
xAI Grok audio models now available on Vercel AI Gateway
Quick Answer
xAI's audio models, including real-time voice (xai/grok-voice-think-fast-1.0), text-to-speech (xai/grok-tts), and speech-to-text (xai/grok-stt), are now available on Vercel AI Gateway.
Quick Take
xAI's audio models, including real-time voice (xai/grok-voice-think-fast-1.0), text-to-speech (xai/grok-tts), and speech-to-text (xai/grok-stt), are now available on Vercel AI Gateway. These models can be integrated via the AI SDK 7, providing robust routing and observability for developers.
Key Points
- Models available: xai/grok-voice-think-fast-1.0, xai/grok-tts, xai/grok-stt.
- Real-time voice uses a secure token route for API key protection.
- Text-to-speech can generate audio files in MP3 format.
- Speech-to-text transcribes audio recordings into text efficiently.
- Interactive playground allows testing of audio models directly in the browser.
📖 Reader Mode
~2 min readxAI's audio models are now live on AI Gateway. Realtime voice, text to speech, and speech to text are all available through the AI SDK with the same routing, observability, and spend controls as your other models.
These capabilities are available on the AI SDK 7 release.
npm install ai @ai-sdk/react @ai-sdk/gateway
Link to headingAvailable models
Capability | Models |
|---|---|
Realtime voice |
|
Text to speech |
|
Speech to text |
|
Link to headingRealtime
A voice agent has two pieces: a server route that mints a short-lived token, so your API key never reaches the client, and a browser component that connects with it.
Add the token route: this example sets model to xai/grok-voice-think-fast-1.0:
app/api/realtime/token/route.ts
import { gateway } from '@ai-sdk/gateway';
export async function POST() {
const { token, url } = await gateway.experimental_realtime.getToken({
model: 'xai/grok-voice-think-fast-1.0',
});
return Response.json({ token, url, tools: [] });
}
Then connect from the browser. The useRealtimehook from @ai-sdk/react fetches that route and manages the WebSocket connection, microphone capture, and audio playback:
'use client';
import { experimental_useRealtime as useRealtime } from '@ai-sdk/react';
import { gateway } from '@ai-sdk/gateway';
// Inside a client component:
const { status, connect, startAudioCapture } = useRealtime({
model: gateway.experimental_realtime('xai/grok-voice-think-fast-1.0'),
api: { token: '/api/realtime/token' },
sessionConfig: { turnDetection: { type: 'server-vad' } },
});
// Call connect(), then startAudioCapture(stream) to start talking.
Link to headingText to speech
Generate spoken audio from text with generateSpeech. Pass a voice and an output format, then write the result to a file with xai/grok-tts:
import { generateSpeech } from 'ai';
import { writeFile } from 'node:fs/promises';
const result = await generateSpeech({
model: 'xai/grok-tts',
text: 'Thanks for trying out AI Gateway.',
voice: 'eve',
outputFormat: 'mp3',
});
await writeFile('speech.mp3', result.audio.uint8Array);
Link to headingSpeech to text
Transcribe recordings into text with transcribe. This example uses xai/grok-stt:
import { transcribe } from 'ai';
import { readFile } from 'node:fs/promises';
const result = await transcribe({
model: 'xai/grok-stt',
audio: await readFile('audio.mp3'),
});
console.log(result.text);
Link to headingPlayground
You can also try the xAI audio models directly in the AI Gateway playground. Open the models list and click into any of the models to use them directly in the browser. The xai/grok-voice-think-fast-1.0 playground here allows you to talk to the agent and see responses instantly:


Link to headingMore information
— Originally published at vercel.com
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from Vercel AI
See more →
The Agent Stack
The Agent Stack by Vercel AI provides essential building blocks for creating production-grade agents, enabling seamless integration across multiple AI models and secure operations. It features components like AI Gateway for model routing, Workflow SDK for durable execution, and Vercel Connect for scoped access, streamlining agent development and deployment across various platforms.

