This is our first general-availability realtime model, capable of responding to audio and text inputs in realtime over WebRTC, WebSocket, or SIP connections.
Specifications
Context32,000
Max Output4,096
Inputtext, audio, image
Outputtext, audio
Performance (7-day Average)
Uptime
TPS
RURT
Pricing
Input$4.00×1.1/MTokens
Output$16.00×1.1/MTokens
Cached Input$0.50×1.1/MTokens
Input Audio$32.00×1.1/MTokens
Output Audio$64.00×1.1/MTokens
input image$5.00×1.1/MTokens