This is our first general-availability realtime model, capable of responding to audio and text inputs in realtime over WebRTC, WebSocket, or SIP connections.
Specifications
Context32,000
Max Output4,096
Inputtext, audio, image
Outputtext, audio
Performance (7-day Average)
Uptime
TPS
RURT
API Paths
/v1/realtime
Pricing
Input$4.00× 1.1/ MTokens
Output$16.00× 1.1/ MTokens
Cached Input$0.50× 1.1/ MTokens
Input Audio$32.00× 1.1/ MTokens
Output Audio$64.00× 1.1/ MTokens
input image$5.00× 1.1/ MTokens