A cost-efficient version of GPT Realtime - capable of responding to audio and text inputs in realtime over WebRTC, WebSocket, or SIP connections.
Specifications
Context32,000
Max Output4,096
Inputtext, audio, image
Outputtext, audio
Performance (7-day Average)
Uptime
TPS
RURT
Pricing
Input$0.60×1.1/MTokens
Output$2.40×1.1/MTokens
Cached Input$0.06×1.1/MTokens
Input Audio$10.00×1.1/MTokens
Output Audio$20.00×1.1/MTokens
input image$0.80×1.1/MTokens