fireworks/models/llama-v3p3-70b-instruct

Common Name: Llama 3.3 70B Instruct

Released on Oct 16, 2025 12:00 AMSupportedTool Invocation

Llama 3.3 70B Instruct is the December update of Llama 3.1 70B. The model improves upon Llama 3.1 70B (released July 2024) with advances in tool calling, multilingual text support, math and coding. The model achieves industry leading results in reasoning, math and instruction following and provides similar performance as 3.1 405B but with significant speed and cost improvements.

Specifications

Context

128K

Inputtext

Outputtext

Performance (7-day Average)

Collecting…

Pricing

Input$0.99/MTokens

Output$0.99/MTokens

Availability Trend (24h)

Performance Metrics (24h)

Similar Models

GLM-5

$1.10/$3.52/M

ctx203Kmax—avail—tps—

InOutCap

Z.ai's state-of-the-art mixture-of-experts model with 40B active parameters out of 744B total. Optimized for complex systems engineering and long-horizon agentic tasks, using Deepseek Sparse Attention for efficient long-context processing.

Deepseek R1 05/28

$1.49/$5.94/M

ctx160Kmax—avail—tps—

InOutCap

05/28 updated checkpoint of Deepseek R1. Its overall performance is now approaching that of leading models, such as O3 and Gemini 2.5 Pro. Compared to the previous version, the upgraded model shows significant improvements in handling complex reasoning tasks, and this version also offers a reduced hallucination rate, enhanced support for function calling, and better experience for vibe coding.

Kimi K2.5

$0.66/$3.30/M

ctx262Kmax—avail—tps—

InOutCap

Kimi K2.5 is Moonshot AI's flagship agentic model and a new SOTA open model. It unifies vision and text, thinking and non-thinking modes, and single-agent and multi-agent execution into one model. Kimi K2.5 is a mixture-of-experts (MoE) language model with 1 trillion total parameters and a 262K context window.

OpenAI gpt-oss-120b

$0.17/$0.66/M

ctx128Kmax—avail—tps—

InOutCap

Welcome to the gpt-oss series, OpenAI's open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases. gpt-oss-120b is used for production, general purpose, high reasoning use-cases that fits into a single H100 GPU.