Qwen: Qwen3 32B

Text input Text output
Author's Description

Qwen3-32B is a dense 32.8B parameter causal language model from the Qwen3 series, optimized for both complex reasoning and efficient dialogue. It supports seamless switching between a "thinking" mode for tasks like math, coding, and logical inference, and a "non-thinking" mode for faster, general-purpose conversation. The model demonstrates strong performance in instruction-following, agent tool use, creative writing, and multilingual tasks across 100+ languages and dialects. It natively handles 32K token contexts and can extend to 131K tokens using YaRN-based scaling.

Key Specifications
Cost
$$$
Context
40K
Parameters
32B
Released
Apr 28, 2025
Speed
Ability
Reliability
Supported Parameters

This model supports the following parameters:

Stop Presence Penalty Temperature Seed Response Format Frequency Penalty Max Tokens Include Reasoning Tool Choice Top P Min P Tools Reasoning
Features

This model supports the following features:

Tools Reasoning Response Format
Performance Summary

Qwen3-32B demonstrates exceptional reliability, consistently providing usable responses with a 100th percentile ranking across benchmarks, indicating virtually no technical failures. While its speed tends to be slower, ranking in the 14th percentile, it offers competitive pricing, placing in the 57th percentile. The model excels in accuracy across several critical domains. It achieved perfect scores in both Ethics and General Knowledge, often being the most accurate model at its price point and among models of similar speed. Its performance in Coding (95.0% accuracy) and Reasoning (98.0% accuracy) is also very strong, again standing out for its accuracy relative to cost. Email Classification shows robust performance at 99.0% accuracy. The primary area for improvement is Instruction Following, where it achieved 55.7% accuracy, placing it in the 62nd percentile, suggesting room for enhancement in handling complex, multi-layered instructions. Overall, Qwen3-32B is a highly reliable and accurate model, particularly strong in knowledge-based and logical tasks, with competitive pricing, though its response times are generally longer.

Model Pricing

Current Pricing

Feature Price (per 1M tokens)
Prompt $0.1
Completion $0.3

Price History

Available Endpoints
Provider Endpoint Name Context Length Pricing (Input) Pricing (Output)
DeepInfra
DeepInfra | qwen/qwen3-32b-04-28 40K $0.1 / 1M tokens $0.3 / 1M tokens
Nebius
Nebius | qwen/qwen3-32b-04-28 40K $0.1 / 1M tokens $0.3 / 1M tokens
Lambda
Lambda | qwen/qwen3-32b-04-28 40K $0.1 / 1M tokens $0.3 / 1M tokens
Novita
Novita | qwen/qwen3-32b-04-28 40K $0.1 / 1M tokens $0.45 / 1M tokens
Parasail
Parasail | qwen/qwen3-32b-04-28 40K $0.018 / 1M tokens $0.072 / 1M tokens
GMICloud
GMICloud | qwen/qwen3-32b-04-28 32K $0.1 / 1M tokens $0.6 / 1M tokens
Nebius
Nebius | qwen/qwen3-32b-04-28 40K $0.2 / 1M tokens $0.6 / 1M tokens
Cerebras
Cerebras | qwen/qwen3-32b-04-28 131K $0.4 / 1M tokens $0.8 / 1M tokens
SambaNova
SambaNova | qwen/qwen3-32b-04-28 32K $0.4 / 1M tokens $0.8 / 1M tokens
Groq
Groq | qwen/qwen3-32b-04-28 131K $0.29 / 1M tokens $0.59 / 1M tokens
Friendli
Friendli | qwen/qwen3-32b-04-28 131K $0.15 / 1M tokens $0.5 / 1M tokens
Chutes
Chutes | qwen/qwen3-32b-04-28 40K $0.018 / 1M tokens $0.072 / 1M tokens
Benchmark Results
Benchmark Category Reasoning Free Executions Accuracy Cost Duration
Other Models by qwen