AllenAI: Olmo 3.1 32B Think

Text input Text output
Author's Description

Olmo 3.1 32B Think is a large-scale, 32-billion-parameter model designed for deep reasoning, complex multi-step logic, and advanced instruction following. Building on the Olmo 3 series, version 3.1 delivers refined reasoning behavior and stronger performance across demanding evaluations and nuanced conversational tasks. Developed by Ai2 under the Apache 2.0 license, Olmo 3.1 32B Think continues the Olmo initiative’s commitment to openness, providing full transparency across model weights, code, and training methodology.

Key Specifications
Cost
$$$$$
Context
65K
Parameters
32B
Released
Dec 16, 2025
Speed
Ability
Reliability
Supported Parameters

This model supports the following parameters:

Reasoning Structured Outputs Frequency Penalty Include Reasoning Top P Seed Min P Temperature Stop Response Format Max Tokens Presence Penalty Logit Bias
Features

This model supports the following features:

Structured Outputs Response Format Reasoning
Performance Summary

AllenAI: Olmo 3.1 32B Think demonstrates exceptional reliability, achieving a 99% success rate across benchmarks, indicating consistent and stable operation. However, its speed performance is a notable weakness, ranking in the 12th percentile with generally longer response times. Pricing is moderate, falling into the 28th percentile. The model exhibits strong performance in several key areas. It achieves high accuracy in Reasoning (90.0%, 79th percentile) and Coding (91.0%, 72nd percentile), aligning with its design for deep reasoning and complex logic. General Knowledge is also robust at 99.0% accuracy (64th percentile). Instruction Following shows solid capability at 64.6% accuracy (69th percentile). While its hallucination rate is relatively low at 91.5% accuracy (45th percentile for not hallucinating), there's room for improvement in consistently acknowledging uncertainty. Mathematics performance is average at 88.4% (57th percentile), and Ethics (98.0%, 38th percentile) and Email Classification (96.8%, 40th percentile) are competent but not top-tier. Its primary weakness lies in its slower processing speeds across all evaluated tasks.

Model Pricing

Current Pricing

Feature Price (per 1M tokens)
Prompt $0.15
Completion $0.5

Price History

Available Endpoints
Provider Endpoint Name Context Length Pricing (Input) Pricing (Output)
Parasail
Parasail | allenai/olmo-3.1-32b-think-20251215 65K $0.15 / 1M tokens $0.5 / 1M tokens
Benchmark Results
Benchmark Category Reasoning Strategy Free Executions Accuracy Cost Duration
Other Models by allenai