OpenAI: GPT-5.1-Codex

Text input Image input Text output
Author's Description

GPT-5.1-Codex is a specialized version of GPT-5.1 optimized for software engineering and coding workflows. It is designed for both interactive development sessions and long, independent execution of complex engineering tasks. The model supports building projects from scratch, feature development, debugging, large-scale refactoring, and code review. Compared to GPT-5.1, Codex is more steerable, adheres closely to developer instructions, and produces cleaner, higher-quality code outputs. Reasoning effort can be adjusted with the `reasoning.effort` parameter. Read the [docs here](https://openrouter.ai/docs/use-cases/reasoning-tokens#reasoning-effort-level) Codex integrates into developer environments including the CLI, IDE extensions, GitHub, and cloud tasks. It adapts reasoning effort dynamically—providing fast responses for small tasks while sustaining extended multi-hour runs for large projects. The model is trained to perform structured code reviews, catching critical flaws by reasoning over dependencies and validating behavior against tests. It also supports multimodal inputs such as images or screenshots for UI development and integrates tool use for search, dependency installation, and environment setup. Codex is intended specifically for agentic coding applications.

Key Specifications
Cost
$$$$$
Context
400K
Released
Nov 13, 2025
Speed
Ability
Reliability
Supported Parameters

This model supports the following parameters:

Seed Tools Structured Outputs Response Format Reasoning Include Reasoning Tool Choice Max Tokens
Features

This model supports the following features:

Structured Outputs Response Format Tools Reasoning
Performance Summary

OpenAI's GPT-5.1-Codex demonstrates competitive response times, ranking in the 54th percentile across benchmarks, and offers moderate pricing, placing it in the 23rd percentile. Notably, it exhibits exceptional reliability with a 100% success rate, indicating minimal technical failures. The model excels in several areas, achieving perfect accuracy in Hallucinations (Baseline), General Knowledge (Baseline), and Ethics (Baseline) tests, often at a competitive price point and speed within those categories. Its specialized coding capabilities are evident with a strong 95.0% accuracy in the Coding (Baseline) benchmark, placing it in the 91st percentile. Instruction Following (Baseline) is also a significant strength, with 85.9% accuracy (94th percentile). While strong in Reasoning (92.0% accuracy) and Mathematics (93.0% accuracy), its Email Classification (98.0% accuracy) performance, though high, is closer to the average for its category (58th percentile). Overall, GPT-5.1-Codex is a highly reliable and accurate model, particularly strong in coding, general knowledge, and ethical reasoning, making it well-suited for its intended agentic coding applications.

Model Pricing

Current Pricing

Feature Price (per 1M tokens)
Prompt $1.25
Completion $10
Input Cache Read $0.125

Price History

Available Endpoints
Provider Endpoint Name Context Length Pricing (Input) Pricing (Output)
OpenAI
OpenAI | openai/gpt-5.1-codex-20251113 400K $1.25 / 1M tokens $10 / 1M tokens
Azure
Azure | openai/gpt-5.1-codex-20251113 400K $1.25 / 1M tokens $10 / 1M tokens
Benchmark Results
Benchmark Category Reasoning Strategy Free Executions Accuracy Cost Duration
Other Models by openai