OpenAI: GPT-5.1-Codex

Text input Image input Text output
Author's Description

GPT-5.1-Codex is a specialized version of GPT-5.1 optimized for software engineering and coding workflows. It is designed for both interactive development sessions and long, independent execution of complex engineering tasks. The model supports building projects from scratch, feature development, debugging, large-scale refactoring, and code review. Compared to GPT-5.1, Codex is more steerable, adheres closely to developer instructions, and produces cleaner, higher-quality code outputs. Reasoning effort can be adjusted with the `reasoning.effort` parameter. Read the [docs here](https://openrouter.ai/docs/use-cases/reasoning-tokens#reasoning-effort-level) Codex integrates into developer environments including the CLI, IDE extensions, GitHub, and cloud tasks. It adapts reasoning effort dynamically—providing fast responses for small tasks while sustaining extended multi-hour runs for large projects. The model is trained to perform structured code reviews, catching critical flaws by reasoning over dependencies and validating behavior against tests. It also supports multimodal inputs such as images or screenshots for UI development and integrates tool use for search, dependency installation, and environment setup. Codex is intended specifically for agentic coding applications.

Key Specifications
Cost
$$$$$
Context
400K
Released
Nov 13, 2025
Speed
Ability
Reliability
Supported Parameters

This model supports the following parameters:

Reasoning Seed Tool Choice Structured Outputs Response Format Max Tokens Tools Include Reasoning
Features

This model supports the following features:

Structured Outputs Response Format Reasoning Tools
Performance Summary

GPT-5.1-Codex demonstrates competitive response times, ranking in the 52nd percentile across 8 benchmarks, indicating it performs among the faster models. Its pricing is moderate, placing it in the 21st percentile, suggesting it offers reasonable cost-effectiveness. The model exhibits exceptional reliability with a 100% success rate across all benchmarks, indicating minimal technical failures and consistent operational stability. A key strength is its perfect accuracy in Hallucinations, General Knowledge, and Ethics benchmarks, showcasing robust factual recall and ethical reasoning. Its specialized coding capabilities are evident with 95.0% accuracy in the Coding benchmark, placing it in the 93rd percentile. Instruction Following and Reasoning also show strong performance at 85.9% and 92.0% accuracy respectively. While Email Classification is solid at 98.0%, it's not a top-tier performance compared to other categories. The model's ability to adjust reasoning effort dynamically and integrate into developer environments for agentic coding applications further enhances its utility.

Model Pricing

Current Pricing

Feature Price (per 1M tokens)
Prompt $1.25
Completion $10
Input Cache Read $0.125

Price History

Available Endpoints
Provider Endpoint Name Context Length Pricing (Input) Pricing (Output)
OpenAI
OpenAI | openai/gpt-5.1-codex-20251113 400K $1.25 / 1M tokens $10 / 1M tokens
Benchmark Results
Benchmark Category Reasoning Strategy Free Executions Accuracy Cost Duration
Other Models by openai