OpenAI: GPT-4o Audio

Text input Audio input Text output Audio output
Author's Description

The gpt-4o-audio-preview model adds support for audio inputs as prompts. This enhancement allows the model to detect nuances within audio recordings and add depth to generated user experiences. Audio outputs are currently not supported. Audio tokens are priced at $40 per million input and $80 per million output audio tokens.

Key Specifications
Cost
$$$$$
Context
128K
Parameters
200B (Rumoured)
Released
Aug 14, 2025
Supported Parameters

This model supports the following parameters:

Top P Tool Choice Logit Bias Stop Tools Seed Temperature Max Tokens Structured Outputs Presence Penalty Frequency Penalty Response Format Logprobs Top Logprobs
Features

This model supports the following features:

Tools Structured Outputs Response Format
Model Pricing

Current Pricing

Feature Price (per 1M tokens)
Prompt $2.5
Completion $10

Price History

Available Endpoints
Provider Endpoint Name Context Length Pricing (Input) Pricing (Output)
OpenAI
OpenAI | openai/gpt-4o-audio-preview 128K $2.5 / 1M tokens $10 / 1M tokens
Benchmark Results
Benchmark Category Reasoning Strategy Free Executions Accuracy Cost Duration
Other Models by openai