OpenAI: GPT-4o Audio

Audio input Text input Text output
Author's Description

The gpt-4o-audio-preview model adds support for audio inputs as prompts. This enhancement allows the model to detect nuances within audio recordings and add depth to generated user experiences. Audio outputs are currently not supported. Audio tokens are priced at $40 per million input audio tokens.

Key Specifications
Cost
$$$$$
Context
128K
Parameters
200B (Rumoured)
Released
Aug 14, 2025
Supported Parameters

This model supports the following parameters:

Top Logprobs Stop Max Tokens Tool Choice Top P Frequency Penalty Structured Outputs Seed Response Format Logprobs Logit Bias Tools Temperature Presence Penalty
Features

This model supports the following features:

Tools Structured Outputs Response Format
Model Pricing

Current Pricing

Feature Price (per 1M tokens)
Prompt $2.5
Completion $10

Price History

Available Endpoints
Provider Endpoint Name Context Length Pricing (Input) Pricing (Output)
OpenAI
OpenAI | openai/gpt-4o-audio-preview 128K $2.5 / 1M tokens $10 / 1M tokens
Benchmark Results
Benchmark Category Reasoning Strategy Free Executions Accuracy Cost Duration
Other Models by openai