Google: Gemini 2.5 Flash Lite

Audio input File input Text input Image input Text output
Author's Description

Gemini 2.5 Flash-Lite is a lightweight reasoning model in the Gemini 2.5 family, optimized for ultra-low latency and cost efficiency. It offers improved throughput, faster token generation, and better performance across common benchmarks compared to earlier Flash models. By default, "thinking" (i.e. multi-pass reasoning) is disabled to prioritize speed, but developers can enable it via the [Reasoning API parameter](https://openrouter.ai/docs/use-cases/reasoning-tokens) to selectively trade off cost for intelligence.

Key Specifications
Cost
$$$
Context
1M
Released
Jul 22, 2025
Speed
Ability
Reliability
Supported Parameters

This model supports the following parameters:

Include Reasoning Stop Tool Choice Top P Temperature Seed Tools Structured Outputs Response Format Reasoning Max Tokens
Features

This model supports the following features:

Tools Reasoning Structured Outputs Response Format
Performance Summary

Gemini 2.5 Flash Lite consistently performs among the fastest models, ranking in the 86th percentile across six benchmarks, and offers competitive pricing, typically providing cost-effective solutions in the 68th percentile. Notably, it demonstrates exceptional reliability with a 100% success rate across all evaluated benchmarks, indicating minimal technical failures. In terms of specific performance, the model achieved perfect accuracy in both Ethics and Email Classification, standing out as the most accurate model at its price point and among models of comparable speed in these categories. It also showed strong performance in General Knowledge with 99% accuracy, again being the most accurate among models of similar speed. However, its performance in Instruction Following (58% accuracy), Coding (79% accuracy), and Reasoning (60% accuracy) was more moderate, suggesting areas where its "lite" nature, particularly with disabled multi-pass reasoning by default, might impact complex task execution. Its strength lies in rapid, accurate responses for well-defined tasks, making it highly suitable for applications prioritizing speed and cost efficiency.

Model Pricing

Current Pricing

Feature Price (per 1M tokens)
Prompt $0.1
Completion $0.4
Input Cache Read $0.025
Input Cache Write $0.183

Price History

Available Endpoints
Provider Endpoint Name Context Length Pricing (Input) Pricing (Output)
Google
Google | google/gemini-2.5-flash-lite 1M $0.1 / 1M tokens $0.4 / 1M tokens
Google AI Studio
Google AI Studio | google/gemini-2.5-flash-lite 1M $0.1 / 1M tokens $0.4 / 1M tokens
Benchmark Results
Benchmark Category Reasoning Free Executions Accuracy Cost Duration
Other Models by google