Google: Gemini 2.5 Flash Lite

Text input Image input File input Audio input Text output
Author's Description

Gemini 2.5 Flash-Lite is a lightweight reasoning model in the Gemini 2.5 family, optimized for ultra-low latency and cost efficiency. It offers improved throughput, faster token generation, and better performance across common benchmarks compared to earlier Flash models. By default, "thinking" (i.e. multi-pass reasoning) is disabled to prioritize speed, but developers can enable it via the [Reasoning API parameter](https://openrouter.ai/docs/use-cases/reasoning-tokens) to selectively trade off cost for intelligence.

Key Specifications
Cost
$$$
Context
1M
Released
Jul 22, 2025
Speed
Ability
Reliability
Supported Parameters

This model supports the following parameters:

Tools Structured Outputs Tool Choice Reasoning Include Reasoning Response Format Stop Seed Top P Max Tokens Temperature
Features

This model supports the following features:

Tools Reasoning Response Format Structured Outputs
Performance Summary

Google's Gemini 2.5 Flash Lite demonstrates strong performance as a lightweight reasoning model, excelling in speed and reliability. It consistently ranks among the fastest models, achieving the 89th percentile across benchmarks, and offers competitive pricing, typically falling within the 69th percentile. Notably, its reliability is exceptional, with a 100% success rate across all evaluated benchmarks, indicating minimal technical failures. The model exhibits perfect accuracy in Email Classification and Ethics, with both benchmarks also highlighting its efficiency as the most accurate model at its price point and among models of comparable speed. It shows strong general knowledge (99.0% accuracy), performing as the most accurate among models of similar speed. While its Instruction Following (58.0% accuracy) and Reasoning (62.0% accuracy) capabilities are moderate, its Hallucinations score (90.0% accuracy) suggests a good ability to acknowledge uncertainty. Coding (79.0% accuracy) and Mathematics (77.0% accuracy) show room for improvement compared to top-tier models. Its primary strength lies in its combination of high speed, cost-effectiveness, and perfect reliability, making it suitable for latency-sensitive and budget-conscious applications.

Model Pricing

Current Pricing

Feature Price (per 1M tokens)
Prompt $0.1
Completion $0.4
Input Cache Read $0.025
Input Cache Write $0.183

Price History

Available Endpoints
Provider Endpoint Name Context Length Pricing (Input) Pricing (Output)
Google
Google | google/gemini-2.5-flash-lite 1M $0.1 / 1M tokens $0.4 / 1M tokens
Google AI Studio
Google AI Studio | google/gemini-2.5-flash-lite 1M $0.1 / 1M tokens $0.4 / 1M tokens
Benchmark Results
Benchmark Category Reasoning Strategy Free Executions Accuracy Cost Duration
Other Models by google