Google: Gemini 2.5 Flash Lite Preview 06-17

Audio input File input Text input Image input Text output
Author's Description

Gemini 2.5 Flash-Lite is a lightweight reasoning model in the Gemini 2.5 family, optimized for ultra-low latency and cost efficiency. It offers improved throughput, faster token generation, and better performance across common benchmarks compared to earlier Flash models. By default, "thinking" (i.e. multi-pass reasoning) is disabled to prioritize speed, but developers can enable it via the [Reasoning API parameter](https://openrouter.ai/docs/use-cases/reasoning-tokens) to selectively trade off cost for intelligence.

Key Specifications
Cost
$$$
Context
1M
Released
Jun 17, 2025
Speed
Ability
Reliability
Supported Parameters

This model supports the following parameters:

Include Reasoning Stop Tool Choice Top P Temperature Seed Tools Structured Outputs Response Format Reasoning Max Tokens
Features

This model supports the following features:

Tools Reasoning Structured Outputs Response Format
Performance Summary

Gemini 2.5 Flash Lite Preview 06-17 demonstrates strong overall performance, particularly excelling in speed and reliability. It consistently performs among the fastest models, ranking in the 77th percentile for speed across various benchmarks. The model also offers competitive pricing, placing in the 66th percentile for cost-effectiveness. Notably, its reliability is exceptional, achieving a perfect 100th percentile, indicating a highly stable and dependable service with minimal technical failures. In terms of benchmark performance, Gemini 2.5 Flash Lite shows particular strength in classification and knowledge-based tasks, achieving 99.0% accuracy in Email Classification and 98.5% in General Knowledge. It also performs well in Ethics (99.0% accuracy) and Instruction Following (60.0% accuracy), demonstrating good precision and adherence to complex directives. While its Reasoning accuracy is solid at 68.0%, its performance in Coding (78.5% accuracy) is more moderate compared to other categories. The model's design prioritizes speed, with "thinking" disabled by default, which contributes to its rapid token generation and throughput, making it ideal for latency-sensitive applications. Developers can, however, enable multi-pass reasoning for tasks requiring higher intelligence at a trade-off in cost.

Model Pricing

Current Pricing

Feature Price (per 1M tokens)
Prompt $0.1
Completion $0.4
Input Cache Read $0.025
Input Cache Write $0.183

Price History

Available Endpoints
Provider Endpoint Name Context Length Pricing (Input) Pricing (Output)
Google
Google | google/gemini-2.5-flash-lite-preview-06-17 1M $0.1 / 1M tokens $0.4 / 1M tokens
Google AI Studio
Google AI Studio | google/gemini-2.5-flash-lite-preview-06-17 1M $0.1 / 1M tokens $0.4 / 1M tokens
Benchmark Results
Benchmark Category Reasoning Free Executions Accuracy Cost Duration
Other Models by google