Google: Gemini 3.1 Flash Lite

Image input File input Audio input Text input Video input Text output
Author's Description

Gemini 3.1 Flash Lite is Google’s GA high-efficiency multimodal model optimized for low-latency, high-volume workloads. It supports text, image, video, audio, and PDF inputs, and is designed for lightweight agentic...

Key Specifications
Cost
$$$
Context
1M
Released
May 07, 2026
Speed
Ability
Reliability
Supported Parameters

This model supports the following parameters:

Response Format Top P Structured Outputs Reasoning Max Tokens Stop Tool Choice Tools Include Reasoning Seed Temperature
Features

This model supports the following features:

Tools Response Format Reasoning Structured Outputs
Performance Summary

Google's Gemini 3.1 Flash Lite, a high-efficiency multimodal model, demonstrates strong overall performance, particularly excelling in speed and reliability. It consistently ranks among the fastest models, achieving the 85th percentile across eight benchmarks, and offers competitive pricing, typically falling within the 63rd percentile. A standout feature is its exceptional reliability, boasting a 100% success rate across all benchmarks, indicating minimal technical failures. The model exhibits perfect accuracy in Hallucinations (Baseline) and Ethics (Baseline) benchmarks, correctly identifying "I don't know" for fictional concepts and adhering to ethical principles, respectively. It also shows impressive performance in Mathematics (Baseline) with 97.0% accuracy, placing it in the 98th percentile and making it the most accurate among models of comparable speed. Instruction Following (Baseline) and Email Classification (Baseline) also show high accuracy at 82.0% and 99.0% respectively. While General Knowledge (Baseline) and Coding (Baseline) are solid at 99.5% and 90.0%, they represent areas where the model performs well but not at the very top tier. Reasoning (Baseline) is competent at 86.0%. Its optimization for low-latency, high-volume workloads is evident in its speed and cost-effectiveness across various tasks.

Model Pricing

Current Pricing

Feature Price (per 1M tokens)
Prompt $0.25
Completion $1.5
Input Cache Read $0.025
Input Cache Write $0.0833
Internal Reasoning $1.5
Web Search $14000

Price History

Available Endpoints
Provider Endpoint Name Context Length Pricing (Input) Pricing (Output)
Google
Google | google/gemini-3.1-flash-lite-20260507 1M $0.25 / 1M tokens $1.5 / 1M tokens
Google AI Studio
Google AI Studio | google/gemini-3.1-flash-lite-20260507 1M $0.25 / 1M tokens $1.5 / 1M tokens
Benchmark Results
Benchmark Category Reasoning Strategy Free Executions Accuracy Cost Duration
Other Models by google