Google: Gemini 1.5 Flash 8B

Image input Text input Text output Unavailable
Author's Description

Gemini Flash 1.5 8B is optimized for speed and efficiency, offering enhanced performance in small prompt tasks like chat, transcription, and translation. With reduced latency, it is highly effective for real-time and large-scale operations. This model focuses on cost-effective solutions while maintaining high-quality results. [Click here to learn more about this model](https://developers.googleblog.com/en/gemini-15-flash-8b-is-now-generally-available-for-use/). Usage of Gemini is subject to Google's [Gemini Terms of Use](https://ai.google.dev/terms).

Key Specifications
Cost
$
Context
1M
Parameters
500B (Rumoured)
Released
Oct 02, 2024
Speed
Ability
Reliability
Supported Parameters

This model supports the following parameters:

Tools Stop Frequency Penalty Presence Penalty Top P Tool Choice Response Format Temperature Seed Structured Outputs Max Tokens
Features

This model supports the following features:

Response Format Structured Outputs Tools
Performance Summary

Google's Gemini 1.5 Flash 8B demonstrates exceptional performance, consistently ranking among the fastest models and offering highly competitive pricing across various benchmarks. Its reliability is outstanding, achieving a 100% success rate with minimal technical failures. Optimized for speed and efficiency, this model excels in tasks requiring low latency and cost-effectiveness. In terms of specific benchmarks, Gemini 1.5 Flash 8B shows strong performance in Hallucinations (98.0% accuracy), General Knowledge (97.0% accuracy), Ethics (98.0% accuracy), and Email Classification (98.0% accuracy). It is a speed champion in Hallucinations, Mathematics, and Reasoning, often delivering near-perfect accuracy at the highest speeds. Notably, it achieves the best accuracy-to-cost ratio in Mathematics. However, the model exhibits a significant weakness in Instruction Following, with 0.0% accuracy, and shows moderate performance in Mathematics (58.0% accuracy), Reasoning (42.0% accuracy), and Coding (80.0% accuracy) compared to its other strengths. Its core strength lies in its speed, cost-efficiency, and reliability, making it highly suitable for real-time and large-scale operations where these factors are critical.

Model Pricing

Current Pricing

Feature Price (per 1M tokens)
Prompt $0.0375
Completion $0.15
Input Cache Read $0.01
Input Cache Write $0.0583

Price History

Available Endpoints
Provider Endpoint Name Context Length Pricing (Input) Pricing (Output)
Google AI Studio
Google AI Studio | google/gemini-flash-1.5-8b 1M $0.0375 / 1M tokens $0.15 / 1M tokens
Benchmark Results
Benchmark Category Reasoning Strategy Free Executions Accuracy Cost Duration
Other Models by google