Google: Gemini 1.5 Flash 8B

Text input Image input Text output
Author's Description

Gemini Flash 1.5 8B is optimized for speed and efficiency, offering enhanced performance in small prompt tasks like chat, transcription, and translation. With reduced latency, it is highly effective for real-time and large-scale operations. This model focuses on cost-effective solutions while maintaining high-quality results. [Click here to learn more about this model](https://developers.googleblog.com/en/gemini-15-flash-8b-is-now-generally-available-for-use/). Usage of Gemini is subject to Google's [Gemini Terms of Use](https://ai.google.dev/terms).

Key Specifications
Cost
$$
Context
1M
Parameters
500B (Rumoured)
Released
Oct 02, 2024
Speed
Ability
Reliability
Supported Parameters

This model supports the following parameters:

Stop Presence Penalty Tool Choice Top P Temperature Seed Tools Structured Outputs Response Format Frequency Penalty Max Tokens
Features

This model supports the following features:

Tools Structured Outputs Response Format
Performance Summary

Google's Gemini 1.5 Flash 8B consistently ranks among the fastest models available, demonstrating exceptional speed across various benchmarks. It also offers highly competitive pricing, making it a cost-effective solution for a wide range of applications. The model exhibits outstanding reliability with a 100% success rate across all evaluated benchmarks, indicating minimal technical failures and consistent response delivery. In terms of specific performance, Gemini 1.5 Flash 8B shows strong capabilities in Email Classification (98% accuracy) and Ethics (98% accuracy), performing well within the top percentiles for these categories. Its General Knowledge performance is also commendable at 97% accuracy, notably being the most accurate model at its price point and ranking among the top three in speed for this category. While its Coding (Baseline) accuracy is moderate at 80%, its Reasoning (Baseline) accuracy is lower at 45%, suggesting an area for potential improvement in complex logical problem-solving. A significant weakness is observed in Instruction Following, where it achieved 0% accuracy, indicating a limitation in handling multi-layered or highly complex instructions. Despite this, its overall speed and cost-efficiency, coupled with high reliability, position it as a strong contender for real-time and large-scale operations, particularly for tasks like chat, transcription, and translation.

Model Pricing

Current Pricing

Feature Price (per 1M tokens)
Prompt $0.0375
Completion $0.15
Input Cache Read $0.01
Input Cache Write $0.0583

Price History

Available Endpoints
Provider Endpoint Name Context Length Pricing (Input) Pricing (Output)
Google AI Studio
Google AI Studio | google/gemini-flash-1.5-8b 1M $0.0375 / 1M tokens $0.15 / 1M tokens
Benchmark Results
Benchmark Category Reasoning Free Executions Accuracy Cost Duration
Other Models by google