Google: Gemma 4 26B A4B

Text input Image input Video input Text output
Author's Description

Gemma 4 26B A4B IT is an instruction-tuned Mixture-of-Experts (MoE) model from Google DeepMind. Despite 25.2B total parameters, only 3.8B activate per token during inference — delivering near-31B quality at a fraction of the compute cost. Supports multimodal input including text, images, and video (up to 60s at 1fps). Features a 256K token context window, native function calling, configurable thinking/reasoning mode, and structured output support. Released under Apache 2.0.

Key Specifications
Cost
$$
Context
262K
Parameters
26B
Released
Apr 03, 2026
Speed
Ability
Reliability
Supported Parameters

This model supports the following parameters:

Seed Structured Outputs Response Format Reasoning Temperature Presence Penalty Include Reasoning Tools Frequency Penalty Top P Stop Tool Choice Max Tokens Logit Bias
Features

This model supports the following features:

Structured Outputs Response Format Tools Reasoning
Performance Summary

Google's Gemma 4 26B A4B IT, an instruction-tuned Mixture-of-Experts (MoE) model, demonstrates strong performance across key metrics. It consistently ranks among the fastest models, placing in the 82nd percentile for speed, and offers competitive pricing, landing in the 78th percentile for cost-effectiveness. A standout feature is its exceptional reliability, achieving a 100% success rate across benchmarks, indicating minimal technical failures and consistent response delivery. In terms of specific capabilities, the model exhibits a high degree of accuracy in acknowledging uncertainty, scoring 98.0% on the Hallucinations (Baseline) benchmark. This indicates a strong ability to identify and appropriately respond to fictional concepts by selecting "I don't know," a crucial aspect for trustworthy AI. This performance is achieved at a cost of $0.0022 and a duration of 76933ms, both of which are competitive within their respective percentiles. Its architectural design, activating only 3.8B parameters out of 25.2B total, allows it to deliver near-31B quality at a significantly reduced computational cost. The model's support for multimodal input, a 256K token context window, native function calling, and configurable reasoning further enhance its versatility and utility.

Model Pricing

Current Pricing

Feature Price (per 1M tokens)
Prompt $0.13
Completion $0.4

Price History

Available Endpoints
Provider Endpoint Name Context Length Pricing (Input) Pricing (Output)
Parasail
Parasail | google/gemma-4-26b-a4b-it-20260403 262K $0.13 / 1M tokens $0.4 / 1M tokens
Novita
Novita | google/gemma-4-26b-a4b-it-20260403 262K $0.13 / 1M tokens $0.4 / 1M tokens
Benchmark Results
Benchmark Category Reasoning Strategy Free Executions Accuracy Cost Duration
Other Models by google