Z.AI: GLM 4.5 Air

Text input Text output Free Option
Author's Description

GLM-4.5-Air is the lightweight variant of our latest flagship model family, also purpose-built for agent-centric applications. Like GLM-4.5, it adopts the Mixture-of-Experts (MoE) architecture but with a more compact parameter size. GLM-4.5-Air also supports hybrid inference modes, offering a "thinking mode" for advanced reasoning and tool use, and a "non-thinking mode" for real-time interaction. Users can control the reasoning behaviour with the `reasoning` `enabled` boolean. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#enable-reasoning-with-default-config)

Key Specifications
Cost
$$$$$
Context
131K
Released
Jul 25, 2025
Speed
Ability
Reliability
Supported Parameters

This model supports the following parameters:

Include Reasoning Temperature Tools Reasoning Max Tokens Tool Choice Top P
Features

This model supports the following features:

Tools Reasoning
Performance Summary

Z.AI's GLM-4.5-Air, a lightweight MoE model designed for agent-centric applications, demonstrates exceptional speed, consistently ranking among the fastest models across nine benchmarks. Its pricing is moderate, placing it in the 24th percentile. Reliability is a significant strength, with a 98% success rate indicating minimal technical failures. The model exhibits strong performance in several key areas. It achieves high accuracy in Reasoning (95.6%, 86th percentile), General Knowledge (99.5%, 75th percentile), and Mathematics (92.9%, 74th percentile), showcasing robust analytical and factual recall capabilities. Instruction Following also performs well at 68.7% accuracy (79th percentile). While its Hallucinations accuracy is 90.0%, placing it in the 35th percentile, it generally acknowledges uncertainty appropriately. A notable weakness is its Email Classification accuracy (80.0%, 9th percentile), suggesting room for improvement in nuanced categorization tasks. The model's "thinking mode" for advanced reasoning and tool use, alongside a "non-thinking mode" for real-time interaction, offers flexible deployment options.

Model Pricing

Current Pricing

Feature Price (per 1M tokens)
Prompt $0.2
Completion $1.1
Input Cache Read $0.03

Price History

Available Endpoints
Provider Endpoint Name Context Length Pricing (Input) Pricing (Output)
Z.AI
Z.AI | z-ai/glm-4.5-air 131K $0.2 / 1M tokens $1.1 / 1M tokens
DeepInfra
DeepInfra | z-ai/glm-4.5-air 131K $0.05 / 1M tokens $0.22 / 1M tokens
GMICloud
GMICloud | z-ai/glm-4.5-air 131K $0.05 / 1M tokens $0.22 / 1M tokens
SiliconFlow
SiliconFlow | z-ai/glm-4.5-air 131K $0.14 / 1M tokens $0.86 / 1M tokens
AtlasCloud
AtlasCloud | z-ai/glm-4.5-air 32K $0.05 / 1M tokens $0.22 / 1M tokens
Nebius
Nebius | z-ai/glm-4.5-air 131K $0.2 / 1M tokens $1.2 / 1M tokens
Novita
Novita | z-ai/glm-4.5-air 131K $0.13 / 1M tokens $0.85 / 1M tokens
Chutes
Chutes | z-ai/glm-4.5-air 131K $0.05 / 1M tokens $0.22 / 1M tokens
Benchmark Results
Benchmark Category Reasoning Strategy Free Executions Accuracy Cost Duration
Other Models by z-ai