Z.AI: GLM 4.5

Text input Text output
Author's Description

GLM-4.5 is our latest flagship foundation model, purpose-built for agent-based applications. It leverages a Mixture-of-Experts (MoE) architecture and supports a context length of up to 128k tokens. GLM-4.5 delivers significantly enhanced capabilities in reasoning, code generation, and agent alignment. It supports a hybrid inference mode with two options, a "thinking mode" designed for complex reasoning and tool use, and a "non-thinking mode" optimized for instant responses. Users can control the reasoning behaviour with the `reasoning` `enabled` boolean. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#enable-reasoning-with-default-config)

Key Specifications
Cost
$$$$$
Context
131K
Released
Jul 25, 2025
Speed
Ability
Reliability
Supported Parameters

This model supports the following parameters:

Response Format Tools Reasoning Top P Max Tokens Include Reasoning Tool Choice Temperature
Features

This model supports the following features:

Reasoning Tools Response Format
Performance Summary

Z.AI's GLM-4.5, a flagship foundation model designed for agent-based applications, exhibits exceptional reliability with a 99% success rate, indicating minimal technical failures. However, it tends to have longer response times, ranking in the 10th percentile for speed, and is positioned at premium pricing levels, ranking in the 18th percentile for cost. Despite its slower speed and higher price point, GLM-4.5 demonstrates strong performance across several critical benchmarks. It achieves perfect accuracy in General Knowledge, making it the most accurate model at its price point and among models of comparable speed. The model also excels in Instruction Following (85th percentile), Coding (82nd percentile), Reasoning (81st percentile), and Mathematics (87th percentile), showcasing its robust capabilities in complex problem-solving and logical tasks. While its Hallucinations accuracy is moderate at 94.0% (46th percentile), it performs well in Ethics (99.0% accuracy). A notable weakness is its Email Classification accuracy, which is 93.0% but ranks in the 23rd percentile, suggesting room for improvement in this specific domain compared to other models. Its hybrid inference mode, with "thinking" and "non-thinking" options, offers flexibility for diverse application needs.

Model Pricing

Current Pricing

Feature Price (per 1M tokens)
Prompt $0.6
Completion $2.2
Input Cache Read $0.11

Price History

Available Endpoints
Provider Endpoint Name Context Length Pricing (Input) Pricing (Output)
Z.AI
Z.AI | z-ai/glm-4.5 131K $0.6 / 1M tokens $2.2 / 1M tokens
Chutes
Chutes | z-ai/glm-4.5 131K $0.35 / 1M tokens $1.55 / 1M tokens
DeepInfra
DeepInfra | z-ai/glm-4.5 131K $0.35 / 1M tokens $1.55 / 1M tokens
Novita
Novita | z-ai/glm-4.5 131K $0.35 / 1M tokens $1.55 / 1M tokens
Parasail
Parasail | z-ai/glm-4.5 131K $0.35 / 1M tokens $1.55 / 1M tokens
GMICloud
GMICloud | z-ai/glm-4.5 131K $0.35 / 1M tokens $1.55 / 1M tokens
AtlasCloud
AtlasCloud | z-ai/glm-4.5 131K $0.35 / 1M tokens $1.55 / 1M tokens
Mancer 2
Mancer 2 | z-ai/glm-4.5 131K $0.35 / 1M tokens $1.55 / 1M tokens
SiliconFlow
SiliconFlow | z-ai/glm-4.5 131K $0.35 / 1M tokens $1.55 / 1M tokens
WandB
WandB | z-ai/glm-4.5 131K $0.55 / 1M tokens $2 / 1M tokens
Mancer 2
Mancer 2 | z-ai/glm-4.5 131K $0.35 / 1M tokens $1.55 / 1M tokens
Nebius
Nebius | z-ai/glm-4.5 131K $0.6 / 1M tokens $2.2 / 1M tokens
Novita
Novita | z-ai/glm-4.5 131K $0.6 / 1M tokens $2.2 / 1M tokens
Chutes
Chutes | z-ai/glm-4.5 131K $0.35 / 1M tokens $1.55 / 1M tokens
Benchmark Results
Benchmark Category Reasoning Strategy Free Executions Accuracy Cost Duration
Other Models by z-ai