Z.ai: GLM 4.6V

Text input Video input Image input Text output
Author's Description

GLM-4.6V is a large multimodal model designed for high-fidelity visual understanding and long-context reasoning across images, documents, and mixed media. It supports up to 128K tokens, processes complex page layouts and charts directly as visual inputs, and integrates native multimodal function calling to connect perception with downstream tool execution. The model also enables interleaved image-text generation and UI reconstruction workflows, including screenshot-to-HTML synthesis and iterative visual editing.

Key Specifications
Cost
$$$$
Context
131K
Released
Dec 08, 2025
Speed
Ability
Reliability
Supported Parameters

This model supports the following parameters:

Temperature Include Reasoning Tool Choice Reasoning Max Tokens Top P Tools
Features

This model supports the following features:

Tools Reasoning
Performance Summary

Z.ai's GLM-4.6V, a large multimodal model designed for visual understanding and long-context reasoning, demonstrates a balanced performance profile with notable strengths in reliability. The model exhibits moderate speed performance, ranking in the 33rd percentile across benchmarks, and offers moderate pricing, placing it in the 37th percentile. A standout feature is its exceptional reliability, achieving a 100% success rate across all benchmarks, indicating minimal technical failures and consistent response delivery. In terms of specific performance, GLM-4.6V excels in ethical reasoning, achieving perfect 100% accuracy, making it the most accurate model at its price point and among models of comparable speed in this category. It also shows strong general knowledge, with 99.5% accuracy, placing it in the 75th percentile. Coding performance is solid at 89.0% accuracy (62nd percentile). The primary area for improvement lies in its ability to acknowledge uncertainty, as indicated by a 90.0% accuracy in the Hallucinations (Baseline) test, suggesting room for refinement in identifying and declining to answer questions based on fictional concepts. Its multimodal capabilities, including processing complex layouts and native function calling, position it well for advanced visual and interactive applications.

Model Pricing

Current Pricing

Feature Price (per 1M tokens)
Prompt $0.3
Completion $0.9
Input Cache Read $0.05

Price History

Available Endpoints
Provider Endpoint Name Context Length Pricing (Input) Pricing (Output)
Z.AI
Z.AI | z-ai/glm-4.6-20251208 131K $0.3 / 1M tokens $0.9 / 1M tokens
DeepInfra
DeepInfra | z-ai/glm-4.6-20251208 131K $0.3 / 1M tokens $0.9 / 1M tokens
Parasail
Parasail | z-ai/glm-4.6-20251208 131K $0.3 / 1M tokens $0.9 / 1M tokens
Chutes
Chutes | z-ai/glm-4.6-20251208 131K $0.3 / 1M tokens $0.9 / 1M tokens
Novita
Novita | z-ai/glm-4.6-20251208 131K $0.3 / 1M tokens $0.9 / 1M tokens
SiliconFlow
SiliconFlow | z-ai/glm-4.6-20251208 131K $0.3 / 1M tokens $0.9 / 1M tokens
Benchmark Results
Benchmark Category Reasoning Strategy Free Executions Accuracy Cost Duration
Other Models by z-ai