Z.ai: GLM 4.6V

Video input Image input Text input Text output
Author's Description

GLM-4.6V is a large multimodal model designed for high-fidelity visual understanding and long-context reasoning across images, documents, and mixed media. It supports up to 128K tokens, processes complex page layouts and charts directly as visual inputs, and integrates native multimodal function calling to connect perception with downstream tool execution. The model also enables interleaved image-text generation and UI reconstruction workflows, including screenshot-to-HTML synthesis and iterative visual editing.

Key Specifications
Cost
$$$$
Context
131K
Released
Dec 08, 2025
Speed
Ability
Reliability
Supported Parameters

This model supports the following parameters:

Tools Reasoning Top P Max Tokens Include Reasoning Tool Choice Temperature
Features

This model supports the following features:

Reasoning Tools
Performance Summary

Z.AI's GLM-4.6V, a large multimodal model created on December 8, 2025, demonstrates a balanced performance profile with notable strengths in reliability and specific accuracy benchmarks. The model exhibits moderate speed performance, ranking in the 30th percentile, indicating it performs at an average pace compared to other models. Its pricing is also moderate, positioned at the 34th percentile, suggesting it offers competitive costs without being the cheapest option. A standout feature is its exceptional reliability, achieving a 100% success rate across all benchmarks, signifying minimal technical failures and consistent response delivery. In terms of benchmark performance, GLM-4.6V excels in ethical reasoning, achieving a perfect 100.0% accuracy, making it the most accurate model at its price point and among models of similar speed. It also shows strong general knowledge with 99.5% accuracy, placing it in the 76th percentile. Coding performance is solid at 89.0% accuracy. The primary area for improvement lies in its handling of hallucinations, where it scored 90.0% accuracy, ranking in the 40th percentile, indicating some room for improvement in acknowledging uncertainty. Its duration for hallucination tests was also relatively slow. Overall, GLM-4.6V is a highly reliable multimodal model with strong ethical and general knowledge capabilities, making it suitable for applications requiring high accuracy and consistent operation, particularly in complex visual and long-context reasoning tasks.

Model Pricing

Current Pricing

Feature Price (per 1M tokens)
Prompt $0.3
Completion $0.9
Input Cache Read $0.05

Price History

Available Endpoints
Provider Endpoint Name Context Length Pricing (Input) Pricing (Output)
Z.AI
Z.AI | z-ai/glm-4.6-20251208 131K $0.3 / 1M tokens $0.9 / 1M tokens
DeepInfra
DeepInfra | z-ai/glm-4.6-20251208 131K $0.3 / 1M tokens $0.9 / 1M tokens
Parasail
Parasail | z-ai/glm-4.6-20251208 131K $0.3 / 1M tokens $0.9 / 1M tokens
Chutes
Chutes | z-ai/glm-4.6-20251208 131K $0.3 / 1M tokens $0.9 / 1M tokens
Novita
Novita | z-ai/glm-4.6-20251208 131K $0.3 / 1M tokens $0.9 / 1M tokens
SiliconFlow
SiliconFlow | z-ai/glm-4.6-20251208 131K $0.3 / 1M tokens $0.9 / 1M tokens
Benchmark Results
Benchmark Category Reasoning Strategy Free Executions Accuracy Cost Duration
Other Models by z-ai