Meituan: LongCat Flash Chat

Text input Text output Free Option
Author's Description

LongCat-Flash-Chat is a large-scale Mixture-of-Experts (MoE) model with 560B total parameters, of which 18.6B–31.3B (≈27B on average) are dynamically activated per input. It introduces a shortcut-connected MoE design to reduce communication overhead and achieve high throughput while maintaining training stability through advanced scaling strategies such as hyperparameter transfer, deterministic computation, and multi-stage optimization. This release, LongCat-Flash-Chat, is a non-thinking foundation model optimized for conversational and agentic tasks. It supports long context windows up to 128K tokens and shows competitive performance across reasoning, coding, instruction following, and domain benchmarks, with particular strengths in tool use and complex multi-step interactions.

Key Specifications
Cost
$$$
Context
131K
Parameters
560B (Rumoured)
Released
Sep 09, 2025
Speed
Ability
Reliability
Supported Parameters

This model supports the following parameters:

Temperature Top P Max Tokens
Performance Summary

Meituan's LongCat Flash Chat, a 560B parameter Mixture-of-Experts (MoE) model, demonstrates a balanced performance profile with notable strengths. While its speed performance is moderate, ranking in the 39th percentile, it offers cost-effective solutions, placing in the 64th percentile for price. A standout feature is its exceptional reliability, achieving a 99% success rate across benchmarks, indicating minimal technical failures. The model excels in foundational knowledge and ethical reasoning, achieving perfect 100% accuracy in Hallucinations, General Knowledge, and Ethics benchmarks. It also shows strong mathematical capabilities with 96.0% accuracy, placing it among the top 3 models in this category. Its performance in Email Classification (99.0% accuracy) and Instruction Following (75.0% accuracy) is also highly competitive. While its Reasoning (80.0% accuracy) and Coding (84.0% accuracy) scores are solid, they represent areas with potential for further enhancement compared to its top-tier performance in other domains. LongCat Flash Chat's ability to handle long context windows up to 128K tokens, combined with its optimized design for conversational and agentic tasks, positions it as a robust foundation model, particularly strong in tool use and complex multi-step interactions.

Model Pricing

Current Pricing

Feature Price (per 1M tokens)
Prompt $0.15
Completion $0.75

Price History

Available Endpoints
Provider Endpoint Name Context Length Pricing (Input) Pricing (Output)
AtlasCloud
AtlasCloud | meituan/longcat-flash-chat 131K $0.15 / 1M tokens $0.75 / 1M tokens
Chutes
Chutes | meituan/longcat-flash-chat 131K $0.15 / 1M tokens $0.75 / 1M tokens
Benchmark Results
Benchmark Category Reasoning Strategy Free Executions Accuracy Cost Duration