Author's Description
ERNIE-4.5-VL-424B-A47B is a multimodal Mixture-of-Experts (MoE) model from Baidu’s ERNIE 4.5 series, featuring 424B total parameters with 47B active per token. It is trained jointly on text and image data using a heterogeneous MoE architecture and modality-isolated routing to enable high-fidelity cross-modal reasoning, image understanding, and long-context generation (up to 131k tokens). Fine-tuned with techniques like SFT, DPO, UPO, and RLVR, this model supports both “thinking” and non-thinking inference modes. Designed for vision-language tasks in English and Chinese, it is optimized for efficient scaling and can operate under 4-bit/8-bit quantization.
Key Specifications
Supported Parameters
This model supports the following parameters:
Features
This model supports the following features:
Performance Summary
Baidu's ERNIE 4.5 VL 424B A47B demonstrates a balanced performance profile, excelling in reliability while showing moderate speed and competitive pricing. Its reliability is a significant strength, with a 92% success rate across benchmarks, indicating consistent and usable responses. Speed performance is moderate, ranking in the 33rd percentile, suggesting it performs adequately but not among the fastest models. Pricing is competitive, placing it in the 43rd percentile. The model exhibits strong accuracy in several key areas. It achieved perfect accuracy in Ethics (Baseline), notably being the most accurate model at its price point and speed. It also performed exceptionally well in Email Classification (99.0% accuracy) and General Knowledge (99.5% accuracy), showcasing robust understanding and recall. Its performance in Coding (89.0% accuracy) is also commendable. However, a notable weakness is its performance in Reasoning (46.9% accuracy), where it ranks in the lower 32nd percentile, indicating challenges with complex logical and abstract problem-solving. Instruction Following also presents room for improvement at 62.0% accuracy. Overall, ERNIE 4.5 VL 424B A47B is a highly reliable multimodal model with strong capabilities in knowledge-based and classification tasks, though its reasoning abilities could be further enhanced.
Model Pricing
Current Pricing
Feature | Price (per 1M tokens) |
---|---|
Prompt | $0.42 |
Completion | $1.25 |
Price History
Available Endpoints
Provider | Endpoint Name | Context Length | Pricing (Input) | Pricing (Output) |
---|---|---|---|---|
Novita
|
Novita | baidu/ernie-4.5-vl-424b-a47b | 123K | $0.42 / 1M tokens | $1.25 / 1M tokens |
Benchmark Results
Benchmark | Category | Reasoning | Free | Executions | Accuracy | Cost | Duration |
---|
Other Models by baidu
|
Released | Params | Context |
|
Speed | Ability | Cost |
---|---|---|---|---|---|---|---|
Baidu: ERNIE 4.5 21B A3B | Aug 12, 2025 | 21B | 120K |
Text input
Text output
|
★★★ | ★★★ | $$ |
Baidu: ERNIE 4.5 VL 28B A3B | Aug 12, 2025 | 28B | 30K |
Text input
Image input
Text output
|
★★★ | ★★★★ | $$$ |
Baidu: ERNIE 4.5 300B A47B | Jun 30, 2025 | 300B | 123K |
Text input
Text output
|
★★ | ★★ | $$$ |