Baidu: ERNIE 4.5 VL 424B A47B

Text input Image input Text output
Author's Description

ERNIE-4.5-VL-424B-A47B is a multimodal Mixture-of-Experts (MoE) model from Baidu’s ERNIE 4.5 series, featuring 424B total parameters with 47B active per token. It is trained jointly on text and image data using a heterogeneous MoE architecture and modality-isolated routing to enable high-fidelity cross-modal reasoning, image understanding, and long-context generation (up to 131k tokens). Fine-tuned with techniques like SFT, DPO, UPO, and RLVR, this model supports both “thinking” and non-thinking inference modes. Designed for vision-language tasks in English and Chinese, it is optimized for efficient scaling and can operate under 4-bit/8-bit quantization.

Key Specifications
Cost
$$$$
Context
123K
Parameters
424B
Released
Jun 30, 2025
Speed
Ability
Reliability
Supported Parameters

This model supports the following parameters:

Include Reasoning Stop Presence Penalty Logit Bias Top P Temperature Seed Min P Reasoning Frequency Penalty Max Tokens
Features

This model supports the following features:

Reasoning
Performance Summary

Baidu's ERNIE 4.5 VL 424B A47B demonstrates a balanced performance profile, excelling in reliability while showing moderate speed and competitive pricing. Its reliability is a significant strength, with a 92% success rate across benchmarks, indicating consistent and usable responses. Speed performance is moderate, ranking in the 33rd percentile, suggesting it performs adequately but not among the fastest models. Pricing is competitive, placing it in the 43rd percentile. The model exhibits strong accuracy in several key areas. It achieved perfect accuracy in Ethics (Baseline), notably being the most accurate model at its price point and speed. It also performed exceptionally well in Email Classification (99.0% accuracy) and General Knowledge (99.5% accuracy), showcasing robust understanding and recall. Its performance in Coding (89.0% accuracy) is also commendable. However, a notable weakness is its performance in Reasoning (46.9% accuracy), where it ranks in the lower 32nd percentile, indicating challenges with complex logical and abstract problem-solving. Instruction Following also presents room for improvement at 62.0% accuracy. Overall, ERNIE 4.5 VL 424B A47B is a highly reliable multimodal model with strong capabilities in knowledge-based and classification tasks, though its reasoning abilities could be further enhanced.

Model Pricing

Current Pricing

Feature Price (per 1M tokens)
Prompt $0.42
Completion $1.25

Price History

Available Endpoints
Provider Endpoint Name Context Length Pricing (Input) Pricing (Output)
Novita
Novita | baidu/ernie-4.5-vl-424b-a47b 123K $0.42 / 1M tokens $1.25 / 1M tokens
Benchmark Results
Benchmark Category Reasoning Free Executions Accuracy Cost Duration
Other Models by baidu