Baidu: ERNIE 4.5 VL 424B A47B

Name: Baidu: ERNIE 4.5 VL 424B A47B
Brand: baidu
Price: 4.2e-7 USD
Availability: InStock
Rating: 3.4 (8 reviews)

Back

Image input Text input Text output

Author's Description

ERNIE-4.5-VL-424B-A47B is a multimodal Mixture-of-Experts (MoE) model from Baidu’s ERNIE 4.5 series, featuring 424B total parameters with 47B active per token. It is trained jointly on text and image data using a heterogeneous MoE architecture and modality-isolated routing to enable high-fidelity cross-modal reasoning, image understanding, and long-context generation (up to 131k tokens). Fine-tuned with techniques like SFT, DPO, UPO, and RLVR, this model supports both “thinking” and non-thinking inference modes. Designed for vision-language tasks in English and Chinese, it is optimized for efficient scaling and can operate under 4-bit/8-bit quantization.

Key Specifications

Cost

$$$$

Context

123K

Parameters

424B

Released

Jun 30, 2025

Speed

★★

Ability

★★★★

Reliability

★★

Hugging Face

Supported Parameters

This model supports the following parameters:

Stop Presence Penalty Max Tokens Frequency Penalty Top P Seed Temperature Reasoning Include Reasoning

Features

This model supports the following features:

Reasoning

Performance Summary

Baidu's ERNIE 4.5 VL 424B A47B model demonstrates moderate speed performance, ranking in the 35th percentile across benchmarks, and offers competitive pricing, placing in the 44th percentile. Notably, its reliability is exceptionally strong, boasting a 94% success rate, indicating consistent and usable responses. The model exhibits strong performance in several key areas. It achieves perfect accuracy in Ethics, highlighting its robust moral reasoning capabilities, and shows high accuracy in General Knowledge (99.5%) and Email Classification (99.0%). Its Coding (89.0%) and Mathematics (88.0%) skills are also commendable. A notable strength is its ability to acknowledge uncertainty, with a 96.0% accuracy in the Hallucinations benchmark. However, the model shows a notable weakness in Reasoning, with an accuracy of 52.0%, placing it in the 35th percentile for this category. Instruction Following also presents a moderate challenge at 62.0% accuracy. Despite these areas for improvement, its multimodal MoE architecture, long-context generation (up to 131k tokens), and support for both "thinking" and non-thinking inference modes, coupled with efficient scaling capabilities, position it as a versatile tool for vision-language tasks in English and Chinese.

Model Pricing

Current Pricing

Feature	Price (per 1M tokens)
Prompt	$0.42
Completion	$1.25

Price History

Available Endpoints

Provider	Endpoint Name	Context Length	Pricing (Input)	Pricing (Output)
Novita	Novita \| baidu/ernie-4.5-vl-424b-a47b	123K	$0.42 / 1M tokens	$1.25 / 1M tokens
Novita	Novita \| baidu/ernie-4.5-vl-424b-a47b	123K	$0.42 / 1M tokens	$1.25 / 1M tokens

Benchmark Results

Benchmark	Category	Reasoning	Strategy	Free	Executions	Accuracy	Cost	Duration

Other Models by baidu

	Released	Params	Context	Filter by Modalities All Modalities	Speed	Ability	Cost
Baidu: ERNIE 4.5 21B A3B Thinking	Oct 09, 2025	21B	131K	Text input Text output	★	★★	$$$$
Baidu: ERNIE 4.5 21B A3B	Aug 12, 2025	21B	120K	Text input Text output	★★★★	★★★	$$
Baidu: ERNIE 4.5 VL 28B A3B	Aug 12, 2025	28B	30K	Image input Text input Text output	★★★	★★★	$$$
Baidu: ERNIE 4.5 300B A47B	Jun 30, 2025	300B	123K	Text input Text output	★	★★	$$$