Baidu: ERNIE 4.5 VL 28B A3B

Text input Image input Text output
Author's Description

A powerful multimodal Mixture-of-Experts chat model featuring 28B total parameters with 3B activated per token, delivering exceptional text and vision understanding through its innovative heterogeneous MoE structure with modality-isolated routing. Built with scaling-efficient infrastructure for high-throughput training and inference, the model leverages advanced post-training techniques including SFT, DPO, and UPO for optimized performance, while supporting an impressive 131K context length and RLVR alignment for superior cross-modal reasoning and generation capabilities.

Key Specifications
Cost
$$$
Context
30K
Parameters
28B
Released
Aug 12, 2025
Speed
Ability
Reliability
Supported Parameters

This model supports the following parameters:

Include Reasoning Stop Presence Penalty Logit Bias Top P Temperature Seed Min P Reasoning Frequency Penalty Max Tokens
Features

This model supports the following features:

Reasoning
Performance Summary

Baidu's ERNIE 4.5 VL 28B A3B, a powerful multimodal Mixture-of-Experts model, demonstrates competitive performance across various metrics. Its speed ranking places it among models with competitive response times, while its price ranking indicates it typically offers cost-effective solutions. Notably, the model exhibits exceptional reliability, boasting a 99% success rate across benchmarks, signifying minimal technical failures and consistent evaluable responses. In terms of specific benchmark results, ERNIE 4.5 VL 28B A3B achieved perfect accuracy in the Ethics (Baseline) benchmark, standing out as the most accurate model at its price point and among models of similar speed. This highlights a significant strength in ethical reasoning. While its Instruction Following and Reasoning capabilities are moderate, scoring 56% and 60% respectively, its Coding performance is solid at 82%. The model also shows strong capabilities in Email Classification and General Knowledge, achieving 96% accuracy in both. Its heterogeneous MoE structure and RLVR alignment appear to contribute to its robust cross-modal reasoning and generation.

Model Pricing

Current Pricing

Feature Price (per 1M tokens)
Prompt $0.14
Completion $0.56

Price History

Available Endpoints
Provider Endpoint Name Context Length Pricing (Input) Pricing (Output)
Novita
Novita | baidu/ernie-4.5-vl-28b-a3b 30K $0.14 / 1M tokens $0.56 / 1M tokens
Benchmark Results
Benchmark Category Reasoning Free Executions Accuracy Cost Duration
Other Models by baidu