NVIDIA: Llama 3.1 Nemotron Ultra 253B v1

Text input Text output
Author's Description

Llama-3.1-Nemotron-Ultra-253B-v1 is a large language model (LLM) optimized for advanced reasoning, human-interactive chat, retrieval-augmented generation (RAG), and tool-calling tasks. Derived from Meta’s Llama-3.1-405B-Instruct, it has been significantly customized using Neural...

Key Specifications
Cost
$$$$$
Context
131K
Parameters
253B
Released
Apr 08, 2025
Speed
Ability
Reliability
Supported Parameters

This model supports the following parameters:

Frequency Penalty Structured Outputs Top P Response Format Reasoning Temperature Presence Penalty Include Reasoning Max Tokens
Features

This model supports the following features:

Structured Outputs Response Format Reasoning
Performance Summary

NVIDIA's Llama 3.1 Nemotron Ultra 253B v1 consistently ranks among the fastest models, demonstrating exceptional speed across various tasks. It offers moderate pricing, making it a competitive option in terms of cost-efficiency. Furthermore, the model exhibits outstanding reliability with a 99% success rate, ensuring consistent and usable responses with minimal technical failures. The model excels in critical areas, achieving perfect 100% accuracy in Hallucinations (Baseline), Email Classification (Baseline), and Ethics (Baseline) benchmarks. These results highlight its strong capability in maintaining factual integrity, categorizing information precisely, and adhering to ethical principles. It also shows robust performance in General Knowledge (97.5% accuracy) and Reasoning (80.0% accuracy), indicating strong analytical and cognitive abilities. In Coding (Baseline), it achieves a high 93.0% accuracy, though this comes with a notably slower duration. A significant area for improvement is Instruction Following, where the model scored 14.1% and 0.0% accuracy in two separate benchmarks, indicating a substantial weakness in processing and executing complex, multi-step instructions. Despite this, its strengths in accuracy for critical classification, ethical, and knowledge-based tasks, combined with its speed and high reliability, position it as a powerful tool for advanced reasoning, RAG, and tool-calling applications, particularly where precise output and minimal errors are paramount.

Model Pricing

Current Pricing

Feature Price (per 1M tokens)
Prompt $0.6
Completion $1.8

Price History

Available Endpoints
Provider Endpoint Name Context Length Pricing (Input) Pricing (Output)
Nebius
Nebius | nvidia/llama-3.1-nemotron-ultra-253b-v1 131K $0.6 / 1M tokens $1.8 / 1M tokens
Benchmark Results
Benchmark Category Reasoning Strategy Free Executions Accuracy Cost Duration
Other Models by nvidia