Nous: Hermes 3 405B Instruct

Text input Text output
Author's Description

Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the board. Hermes 3 405B is a frontier-level, full-parameter finetune of the Llama-3.1 405B foundation model, focused on aligning LLMs to the user, with powerful steering capabilities and control given to the end user. The Hermes 3 series builds and expands on the Hermes 2 set of capabilities, including more powerful and reliable function calling and structured output capabilities, generalist assistant capabilities, and improved code generation skills. Hermes 3 is competitive, if not superior, to Llama-3.1 Instruct models at general capabilities, with varying strengths and weaknesses attributable between the two.

Key Specifications
Cost
$$$$
Context
131K
Parameters
405B
Released
Aug 15, 2024
Speed
Ability
Reliability
Supported Parameters

This model supports the following parameters:

Stop Presence Penalty Top P Temperature Seed Min P Response Format Frequency Penalty Max Tokens
Features

This model supports the following features:

Response Format
Performance Summary

Nous: Hermes 3 405B Instruct, a frontier-level finetune of Llama-3.1 405B, demonstrates a strong overall performance profile. In terms of speed, it exhibits competitive response times, ranking in the 47th percentile across benchmarks. Its pricing is also competitive, placing in the 46th percentile. A standout feature is its exceptional reliability, achieving a perfect 100th percentile, indicating minimal technical failures and consistent usable responses. Across benchmark categories, Hermes 3 shows particular strength in classification and ethical reasoning, achieving perfect 100% accuracy in both Email Classification and Ethics. It also performs very well in General Knowledge (98.5%) and Coding (85.0%). While its Instruction Following (63.0%) and Reasoning (70.0%) scores are solid, they represent areas with some room for improvement compared to its top performances. Notably, in Email Classification and Ethics, it is highlighted as the most accurate model at its price point and among models of similar speed. Its advanced agentic capabilities, improved roleplaying, multi-turn conversation, and long context coherence are key differentiators, building on the Hermes 2 series with enhanced function calling and structured output.

Model Pricing

Current Pricing

Feature Price (per 1M tokens)
Prompt $0.7
Completion $0.8

Price History

Available Endpoints
Provider Endpoint Name Context Length Pricing (Input) Pricing (Output)
DeepInfra
DeepInfra | nousresearch/hermes-3-llama-3.1-405b 131K $0.7 / 1M tokens $0.8 / 1M tokens
Lambda
Lambda | nousresearch/hermes-3-llama-3.1-405b 131K $0.8 / 1M tokens $0.8 / 1M tokens
Nebius
Nebius | nousresearch/hermes-3-llama-3.1-405b 131K $1 / 1M tokens $3 / 1M tokens
Benchmark Results
Benchmark Category Reasoning Free Executions Accuracy Cost Duration
Other Models by nousresearch