Nous: Hermes 4 405B

Text input Text output
Author's Description

Hermes 4 is a large-scale reasoning model built on Meta-Llama-3.1-405B and released by Nous Research. It introduces a hybrid reasoning mode, where the model can choose to deliberate internally with <think>...</think> traces or respond directly, offering flexibility between speed and depth. Users can control the reasoning behaviour with the `reasoning` `enabled` boolean. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#enable-reasoning-with-default-config) The model is instruction-tuned with an expanded post-training corpus (~60B tokens) emphasizing reasoning traces, improving performance in math, code, STEM, and logical reasoning, while retaining broad assistant utility. It also supports structured outputs, including JSON mode, schema adherence, function calling, and tool use. Hermes 4 is trained for steerability, lower refusal rates, and alignment toward neutral, user-directed behavior.

Key Specifications
Cost
$$$$
Context
131K
Parameters
405B
Released
Aug 26, 2025
Speed
Ability
Reliability
Supported Parameters

This model supports the following parameters:

Include Reasoning Top Logprobs Stop Logprobs Max Tokens Top P Frequency Penalty Reasoning Logit Bias Seed Temperature Presence Penalty
Features

This model supports the following features:

Reasoning
Performance Summary

Nous: Hermes 4 405B, built on Meta-Llama-3.1-405B, demonstrates a balanced performance profile with exceptional reliability. It performs competitively in terms of speed, ranking in the 56th percentile across benchmarks, and offers moderate pricing, placing in the 34th percentile. Notably, the model achieves a perfect 100% success rate across all 8 benchmarks, indicating outstanding technical reliability with no reported failures. Hermes 4 excels in specific areas, achieving perfect accuracy in both Hallucinations (100.0%) and Ethics (100.0%) benchmarks, making it the most accurate model at its price point and speed for these categories. It also shows strong performance in General Knowledge (99.5%) and Email Classification (99.0%), ranking in the 77th and 81st percentiles respectively. Its instruction-tuned nature, with an expanded post-training corpus emphasizing reasoning traces, is evident in its solid 72.0% accuracy in Reasoning. While its Coding (83.0%) and Mathematics (79.0%) scores are respectable, they fall closer to the average compared to its top-tier performance in other areas. The model's hybrid reasoning mode and support for structured outputs like JSON mode and function calling further enhance its versatility.

Model Pricing

Current Pricing

Feature Price (per 1M tokens)
Prompt $1
Completion $3

Price History

Available Endpoints
Provider Endpoint Name Context Length Pricing (Input) Pricing (Output)
Nebius
Nebius | nousresearch/hermes-4-405b 131K $1 / 1M tokens $3 / 1M tokens
Chutes
Chutes | nousresearch/hermes-4-405b 131K $0.3 / 1M tokens $1.2 / 1M tokens
Benchmark Results
Benchmark Category Reasoning Strategy Free Executions Accuracy Cost Duration
Other Models by nousresearch