Cogito V2 Preview Llama 109B

Image input Text input Text output
Author's Description

An instruction-tuned, hybrid-reasoning Mixture-of-Experts model built on Llama-4-Scout-17B-16E. Cogito v2 can answer directly or engage an extended “thinking” phase, with alignment guided by Iterated Distillation & Amplification (IDA). It targets coding, STEM, instruction following, and general helpfulness, with stronger multilingual, tool-calling, and reasoning performance than size-equivalent baselines. The model supports long-context use (up to 10M tokens) and standard Transformers workflows. Users can control the reasoning behaviour with the `reasoning` `enabled` boolean. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#enable-reasoning-with-default-config)

Key Specifications
Cost
$$$
Context
32K
Parameters
109B
Released
Sep 02, 2025
Ability
Reliability
Supported Parameters

This model supports the following parameters:

Stop Top P Min P Frequency Penalty Tool Choice Max Tokens Reasoning Tools Presence Penalty Include Reasoning Logit Bias Temperature
Features

This model supports the following features:

Tools Reasoning
Performance Summary

Cogito V2 Preview Llama 109B, provided by deepcogito, demonstrates exceptional speed, consistently ranking among the fastest models with an Infinityth percentile across six benchmarks. Price data is currently unavailable, suggesting potential free tier usage. However, a significant concern is its reliability, which is critically low, experiencing frequent technical failures resulting in a 0% success rate across all evaluated benchmarks. This severe unreliability renders the model unusable for practical applications despite its speed. Across all benchmark categories—Ethics, Instruction Following, Coding, Reasoning, Email Classification, and General Knowledge—the model achieved 0.0% accuracy. This indicates a complete failure to provide correct responses, likely due to the pervasive technical issues rather than a lack of inherent capability. The primary strength is its unparalleled speed, while its overwhelming weakness is its complete lack of reliability and inability to produce any accurate output.

Model Pricing

Current Pricing

Feature Price (per 1M tokens)
Prompt $0.18
Completion $0.59

Price History

Available Endpoints
Provider Endpoint Name Context Length Pricing (Input) Pricing (Output)
Together
Together | deepcogito/cogito-v2-preview-llama-109b-moe 32K $0.18 / 1M tokens $0.59 / 1M tokens
Benchmark Results
Benchmark Category Reasoning Free Executions Accuracy Cost Duration
Other Models by deepcogito