Goliath 120B

Text input Text output
Author's Description

A large LLM created by combining two fine-tuned Llama 70B models into one 120B model. Combines Xwin and Euryale. Credits to - [@chargoddard](https://huggingface.co/chargoddard) for developing the framework used to merge the model - [mergekit](https://github.com/cg123/mergekit). - [@Undi95](https://huggingface.co/Undi95) for helping with the merge ratios. #merge

Key Specifications
Cost
$$$$$
Context
6K
Parameters
120B
Released
Nov 09, 2023
Speed
Ability
Reliability
Supported Parameters

This model supports the following parameters:

Logit Bias Structured Outputs Response Format Stop Seed Min P Top P Max Tokens Frequency Penalty Temperature Presence Penalty
Features

This model supports the following features:

Response Format Structured Outputs
Performance Summary

Goliath 120B, an alpindale model created by merging two fine-tuned Llama 70B models (Xwin and Euryale), demonstrates exceptional performance in terms of speed and cost-efficiency. It consistently ranks among the fastest models and offers among the most competitive pricing across all evaluated benchmarks. However, its accuracy across various benchmarks presents a mixed picture. In Ethics and Mathematics, Goliath 120B shows significant weaknesses, achieving only 44.0% and 3.0% accuracy respectively, placing it in the lower percentiles for these categories. Similarly, its performance in Coding is modest at 22.0% accuracy. A notable weakness is observed in Instruction Following, where the model recorded 0.0% accuracy, indicating a substantial challenge in processing and executing complex directives. Conversely, the model exhibits a relative strength in Email Classification, achieving 95.0% accuracy, which, while not top-tier, is a respectable performance. The model's reliability is not explicitly provided in the rankings, preventing a specific comment on this aspect. Overall, Goliath 120B stands out for its operational efficiency but requires significant improvement in its ethical reasoning, mathematical capabilities, and instruction following.

Model Pricing

Current Pricing

Feature Price (per 1M tokens)
Prompt $8
Completion $10

Price History

Available Endpoints
Provider Endpoint Name Context Length Pricing (Input) Pricing (Output)
Mancer 2
Mancer 2 | alpindale/goliath-120b 6K $8 / 1M tokens $10 / 1M tokens
NextBit
NextBit | alpindale/goliath-120b 6K $4 / 1M tokens $5.5 / 1M tokens
Benchmark Results
Benchmark Category Reasoning Strategy Free Executions Accuracy Cost Duration
Other Models by alpindale