Goliath 120B

Text input Text output
Author's Description

A large LLM created by combining two fine-tuned Llama 70B models into one 120B model. Combines Xwin and Euryale. Credits to - [@chargoddard](https://huggingface.co/chargoddard) for developing the framework used to merge...

Key Specifications
Cost
$$$$$
Context
6K
Parameters
120B
Released
Nov 09, 2023
Speed
Ability
Reliability
Supported Parameters

This model supports the following parameters:

Seed Frequency Penalty Top P Logprobs Min P Response Format Temperature Stop Presence Penalty Max Tokens Logit Bias Top Logprobs
Features

This model supports the following features:

Response Format
Performance Summary

Goliath 120B, a merged model from alpindale, demonstrates exceptional speed and cost efficiency. It consistently ranks among the fastest models and offers highly competitive pricing across all evaluated benchmarks. However, its performance in accuracy-based tasks is generally low. In Ethics and Mathematics, it achieved 44.0% and 3.0% accuracy respectively, placing it in the lower percentiles. Similarly, its Coding accuracy was 22.0%. A significant weakness is observed in Instruction Following, where it scored 0.0% accuracy, indicating a substantial challenge in processing and executing complex directives. The model's strongest performance was in Email Classification, achieving 95.0% accuracy, which suggests a capability for specific classification tasks. Overall, Goliath 120B excels in operational efficiency (speed and cost) but struggles significantly with tasks requiring nuanced understanding, ethical reasoning, mathematical problem-solving, and precise instruction adherence.

Model Pricing

Current Pricing

Feature Price (per 1M tokens)
Prompt $3.75
Completion $7.5

Price History

Available Endpoints
Provider Endpoint Name Context Length Pricing (Input) Pricing (Output)
Mancer 2
Mancer 2 | alpindale/goliath-120b 6K $3.75 / 1M tokens $7.5 / 1M tokens
NextBit
NextBit | alpindale/goliath-120b 6K $3.75 / 1M tokens $7.5 / 1M tokens
Benchmark Results
Benchmark Category Reasoning Strategy Free Executions Accuracy Cost Duration
Other Models by alpindale