Anthropic: Claude 3.5 Sonnet

File input Text input Image input Text output
Author's Description

New Claude 3.5 Sonnet delivers better-than-Opus capabilities, faster-than-Sonnet speeds, at the same Sonnet prices. Sonnet is particularly good at: - Coding: Scores ~49% on SWE-Bench Verified, higher than the last best score, and without any fancy prompt scaffolding - Data science: Augments human data science expertise; navigates unstructured data while using multiple tools for insights - Visual processing: excelling at interpreting charts, graphs, and images, accurately transcribing text to derive insights beyond just the text alone - Agentic tasks: exceptional tool use, making it great at agentic tasks (i.e. complex, multi-step problem solving tasks that require engaging with other systems) #multimodal

Key Specifications
Cost
$$$$$
Context
200K
Parameters
400B (Rumoured)
Released
Oct 21, 2024
Speed
Ability
Reliability
Supported Parameters

This model supports the following parameters:

Stop Tool Choice Top P Temperature Tools Max Tokens
Features

This model supports the following features:

Tools
Performance Summary

Anthropic's Claude 3.5 Sonnet, released October 21, 2024, demonstrates a strong overall performance profile. It exhibits competitive response times, ranking in the 48th percentile across benchmarks, indicating it performs among the faster models available. While its pricing is positioned at a premium, ranking in the 15th percentile, its exceptional reliability stands out, achieving a perfect 100th percentile. This means the model consistently provides usable responses with minimal technical failures. Performance across specific benchmarks highlights several strengths. Claude 3.5 Sonnet achieved perfect accuracy in both the Ethics and General Knowledge benchmarks, notably being the most accurate model at its price point and speed in these categories. It also excelled in Instruction Following and Reasoning, scoring 77% (94th percentile) and 82% (80th percentile) respectively, showcasing its ability to handle complex directives and logical problems. Its Coding performance, at 82% accuracy, is solid, aligning with its description of scoring high on SWE-Bench. The model's ability to classify emails accurately at 98% further underscores its strong performance in classification tasks. While its speed is competitive overall, individual benchmark durations vary, with some being faster than others. The primary weakness appears to be its premium pricing, which may be a consideration for cost-sensitive applications, though its high reliability and strong accuracy in key areas offer significant value.

Model Pricing

Current Pricing

Feature Price (per 1M tokens)
Prompt $3
Completion $15
Input Cache Read $0.3
Input Cache Write $3.75

Price History

Available Endpoints
Provider Endpoint Name Context Length Pricing (Input) Pricing (Output)
Anthropic
Anthropic | anthropic/claude-3.5-sonnet 200K $3 / 1M tokens $15 / 1M tokens
Google
Google | anthropic/claude-3.5-sonnet 200K $3 / 1M tokens $15 / 1M tokens
Amazon Bedrock
Amazon Bedrock | anthropic/claude-3.5-sonnet 200K $3 / 1M tokens $15 / 1M tokens
Benchmark Results
Benchmark Category Reasoning Free Executions Accuracy Cost Duration
Other Models by anthropic