Shisa AI: Shisa V2 Llama 3.3 70B

Text input Text output Free Option
Author's Description

Shisa V2 Llama 3.3 70B is a bilingual Japanese-English chat model fine-tuned by Shisa.AI on Meta’s Llama-3.3-70B-Instruct base. It prioritizes Japanese language performance while retaining strong English capabilities. The model was optimized entirely through post-training, using a refined mix of supervised fine-tuning (SFT) and DPO datasets including regenerated ShareGPT-style data, translation tasks, roleplaying conversations, and instruction-following prompts. Unlike earlier Shisa releases, this version avoids tokenizer modifications or extended pretraining. Shisa V2 70B achieves leading Japanese task performance across a wide range of custom and public benchmarks, including JA MT Bench, ELYZA 100, and Rakuda. It supports a 128K token context length and integrates smoothly with inference frameworks like vLLM and SGLang. While it inherits safety characteristics from its base model, no additional alignment was applied. The model is intended for high-performance bilingual chat, instruction following, and translation tasks across JA/EN.

Key Specifications
Cost
$
Context
32K
Parameters
70B
Released
Apr 15, 2025
Speed
Ability
Reliability
Supported Parameters

This model supports the following parameters:

Stop Presence Penalty Logit Bias Top P Temperature Seed Min P Frequency Penalty Logprobs Max Tokens Top Logprobs
Performance Summary

Shisa V2 Llama 3.3 70B, from shisa-ai, demonstrates a strong overall performance profile, particularly excelling in cost-efficiency and reliability. The model consistently offers among the most competitive pricing, ranking in the 94th percentile across six benchmarks. Its reliability is exceptional, boasting a 96% success rate, indicating minimal technical failures and consistent response delivery. In terms of speed, the model shows competitive response times, ranking in the 44th percentile. Benchmark analysis reveals distinct strengths and areas for improvement. Shisa V2 achieves perfect accuracy in "Instruction Following (Baseline)" and "Email Classification (Baseline)," with the former also demonstrating top-tier speed. It is notably accurate and cost-effective in classification tasks. The model also performs well in "Ethics (Baseline)" and "General Knowledge (Baseline)," achieving high accuracy. However, its performance in "Coding (Baseline)" and "Reasoning (Baseline)" is moderate, with accuracy in the 32nd and 38th percentiles respectively. Despite these variations, its overall reliability and cost-effectiveness make it a compelling option for high-performance bilingual chat, instruction following, and translation tasks, especially given its optimized post-training for Japanese and English.

Model Pricing

Current Pricing

Feature Price (per 1M tokens)
Prompt $0.02
Completion $0.08

Price History

Available Endpoints
Provider Endpoint Name Context Length Pricing (Input) Pricing (Output)
Chutes
Chutes | shisa-ai/shisa-v2-llama3.3-70b 32K $0.02 / 1M tokens $0.08 / 1M tokens
Benchmark Results
Benchmark Category Reasoning Free Executions Accuracy Cost Duration