模型和 API 提供商分析 |人工分析

Independent analysis of AI language models and API providers
对 AI 语言模型和 API 提供商进行独立分析

Understand the AI landscape and choose the best model and API provider for your use-case
了解 AI 环境，并为您的用例选择最佳模型和 API 提供商

AI Builders Survey AI Builders 调查

Participate & receive our report of results
参与并接收我们的结果报告

Highlights 突出

Quality 质量

Quality Index; Higher is better
质量指标;越高越好

Speed 速度

Output Tokens per Second; Higher is better
每秒输出令牌数;越高越好

Price 价格

USD per 1M Tokens; Lower is better
每 1M 代币 USD;越低越好

Navigation 导航

Language Models Comparison Highlights
语言模型比较亮点

Quality Comparison by Ability
按能力进行质量比较

+ Add model from specific provider
+ 添加来自特定提供商的模型

Varied metrics by ability categorization; Higher is better
按能力分类的不同指标;越高越好

General Ability (Chatbot Arena)
一般能力（聊天机器人竞技场）

Reasoning & Knowledge (MMLU)
推理与知识（MMLU）

Reasoning & Knowledge (MT Bench)
推理与知识（MT Bench）

Coding (HumanEval) 编码（HumanEval）

Different use-cases warrant considering different evaluation tests. Chatbot Arena is a good evaluation of communication abilities while MMLU tests reasoning and knowledge more comprehensively.
不同的用例需要考虑不同的评估测试。Chatbot Arena 是对沟通能力的良好评估，而 MMLU 则更全面地测试推理和知识。

Median across providers: Figures represent median (P50) across all providers which support the model.
各提供商的中位数：数字表示支持该模型的所有提供商的中位数（P50）。

Quality vs. Throughput 质量与吞吐量

+ Add model from specific provider
+ 添加来自特定提供商的模型

Quality: General reasoning index, Output Speed: Output Tokens per Second, Price: USD per 1M Tokens
质量：通用推理指数，输出速度：每秒输出代币数，价格：每100万代币美元

Most attractive quadrant 最具吸引力象限

Size represents Price (USD per M Tokens)
大小代表价格（每 M 个代币美元）

GPT-4o GPT-4o的

GPT-4 Turbo GPT-4 涡轮增压

GPT-3.5 Turbo GPT-3.5 涡轮增压

Gemini 1.5 Flash Gemini 1.5 闪存

Gemini 1.5 Pro 双子座 1.5 Pro

Llama 3 (70B) 骆驼 3 （70B）

Llama 3 (8B) 骆驼 3 （8B）

Mixtral 8x22B 混合 8x22B

Mistral Large 米斯特拉尔大号

Mixtral 8x7B 混合 8x7B

Mistral 7B 米斯特拉尔 7B

Claude 3 Opus 克劳德 3 作品

Claude 3 Haiku 克劳德 3 俳句

Command-R+ 命令-R+

DBRX DBRX的

There is a trade-off between model quality and output speed, with higher quality models typically having lower output speed.
模型质量和输出速度之间存在权衡，质量越高的模型通常输出速度越低。

Quality: Index represents normalized average relative performance across Chatbot arena, MMLU & MT-Bench.
质量：指数代表聊天机器人领域、MMLU和MT-Bench的标准化平均相对性能。

Output Speed: Tokens per second received while the model is generating tokens (ie. after first chunk has been received from the API).
输出速度：模型生成令牌时（即从 API 接收第一个块后）每秒接收的令牌数。

Price: Price per token, represented as USD per million Tokens. Price is a blend of Input & Output token prices (3:1 ratio).
价格：每个代币的价格，以每百万个代币的美元表示。价格是输入和输出代币价格的混合（3：1比例）。

Median across providers: Figures represent median (P50) across all providers which support the model.
各提供商的中位数：数字表示支持该模型的所有提供商的中位数（P50）。

Quality vs. Price 质量与价格

+ Add model from specific provider
+ 添加来自特定提供商的模型

Quality: General reasoning index, Price: USD per 1M Tokens
质量：一般推理指数，价格：每100万代币美元

Most attractive quadrant 最具吸引力象限

GPT-4o GPT-4o的

GPT-4 Turbo GPT-4 涡轮增压

GPT-3.5 Turbo GPT-3.5 涡轮增压

Gemini 1.5 Flash Gemini 1.5 闪存

Gemini 1.5 Pro 双子座 1.5 Pro

Llama 3 (70B) 骆驼 3 （70B）

Llama 3 (8B) 骆驼 3 （8B）

Mixtral 8x22B 混合 8x22B

Mistral Large 米斯特拉尔大号

Mixtral 8x7B 混合 8x7B

Mistral 7B 米斯特拉尔 7B

Claude 3 Opus 克劳德 3 作品

Claude 3 Haiku 克劳德 3 俳句

Command-R+ 命令-R+

DBRX DBRX的

While higher quality models are typically more expensive, they do not all follow the same price-quality curve.
虽然更高质量的型号通常更昂贵，但它们并不都遵循相同的价格质量曲线。

Quality: Index represents normalized average relative performance across Chatbot arena, MMLU & MT-Bench.
质量：指数代表聊天机器人领域、MMLU和MT-Bench的标准化平均相对性能。

Median across providers: Figures represent median (P50) across all providers which support the model.
各提供商的中位数：数字表示支持该模型的所有提供商的中位数（P50）。

Output Speed 输出速度

+ Add model from specific provider
+ 添加来自特定提供商的模型

Output Tokens per Second; Higher is better
每秒输出令牌数;越高越好

Median across providers: Figures represent median (P50) across all providers which support the model.
各提供商的中位数：数字表示支持该模型的所有提供商的中位数（P50）。

Pricing: Input and Output Prices
定价：投入和产出价格

+ Add model from specific provider
+ 添加来自特定提供商的模型

USD per 1M Tokens 每 1M 代币 USD

Input price 输入价格

Output price 输出价格

Prices vary considerably, including between input and output token price. Prices can vary by orders of magnitude (>10X) between the more expensive and cheapest models.
价格差异很大，包括输入和输出代币价格之间的价格差异。更昂贵和最便宜的型号之间的价格可能会相差几个数量级（>10X）。

Input price: Price per token included in the request/message sent to the API, represented as USD per million Tokens.
输入价格：发送到 API 的请求/消息中包含的每个代币的价格，以每百万个代币的美元表示。

Output price: Price per token generated by the model (received from the API), represented as USD per million Tokens.
输出价格：模型生成的每个代币的价格（从 API 接收），表示为每百万个代币的美元。

Median across providers: Figures represent median (P50) across all providers which support the model.
各提供商的中位数：数字表示支持该模型的所有提供商的中位数（P50）。

API Provider Highlights: Llama 3 Instruct (70B)
API 提供程序亮点：Llama 3 Instruct （70B）

Output Speed vs. Price: Llama 3 Instruct (70B)
输出速度与价格：Llama 3 Instruct （70B）

Output Speed: Output Tokens per Second, Price: USD per 1M Tokens
输出速度：每秒输出代币数，价格：每 1M 代币 USD

Most attractive quadrant 最具吸引力象限

Microsoft Azure

Amazon Bedrock 亚马逊基岩

Groq 格罗克

Together.ai

Perplexity 困惑

Fireworks 烟火

Deepinfra 深度红外

Replicate 复制

OctoAI 八爪鱼人工智能

Smaller, emerging providers are offering high output speed and at competitive prices.
规模较小的新兴供应商正在以具有竞争力的价格提供高输出速度。

Median: Figures represent median (P50) measurement over the past 14 days.
中位数：数字代表过去 14 天内测量值的中位数（P50）。

Variance data is present on the model and API provider pages amongst the detailed performance metrics. See 'Compare Models' and 'Compare API Providers' in the navigation menu for further analysis.
在详细的性能指标中，差异数据显示在模型和 API 提供程序页面上。请参阅导航菜单中的“比较模型”和“比较 API 提供程序”以进行进一步分析。

Pricing (Input and Output Prices): Llama 3 Instruct (70B)
定价（输入和输出价格）：Llama 3 Instruct （70B）

Price: USD per 1M Tokens; Lower is better
价格：每100万代币美元;越低越好

Input price 输入价格

Output price 输出价格

Providers typically charge different prices for input and output tokens. The ratio of input / output token price for a certain use-case may significantly impact overall costs.
提供商通常对输入和输出代币收取不同的价格。特定用例的输入/输出代币价格之比可能会显着影响总体成本。

Input price: Price per token included in the request/message sent to the API, represented as USD per million Tokens.

Output price: Price per token generated by the model (received from the API), represented as USD per million Tokens.

Output Speed, Over Time: Llama 3 Instruct (70B)

Output Tokens per Second; Higher is better

Microsoft Azure

Amazon Bedrock

Groq

Together.ai

Perplexity

Fireworks

Deepinfra

Replicate

OctoAI

Smaller, emerging providers offer high output speed, though precise speeds delivered vary day-to-day.

Output Speed: Tokens per second received while the model is generating tokens (ie. after first chunk has been received from the API).

Over time measurement: Median measurement per day, based on 8 measurements each day at different times. Labels represent start of week's measurements.

See more information on any of our supported models

Model Name	Creator	License	Context Window

GPT-4o	OpenAI	Proprietary	128k
GPT-4 Turbo	OpenAI	Proprietary	128k
GPT-4	OpenAI	Proprietary	8k
GPT-3.5 Turbo	OpenAI	Proprietary	16k
GPT-3.5 Turbo Instruct	OpenAI	Proprietary	4k

Gemini 1.5 Flash	Google	Proprietary	1m
Gemini 1.5 Pro	Google	Proprietary	1m
Gemini 1.0 Pro	Google	Proprietary	33k
Gemma 7B Instruct	Google	Open	8k

Llama 3 Instruct (70B)	Meta	Open	8k
Llama 3 Instruct (8B)	Meta	Open	8k
Code Llama Instruct (70B)	Meta	Open	16k
Llama 2 Chat (70B)	Meta	Open	4k
Llama 2 Chat (13B)	Meta	Open	4k
Llama 2 Chat (7B)	Meta	Open	4k

Mixtral 8x22B Instruct	Mistral	Open	65k
Mistral Large	Mistral	Proprietary	33k
Mistral Medium	Mistral	Proprietary	33k
Mistral Small	Mistral	Proprietary	33k
Mixtral 8x7B Instruct	Mistral	Open	33k
Mistral 7B Instruct	Mistral	Open	33k

Claude 3 Opus	Anthropic	Proprietary	200k
Claude 3 Sonnet	Anthropic	Proprietary	200k
Claude 3 Haiku	Anthropic	Proprietary	200k
Claude 2.0	Anthropic	Proprietary	100k
Claude 2.1	Anthropic	Proprietary	200k
Claude Instant	Anthropic	Proprietary	100k

Qwen2 Instruct (72B)	Alibaba	Open	128k

Command Light	Cohere	Proprietary	4k
Command	Cohere	Proprietary	4k
Command-R+	Cohere	Open	128k
Command-R	Cohere	Open	128k

OpenChat 3.5 (1210)	OpenChat	Open	8k

DBRX Instruct	Databricks	Open	33k

DeepSeek-V2-Chat	DeepSeek	Open	128k

Arctic Instruct	Snowflake	Open	4k

Independent analysis of AI language models and API providers对 AI 语言模型和 API 提供商进行独立分析

Navigation 导航

Language Models Comparison Highlights语言模型比较亮点

Quality Comparison by Ability按能力进行质量比较

Quality vs. Throughput 质量与吞吐量

Quality vs. Price 质量与价格

Output Speed 输出速度

Pricing: Input and Output Prices定价：投入和产出价格

API Provider Highlights: Llama 3 Instruct (70B)API 提供程序亮点：Llama 3 Instruct （70B）

Output Speed vs. Price: Llama 3 Instruct (70B)输出速度与价格：Llama 3 Instruct （70B）

Pricing (Input and Output Prices): Llama 3 Instruct (70B)定价（输入和输出价格）：Llama 3 Instruct （70B）

Output Speed, Over Time: Llama 3 Instruct (70B)

Independent analysis of AI language models and API providers
对 AI 语言模型和 API 提供商进行独立分析

Language Models Comparison Highlights
语言模型比较亮点

Quality Comparison by Ability
按能力进行质量比较

Pricing: Input and Output Prices
定价：投入和产出价格

API Provider Highlights: Llama 3 Instruct (70B)
API 提供程序亮点：Llama 3 Instruct （70B）

Output Speed vs. Price: Llama 3 Instruct (70B)
输出速度与价格：Llama 3 Instruct （70B）

Pricing (Input and Output Prices): Llama 3 Instruct (70B)
定价（输入和输出价格）：Llama 3 Instruct （70B）