这是用户在 2024-6-17 16:46 为 https://artificialanalysis.ai/ 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?

Independent analysis of AI language models and API providers
对 AI 语言模型和 API 提供商进行独立分析

Understand the AI landscape and choose the best model and API provider for your use-case
了解 AI 环境,并为您的用例选择最佳模型和 API 提供商

Highlights 突出

Quality 质量
Quality Index; Higher is better
质量指标;越高越好
1009494938883787572656565GPT-4oLogo of GPT-4o which relates to the data aboveGPT-4oGPT-4 TurboLogo of GPT-4 Turbo which relates to the data aboveGPT-4 TurboClaude 3 OpusLogo of Claude 3 Opus which relates to the data aboveClaude 3 OpusGemini 1.5 ProLogo of Gemini 1.5 Pro which relates to the data aboveGemini 1.5 ProLlama 3 (70B)Logo of Llama 3 (70B) which relates to the data aboveLlama 3 (70B)Gemini 1.5 FlashLogo of Gemini 1.5 Flash which relates to the data aboveGemini 1.5FlashMixtral 8x22BLogo of Mixtral 8x22B which relates to the data aboveMixtral 8x22BMistral LargeLogo of Mistral Large which relates to the data aboveMistral LargeClaude 3 HaikuLogo of Claude 3 Haiku which relates to the data aboveClaude 3 HaikuGPT-3.5 TurboLogo of GPT-3.5 Turbo which relates to the data aboveGPT-3.5 TurboLlama 3 (8B)Logo of Llama 3 (8B) which relates to the data aboveLlama 3 (8B)Mixtral 8x7BLogo of Mixtral 8x7B which relates to the data aboveMixtral 8x7B
Speed 速度
Output Tokens per Second; Higher is better
每秒输出令牌数;越高越好
140121117926965636353322724Gemini 1.5 FlashLogo of Gemini 1.5 Flash which relates to the data aboveGemini 1.5FlashLlama 3 (8B)Logo of Llama 3 (8B) which relates to the data aboveLlama 3 (8B)Claude 3 HaikuLogo of Claude 3 Haiku which relates to the data aboveClaude 3 HaikuMixtral 8x7BLogo of Mixtral 8x7B which relates to the data aboveMixtral 8x7BGPT-4oLogo of GPT-4o which relates to the data aboveGPT-4oMixtral 8x22BLogo of Mixtral 8x22B which relates to the data aboveMixtral 8x22BGPT-3.5 TurboLogo of GPT-3.5 Turbo which relates to the data aboveGPT-3.5 TurboGemini 1.5 ProLogo of Gemini 1.5 Pro which relates to the data aboveGemini 1.5 ProLlama 3 (70B)Logo of Llama 3 (70B) which relates to the data aboveLlama 3 (70B)Mistral LargeLogo of Mistral Large which relates to the data aboveMistral LargeGPT-4 TurboLogo of GPT-4 Turbo which relates to the data aboveGPT-4 TurboClaude 3 OpusLogo of Claude 3 Opus which relates to the data aboveClaude 3 Opus
Price 价格
USD per 1M Tokens; Lower is better
每 1M 代币 USD;越低越好
0.20.50.50.50.80.91.25.367.51530Llama 3 (8B)Logo of Llama 3 (8B) which relates to the data aboveLlama 3 (8B)Gemini 1.5 FlashLogo of Gemini 1.5 Flash which relates to the data aboveGemini 1.5FlashMixtral 8x7BLogo of Mixtral 8x7B which relates to the data aboveMixtral 8x7BClaude 3 HaikuLogo of Claude 3 Haiku which relates to the data aboveClaude 3 HaikuGPT-3.5 TurboLogo of GPT-3.5 Turbo which relates to the data aboveGPT-3.5 TurboLlama 3 (70B)Logo of Llama 3 (70B) which relates to the data aboveLlama 3 (70B)Mixtral 8x22BLogo of Mixtral 8x22B which relates to the data aboveMixtral 8x22BGemini 1.5 ProLogo of Gemini 1.5 Pro which relates to the data aboveGemini 1.5 ProMistral LargeLogo of Mistral Large which relates to the data aboveMistral LargeGPT-4oLogo of GPT-4o which relates to the data aboveGPT-4oGPT-4 TurboLogo of GPT-4 Turbo which relates to the data aboveGPT-4 TurboClaude 3 OpusLogo of Claude 3 Opus which relates to the data aboveClaude 3 Opus0.20.50.50.50.80.91.25.367.51530

Language Models Comparison Highlights
语言模型比较亮点

Quality Comparison by Ability
按能力进行质量比较

+ Add model from specific provider
+ 添加来自特定提供商的模型
Varied metrics by ability categorization; Higher is better
按能力分类的不同指标;越高越好
General Ability (Chatbot Arena)
一般能力(聊天机器人竞技场)
128712651256124912311207118911781156115311461114110711031008GPT-4oLogo of GPT-4o which relates to the data aboveGPT-4oGemini 1.5 ProLogo of Gemini 1.5 Pro which relates to the data aboveGemini 1.5 ProGPT-4 TurboLogo of GPT-4 Turbo which relates to the data aboveGPT-4 TurboClaude 3 OpusLogo of Claude 3 Opus which relates to the data aboveClaude 3 OpusGemini 1.5 FlashLogo of Gemini 1.5 Flash which relates to the data aboveGemini 1.5FlashLlama 3 (70B)Logo of Llama 3 (70B) which relates to the data aboveLlama 3 (70B)Command-R+Logo of Command-R+ which relates to the data aboveCommand-R+Claude 3 HaikuLogo of Claude 3 Haiku which relates to the data aboveClaude 3 HaikuMistral LargeLogo of Mistral Large which relates to the data aboveMistral LargeLlama 3 (8B)Logo of Llama 3 (8B) which relates to the data aboveLlama 3 (8B)Mixtral 8x22BLogo of Mixtral 8x22B which relates to the data aboveMixtral 8x22BMixtral 8x7BLogo of Mixtral 8x7B which relates to the data aboveMixtral 8x7BGPT-3.5 TurboLogo of GPT-3.5 Turbo which relates to the data aboveGPT-3.5 TurboDBRXLogo of DBRX which relates to the data aboveDBRXMistral 7BLogo of Mistral 7B which relates to the data aboveMistral 7B128712651256124912311207118911781156115311461114110711031008
Reasoning & Knowledge (MMLU)
推理与知识(MMLU)
89%87%86%86%82%81%79%78%76%75%74%71%70%68%63%GPT-4oLogo of GPT-4o which relates to the data aboveGPT-4oClaude 3 OpusLogo of Claude 3 Opus which relates to the data aboveClaude 3 OpusGPT-4 TurboLogo of GPT-4 Turbo which relates to the data aboveGPT-4 TurboGemini 1.5 ProLogo of Gemini 1.5 Pro which relates to the data aboveGemini 1.5 ProLlama 3 (70B)Logo of Llama 3 (70B) which relates to the data aboveLlama 3 (70B)Mistral LargeLogo of Mistral Large which relates to the data aboveMistral LargeGemini 1.5 FlashLogo of Gemini 1.5 Flash which relates to the data aboveGemini 1.5FlashMixtral 8x22BLogo of Mixtral 8x22B which relates to the data aboveMixtral 8x22BCommand-R+Logo of Command-R+ which relates to the data aboveCommand-R+Claude 3 HaikuLogo of Claude 3 Haiku which relates to the data aboveClaude 3 HaikuDBRXLogo of DBRX which relates to the data aboveDBRXMixtral 8x7BLogo of Mixtral 8x7B which relates to the data aboveMixtral 8x7BGPT-3.5 TurboLogo of GPT-3.5 Turbo which relates to the data aboveGPT-3.5 TurboLlama 3 (8B)Logo of Llama 3 (8B) which relates to the data aboveLlama 3 (8B)Mistral 7BLogo of Mistral 7B which relates to the data aboveMistral 7B89%87%86%86%82%81%79%78%76%75%74%71%70%68%63%
Reasoning & Knowledge (MT Bench)
推理与知识(MT Bench)
9.38.48.36.8GPT-4 TurboLogo of GPT-4 Turbo which relates to the data aboveGPT-4 TurboGPT-3.5 TurboLogo of GPT-3.5 Turbo which relates to the data aboveGPT-3.5 TurboMixtral 8x7BLogo of Mixtral 8x7B which relates to the data aboveMixtral 8x7BMistral 7BLogo of Mistral 7B which relates to the data aboveMistral 7B9.38.48.36.8
Coding (HumanEval) 编码 (HumanEval)
90.285.484.181.774.373.270.162.2GPT-4oLogo of GPT-4o which relates to the data aboveGPT-4oGPT-4 TurboLogo of GPT-4 Turbo which relates to the data aboveGPT-4 TurboGemini 1.5 ProLogo of Gemini 1.5 Pro which relates to the data aboveGemini 1.5 ProLlama 3 (70B)Logo of Llama 3 (70B) which relates to the data aboveLlama 3 (70B)Gemini 1.5 FlashLogo of Gemini 1.5 Flash which relates to the data aboveGemini 1.5FlashGPT-3.5 TurboLogo of GPT-3.5 Turbo which relates to the data aboveGPT-3.5 TurboDBRXLogo of DBRX which relates to the data aboveDBRXLlama 3 (8B)Logo of Llama 3 (8B) which relates to the data aboveLlama 3 (8B)90.285.484.181.774.373.270.162.2
Different use-cases warrant considering different evaluation tests. Chatbot Arena is a good evaluation of communication abilities while MMLU tests reasoning and knowledge more comprehensively.
不同的用例需要考虑不同的评估测试。Chatbot Arena 是对沟通能力的良好评估,而 MMLU 则更全面地测试推理和知识。
Median across providers: Figures represent median (P50) across all providers which support the model.
各提供商的中位数:数字表示支持该模型的所有提供商的中位数 (P50)。

Quality vs. Throughput 质量与吞吐量

+ Add model from specific provider
+ 添加来自特定提供商的模型
Quality: General reasoning index, Output Speed: Output Tokens per Second, Price: USD per 1M Tokens
质量:通用推理指数,输出速度:每秒输出代币数,价格:每100万代币美元
Most attractive quadrant 最具吸引力象限
Size represents Price (USD per M Tokens)
大小代表价格(每 M 个代币美元)
ArtificialAnalysis.ai020406080100120140160Output Speed (Output Tokens per Second)405060708090100110Quality (General ability index)Mistral LargeMistral LargeMistral 7BMistral 7BGPT-3.5 TurboGPT-3.5 TurboCommand-R+Command-R+Claude 3 OpusClaude 3 OpusGPT-4 TurboGPT-4 TurboMixtral 8x22BMixtral 8x22BDBRXDBRXLlama 3 (70B)Llama 3 (70B)Gemini 1.5 ProGemini 1.5 ProMixtral 8x7BMixtral 8x7BGPT-4oGPT-4oClaude 3 HaikuClaude 3 HaikuLlama 3 (8B)Llama 3 (8B)Gemini 1.5 FlashGemini 1.5 Flash
There is a trade-off between model quality and output speed, with higher quality models typically having lower output speed.
模型质量和输出速度之间存在权衡,质量越高的模型通常输出速度越低。
Quality: Index represents normalized average relative performance across Chatbot arena, MMLU & MT-Bench.
质量:指数代表聊天机器人领域、MMLU和MT-Bench的标准化平均相对性能。
Output Speed: Tokens per second received while the model is generating tokens (ie. after first chunk has been received from the API).
输出速度:模型生成令牌时(即从 API 接收第一个块后)每秒接收的令牌数。
Price: Price per token, represented as USD per million Tokens. Price is a blend of Input & Output token prices (3:1 ratio).
价格:每个代币的价格,以每百万个代币的美元表示。价格是输入和输出代币价格的混合(3:1比例)。
Median across providers: Figures represent median (P50) across all providers which support the model.
各提供商的中位数:数字表示支持该模型的所有提供商的中位数 (P50)。

Quality vs. Price 质量与价格

+ Add model from specific provider
+ 添加来自特定提供商的模型
Quality: General reasoning index, Price: USD per 1M Tokens
质量:一般推理指数,价格:每100万代币美元
Most attractive quadrant 最具吸引力象限
ArtificialAnalysis.ai$0.00$5.00$10.00$15.00$20.00$25.00$30.00Price (USD per M Tokens)405060708090100110Quality (General ability index)Mistral 7BMistral 7BLlama 3 (8B)Llama 3 (8B)GPT-3.5 TurboGPT-3.5 TurboMixtral 8x7BMixtral 8x7BClaude 3 HaikuClaude 3 HaikuDBRXDBRXCommand-R+Command-R+Mistral LargeMistral LargeMixtral 8x22BMixtral 8x22BGemini 1.5 FlashGemini 1.5 FlashLlama 3 (70B)Llama 3 (70B)Gemini 1.5 ProGemini 1.5 ProGPT-4 TurboGPT-4 TurboClaude 3 OpusClaude 3 OpusGPT-4oGPT-4o
While higher quality models are typically more expensive, they do not all follow the same price-quality curve.
虽然更高质量的型号通常更昂贵,但它们并不都遵循相同的价格质量曲线。
Quality: Index represents normalized average relative performance across Chatbot arena, MMLU & MT-Bench.
质量:指数代表聊天机器人领域、MMLU和MT-Bench的标准化平均相对性能。
Price: Price per token, represented as USD per million Tokens. Price is a blend of Input & Output token prices (3:1 ratio).
价格:每个代币的价格,以每百万个代币的美元表示。价格是输入和输出代币价格的混合(3:1比例)。
Median across providers: Figures represent median (P50) across all providers which support the model.
各提供商的中位数:数字表示支持该模型的所有提供商的中位数 (P50)。

Output Speed 输出速度

+ Add model from specific provider
+ 添加来自特定提供商的模型
Output Tokens per Second; Higher is better
每秒输出令牌数;越高越好
ArtificialAnalysis.ai140121117928071696563636053322724Gemini 1.5 FlashLogo of Gemini 1.5 Flash which relates to the data aboveGemini 1.5FlashLlama 3 (8B)Logo of Llama 3 (8B) which relates to the data aboveLlama 3 (8B)Claude 3 HaikuLogo of Claude 3 Haiku which relates to the data aboveClaude 3 HaikuMixtral 8x7BLogo of Mixtral 8x7B which relates to the data aboveMixtral 8x7BMistral 7BLogo of Mistral 7B which relates to the data aboveMistral 7BDBRXLogo of DBRX which relates to the data aboveDBRXGPT-4oLogo of GPT-4o which relates to the data aboveGPT-4oMixtral 8x22BLogo of Mixtral 8x22B which relates to the data aboveMixtral 8x22BGPT-3.5 TurboLogo of GPT-3.5 Turbo which relates to the data aboveGPT-3.5 TurboGemini 1.5 ProLogo of Gemini 1.5 Pro which relates to the data aboveGemini 1.5 ProCommand-R+Logo of Command-R+ which relates to the data aboveCommand-R+Llama 3 (70B)Logo of Llama 3 (70B) which relates to the data aboveLlama 3 (70B)Mistral LargeLogo of Mistral Large which relates to the data aboveMistral LargeGPT-4 TurboLogo of GPT-4 Turbo which relates to the data aboveGPT-4 TurboClaude 3 OpusLogo of Claude 3 Opus which relates to the data aboveClaude 3 Opus
Output Speed: Tokens per second received while the model is generating tokens (ie. after first chunk has been received from the API).
输出速度:模型生成令牌时(即从 API 接收第一个块后)每秒接收的令牌数。
Median across providers: Figures represent median (P50) across all providers which support the model.
各提供商的中位数:数字表示支持该模型的所有提供商的中位数 (P50)。

Pricing: Input and Output Prices
定价:投入和产出价格

+ Add model from specific provider
+ 添加来自特定提供商的模型
USD per 1M Tokens 每 1M 代币 USD
Input price 输入价格
Output price 输出价格
ArtificialAnalysis.ai0.20.20.250.350.50.50.91.21.433.54510150.20.21.251.051.50.511.21.41510.512153075Llama 3 (8B)Logo of Llama 3 (8B) which relates to the data aboveLlama 3 (8B)Mistral 7BLogo of Mistral 7B which relates to the data aboveMistral 7BClaude 3 HaikuLogo of Claude 3 Haiku which relates to the data aboveClaude 3 HaikuGemini 1.5 FlashLogo of Gemini 1.5 Flash which relates to the data aboveGemini 1.5FlashGPT-3.5 TurboLogo of GPT-3.5 Turbo which relates to the data aboveGPT-3.5 TurboMixtral 8x7BLogo of Mixtral 8x7B which relates to the data aboveMixtral 8x7BLlama 3 (70B)Logo of Llama 3 (70B) which relates to the data aboveLlama 3 (70B)Mixtral 8x22BLogo of Mixtral 8x22B which relates to the data aboveMixtral 8x22BDBRXLogo of DBRX which relates to the data aboveDBRXCommand-R+Logo of Command-R+ which relates to the data aboveCommand-R+Gemini 1.5 ProLogo of Gemini 1.5 Pro which relates to the data aboveGemini 1.5 ProMistral LargeLogo of Mistral Large which relates to the data aboveMistral LargeGPT-4oLogo of GPT-4o which relates to the data aboveGPT-4oGPT-4 TurboLogo of GPT-4 Turbo which relates to the data aboveGPT-4 TurboClaude 3 OpusLogo of Claude 3 Opus which relates to the data aboveClaude 3 Opus0.20.20.250.350.50.50.91.21.433.54510150.20.21.251.051.50.511.21.41510.512153075
Prices vary considerably, including between input and output token price. Prices can vary by orders of magnitude (>10X) between the more expensive and cheapest models.
价格差异很大,包括输入和输出代币价格之间的价格差异。更昂贵和最便宜的型号之间的价格可能会相差几个数量级 (>10X)。
Input price: Price per token included in the request/message sent to the API, represented as USD per million Tokens.
输入价格:发送到 API 的请求/消息中包含的每个代币的价格,以每百万个代币的美元表示。
Output price: Price per token generated by the model (received from the API), represented as USD per million Tokens.
输出价格:模型生成的每个代币的价格(从 API 接收),表示为每百万个代币的美元。
Median across providers: Figures represent median (P50) across all providers which support the model.
各提供商的中位数:数字表示支持该模型的所有提供商的中位数 (P50)。

API Provider Highlights: Llama 3 Instruct (70B)
API 提供程序亮点:Llama 3 Instruct (70B)

Output Speed vs. Price: Llama 3 Instruct (70B)
输出速度与价格:Llama 3 Instruct (70B)

Output Speed: Output Tokens per Second, Price: USD per 1M Tokens
输出速度:每秒输出代币数,价格:每 1M 代币 USD
Most attractive quadrant 最具吸引力象限
Microsoft Azure
Amazon Bedrock 亚马逊基岩
Groq 格罗克
Together.ai
Perplexity 困惑
Fireworks 烟火
Deepinfra 深度红外
Replicate 复制
OctoAI 八爪鱼人工智能
ArtificialAnalysis.ai$0.00$0.50$1.00$1.50$2.00$2.50$3.00$3.50$4.00$4.50$5.00$5.50$6.00$6.50Price (USD per 1M Tokens)050100150200250300350Output Speed (Output Tokens per Second)Microsoft AzureMicrosoft AzureDeepinfraDeepinfraAmazon BedrockAmazon BedrockReplicateReplicatePerplexityPerplexityOctoAIOctoAITogether.aiTogether.aiFireworksFireworksGroqGroq
Smaller, emerging providers are offering high output speed and at competitive prices.
规模较小的新兴供应商正在以具有竞争力的价格提供高输出速度。
Price: Price per token, represented as USD per million Tokens. Price is a blend of Input & Output token prices (3:1 ratio).
价格:每个代币的价格,以每百万个代币的美元表示。价格是输入和输出代币价格的混合(3:1比例)。
Output Speed: Tokens per second received while the model is generating tokens (ie. after first chunk has been received from the API).
输出速度:模型生成令牌时(即从 API 接收第一个块后)每秒接收的令牌数。
Median: Figures represent median (P50) measurement over the past 14 days.
中位数:数字代表过去 14 天内测量值的中位数 (P50)。
Variance data is present on the model and API provider pages amongst the detailed performance metrics. See 'Compare Models' and 'Compare API Providers' in the navigation menu for further analysis.
在详细的性能指标中,差异数据显示在模型和 API 提供程序页面上。请参阅导航菜单中的“比较模型”和“比较 API 提供程序”以进行进一步分析。

Pricing (Input and Output Prices): Llama 3 Instruct (70B)
定价(输入和输出价格):Llama 3 Instruct (70B)

Price: USD per 1M Tokens; Lower is better
价格:每100万代币美元;越低越好
Input price 输入价格
Output price 输出价格
ArtificialAnalysis.ai0.590.590.60.650.90.91