这是用户在 2024-7-25 10:02 为 https://mistral.ai/news/mistral-large-2407/ 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?

Large Enough 足够大

Today, we are announcing Mistral Large 2, the new generation of our flagship model. Compared to its predecessor, Mistral Large 2 is significantly more capable in code generation, mathematics, and reasoning. It also provides a much stronger multilingual support, and advanced function calling capabilities.
今天,我们发布新一代旗舰机型 Mistral Large 2。与前代产品相比,Mistral Large 2 在代码生成、数学和推理方面的能力大大增强。它还提供了更强大的多语言支持和高级函数调用功能。

  • July 24, 2024 2024 年 7 月 24 日
  • Mistral AI team Mistral AI 团队
Detailed benchmarks

This latest generation continues to push the boundaries of cost efficiency, speed, and performance. Mistral Large 2 is exposed on la Plateforme and enriched with new features to facilitate building innovative AI applications.
最新一代产品继续推动成本效益、速度和性能的发展。Mistral Large 2 在 la Plateforme 上发布,并丰富了新功能,便于构建创新的人工智能应用。

Mistral Large 2

Mistral Large 2 has a 128k context window and supports dozens of languages including French, German, Spanish, Italian, Portuguese, Arabic, Hindi, Russian, Chinese, Japanese, and Korean, along with 80+ coding languages including Python, Java, C, C++, JavaScript, and Bash.
Mistral Large 2 有一个 128k 的上下文窗口,支持数十种语言,包括法语、德语、西班牙语、意大利语、葡萄牙语、阿拉伯语、印地语、俄语、中文、日语和韩语,以及 80 多种编码语言,包括 Python、Java、C、C++、JavaScript 和 Bash。

Mistral Large 2 is designed for single-node inference with long-context applications in mind – its size of 123 billion parameters allows it to run at large throughput on a single node. We are releasing Mistral Large 2 under the Mistral Research License, that allows usage and modification for research and non-commercial usages. For commercial usage of Mistral Large 2 requiring self-deployment, a Mistral Commercial License must be acquired by contacting us.
Mistral Large 2 专为单节点推理而设计,并考虑到了长上下文应用--其 1230 亿个参数的规模使其能够在单节点上以高吞吐量运行。我们根据 Mistral Research License 发布 Mistral Large 2,允许为研究和非商业用途使用和修改。如果需要自行部署 Mistral Large 2,则必须联系我们获取 Mistral 商业许可证。

General performance 一般性能

Mistral Large 2 sets a new frontier in terms of performance / cost of serving on evaluation metrics. In particular, on MMLU, the pretrained version achieves an accuracy of 84.0%, and sets a new point on the performance/cost Pareto front of open models.
Mistral Large 2 在评估指标的性能/服务成本方面开辟了新的前沿。特别是在 MMLU 上,预训练版本达到了 84.0% 的准确率,在开放模型的性能/成本帕累托前沿上确立了一个新的点。

Code & Reasoning 代码与推理

Following our experience with Codestral 22B and Codestral Mamba, we trained Mistral Large 2 on a very large proportion of code. Mistral Large 2 vastly outperforms the previous Mistral Large, and performs on par with leading models such as GPT-4o, Claude 3 Opus, and Llama 3 405B.
根据 Codestral 22B 和 Codestral Mamba 的经验,我们对 Mistral Large 2 进行了大量代码训练。Mistral Large 2 的性能大大超过了之前的 Mistral Large,与 GPT-4o、Claude 3 Opus 和 Llama 3 405B 等领先模型不相上下。

Detailed benchmarks

A significant effort was also devoted to enhancing the model’s reasoning capabilities. One of the key focus areas during training was to minimize the model’s tendency to “hallucinate” or generate plausible-sounding but factually incorrect or irrelevant information. This was achieved by fine-tuning the model to be more cautious and discerning in its responses, ensuring that it provides reliable and accurate outputs.
在提高模型的推理能力方面也投入了大量精力。培训期间的一个重点领域是尽量减少模型产生 "幻觉 "或生成听起来似乎合理但实际上不正确或不相关信息的倾向。为此,我们对模型进行了微调,使其在作出反应时更加谨慎,更具辨别力,确保提供可靠和准确的输出结果。

Additionally, the new Mistral Large 2 is trained to acknowledge when it cannot find solutions or does not have sufficient information to provide a confident answer. This commitment to accuracy is reflected in the improved model performance on popular mathematical benchmarks, demonstrating its enhanced reasoning and problem-solving skills:
此外,新的 Mistral Large 2 经过训练,能够在无法找到解决方案或没有足够信息提供可靠答案时进行识别。这种对准确性的承诺体现在模型在常用数学基准测试中表现的提高上,证明了其推理和解决问题能力的增强:

Detailed benchmarks

Performance accuracy on code generation benchmarks (all models were benchmarked through the same evaluation pipeline)
代码生成基准的性能精度(所有模型均通过相同的评估管道进行基准测试)

Detailed benchmarks

Performance accuracy on MultiPL-E (all models were benchmarked through the same evaluation pipeline, except for the "paper" row)
在 MultiPL-E 上的性能精度(除 "纸质 "行外,所有模型均通过相同的评估管道进行基准测试)

Detailed benchmarks

Performance accuracy on GSM8K (8-shot) and MATH (0-shot, no CoT) generation benchmarks (all models were benchmarked through the same evaluation pipeline)
在 GSM8K(8 发)和 MATH(0 发,无 CoT)生成基准上的性能精度(所有模型均通过相同的评估管道进行基准测试)

Instruction following & Alignment
教学跟踪和对齐

We drastically improved the instruction-following and conversational capabilities of Mistral Large 2. The new Mistral Large 2 is particularly better at following precise instructions and handling long multi-turn conversations. Below we report the performance on MT-Bench, Wild Bench, and Arena Hard benchmarks:
我们大幅改进了 Mistral Large 2 的指令跟踪和对话能力。新版 Mistral Large 2 在遵循精确指令和处理长时间多轮对话方面表现尤为出色。下面我们将报告在 MT-Bench、Wild Bench 和 Arena Hard 基准测试中的表现:

Detailed benchmarks

Performance on general alignment benchmarks (all models were benchmarked through the same evalutation pipeline)
一般对齐基准的性能(所有模型均通过相同的评估管道进行基准测试)

On some benchmarks, generating lengthy responses tends to improve the scores. However, in many business applications, conciseness is paramount – short model generations facilitate quicker interactions and are more cost-effective for inference. This is why we spent a lot of effort to ensure that generations remain succinct and to the point whenever possible. The graph below reports the average length of generations of different models on questions from the MT Bench benchmark:
在某些基准测试中,生成冗长的响应往往会提高得分。然而,在许多商业应用中,简洁是最重要的--简短的模型生成有助于更快地进行交互,并且在推理中更具成本效益。因此,我们花费了大量精力来确保模型代尽可能简洁明了。下图显示了 MT Bench 基准测试中不同模型生成问题的平均长度:

MT Bench benchmarks
Language diversity 语言多样性

A large fraction of business use cases today involve working with multilingual documents. While the majority of models are English-centric, the new Mistral Large 2 was trained on a large proportion of multilingual data. In particular, it excels in English, French, German, Spanish, Italian, Portuguese, Dutch, Russian, Chinese, Japanese, Korean, Arabic, and Hindi. Below are the performance results of Mistral Large 2 on the multilingual MMLU benchmark, compared to the previous Mistral Large, Llama 3.1 models, and to Cohere’s Command R+.
如今,大部分业务用例都涉及多语言文档的处理。虽然大多数模型都以英语为中心,但新的 Mistral Large 2 却在大量多语言数据上进行了训练。特别是,它在英语、法语、德语、西班牙语、意大利语、葡萄牙语、荷兰语、俄语、中文、日语、韩语、阿拉伯语和印地语方面表现出色。以下是 Mistral Large 2 在多语言 MMLU 基准测试中的性能结果,与之前的 Mistral Large、Llama 3.1 模型以及 Cohere 的 Command R+ 进行了比较。

Detailed benchmarks
Detailed benchmarks

Performance on Multilingual MMLU (measured on the base pretrained model)
多语种 MMLU 的性能(根据基础预训练模型进行测量)

Tool Use & Function Calling
工具使用和功能调用

Mistral Large 2 is equipped with enhanced function calling and retrieval skills and has undergone training to proficiently execute both parallel and sequential function calls, enabling it to serve as the power engine of complex business applications.
Mistral Large 2 配备了增强的函数调用和检索技能,并接受了熟练执行并行和顺序函数调用的培训,使其能够成为复杂业务应用程序的动力引擎。

Detailed benchmarks
Try Mistral Large 2 on la Plateforme
试试 Plateforme 上的 Mistral Large 2

You can use Mistral Large 2 today via la Plateforme under the name mistral-large-2407, and test it on le Chat. It is available under the version 24.07 (a YY.MM versioning system that we are applying to all our models), and the API name mistral-large-2407. Weights for the instruct model are available and are also hosted on HuggingFace.
您今天就可以通过平台使用 Mistral Large 2,名称为 mistral-large-2407 ,并在 le Chat 上进行测试。它的版本为 24.07(YY.MM 版本系统,我们对所有模型都采用了该系统),API 名称为 mistral-large-2407 。指导模型的权重也可在 HuggingFace 上找到。

we are consolidating the offering on la Plateforme around two general purpose models, Mistral Nemo and Mistral Large, and two specialist models, Codestral and Embed. As we progressively deprecate older models on la Plateforme, all Apache models (Mistral 7B, Mixtral 8x7B and 8x22B, Codestral Mamba, Mathstral) remain available for deployment and fine-tuning using our SDK mistral-inference and mistral-finetune.
我们正围绕两个通用型号 Mistral Nemo 和 Mistral Large,以及两个专业型号 Codestral 和 Embed,对 la Plateforme 上的产品进行整合。随着我们逐步淘汰 la Plateforme 上的旧型号,所有 Apache 型号(Mistral 7B、Mixtral 8x7B 和 8x22B、Codestral Mamba、Mathstral)仍可使用我们的 SDK mistral-inference 和 mistral-finetune 进行部署和微调。

Starting today, we are extending fine-tuning capabilities on la Plateforme: those are now available for Mistral Large, Mistral Nemo and Codestral.
从今天起,我们将扩展 Plateforme 上的微调功能:Mistral Large、Mistral Nemo 和 Codestral 现在都可以进行微调。

Access Mistral models through cloud service providers
通过云服务提供商访问 Mistral 模型

We are proud to partner with leading cloud service providers to bring the new Mistral Large 2 to a global audience. In particular, today we are expanding our partnership with Google Cloud Platform to bring Mistral AI’s models on Vertex AI via a Managed API. Mistral AI’s best models are now available on Vertex AI, in addition to Azure AI Studio, Amazon Bedrock and IBM watsonx.ai.
我们很荣幸能与领先的云服务提供商合作,为全球用户带来全新的 Mistral Large 2。特别是,今天我们将扩大与谷歌云平台的合作,通过托管 API 将 Mistral AI 的模型引入 Vertex AI。除了 Azure AI Studio、亚马逊 Bedrock 和 IBM watsonx.ai 之外,Mistral AI 的最佳模型现在也可以在 Vertex AI 上使用。

Availability timeline of Mistral AI models
Mistral 人工智能模型上市时间表
Detailed benchmarks