这是用户在 2024-7-27 9:43 为 https://deepmind.google/discover/blog/ai-solves-imo-problems-at-silver-medal-level/ 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?

Research 研究

AI achieves silver-medal standard solving International Mathematical Olympiad problems
AI 达到解决国际数学奥林匹克问题的银牌标准

Published
Authors

AlphaProof and AlphaGeometry teams
AlphaProof 和 AlphaGeometry 团队

A blue background with faint white outlines of a cube, sphere, and mathematical symbols surrounding a central glowing sphere with lines crisscrossing through it.

Breakthrough models AlphaProof and AlphaGeometry 2 solve advanced reasoning problems in mathematics
突破性模型 AlphaProof 和 AlphaGeometry 2 解决了数学中的高级推理问题

Artificial general intelligence (AGI) with advanced mathematical reasoning has the potential to unlock new frontiers in science and technology.
具有高级数学推理能力的通用人工智能(AGI)有可能开启科学和技术的新前沿。

We’ve made great progress building AI systems that help mathematicians discover new insights, novel algorithms and answers to open problems. But current AI systems still struggle with solving general math problems because of limitations in reasoning skills and training data.
我们在构建帮助数学家发现新见解、新算法和解决开放问题的 AI 系统方面取得了巨大进展。但由于推理能力和训练数据的限制,当前的 AI 系统在解决一般数学问题时仍然存在困难。

Today, we present AlphaProof, a new reinforcement-learning based system for formal math reasoning, and AlphaGeometry 2, an improved version of our geometry-solving system. Together, these systems solved four out of six problems from this year’s International Mathematical Olympiad (IMO), achieving the same level as a silver medalist in the competition for the first time.
今天,我们推出了 AlphaProof,一个基于强化学习的新系统用于形式数学推理,以及 AlphaGeometry 2,我们的几何求解系统的改进版本。这些系统共同解决了今年国际数学奥林匹克(IMO)中的六个问题中的四个,首次达到了比赛中银牌得主的水平。

Breakthrough AI performance solving complex math problems
突破性的人工智能性能解决复杂的数学问题

The IMO is the oldest, largest and most prestigious competition for young mathematicians, held annually since 1959.
IMO 是历史最悠久、规模最大、最负盛名的青年数学家竞赛,自 1959 年以来每年举办一次。

Each year, elite pre-college mathematicians train, sometimes for thousands of hours, to solve six exceptionally difficult problems in algebra, combinatorics, geometry and number theory. Many of the winners of the Fields Medal, one of the highest honors for mathematicians, have represented their country at the IMO.
每年,精英高中数学家们有时会训练数千小时,以解决代数、组合学、几何和数论中六个极其困难的问题。许多菲尔兹奖得主(数学家的最高荣誉之一)都曾代表他们的国家参加 IMO。

More recently, the annual IMO competition has also become widely recognised as a grand challenge in machine learning and an aspirational benchmark for measuring an AI system’s advanced mathematical reasoning capabilities.
最近,年度 IMO 竞赛也被广泛认为是机器学习的重大挑战和衡量人工智能系统高级数学推理能力的理想基准。

This year, we applied our combined AI system to the competition problems, provided by the IMO organizers. Our solutions were scored according to the IMO’s point-awarding rules by prominent mathematicians Prof Sir Timothy Gowers, an IMO gold medalist and Fields Medal winner, and Dr Joseph Myers, a two-time IMO gold medalist and Chair of the IMO 2024 Problem Selection Committee.
今年,我们将结合的人工智能系统应用于 IMO 组织者提供的竞赛问题。我们的解决方案由著名数学家、IMO 金牌得主和菲尔兹奖得主蒂莫西·高尔斯教授和两次 IMO 金牌得主、IMO 2024 问题选择委员会主席约瑟夫·迈尔斯博士根据 IMO 的评分规则进行评分。

The fact that the program can come up with a non-obvious construction like this is very impressive, and well beyond what I thought was state of the art.
程序能够提出这样一个不明显的构造,这非常令人印象深刻,远远超出了我认为的技术水平。

Prof Sir Timothy Gowers, 提摩太·高尔斯教授爵士,
IMO gold medalist and Fields Medal winner
国际数学奥林匹克金牌得主和菲尔兹奖得主

First, the problems were manually translated into formal mathematical language for our systems to understand. In the official competition, students submit answers in two sessions of 4.5 hours each. Our systems solved one problem within minutes and took up to three days to solve the others.
首先,问题被手动翻译成正式的数学语言,以便我们的系统理解。在正式比赛中,学生在两个 4.5 小时的会话中提交答案。我们的系统在几分钟内解决了一个问题,其他问题则花了最多三天时间解决。

AlphaProof solved two algebra problems and one number theory problem by determining the answer and proving it was correct. This included the hardest problem in the competition, solved by only five contestants at this year’s IMO. AlphaGeometry 2 proved the geometry problem, while the two combinatorics problems remained unsolved.
AlphaProof 解决了两个代数问题和一个数论问题,通过确定答案并证明其正确性。这包括了今年 IMO 比赛中只有五名参赛者解决的最难问题。AlphaGeometry 2 证明了几何问题,而两个组合问题仍未解决。

Each of the six problems can earn seven points, with a total maximum of 42. Our system achieved a final score of 28 points, earning a perfect score on each problem solved — equivalent to the top end of the silver-medal category. This year, the gold-medal threshold starts at 29 points, and was achieved by 58 of 609 contestants at the official competition.
每个问题可以获得七分,总分为 42 分。我们的系统最终得分为 28 分,每个解决的问题都获得了满分——相当于银牌类别的最高分。今年,金牌门槛从 29 分开始,609 名参赛者中有 58 名在正式比赛中达到了这一分数。

Colored graph showing our AI system’s performance relative to human competitors earning bronze, silver and gold at IMO 2024. Our system earned 28 out of 42 total points, achieving the same level as a silver medalist in the competition and nearly reaching the gold-medal threshold starting at 29 points.

Graph showing performance of our AI system relative to human competitors at IMO 2024. We earned 28 out of 42 total points, achieving the same level as a silver medalist in the competition.
显示我们 AI 系统在 IMO 2024 中相对于人类竞争者的表现的图表。我们获得了 42 分中的 28 分,达到了比赛中银牌得主的水平。

AlphaProof: a formal approach to reasoning
AlphaProof:一种形式化推理方法

AlphaProof is a system that trains itself to prove mathematical statements in the formal language Lean. It couples a pre-trained language model with the AlphaZero reinforcement learning algorithm, which previously taught itself how to master the games of chess, shogi and Go.
AlphaProof 是一个训练自己在形式语言 Lean 中证明数学陈述的系统。它将预训练的语言模型与 AlphaZero 强化学习算法结合在一起,后者之前已经自学掌握了国际象棋、将棋和围棋的游戏。

Formal languages offer the critical advantage that proofs involving mathematical reasoning can be formally verified for correctness. Their use in machine learning has, however, previously been constrained by the very limited amount of human-written data available.
形式语言提供了一个关键优势,即涉及数学推理的证明可以形式化验证其正确性。然而,由于可用的人类编写数据量非常有限,它们在机器学习中的使用以前一直受到限制。

In contrast, natural language based approaches can hallucinate plausible but incorrect intermediate reasoning steps and solutions, despite having access to orders of magnitudes more data. We established a bridge between these two complementary spheres by fine-tuning a Gemini model to automatically translate natural language problem statements into formal statements, creating a large library of formal problems of varying difficulty.
相比之下,基于自然语言的方法可能会产生看似合理但不正确的中间推理步骤和解决方案,尽管它们可以访问数量级更多的数据。我们通过微调 Gemini 模型在自然语言问题陈述和形式陈述之间建立了一座桥梁,创建了一个包含不同难度的形式问题的大型库。

When presented with a problem, AlphaProof generates solution candidates and then proves or disproves them by searching over possible proof steps in Lean. Each proof that was found and verified is used to reinforce AlphaProof’s language model, enhancing its ability to solve subsequent, more challenging problems.
当遇到问题时,AlphaProof 会生成解决方案候选,然后通过在 Lean 中搜索可能的证明步骤来证明或反驳它们。每个找到并验证的证明都用于强化 AlphaProof 的语言模型,增强其解决后续更具挑战性问题的能力。

We trained AlphaProof for the IMO by proving or disproving millions of problems, covering a wide range of difficulties and mathematical topic areas over a period of weeks leading up to the competition. The training loop was also applied during the contest, reinforcing proofs of self-generated variations of the contest problems until a full solution could be found.
我们通过证明或反驳数百万个问题来训练 AlphaProof,涵盖了广泛的难度和数学主题领域,持续数周直到比赛前。训练循环也在比赛期间应用,通过强化自生成的比赛问题变体的证明,直到找到完整的解决方案。

Process infographic of AlphaProof’s reinforcement learning training loop: Around one million informal math problems are translated into a formal math language by a formalizer network. Then a solver network searches for proofs or disproofs of the problems, progressively training itself via the AlphaZero algorithm to solve more challenging problems

Process infographic of AlphaProof’s reinforcement learning training loop: Around one million informal math problems are translated into a formal math language by a formalizer network. Then a solver network searches for proofs or disproofs of the problems, progressively training itself via the AlphaZero algorithm to solve more challenging problems.
AlphaProof 的强化学习训练循环的流程信息图:大约一百万个非正式数学问题由一个形式化网络翻译成正式的数学语言。然后,一个求解器网络搜索这些问题的证明或反证,并通过 AlphaZero 算法逐步训练自己以解决更具挑战性的问题。

A more competitive AlphaGeometry 2
更具竞争力的 AlphaGeometry 2

AlphaGeometry 2 is a significantly improved version of AlphaGeometry. It’s a neuro-symbolic hybrid system in which the language model was based on Gemini and trained from scratch on an order of magnitude more synthetic data than its predecessor. This helped the model tackle much more challenging geometry problems, including problems about movements of objects and equations of angles, ratio or distances.
AlphaGeometry 2 是 AlphaGeometry 的显著改进版本。它是一个神经符号混合系统,其中语言模型基于 Gemini 并从头开始在数量级更多的合成数据上进行训练。这帮助模型解决了更具挑战性的几何问题,包括关于物体运动和角度、比例或距离方程的问题。

AlphaGeometry 2 employs a symbolic engine that is two orders of magnitude faster than its predecessor. When presented with a new problem, a novel knowledge-sharing mechanism is used to enable advanced combinations of different search trees to tackle more complex problems.
AlphaGeometry 2 采用了比其前身快两个数量级的符号引擎。当遇到新问题时,使用一种新颖的知识共享机制来实现不同搜索树的高级组合,以解决更复杂的问题。

Before this year’s competition, AlphaGeometry 2 could solve 83% of all historical IMO geometry problems from the past 25 years, compared to the 53% rate achieved by its predecessor. For IMO 2024, AlphaGeometry 2 solved Problem 4 within 19 seconds after receiving its formalization.
在今年的比赛之前,AlphaGeometry 2 能够解决过去 25 年中所有历史 IMO 几何问题的 83%,而其前身的解决率为 53%。在 2024 年 IMO 比赛中,AlphaGeometry 2 在接收到其形式化后 19 秒内解决了问题 4。

A geometric diagram featuring a triangle ABC inscribed in a larger circle, with various points, lines, and another smaller circle intersecting the triangle. Point A is the apex, with lines connecting it to points L and K on the larger circle, and point E inside the triangle. Points T1 and T2 lie on the lines AB and AC respectively. The smaller circle is centered at point I, the incenter of triangle ABC, and intersects the larger circle at points L and K. Points X, D, and Y lie on lines AB, BC, and AC, respectively, and a blue angle is formed at point P, below the triangle. The diagram is labeled with the letters A, B, C, D, E, I, K, L, O, P, T1, T2, X, and Y.

Illustration of Problem 4, which asks to prove the sum of ∠KIL and ∠XPY equals 180°. AlphaGeometry 2 proposed to construct E, a point on the line BI so that ∠AEB = 90°. Point E helps give purpose to the midpoint L of AB, creating many pairs of similar triangles such as ABE ~ YBI and ALE ~ IPC needed to prove the conclusion.
问题 4 的插图,要求证明∠KIL 和∠XPY 的和等于 180°。AlphaGeometry 2 建议在 BI 线上构造点 E,使得∠AEB=90°。点 E 有助于确定 AB 的中点 L,创建许多类似的三角形对,如 ABE ~ YBI 和 ALE ~ IPC,以证明结论。

New frontiers in mathematical reasoning
数学推理的新前沿

As part of our IMO work, we also experimented with a natural language reasoning system, built upon Gemini and our latest research to enable advanced problem-solving skills. This system doesn’t require the problems to be translated into a formal language and could be combined with other AI systems. We also tested this approach on this year’s IMO problems and the results showed great promise.
作为我们 IMO 工作的一部分,我们还尝试了一个基于 Gemini 和我们最新研究的自然语言推理系统,以实现高级问题解决能力。该系统不需要将问题翻译成形式语言,并且可以与其他 AI 系统结合。我们还在今年的 IMO 问题上测试了这种方法,结果显示出很大的前景。

Our teams are continuing to explore multiple AI approaches for advancing mathematical reasoning and plan to release more technical details on AlphaProof soon.
我们的团队正在继续探索多种 AI 方法以推进数学推理,并计划很快发布有关 AlphaProof 的更多技术细节。

We’re excited for a future in which mathematicians work with AI tools to explore hypotheses, try bold new approaches to solving long-standing problems and quickly complete time-consuming elements of proofs — and where AI systems like Gemini become more capable at math and broader reasoning.
我们对未来充满期待,在这个未来中,数学家们可以使用 AI 工具来探索假设,尝试解决长期存在问题的新方法,并快速完成证明中耗时的部分——而像 Gemini 这样的 AI 系统在数学和更广泛的推理方面变得更有能力。