We’re Entering Uncharted Territory for Math
我们正在进入数学的未知领域

Author:

Length: • 1 min

October 04

Annotated by howie.serious

howie.serious: 在《我们进入数学的未知领域》中，作者深入探讨了人工智能与人类在学习和问题解决上的根本差异。AI，尽管在处理复杂任务时表现出色，但其思考过程更像是一个“平庸的研究助理”，缺乏真正的理解与创造力。相比之下，人类的学习能力不仅仅是积累知识，更在于如何将这些知识应用于新情境中，推动创新与发现。AI可以高效地完成常规任务，但无法替代人类在创造性思维和情感驱动下的学习过程。通过将AI的计算能力与人类的独特创造力相结合，我们或许能够在新兴的“工业级数学”领域中开辟出前所未有的可能性。这一切提醒我们，AI是辅助工具，而非替代者，未来的成功在于我们如何利用这一工具，提升自身的学习与创造能力。背景: GPT对人类学习的影响与启示 🧠 AI与人类在学习和解决问题的方式上存在根本差异，因此可以将AI视为一种补充工具。结合AI和人类的优势，将为完成任务提供更具前景的解决方案。 💡 AI可以处理例行的任务，但缺乏创造力和想象力。我们应当认识到AI的局限性，尤其是在需要创新和深度理解的领域。 📊 在数学和科学任务中，AI能够模拟人类的思维过程，但其“推理”能力仍然有限。我们需要谨慎使用这些工具，明确它们的作用与不足，以便更好地指导研究与学习。 ### newsletter 在《我们正进入数学的未知领域》中，陶哲轩探讨了人工智能与人类学习和解决问题的根本区别。他指出，虽然AI在处理常规任务时表现出色，但它的创造力和理解能力仍显不足，这使得它更像是一个“平庸但并非完全无能”的研究助理。未来，AI与人类的协作将为数学研究开辟新的可能性，推动“工业规模的数学”发展。

Terence Tao, a mathematics professor at UCLA, is a real-life superintelligence. The “Mozart of Math,” as he is sometimes called, is widely considered the world’s greatest living mathematician. He has won numerous awards, including the equivalent of a Nobel Prize for mathematics, for his advances and proofs. Right now, AI is nowhere close to his level.
特伦斯·陶是加州大学洛杉矶分校的数学教授，被誉为现实生活中的超级智能。他有时被称为“数学界的莫扎特”，被广泛认为是当今世界上最伟大的数学家。他赢得了无数奖项，包括数学界的诺贝尔奖，因其在数学上的进步和证明。目前，人工智能还远未达到他的水平。

But technology companies are trying to get it there. Recent, attention-grabbing generations of AI—even the almighty ChatGPT—were not built to handle ⁠⁠mathematical reasoning⁠⁠. They were instead focused on ⁠⁠language⁠⁠: When you asked such a program to answer a basic question, ⁠⁠it did not understand and execute an equation or formulate a proof, but instead presented an answer based on which words were likely to appear in sequence.⁠⁠ For instance, the original ChatGPT can’t add or multiply, but has seen enough examples of algebra to solve x + 2 = 4: “To solve the equation x + 2 = 4, subtract 2 from both sides …” Now, however, OpenAI is explicitly marketing ⁠⁠a new line of “reasoning models,”⁠⁠ known collectively as the o1 series, for their ability to ⁠⁠problem-solve “much like a person” and work through complex mathematical and scientific tasks and queries.⁠⁠ If these models are successful, they could represent a sea change for the slow, lonely work that Tao and his peers do.
但科技公司正努力让它达到这个水平。最近引人注目的几代人工智能——甚至是强大的 ChatGPT——并不是为了处理数学推理而构建的。它们的重点是语言：当你要求这样的程序回答一个基本问题时，它并不是理解和执行一个方程或制定一个证明，而是根据哪些词可能按顺序出现来给出答案。例如，原始的 ChatGPT 不能进行加法或乘法运算，但已经看过足够多的代数例子来解决 x + 2 = 4：“要解决方程 x + 2 = 4，从两边减去 2……”然而，现在 OpenAI 正在明确地推广一系列新的“推理模型”，统称为 o1 系列，以其“像人一样”解决问题的能力以及处理复杂的数学和科学任务和查询的能力。如果这些模型成功，它们可能会为陶和他的同行所做的缓慢而孤独的工作带来重大变化。

[Read: OpenAI’s big reset]
[阅读：OpenAI 的大重置]

After I saw Tao post his impressions of o1 online—he compared it to a “mediocre, but not completely incompetent” graduate student—I wanted to understand more about his views on the technology’s potential. In a Zoom call last week, he described a kind of ⁠⁠AI-enabled, “industrial-scale mathematics”⁠⁠ that has never been possible before: one in which AI, at least in the near future, is not a creative collaborator in its own right so much as a lubricant for mathematicians’ hypotheses and approaches. ⁠⁠This new sort of math⁠⁠, which could unlock terra incognitae of knowledge, will remain human at its core, embracing how people and machines have very different strengths that should be thought of as complementary rather than competing.
在我看到陶在网上发布他对 o1 的印象后——他将其比作“一个平庸但不完全无能”的研究生——我想了解更多关于他对这项技术潜力的看法。在上周的 Zoom 通话中，他描述了一种前所未有的 AI 支持的“工业规模数学”：在不久的将来，AI 并不是一个独立的创造性合作者，而是数学家假设和方法的润滑剂。这种新型数学可能会解锁知识的未知领域，但其核心仍将是人类，强调人类和机器有着非常不同的优势，应该被视为互补而非竞争。

This conversation has been edited for length and clarity.
此对话经过编辑以简化和清晰。

Matteo Wong: What was your first experience with ChatGPT?
Matteo Wong: 你第一次使用 ChatGPT 的体验是什么？

Terence Tao: I played with it pretty much as soon as it came out. I posed some difficult math problems, and it gave pretty silly results. It was coherent English, it mentioned the right words, but there was very little depth. Anything really advanced, the early GPTs were not impressive at all. They were good for fun things—like if you wanted to explain some mathematical topic as a poem or as a story for kids. Those are quite impressive.
Terence Tao: 我几乎在它刚出来的时候就开始玩了。我提出了一些困难的数学问题，它给出的结果相当可笑。虽然是连贯的英语，也提到了正确的词汇，但深度很浅。对于任何真正高级的东西，早期的 GPTs 根本不令人印象深刻。它们适合做一些有趣的事情——比如如果你想把某个数学主题解释成一首诗或一个儿童故事。这些还是相当令人印象深刻的。

Wong: ⁠⁠OpenAI says o1 can “reason,” but you compared the model to “a mediocre, but not completely incompetent” graduate student.⁠⁠
Wong: OpenAI 说 o1 可以“推理”，但你将该模型比作“一个平庸但不完全无能”的研究生。

Tao: That initial wording went viral, but it got misinterpreted. I wasn’t saying that this tool is equivalent to a graduate student in every single aspect of graduate study. I ⁠⁠was interested in using these tools as research assistants. A research project has a lot of tedious steps: You may have an idea and you want to flesh out computations, but you have to do it by hand and work it all out.⁠⁠
Tao: 那个最初的措辞走红了，但被误解了。我并不是说这个工具在研究生学习的每个方面都等同于一个研究生。我对将这些工具用作研究助手感兴趣。一个研究项目有很多繁琐的步骤：你可能有一个想法，想要展开计算，但必须手动完成并解决所有问题。

Wong: So it’s ⁠⁠a mediocre or incompetent research assistant.⁠⁠
Wong: 所以它是一个平庸或无能的研究助手。

Tao: Right, it’s the equivalent, in terms of serving as that kind of an assistant. But I do envision a future where you do research through a conversation with a chatbot. Say you have an idea, and the chatbot went with it and filled out all the details.
Tao: 是的，这相当于作为那种助手的角色。但我确实设想了一个未来，你可以通过与聊天机器人的对话进行研究。假设你有一个想法，聊天机器人跟随这个想法并填充所有细节。

It’s already happening in some other areas. AI famously conquered chess years ago, but chess is still thriving today, because it’s now possible for a reasonably good chess player to speculate what moves are good in what situations, and they can use the chess engines to check 20 moves ahead. I can see this sort of thing happening in mathematics eventually: You have a project and ask, “What if I try this approach?” And instead of spending hours and hours actually trying to make it work, you guide a GPT to do it for you.
这种情况已经在其他领域发生了。多年前，AI 在国际象棋上取得了胜利，但国际象棋今天仍然蓬勃发展，因为现在一个相当不错的棋手可以推测在什么情况下哪些走法是好的，他们可以使用棋类引擎来检查 20 步之后的情况。我可以看到这种事情最终会在数学中发生：你有一个项目并问，“如果我尝试这种方法会怎样？”而不是花费数小时真正尝试使其工作，你可以引导 GPT 为你完成。

With o1, you can kind of do this. I gave it a problem I knew how to solve, and I tried to guide the model. First I gave it a hint, and it ignored the hint and did something else, which didn’t work. When I explained this, it apologized and said, “Okay, I’ll do it your way.” And then it carried out my instructions reasonably well, and then it got stuck again, and I had to correct it again. The model never figured out the most clever steps. ⁠⁠It could do all the routine things, but it was very unimaginative.⁠⁠
使用 o1，你可以做到这一点。我给了它一个我知道如何解决的问题，并试图引导模型。首先我给了它一个提示，但它忽略了提示并做了其他事情，这没有奏效。当我解释这一点时，它道歉并说，“好的，我会按照你的方式做。”然后它合理地执行了我的指示，但又卡住了，我不得不再次纠正它。模型从未想出最聪明的步骤。它可以做所有常规的事情，但非常缺乏想象力。

⁠⁠One key difference between graduate students and AI is that graduate students learn.⁠⁠ You tell an AI its approach doesn’t work, it apologizes, it will maybe temporarily correct its course, but sometimes it just snaps back to the thing it tried before. And if you start a new session with AI, you go back to square one. I’m much more patient with graduate students because I know that even if a graduate student completely fails to solve a task, they have potential to learn and self-correct.
研究生和 AI 之间的一个关键区别是研究生会学习。你告诉 AI 它的方法行不通，它会道歉，可能会暂时纠正其路线，但有时它会突然回到之前尝试的事情。而如果你与 AI 开始一个新的会话，你就回到了起点。我对研究生更有耐心，因为我知道即使一个研究生完全无法解决任务，他们也有学习和自我纠正的潜力。

Wong: The way OpenAI describes it, o1 can recognize its mistakes, but you’re saying that’s not the same as sustained learning, which is what actually makes mistakes useful for humans.
Wong: OpenAI 描述的方式是，o1 可以识别其错误，但你说这与持续学习不同，而持续学习实际上使错误对人类有用。

Tao: Yes, humans have growth. These models are static—the feedback I give to GPT-4 might be used as 0.00001 percent of the training data for GPT-5. But that’s not really the same as with a student.
Tao: 是的，人类是会成长的。这些模型是静态的——我给 GPT-4 的反馈可能会被用作 GPT-5 训练数据的 0.00001%。但这与对学生的教育并不完全相同。

⁠⁠AI and humans have such different models for how they learn and solve problems—I think it’s better to think of AI as a complementary way to do tasks. For a lot of tasks, having both AIs and humans doing different things will be most promising.
人工智能和人类在学习和解决问题的方式上有很大不同——我认为最好将人工智能视为一种互补的任务完成方式。对于许多任务，让人工智能和人类分别做不同的事情将是最有前途的。⁠⁠

Wong: You’ve also said previously that computer programs might transform mathematics and make it easier for humans to collaborate with one another. How so? And does generative AI have anything to contribute here?
Wong: 你之前也提到过，计算机程序可能会改变数学，并使人类更容易相互合作。怎么做到的？生成式人工智能在这方面有什么贡献吗？

Tao: Technically they aren’t classified as AI, but proof assistants are useful computer tools that check whether a mathematical argument is correct or not. They enable large-scale collaboration in mathematics. That’s a very recent advent.
Tao: 从技术上讲，它们不被归类为人工智能，但证明助手是有用的计算机工具，可以检查数学论证是否正确。它们使大规模的数学合作成为可能。这是一个非常新的进展。

Math can be very fragile: If one step in a proof is wrong, the whole argument can collapse. If you make a collaborative project with 100 people, you break your proof in 100 pieces and everybody contributes one. But if they don’t coordinate with one another, the pieces might not fit properly. Because of this, it’s very rare to see more than five people on a single project.
数学可能非常脆弱：如果证明中的一个步骤出错，整个论证可能会崩溃。如果你与 100 人合作一个项目，你将证明分成 100 个部分，每个人贡献一个。但如果他们彼此不协调，这些部分可能无法正确组合。因此，很少有超过五个人参与单个项目。

With proof assistants, you don’t need to trust the people you’re working with, because the program gives you this 100 percent guarantee. Then you can do factory production–type, industrial-scale mathematics, which doesn't really exist right now. One person focuses on just proving certain types of results, like a modern supply chain.
使用证明助手，您不需要信任与您合作的人，因为程序为您提供了 100%的保证。然后，您可以进行工厂生产型、工业规模的数学，这在目前实际上并不存在。一个人专注于证明某些类型的结果，就像现代供应链一样。

The problem is these programs are very fussy. You have to write your argument in a specialized language—you can’t just write it in English. AI may be able to do some translation from human language to the programs. Translating one language to another is almost exactly what large language models are designed to do. The dream is that you just have a conversation with a chatbot explaining your proof, and the chatbot would convert it into a proof-system language as you go.
问题是这些程序非常挑剔。您必须用一种专门的语言来写您的论证——不能仅仅用英语。人工智能可能能够从人类语言翻译到程序。翻译一种语言到另一种语言几乎正是大型语言模型的设计目的。梦想是您只需与聊天机器人对话，解释您的证明，聊天机器人会在您进行时将其转换为证明系统语言。

Wong: So the chatbot isn’t a source of knowledge or ideas, but a way to interface.
Wong: 所以聊天机器人不是知识或想法的来源，而是一种接口方式。

Tao: Yes, it could be a really useful glue.
Tao: 是的，它可能是一个非常有用的粘合剂。

Wong: What are the sorts of problems that this might help solve?
Wong: 这可能有助于解决哪些类型的问题？

Tao: The classic idea of math is that you pick some really hard problem, and then you have one or two people locked away in the attic for seven years just banging away at it. The types of problems you want to attack with AI are the opposite. The naive way you would use AI is to feed it the most difficult problem that we have in mathematics. I don’t think that’s going to be super successful, and also, we already have humans that are working on those problems.
陶：数学的经典理念是你选择一些非常难的问题，然后让一两个人在阁楼里关上七年，只是不断地研究它。你想用 AI 解决的问题类型正好相反。你用 AI 的天真方式是给它我们在数学中最困难的问题。我认为这不会非常成功，而且，我们已经有人类在研究这些问题。

The type of math that I’m most interested in is math that doesn’t really exist. The project that I launched just a few days ago is about an area of math called universal algebra, which is about whether certain mathematical statements or equations imply that other statements are true. The way people have studied this in the past is that they pick one or two equations and they study them to death, like how a craftsperson used to make one toy at a time, then work on the next one. Now we have factories; we can produce thousands of toys at a time. In my project, there’s a collection of about 4,000 equations, and the task is to find connections between them. Each is relatively easy, but there’s a million implications. There’s like 10 points of light, 10 equations among these thousands that have been studied reasonably well, and then there’s this whole terra incognita.
我最感兴趣的数学类型是实际上不存在的数学。我几天前启动的项目是关于一个叫做通用代数的数学领域，它研究某些数学陈述或方程是否意味着其他陈述为真。过去人们研究这个问题的方式是选择一两个方程并深入研究，就像工匠过去一次制作一个玩具，然后再制作下一个。现在我们有工厂；我们可以一次生产成千上万个玩具。在我的项目中，有大约 4000 个方程，任务是找到它们之间的联系。每个都相对简单，但有一百万种可能性。有大约 10 个光点，这些成千上万的方程中有 10 个已经被合理研究过，然后还有整个未知领域。

[Read: Science is becoming less human]
[阅读：科学正在变得不那么人性化]

There are other fields where this transition has happened, like in genetics. It used to be that if you wanted to sequence a genome of an organism, this was an entire Ph.D. thesis. Now we have these gene-sequencing machines, and so geneticists are sequencing entire populations. You can do different types of genetics that way. Instead of narrow, deep mathematics, where an expert human works very hard on a narrow scope of problems, you could have broad, crowdsourced problems with lots of AI assistance that are maybe shallower, but at a much larger scale. And it could be a very complementary way of gaining mathematical insight.
在其他领域也发生了这种转变，比如在遗传学中。过去，如果你想对一个生物体的基因组进行测序，这是一篇完整的博士论文。现在我们有这些基因测序机器，因此遗传学家正在对整个群体进行测序。你可以用这种方式进行不同类型的遗传学研究。与其进行狭窄、深入的数学研究，让专家在人类狭窄的问题范围内非常努力地工作，不如进行广泛的、众包的问题研究，结合大量的 AI 辅助，这些问题可能更浅显，但规模更大。这可能是一种非常互补的获得数学洞察力的方式。

Wong: It reminds me of how an AI program made by Google Deepmind, called AlphaFold, figured out how to predict the three-dimensional structure of proteins, which was for a long time something that had to be done one protein at a time.
黄：这让我想起了谷歌 Deepmind 制作的一个 AI 程序，叫做 AlphaFold，它解决了如何预测蛋白质的三维结构，这在很长一段时间内必须一次预测一个蛋白质。

Tao: Right, but that doesn’t mean protein science is obsolete. You have to change the problems you study. A hundred and fifty years ago, mathematicians’ primary usefulness was in solving partial differential equations. There are computer packages that do this automatically now. Six hundred years ago, mathematicians were building tables of sines and cosines, which were needed for navigation, but these can now be generated by computers in seconds.
陶：对，但这并不意味着蛋白质科学已经过时。你必须改变你研究的问题。150 年前，数学家的主要用途是解决偏微分方程。现在有计算机软件可以自动完成这些工作。600 年前，数学家们在建立正弦和余弦表，这些表在航海中是必需的，但现在可以由计算机在几秒钟内生成。

I’m not super interested in duplicating the things that humans are already good at. It seems inefficient. I think at the frontier, we will always need humans and AI. They have complementary strengths. AI is very good at converting billions of pieces of data into one good answer. Humans are good at taking 10 observations and making really inspired guesses.
我对重复人类已经擅长的事情不太感兴趣。这似乎效率不高。我认为在前沿领域，我们总是需要人类和人工智能。它们有互补的优势。人工智能非常擅长将数十亿条数据转换为一个好的答案。人类擅长从 10 个观察中做出真正有灵感的猜测。