这是用户在 2024-11-26 2:44 为 https://www.theinformation.com/articles/new-competitors-chase-openai-in-reasoning-ai-race?utm_campai... 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?
Exclusive 独家

New Competitors Chase OpenAI in Reasoning AI Race
新竞争对手在推理 AI 竞赛中追赶 OpenAI

New Competitors Chase OpenAI in Reasoning AI RaceArt by Clark Miller. 克拉克·米勒作品。

After OpenAI sparked a race this fall by releasing groundbreaking artificial intelligence known as reasoning AI, it seemed like the ChatGPT owner might run away with the market.
今年秋季,OpenAI 发布了名为推理 AI 的突破性人工智能,引发了一场竞赛, 此后,OpenAI似乎有可能主导市场。

Both Google and Microsoft scrambled to catch up with the technology, which aims to help users answer complex, multi-step questions, according to two people involved in the effort at the companies. But in the past week or so, the dynamics of the reasoning race appear to have changed: A little-known startup, a Chinese quant trading firm, and Chinese e-commerce firm Alibaba Group each released reasoning models that seem to score well against OpenAI’s.
谷歌和微软都争先恐后地追赶这项技术,据两名参与公司相关工作的知情人士透露,这项技术旨在帮助用户解答复杂的多步骤问题。但在大约过去一周左右的时间里,这场推理竞赛的动态似乎发生了变化:一家鲜为人知的初创公司、一家中国幻方资本(DeepSeek)和中国电子商务巨头阿里巴巴集团都发布了在得分上似乎可以与OpenAI相媲美的推理模型。

The Takeaway 关键要点

• Reasoning models are easier to develop than advanced LLMs
• 推理模型比高级LLMs更容易开发

• Microsoft has struggled to replicate OpenAI’s o1 reasoning model
• 微软难以复制OpenAI的 o1 推理模型

• The Lawrence Livermore National Lab has used o1 in laser research
• 劳伦斯利弗莫尔国家实验室已将 o1 用于激光研究

Fireworks AI, a Redwood City, Calif.–based startup that helps developers run open-source models, released AI combining several open-source models into a system that outperformed models such as OpenAI’s GPT-4o and Anthropic’s Claude 3.5 Sonnet in some math and coding evaluations, which are a proxy for reasoning capabilities.
总部位于加利福尼亚州红木城的初创公司 Fireworks AI 致力于帮助开发者运行开源模型,该公司发布了一款人工智能,该人工智能将多个开源模型组合成一个系统,在一些数学和编码评估中,其性能超过了OpenAI的 GPT-4o 和Anthropic的 Claude 3.5 Sonnet,这些评估是推理能力的替代指标。

A few days later, Chinese quant trading firm High-Flyer Capital Management announced a reasoning model that outperformed OpenAI’s reasoning model, o1-preview, on certain math and coding evaluations. And on Friday, another Chinese firm, Alibaba, released an open-source reasoning model that generated positive buzz among founders and on social media.
几天后,中国幻方资本宣布了一款推理模型(DeepSeek-R1-Lite-Preview),该模型在某些数学和编码评估中优于OpenAI的推理模型 o1-preview。周五,另一家中国公司阿里巴巴发布了一个开源推理模型(Marco-o1),在创始人和社交媒体上引起了积极反响。

Together, the launches highlight how the rise of reasoning models could provide a way for smaller AI developers—both within and outside the U.S.—to catch up with OpenAI, which has a significant head start in building the large language models that power ChatGPT and other conversational AI.
这些发布共同凸显了推理模型的兴起如何为规模较小的 AI 开发者(无论在美国境内还是境外)提供了一种赶超OpenAI的方法,后者在构建支持ChatGPT和其他对话式 AI 的大型语言模型方面拥有显著的领先优势。

In developing alternatives to OpenAI, the new entrants appear to have benefited from papers researchers at Stanford University, Google, Meta Platforms and OpenAI itself have published about reasoning in recent years. (Spokespeople for High-Flyer and Alibaba did not respond to requests for comment.)
在开发OpenAI的替代方案时,新进入者似乎受益于斯坦福大学、谷歌、Meta 和OpenAI自身的研究人员近年来发表的关于推理的论文。(幻方资本和阿里巴巴的发言人没有回复置评请求。)

Reasoning models are also less expensive to develop than traditional LLMs like GPT-4o, which require spending hundreds of millions of dollars on computing resources and training data—plus agreements to acquire that data legally—to build them from scratch.
推理模型的开发成本也低于传统的LLMs,例如 GPT-4o,后者需要花费数亿美元用于计算资源和训练数据——此外还需要签订协议以合法获取这些数据——才能从零开始构建它们。

The new models could aid OpenAI and its rivals in developing coding assistants that take on difficult projects. Enterprise software firms such as Microsoft and Salesforce could use them to improve agents that take actions on behalf of customers, such as scheduling appointments.
这些新模型可以帮助OpenAI及其竞争对手开发能够承担复杂项目的编码助手。微软和 Salesforce 等企业软件公司可以使用它们来改进代表客户采取行动的代理,例如安排约会。

Chain of Thought CoT思维链

Researchers can bake reasoning capabilities into existing LLMs such as Meta’s Llama, which is freely available to almost any developer.

They do this by getting other models to generate the thought processes they went through to solve problems—otherwise known as chains of thought—and then train an LLM only on the ones that provide a correct answer. (Google first came up with the concept of chain of thought in 2022.) Other techniques include teaching models how to reflect on their own mistakes or decide which problem-solving approaches seem most promising.
他们通过让其他模型生成解决问题过程中所经历的思维过程——也就是所谓的思维链——然后只对给出正确答案的思维链进行大型语言模型(LLM)的训练来实现这一点。(谷歌在 2022 年首次提出了思维链的概念。)其他技术包括教模型如何反思自身的错误,或决定哪些问题解决方法看起来最有希望。

These steps happen during a process called post-training, which occurs after the model is initially trained on billions of words of text and other data that help it make sense of the world and the connections between different concepts.
这些步骤发生在一个称为后训练的过程中,该过程发生在模型最初在数十亿单词的文本和其他数据上进行训练之后,这些数据帮助它理解世界以及不同概念之间的联系。

Some researchers have also made reasoning-focused datasets available for free to other developers. For instance, Alibaba said it used data from one such research group, Open O1, to build the reasoning model it released last week.
一些研究人员还免费向其他开发者提供了专注于推理的数据集。例如,阿里巴巴表示,它使用了来自一个这样的研究小组 Open O1 的数据来构建它上周发布的推理模型。

Alibaba researchers said they spent a significant amount of time training the model on solving problems in more subjective and open-ended fields such as writing prose or translating slang, in addition to areas with verifiable answers such as math, physics and coding.
阿里巴巴的研究人员表示,除了数学、物理和编码等具有可验证答案的领域外,他们还花费了大量时间训练模型解决散文写作或俚语翻译等更主观和开放式领域的问题。

In developing reasoning models, competitors of OpenAI are “not as much at a disadvantage versus training general models,” said Ion Stoica, a co-founder of AI startups Anyscale and Databricks.
AI 初创公司 Anyscale 和 Databricks 的联合创始人 Ion Stoica 表示,在开发推理模型方面,OpenAI 的竞争对手“在训练通用模型方面并没有那么大的劣势”。

“You don’t need all the data in the world, which is a barrier for many companies” in competing with OpenAI on LLMs such as GPT-4o, he said.
他说:“你不需要世界上所有的数据,这对许多公司来说都是一个障碍”,在与 OpenAI 竞争 GPT-4o 等大型语言模型(LLM)方面也是如此。

A spokesperson for OpenAI declined to comment.
OpenAI 的发言人拒绝置评。

To be sure, it takes more than having good AI to win. OpenAI has kept its rivals at bay because its application programming interface is easy for customers to use, and because it has continually cut prices to make free, open-source LLMs less attractive.
可以肯定的是,仅仅拥有优秀的 AI 并不能赢得胜利。OpenAI 之所以能够将竞争对手拒之门外,是因为其应用程序编程接口易于客户使用,并且它不断降低价格,从而降低了免费开源大型语言模型(LLM)的吸引力。

OpenAI and other developers have seen a slowdown in the rate of improvement in AI models based on traditional methods. But reasoning models have become a promising alternative—and a potential way to justify the billions of dollars customers, investors, cloud providers and chipmakers are pouring into the industry.
OpenAI 和其他开发者已经发现,基于传统方法的 AI 模型改进速度正在放缓。但推理模型已成为一种有前景的替代方案——也是证明客户、投资者、云提供商和芯片制造商向该行业投入数十亿美元的一种潜在方式。

OpenAI’s o1 model stemmed from a breakthrough last year that helped the company’s models answer math problems they had never seen before. It took the better part of a year to ready the technology—which OpenAI researchers called Q* and later Strawberry—for public consumption.
OpenAI 的 o1 模型源于去年的一项突破,这项突破帮助该公司的模型解答了他们从未见过的数学问题。该公司花了大部分时间来准备这项技术——OpenAI 的研究人员称之为 Q*,后来又称为 Strawberry——以供公众使用。

In September, OpenAI released two o1 reasoning models, which were able to answer more complex, multistep questions in fields such as math and coding by spending more time “thinking” when users asked them questions, a technique known as test-time compute.
9 月,OpenAI 发布了两个 o1 推理模型,它们能够通过在用户提问时花费更多时间“思考”(一种称为测试时计算的技术)来回答数学和编码等领域中更复杂的多步骤问题。

Helping Microsoft 帮助微软

At first, it seemed that OpenAI was racing far ahead of rivals, and the secretive methods it used to develop o1 were keeping it safe from prying eyes. Even Microsoft, which has access to the code for OpenAI’s reasoning model through a multibillion-dollar partnership with the startup, struggled to recreate it, according to two people who were involved in the situation.
起初,似乎 OpenAI 遥遥领先于竞争对手,而它用于开发 o1 的秘密方法使其免受窥探。据两位参与此事的人士透露,即使是通过与这家初创公司达成数十亿美元的合作关系而可以访问 OpenAI 推理模型代码的微软,也难以复制它。

In an effort to help their counterparts at Microsoft, OpenAI researchers scheduled daily sessions with Microsoft researchers to explain aspects of the model to them, such as how OpenAI was able to use other AI models to generate training data for o1, one of the people said. It isn’t clear when Microsoft plans to launch a reasoning model. A spokesperson for the company declined to comment.
为了帮助他们在微软的同行,OpenAI 的研究人员安排了每日会议与微软的研究人员解释模型的各个方面,例如 OpenAI 如何能够使用其他 AI 模型为 o1 生成训练数据,其中一人说。目前尚不清楚微软计划何时推出推理模型。该公司的一位发言人拒绝置评。

Fireworks started work on its reasoning model in the second quarter of the year, said Lin Qiao, the company’s co-founder and CEO.
Fireworks 公司联合创始人兼首席执行官林乔表示,该公司在今年第二季度开始研发其推理模型。

“The whole entire open-source community…is going to move superfast” in launching reasoning models, she said.
她说:“整个开源社区……将在推出推理模型方面取得超高速发展。”

Meanwhile Google in July announced new models, AlphaProof and AlphaGeometry 2, which together scored at a silver medalist level on the International Mathematical Olympiad.
与此同时,谷歌在 7 月份宣布了新的模型 AlphaProof 和 AlphaGeometry 2,这两个模型在国际数学奥林匹克竞赛中获得了银牌水平的成绩。

After OpenAI launched o1-preview, Google stepped up its reasoning efforts. Google has increased the size of its team working on reasoning models to around 200 from the several dozen it comprised before the o1-preview launch. Google also gave the team more computing resources, according to a person who has been involved with it.
在 OpenAI 推出 o1-preview 之后,谷歌加大了其推理工作的力度。据一位参与其中的人士透露,谷歌将其从事推理模型工作的团队规模从 o1-preview 发布前的几十人增加到约 200 人。谷歌还为该团队提供了更多的计算资源。

Unlike o1-preview, High-Flyer’s new reasoning model reveals its chain of thought to the people who use it. That has allowed outside researchers to validate the effectiveness of the model, and it could also help them train similar AI. (OpenAI has said it hides o1-preview’s chain of thought for competitive and safety reasons.)
与 o1-preview 不同,幻方資本的新推理模型向使用者展示其思维链。这使得外部研究人员能够验证模型的有效性,并且还可以帮助他们训练类似的 AI。(OpenAI 表示,出于竞争和安全原因,它隐藏了 o1-preview 的思维链。)

High-Flyer said it would release an open-source version of the model so anyone could use it in their products.
幻方資本表示,它将发布该模型的开源版本,以便任何人都可以在其产品中使用它。

It’s not clear how useful reasoning models like o1 are to the average chatbot customer. Only a small percentage of ChatGPT customers regularly use o1-preview, according to a person with knowledge of recent usage.
目前尚不清楚像 o1 这样的推理模型对普通聊天机器人用户有多大用处。据一位了解近期使用情况的人士透露,只有少量ChatGPT用户定期使用 o1-Preview。

That might be because OpenAI is still limiting how much its customers can use o1. And for the businesses that build apps with OpenAI models using its application programming interface, o1-preview costs at least six times more than other LLMs the company sells.
这可能是因为OpenAI仍在限制其客户使用 o1 的程度。而且,对于那些使用 API 调用OpenAI模型构建应用程序的企业而言,o1-Preview的成本至少是该公司销售的其他LLMs的六倍。

But o1 has been particularly useful in deep scientific research. Researchers at Lawrence Livermore National Laboratory, for example, have used the reasoning model to answer doctorate-level questions.
但 o1 在深入的科学研究中特别有用。例如,劳伦斯利弗莫尔国家实验室的研究人员已经使用这种推理模型来回答博士水平的问题。

AI for Lasers 赋能激光的 AI

A focus of the lab, based in Livermore, Calif., is using high-powered lasers on small fuel capsules to generate energy in a nuclear fusion reaction. In one case, researchers used OpenAI’s o1-preview to calculate what the temperature and pressure of the capsule would be with a laser of a certain strength, and to also ask how strong a laser would be needed to achieve a certain temperature and pressure, according to a person involved in the experiments.
位于加利福尼亚州利弗莫尔的该实验室的研究重点是利用高功率激光照射小型燃料舱,以在核聚变反应中产生能量。据一位参与实验的人士透露,在一个案例中,研究人员使用OpenAI的 o1-Preview计算了在特定强度激光照射下,燃料舱的温度和压力将会是多少,并询问需要多强的激光才能达到一定的温度和压力。

Laser research in the Lawrence Livermore National Laboratory. Photo via the Lawrence Livermore National Laboratory.
劳伦斯利弗莫尔国家实验室的激光研究。图片来自劳伦斯利弗莫尔国家实验室。

The OpenAI reasoning model typically takes 10 to 60 seconds to answer such questions. That can save researchers anywhere from 30 minutes to several hours or days, the time it would take to figure out the answers themselves, the person said.
OpenAI的推理模型通常需要 10 到 60 秒才能回答这些问题。该人士表示,这可以为研究人员节省 30 分钟到数小时甚至数天的时间,而这些时间原本需要用来自己计算答案。

Researchers could use future reasoning models to generate and test scientific hypotheses in fields such as biology, physics and manufacturing, especially if those models could control the tools running and analyzing the results of experiments, the person said.
该人士表示,研究人员可以使用未来的推理模型来生成和检验生物学、物理学和制造业等领域的科学假设,特别是如果这些模型能够控制运行实验并分析实验结果的工具的话。

Such applications of reasoning models could enable AI firms to charge customers more money for their services.
此类推理模型的应用可以使人工智能公司向客户收取更高的服务费用。

At one point, OpenAI executives discussed high-priced subscriptions for chatbots based on more advanced models like o1, with potential prices ranging up to $2,000 per month. It isn’t clear whether the company will move forward with the plan.
曾经,OpenAI高管们讨论过基于 o1 等更先进模型的聊天机器人的高价订阅服务,潜在价格高达每月 2000 美元。目前尚不清楚该公司是否会继续推进这项计划。

Aaron Holmes contributed to this article.
Aaron Holmes 为本文做出了贡献。

Stephanie Palazzolo is a reporter at The Information covering artificial intelligence. She previously worked at Business Insider covering AI and at Morgan Stanley as an investment banker. Based in New York, she can be reached at stephanie@theinformation.com or on Twitter at @steph_palazzolo.
Stephanie Palazzolo 是《The Information》的一名记者,负责报道人工智能领域。她之前在 Business Insider 报道人工智能,并在摩根士丹利担任投资银行家。她常驻纽约,可以通过 stephanie@theinformation.com 联系她,或在 Twitter 上关注@steph_palazzolo。

Erin Woo is a San Francisco-based reporter covering Google and Alphabet for The Information. Contact her at @erinkwoo.07 on Signal, erin@theinformation.com and at @erinkwoo on X.
Erin Woo 是《The Information》驻旧金山的记者,负责报道谷歌和 Alphabet。可以通过 Signal 上的@erinkwoo.07、erin@theinformation.com 以及 X 上的@erinkwoo 联系她。

Conversation 留言区

1 comment 1 条评论
One more data point to say that AI models (this time reasoning ones) have very little moat. Why are people still funding model companies? Or, have the business models of the model companies changed and not yet publicized?
又一个数据点表明,AI 模型(这次是推理模型)几乎没有护城河。人们为什么还在投资模型公司?或者,模型公司的商业模式是否已经改变,但尚未公开?