OpenAI Shifts Strategy as Rate of ‘GPT’ AI Improvements Slows
OpenAI 调整战略，因为“GPT”人工智能的改进速度放缓

Art by Clark Miller 克拉克·米勒的艺术

By Stephanie Palazzolo, Erin Woo and Amir Efrati
由斯蒂芬妮·帕拉佐洛、艾琳·吴和阿米尔·埃夫拉提撰写

Nov 9, 2024, 1:47pm PST
2024 年 11 月 9 日，下午 1:47（太平洋标准时间）

The number of people using ChatGPT and other artificial intelligence products is soaring. The rate of improvement for the basic building blocks underpinning them appears to be slowing down, though.
使用 ChatGPT 和其他人工智能产品的人数正在激增。然而，支撑它们的基本构建块的改进速度似乎正在放缓。

The situation has prompted OpenAI, which makes ChatGPT, to cook up new techniques for boosting those building blocks, known as large language models, to make up for the slowdown.
这一情况促使了 OpenAI（ChatGPT 的开发者）想出新的技术来增强这些被称为大型语言模型的基础构件，以弥补放缓。

The Takeaway 外卖

The increase in quality of OpenAI’s next flagship model was less than the quality jump between the last two flagship models
OpenAI 下一个旗舰模型的质量提升低于最后两个旗舰模型之间的质量跃升
The industry is shifting its effort to improving models after their initial training
行业正在将其努力转向在初始训练后改进模型
OpenAI has created a foundations team to figure out how to deal with the dearth of training data
OpenAI 成立了一个基础团队，以解决训练数据匮乏的问题

The challenges OpenAI is experiencing with its upcoming flagship model, code-named Orion, show what the company is up against. In May, OpenAI CEO Sam Altman told staff he expected Orion, which the startup’s researchers were training, would likely be significantly better than the last flagship model, released a year earlier.
OpenAI 在其即将推出的旗舰模型“猎户座”面临的挑战显示了该公司所面临的困难。5 月，OpenAI 首席执行官山姆·阿尔特曼告诉员工，他预计猎户座——这款初创公司的研究人员正在训练的模型——将比一年前发布的最后一款旗舰模型显著更好。

Though OpenAI had only completed 20% of the training process for Orion, it was already on par with GPT-4 in terms of intelligence and abilities to fulfill tasks and answer questions, Altman said, according to a person who heard the comment.
尽管 OpenAI 仅完成了 Orion 训练过程的 20%，但根据一位听到该评论的人所说，它在智能和完成任务及回答问题的能力方面已经与 GPT-4 不相上下。

While Orion’s performance ended up exceeding that of prior models, the increase in quality was far smaller compared with the jump between GPT-3 and GPT-4, the last two flagship models the company released, according to some OpenAI employees who have used or tested Orion.
尽管猎户座的性能最终超过了之前的模型，但与公司发布的最后两个旗舰模型 GPT-3 和 GPT-4 之间的飞跃相比，质量的提升要小得多，一些使用或测试过猎户座的 OpenAI 员工表示。

Some researchers at the company believe Orion isn’t reliably better than its predecessor in handling certain tasks, according to the employees. Orion performs better at language tasks but may not outperform previous models at tasks such as coding, according to an OpenAI employee. That could be a problem, as Orion may be more expensive for OpenAI to run in its data centers compared to other models it has recently released, one of those people said.
根据员工的说法，公司的某些研究人员认为，Orion 在处理某些任务时并不比其前身更可靠。根据一位 OpenAI 员工的说法，Orion 在语言任务上表现更好，但在编码等任务上可能不如之前的模型。这可能是个问题，因为与 OpenAI 最近发布的其他模型相比，Orion 在其数据中心的运行成本可能更高，这位人士表示。

The Orion situation could test a core assumption of the AI field, known as scaling laws: that LLMs would continue to improve at the same pace as long as they had more data to learn from and additional computing power to facilitate that training process.
猎户座的情况可能会检验人工智能领域的一个核心假设，即规模法则：只要有更多的数据可供学习和额外的计算能力来促进训练过程，LLMs 就会继续以相同的速度改善。

In response to the recent challenge to training-based scaling laws posed by slowing GPT improvements, the industry appears to be shifting its effort to improving models after their initial training, potentially yielding a different type of scaling law.
针对最近由于 GPT 改进放缓而对基于训练的扩展法则提出的挑战，行业似乎正在将其努力转向在初始训练后改进模型，这可能会产生不同类型的扩展法则。

Some CEOs, including Meta Platforms’ Mark Zuckerberg, have said that in a worst-case scenario, there would still be a lot of room to build consumer and enterprise products on top of the current technology even if it doesn’t improve.
一些首席执行官，包括 Meta Platforms 的马克·扎克伯格，表示在最坏的情况下，即使当前技术没有改善，仍然有很多空间可以在此基础上构建消费和企业产品。

At OpenAI, for instance, the company is busy baking more code-writing capabilities into its models to head off a major threat from rival Anthropic. And it’s developing software that can take over a person’s computer to complete white-collar tasks involving web browser activity or applications by performing clicks, cursor movements, text typing and other actions humans perform as they work with different apps.
在 OpenAI，例如，该公司正在忙于将更多的代码编写能力融入其模型，以应对来自竞争对手 Anthropic 的重大威胁。它还在开发可以接管一个人计算机的软件，以完成涉及网页浏览器活动或应用程序的白领任务，通过执行点击、光标移动、文本输入和其他人类在使用不同应用程序时执行的操作。

Those products, part of a movement toward AI agents that handle multistep tasks, could prove just as revolutionary as the initial launch of ChatGPT.
这些产品是朝着处理多步骤任务的人工智能代理的运动的一部分，可能会像 ChatGPT 的首次发布一样具有革命性。

Furthermore, Zuckerberg, Altman and CEOs of other AI developers also publicly say they haven’t hit the limits of traditional scaling laws yet.
此外，扎克伯格、奥特曼和其他人工智能开发公司的首席执行官们也公开表示，他们尚未达到传统扩展法则的极限。

That’s likely why companies including OpenAI are still developing expensive, multibilliondollar data centers to eke out as many performance gains from pretrained models as they can.
这可能就是为什么包括 OpenAI 在内的公司仍在开发昂贵的数十亿美元的数据中心，以尽可能从预训练模型中获得更多性能提升。

However, OpenAI researcher Noam Brown said at the TEDAI conference last month that more-advanced models could become financially unfeasible to develop.
然而，OpenAI 研究员诺姆·布朗上个月在 TEDAI 会议上表示，更先进的模型可能在财务上变得不可行。
“After all, are we really going to train models that cost hundreds of billions of dollars or trillions of dollars?” Brown said. “At some point, the scaling paradigm breaks down.”
“毕竟，我们真的要训练成本高达数千亿或数万亿美元的模型吗？”布朗说。“在某个时刻，扩展范式就会崩溃。”

OpenAI has yet to finish the lengthy process of testing the safety of Orion before its public release. When OpenAI releases Orion by early next year, it may diverge from its traditional “GPT” naming convention for flagship models, further underscoring the changing nature of LLM improvements, employees said. (An OpenAI spokesperson did not comment on the record.)
OpenAI 尚未完成对 Orion 的安全性进行测试的漫长过程，预计将在明年初发布 Orion。员工表示，当 OpenAI 发布 Orion 时，它可能会偏离传统的旗舰模型“GPT”命名惯例，进一步强调LLM改进的变化性质。（OpenAI 发言人未对此作出公开评论。）

Hitting a Data Wall 撞上数据壁垒

One reason for the GPT slowdown is a dwindling supply of high-quality text and other data that LLMs can process during pretraining to make sense of the world and the relationships between different concepts so they can solve problems such as drafting blog posts or solving coding bugs, OpenAI employees and researchers said.
GPT 减速的一个原因是高质量文本和其他数据的供应减少，这些数据是LLMs在预训练期间可以处理的，以理解世界和不同概念之间的关系，从而解决诸如撰写博客文章或解决编码错误等问题，OpenAI 的员工和研究人员表示。

In the past few years, LLMs used publicly available text and other data from websites, books and other sources for the pretraining process, but developers of the models have largely squeezed as much out of that type of data as they can, these people said.
在过去的几年中，LLMs 使用公开可用的文本和来自网站、书籍及其他来源的数据进行预训练过程，但这些人表示，模型的开发者在很大程度上已经从这种类型的数据中榨取了尽可能多的内容。

Sculpting a Model 雕刻模型

The training and testing process large language models go through before release
大型语言模型在发布前经历的训练和测试过程

Source: The Information reporting
来源：信息报告

In response, OpenAI has created a foundations team, led by Nick Ryder, who previously ran pretraining, to figure out how to deal with the dearth of training data and how long the scaling law will continue to apply, they said.
作为回应，OpenAI 创建了一个基础团队，由之前负责预训练的 Nick Ryder 领导，旨在找出如何应对训练数据的匮乏以及扩展法则将持续适用多久，他们表示。

Orion was trained in part on AI-generated data, produced by other OpenAI models, including GPT-4 and recently released reasoning models, according to an OpenAI employee. However, such synthetic data, as it is known, is leading to a new problem in which Orion may end up resembling those older models in certain aspects, the employee said.
根据一位 OpenAI 员工的说法，Orion 部分是基于其他 OpenAI 模型生成的 AI 数据进行训练的，包括 GPT-4 和最近发布的推理模型。然而，这种被称为合成数据的东西，正在导致一个新问题，即 Orion 在某些方面可能会与那些旧模型相似，这位员工表示。

4

“We’re increasing [the number of graphics processing units used to train AI] at the same rate, but we’re not getting the intelligent improvements at all out of it,” said venture capitalist Ben Horowitz.
“我们以相同的速度增加[用于训练人工智能的图形处理单元数量]，但我们根本没有获得智能上的提升，”风险投资家本·霍洛维茨说。

OpenAI researchers are utilizing other tools to improve LLMs during the post-training process by improving how they handle specific tasks. The researchers do so by asking the models to learn from a large sample of problems-such as math or coding problems-that have been solved correctly, in a process known as reinforcement learning.
OpenAI 研究人员正在利用其他工具来改善 LLMs 在后训练过程中的表现，提升它们处理特定任务的能力。研究人员通过让模型从大量已正确解决的问题（如数学或编程问题）中学习，来实现这一目标，这一过程被称为强化学习。

They also ask human evaluators to test the pretrained models on specific coding or problem-solving tasks and rate the answers, which helps the researchers tweak the models to improve their answers to certain types of requests, such as writing or coding. That process, called reinforcement learning with human feedback, has aided older AI models as well.
他们还要求人类评估者在特定的编码或问题解决任务上测试预训练模型并对答案进行评分，这帮助研究人员调整模型，以改善它们对某些类型请求的回答，例如写作或编码。这个过程被称为带有人类反馈的强化学习，也帮助了较早的人工智能模型。

To handle these evaluations, OpenAI and other AI developers typically rely on startups such as Scale AI and Turing to manage thousands of contractors.
为了处理这些评估，OpenAI 和其他人工智能开发者通常依赖于像 Scale AI 和 Turing 这样的初创公司来管理数千名承包商。

In OpenAl’s case, researchers have also developed a type of reasoning model, named o1, that takes more time to “think” about data the LLM trained on before spitting out an answer, a concept known as test-time compute.
在 OpenAI 的案例中，研究人员还开发了一种推理模型，名为 o1，它在给出答案之前需要更多时间“思考”LLM训练的数据，这个概念被称为测试时计算。

That means the quality of o1’s responses can continue to improve when the model is provided with additional computing resources while it’s answering user questions, even without making changes to the underlying model. And if OpenAI can keep improving the quality of the underlying model, even at a slower rate, it can result in a much better reasoning result, said one person who has knowledge of the process.
这意味着，当模型在回答用户问题时提供额外的计算资源时，o1 的响应质量可以继续提高，即使不对基础模型进行更改。而且，如果 OpenAI 能够继续改善基础模型的质量，即使速度较慢，也会导致更好的推理结果，一位了解该过程的人士表示。
“This opens up a completely new dimension for scaling,” Brown said during the TEDAI conference. Researchers can improve model responses by going from “spending a penny per query to 10 cents per query,” he said.
“这为扩展打开了一个全新的维度，”布朗在 TEDAI 会议上说。研究人员可以通过“将每个查询的花费从一分钱提高到十分钱”来改善模型的响应，他说。

Altman, too, has emphasized the importance of OpenAI’s reasoning models, which can be combined with LLMs.
阿尔特曼也强调了 OpenAI 推理模型的重要性，这些模型可以与LLMs结合。
“I hope reasoning will unlock a lot of the things that we’ve been waiting years to do-the ability for models like this to, for example, contribute to new science, help write a lot more very difficult code,” Altman said in October at an event for app developers.
“我希望推理能够解锁我们等待多年的许多事情——像这样的模型能够，例如，贡献新的科学，帮助编写更多非常复杂的代码，”奥特曼在十月的一个应用程序开发者活动中说道。

Pushing the AI Envelope 推动人工智能的边界

This chart from OpenAI shows how its new ‘o1’ reasoning model got better at solving Math Olympiad (AIME) problems on the first try (pass@1), depending on how much time the model had to work on the problem (test-time compute). This type of improvement is known as log-linear compute scaling.
这张来自 OpenAI 的图表显示了其新的“o1”推理模型在第一次尝试解决数学奥林匹克（AIME）问题时的表现（pass@1），取决于模型在问题上工作的时间（测试时间计算）。这种改进被称为对数线性计算扩展。

Chart: OpenAl 图表：OpenAl

In a recent interview with Y Combinator CEO Garry Tan, Altman said, “We basically know what to go do” to achieve artificial general intelligence-technology that is on par with human abilities-and part of it involves “using current models in creative ways.”
在最近与 Y Combinator 首席执行官 Garry Tan 的采访中，Altman 表示：“我们基本上知道该做什么”以实现与人类能力相当的人工通用智能技术，其中一部分涉及“以创造性的方式使用当前模型。”

Mathematicians and other scientists have said o1 has been beneficial to their work by acting as a companion that can provide feedback or ideas. But the model is currently priced six times higher than nonreasoning models, and as a result it doesn’t have a broad base of customers, said two employees with knowledge of the situation.
数学家和其他科学家表示，o1 对他们的工作有益，充当了一个可以提供反馈或想法的伙伴。但该模型的价格目前是非推理模型的六倍，因此它没有广泛的客户基础，两名了解情况的员工表示。

'Breaking Through the Asymptote’
《突破渐近线》

Some investors who have poured tens of millions of dollars into AI developers have wondered whether the rate of improvement of LLMs is beginning to plateau.
一些向人工智能开发者投入数千万美元的投资者开始怀疑，LLMs 的改进速度是否开始趋于平稳。

Ben Horowitz, whose venture capital firm is both an OpenAI shareholder and a direct investor in rivals such as Mistral and Safe Superintelligence, said in a YouTube video that “we’re increasing [the number of graphics processing units used to train AI] at the same rate, but we’re not getting the intelligent improvements at all out of it.” (He didn’t elaborate.)
本·霍洛维茨，他的风险投资公司既是 OpenAI 的股东，也是 Mistral 和 Safe Superintelligence 等竞争对手的直接投资者，在一段 YouTube 视频中表示：“我们以相同的速度增加[用于训练 AI 的图形处理单元的数量]，但我们根本没有获得智能上的提升。”（他没有详细说明。）

Horowitz’s colleague, Marc Andreessen, said in the same video that there were “lots of smart people working on breaking through the asymptote, figuring out how to get to higher levels of reasoning capability.”
霍洛维茨的同事马克·安德森在同一视频中表示，有“很多聪明的人在努力突破渐近线，弄清楚如何达到更高的推理能力。”

It’s possible that the performance of LLMs has plateaued in certain ways but not others, said Ion Stoica, a co-founder and chair of enterprise software firm Databricks and a codeveloper of a website that allows app developers to evaluate different LLMs.
Ion Stoica 表示，企业软件公司 Databricks 的联合创始人兼主席，以及一个允许应用程序开发人员评估不同 LLMs 的网站的共同开发者，LLMs 的性能在某些方面可能已经达到了瓶颈，但在其他方面则没有。

While AI has continued to improve in tasks like coding and solving complex, multistep problems, progress appears to have slowed in AI models’ ability to carry out generalpurpose tasks like analyzing the sentiment of a tract of text or describing the symptoms of a medical issue, Stoica said.
斯托伊卡说，尽管人工智能在编码和解决复杂的多步骤问题等任务上持续改进，但在人工智能模型执行一般任务的能力上，如分析一段文本的情感或描述医疗问题的症状，进展似乎已经放缓。
“For general-knowledge questions, you could argue that for now we are seeing a plateau in the performance of LLMs. We need [more] factual data, and synthetic data does not help as much,” he said.
“对于一般知识问题，你可以说目前我们看到LLMs的表现处于一个平台期。我们需要[更多]事实数据，而合成数据并没有太大帮助，”他说。

Aaron Holmes, Kalley Huang and Kevin McLaughlin also contributed to this article.
亚伦·霍尔姆斯、卡莉·黄和凯文·麦克劳林也为本文做出了贡献。

Stephanie Palazzolo is a reporter at The Information covering artificial intelligence. She previously worked at Business Insider covering AI and at Morgan Stanley as an investment banker. Based in New York, she can be reached at stephanie@theinformation.com or on Twitter at @steph_palazzolo.
斯蒂芬妮·帕拉佐洛是《信息》的一名记者，专注于人工智能。她之前在《商业内幕》报道人工智能，并在摩根士丹利担任投资银行家。她常驻纽约，可以通过stephanie@theinformation.com或在推特上联系她，用户名为@steph_palazzolo。

Erin Woo is a San Francisco-based reporter covering Google and Alphabet for The Information. Contact her at @erinkwoo.07 on Signal, erin@theinformation.com and at @erinkwoo on X.
Erin Woo 是一位驻旧金山的记者，为《信息》报道谷歌和字母表。通过 Signal 联系她，@erinkwoo.07，电子邮件 erin@theinformation.com，以及在 X 上 @erinkwoo。

Amir Efrati is executive editor at The Information, which he helped to launch in 2013. Previously he spent nine years as a reporter at the Wall Street Journal, reporting on white-collar crime and later about technology. He can be reached at amir@theinformation.com and is on Twitter @amir
阿米尔·埃夫拉提是《信息》的执行编辑，他在 2013 年帮助创办了该杂志。此前，他在《华尔街日报》担任记者九年，报道白领犯罪，后来转向科技领域。他的联系方式是 amir@theinformation.com，推特账号是 @amir

GROW with SAP gives your people the power of AI to help them work efficiently and keep customers happy.
GROW with SAP 赋予您的员工人工智能的力量，帮助他们高效工作并保持客户满意。

Learn more 了解更多

OpenAI Shifts Strategy as Rate of ‘GPT’ AI Improvements Slows OpenAI 调整战略，因为“GPT”人工智能的改进速度放缓

Hitting a Data Wall 撞上数据壁垒

Sculpting a Model 雕刻模型

4

Pushing the AI Envelope 推动人工智能的边界

'Breaking Through the Asymptote’《突破渐近线》

OpenAI Shifts Strategy as Rate of ‘GPT’ AI Improvements Slows
OpenAI 调整战略，因为“GPT”人工智能的改进速度放缓

'Breaking Through the Asymptote’
《突破渐近线》