What would an LLM OS look like?
LLM 操作系统会是什么样子？

November 29, 2023 2023 年 11 月 29 日

Andrej Karpathy's YouTube channel is fantasic. He just published an Intro to Large Language Models video which is a great overview of the subject. In this video, he presents a concept of an "LLM OS", which hasn't gotten enough discussion.
Andrej Karpathy 的 YouTube 频道很棒。他刚刚发布了一个关于大型语言模型的介绍视频，对该主题进行了很好的概述。在这个视频中，他提出了一个 "LLM 操作系统" 的概念，这个概念还没有得到足够的讨论。

I don't want to speak for Andrej, but his ideas are very clearly interpreted in the context of ChatGPT and ways to improve its functionality (as he currently works at OpenAI). But there is another interesting way to look at it, outside of ChatGPT.
我不想替 Andrej 说话，但他的想法在 ChatGPT 的背景下以及如何改进其功能（因为他目前在 OpenAI 工作）方面得到了非常清晰的解释。但还有另一种有趣的方式来看待它，即跳出 ChatGPT 的框架。

Large Language Models can do a lot of things, but a prompt-based interface limits their usefulness. We are in the dark ages of manually prompting LLMs. The future will be agentic.
大型语言模型可以做很多事情，但基于提示的界面限制了它们的用处。我们正处于手动提示 LLMs 的黑暗时代。未来将是自主的。

agentic: individuals or groups who have the ability to take initiative, make decisions, and exert control over their own actions and outcomes
自主：能够主动采取行动、做出决策并控制自身行动和结果的个人或群体

Through ReAct and similar prompting methods, agentic reasoning behavior is possible. There is a lot of research happening around this, but like Autogen most of it is focused on a chat interface.
通过 ReAct 和类似的提示方法，自主推理行为是可能的。目前围绕这一领域有很多研究，但与 Autogen 一样，大多数研究都集中在聊天界面上。

MemGPT is another important step: it begins to abstract away from the chat interface by treating user input as an event. However, the end goal of the system is to increase the quality of conversational agents (by improving the context window by managing memory).
MemGPT 是另一个重要的步骤：它开始通过将用户输入视为事件来抽象化聊天界面。然而，该系统的最终目标是提高对话代理的质量（通过管理记忆来改善上下文窗口）。

Looking at Andrej's concept of an LLM OS, we can see it is broader in scope. It could be applied as a modular architecture for agentic behavior. And in this architecture, user conversations and multi-agent systems are just a small part of the picture.
从 Andrej 的 LLM 操作系统概念来看，我们可以看到它的范围更广。它可以应用于代理行为的模块化架构。在这个架构中，用户对话和多代理系统只是其中很小的一部分。

Karpathy's LLM OS - screenshot from the video linked above
Karpathy 的 LLM 操作系统 - 来自上面链接视频的截图

The files are inside the computer
这些文件位于计算机内部

Chat was a great introductory interface for LLMs, but it's only one UX paradigm that can be applied, and quite low bandwidth. There is a lot of potential for LLMs to be used in other ways, and I think that's what Karpathy is dreaming about with his LLM OS concept.
聊天是 LLMs 的一个很好的入门界面，但它只是可以应用的众多 UX 范式之一，而且带宽很低。LLMs 在其他方面有很大的潜力，我认为这就是 Karpathy 在他的 LLM 操作系统概念中所梦寐以求的。

Computers are tools for humans to interact with information, and current mobile and desktop OSes are the best way we've found do that as a species. But they depend on slow user input to be manipulated. In the same way ChatGPT has made knowledge workers more productive, integrating LLMs into the OS could multiply the productivity of every user, by making the computer itself more useful.
计算机是人类与信息交互的工具，而当前的移动和桌面操作系统是我们作为物种找到的最佳方式。但它们依赖于缓慢的用户输入来进行操作。就像 ChatGPT 使知识工作者更高效一样，将 LLMs 集成到操作系统中可以成倍地提高每个用户的生产力，因为这使得计算机本身更有用。

Steve Jobs famously said that computers are like a bicycle for the mind. Autonomous semi-intelligent agents could turn computers into a spaceship for the mind.
史蒂夫·乔布斯曾说过，计算机就像心灵的自行车。自主的半智能代理可以将计算机变成心灵的宇宙飞船。

LLMs on the edge
LLMs 在边缘

As anyone who has used the GPT API could tell you, it can be slow and expensive af. For this "OS" to work, it would need to be fast and cheap. This means running the model on the edge, and not in the cloud.
任何使用过 GPT API 的人都可以告诉你，它可能很慢而且很贵。为了让这个“操作系统”能够正常工作，它需要快速且便宜。这意味着在边缘运行模型，而不是在云端。

As much as corporations would like everything to be rented in the cloud, we each have very capable devices in our pockets and on our desks. These devices have the keys to our digital castle, and using LLMs locally would be a great way to keep it that way.
尽管企业希望一切都在云端租赁，但我们每个人口袋里和办公桌上都有非常强大的设备。这些设备拥有我们数字城堡的钥匙，在本地使用 LLMs 将是保持这种状态的绝佳方式。

Apple has been pushing the idea of "on-device" machine learning for a while, and it may pay off in the long run. Not only is their "neural engine" compatible with transformers, local inference is very possible without it. Because of the unified memory architecture, the GPU can access the CPU's RAM, and so high end consumer GPUs aren't necessary. This allows Macs with 32GB of RAM to run 70B models. We can expect Moore's Law and derivatives to continue.
苹果一直在推动“设备端”机器学习的想法，这在长远来看可能会带来回报。不仅他们的“神经引擎”与 transformers 兼容，即使没有它，本地推理也是完全可行的。由于统一的内存架构，GPU 可以访问 CPU 的 RAM，因此不需要高端消费级 GPU。这使得拥有 32GB RAM 的 Mac 能够运行 70B 模型。我们可以预期摩尔定律及其衍生定律将继续下去。

The LLM "Kernel" LLM “内核”

Running a tool-enabled model as a sidecar service would be a great start. Imagine the common use case of writing a small Python tool with ChatGPT. Instead of copy/pasting the code from a chat window, the sidecar process could instead create a new project in VS Code, open it for you, save the file and run it.
将工具赋能模型作为辅助服务运行将是一个良好的开端。想象一下，使用 ChatGPT 编写一个小型 Python 工具的常见用例。辅助进程可以创建一个新的 VS Code 项目，为你打开它，保存文件并运行它，而不是从聊天窗口复制粘贴代码。

This seems to be where Microsoft is headed with Copilot, but it's still chat-based. The true magic here would be to allow the LLM to interact with the OS directly, and autonomously. User chat is just one (very important) input to the functionality of the system. For this reason, it's valuable to think of the LLM as a kernel, and not just a sidecar service.
这似乎是微软在 Copilot 上的发展方向，但它仍然是基于聊天的。真正的魔力在于允许 LLM 直接与操作系统交互，并自主地进行操作。用户聊天只是系统功能的一个（非常重要的）输入。因此，将 LLM 视为一个内核，而不仅仅是一个辅助服务，是有价值的。

The kernel is a computer program at the core of a computer's operating system and generally has complete control over everything in the system.
内核是计算机操作系统核心中的一个计算机程序，通常对系统中的所有内容都具有完全控制权。

If the LLM were available globally as a system API, we could allow userland programs to register functions for the LLM to use. This would allow using the existing app stores to distribute new functionality, while users continue to use their favorite apps. The UX transition would be less jarring than hoping everyone will want to write chat messages in a web browser all day.
如果 LLM 作为系统 API 在全局范围内可用，我们可以允许用户空间程序注册 LLM 可以使用的函数。这将允许使用现有的应用商店来分发新功能，同时用户可以继续使用他们喜欢的应用。与希望每个人都愿意整天在 Web 浏览器中写聊天信息相比，这种 UX 过渡会不那么令人反感。

With access to all of the user's information, including session tokens, RAG will become mind-bogglingly useful. Imagine being able to ask your computer "what was that article I read last week about LLMs?" and it could find it for you without your browsing history leaving your machine.
通过访问用户的全部信息，包括会话令牌，RAG 将变得令人难以置信地有用。想象一下，能够问你的电脑“上周我读的那篇关于 LLMs 的文章是什么？”，它可以帮你找到它，而你的浏览历史不会离开你的机器。

Time rhymes 时间押韵

Bonzi Buddy was a desktop assistant from the early 2000s. It was a cute little purple gorilla that would help you with your computer. It was a fun idea, but it was too early.
Bonzi Buddy 是 2000 年代初的桌面助手。它是一只可爱的紫色大猩猩，可以帮助你使用电脑。这是一个有趣的想法，但为时过早。

Not only was the wizard behind the curtain, but incentives were different: back then, the only monetization path was to get users to click on ads. Today, users are much more willing to pay for software. Additionally, device manufacturers are competing on software features to sell hardware. And the wizard is now a transformer model.
不仅幕后有巫师，而且激励措施也不同：那时，唯一的盈利途径是让用户点击广告。今天，用户更愿意为软件付费。此外，设备制造商正在争夺软件功能以销售硬件。而巫师现在变成了一个转换模型。

Incentives are aligned for a new generation of not only desktop assistants, but improved operating systems - and the technology has appeared at the right time.
激励措施与新一代的桌面助手以及改进的操作系统相一致，而这项技术恰逢其时。

Pondering the orb 思考着这个球体

What might the future look like? There are obvious use caseses of LLMs - but here are some non-obvious ways OS-level LLMs could be used:
未来会是什么样子？LLMs 的明显用例有很多，但以下是一些 OS 级 LLMs 的非明显用途：

Privacy-Enhanced Search: Perform personalized searches like "What article did I read about dachshunds last week?" using local browsing history, maintaining data privacy.
隐私增强搜索：使用本地浏览历史记录执行个性化搜索，例如“上周我读过关于腊肠犬的文章是什么？”，同时维护数据隐私。
Local File Management: Assist with content in local files, e.g., "Insert sales figures from last Tuesday's Excel file," using local data access.
本地文件管理：协助处理本地文件中的内容，例如“插入上周二的 Excel 文件中的销售数据”，使用本地数据访问。
Intuitive Troubleshooting: When a user encounters a system error or technical issue, the LLM can provide a plain-language explanation and step-by-step troubleshooting guidance, tailored to the user's technical expertise level. This could be especially helpful for my mom.
直观的故障排除：当用户遇到系统错误或技术问题时，LLM 可以提供通俗易懂的解释和分步故障排除指南，并根据用户的技术水平进行调整。这对于我妈妈来说尤其有用。
Emotional Tone Detection: While composing emails or messages, the LLM could analyze the text's emotional tone and suggest modifications to align with the user's intended sentiment. I don't want my draft text to be sent to the cloud for this.
情感语调检测：在撰写电子邮件或消息时，LLM 可以分析文本的情感语调，并建议修改以符合用户的预期情绪。我不想让我的草稿文本被发送到云端进行这项操作。
Health-Related Adjustments: If a user shows an elevated heart rate after a challenging meeting, the LLM could suggest taking a break or automatically dim the screen. It could also reword follow-up emails to be more diplomatic if the user is feeling stressed. Health data should stay local.
健康相关调整：如果用户在一次具有挑战性的会议后出现心率升高，LLM 可以建议休息一下或自动调暗屏幕。如果用户感到压力，它还可以将后续电子邮件改写得更加委婉。健康数据应保留在本地。
Smart Local Reminders: Set reminders and manage calendar events based on content from emails or messages, all processed locally for security. "What am I forgetting to follow up on?"
智能本地提醒：根据来自电子邮件或消息的内容设置提醒并管理日历事件，所有这些都在本地处理以确保安全。“我忘记了哪些事需要跟进？”
Advanced Voice Commands: Control devices with complex voice commands, e.g., "Open and start slideshow from slide 10 of yesterday’s presentation," using local data processing.
高级语音命令：使用本地数据处理，通过复杂语音命令控制设备，例如“打开并从昨天演示文稿的第 10 张幻灯片开始播放幻灯片”。
Personalized Recommendations: For example, if a user is planning a trip, the LLM could suggest relevant travel itineraries or packing lists stored on the user’s device, based on their past travel planning documents and preferences. Knowing the user is afraid of flying from past messages, it could evaluate routes and suggest a train trip instead.
个性化推荐：例如，如果用户正在计划旅行，LLM 可以根据用户过去的旅行计划文档和偏好，建议相关的旅行路线或打包清单，这些清单存储在用户的设备上。知道用户从过去的留言中害怕飞行，它可以评估路线并建议乘坐火车旅行。
Complex Automation: "Get me Taylor Swift tickets somewhere I can drive." (local calendar + location access) "Let me know when the food my dog likes is back in stock." (local email access)
复杂自动化：“给我一张泰勒·斯威夫特演唱会的票，地点我要能开车去。”（本地日历 + 位置访问权限）“当我的狗喜欢的食物重新有货时告诉我。”（本地电子邮件访问权限）

There has been speculation on Twitter that OpenAI has something big to release soon. I don't think this is it - but I would be surprised if they aren't playing around with something like this.
推特上一直有传言说 OpenAI 很快就会发布一些重大的东西。我不认为这就是它 - 但如果他们没有玩弄类似的东西，我会感到惊讶。

In the same way mobile changed the way we interact with computers, LLMs may change the way we interact with information. Only time will tell, and at the rate things are moving, it may not be long.
就像移动设备改变了我们与计算机的交互方式一样，LLMs 可能会改变我们与信息交互的方式。只有时间才能证明，而且随着事情的发展速度，可能不会太久。

The LLM OS is a fun concept, and if you have any ideas about it, I'd love to hear them!
LLM 操作系统是一个有趣的概念，如果你有任何想法，我很乐意听听！

What would an LLM OS look like?LLM 操作系统会是什么样子？