InfoQ Homepage Presentations AI Agents & LLMs: Scaling the Next Wave of Automation
InfoQ 主页演示 AI 代理和 LLMs：扩展自动化的下一波

AI, ML & Data Engineering
AI、ML 和数据工程

How to Build Reliable GenAI Systems: Architectures, APIs, and Guardrails (Webinar Aug 7th)
如何构建可靠的 GenAI 系统：架构、API 和护栏（网络研讨会 8 月 7 日）

AI Agents & LLMs: Scaling the Next Wave of Automation
AI Agents & LLMs：扩展自动化的下一波

View Presentation 查看演示文稿

Speed: 速度：

01:02:48

Summary 总结

The panelists demystify AI agents and LLMs. They define agentic AI, detail architectural components, and share real-world use cases and limitations. The panel explores how AI transforms the SDLC, addresses concerns about accuracy and bias, and discusses the Model Context Protocol (MCP) and future predictions for AI's impact.
小组成员揭开了 AI 代理和 LLM 的神秘面纱。它们定义代理 AI，详细说明架构组件，并分享现实世界的用例和限制。该小组探讨了 AI 如何改变 SDLC，解决对准确性和偏差的担忧，并讨论了模型上下文协议（MCP）和对 AI 影响的未来预测。

Bio

Govind Kamtamneni is Technical Director, App Innovation Global Black Belt, Microsoft Hien Luu - Sr. Engineering Manager @Zoox & Author of MLOps with Ray. Karthik Ramgopal - Distinguished Engineer & Tech Lead of the Product Engineering Team @LinkedIn. Moderated by: Srini Penchikala - InfoQ Editor.
Govind Kamtamneni 是应用创新全球黑带的技术总监，Microsoft Hien Luu - 高级工程经理@Zoox 和 MLOps 的作者。Karthik Ramgopal - 杰出的工程师和产品工程团队的技术负责人@LinkedIn。主持人：Srini Penchikala - InfoQ 编辑。

About the conference 关于会议

InfoQ Live is a virtual event designed for you, the modern software practitioner. Take part in facilitated sessions with world-class practitioners. Hear from software leaders at our optional InfoQ Roundtables.
InfoQ Live 是专为现代软件从业者设计的虚拟活动。参加与世界级从业者的促进性会议。在我们可选的 InfoQ 圆桌会议上听听软件领导者的意见。

INFOQ EVENTS INFOQ 活动

August 7th, 2025, 2 PM EDT
2025 年 8 月 7 日，美国东部夏令时下午 2 点
How to Build Reliable GenAI Systems: Architectures, APIs, and Guardrails
如何构建可靠的 GenAI 系统：架构、API 和护栏

Presented by: Karthik Ranganathan - Co-Founder and Co-CEO at YugabyteDB
主讲人：Karthik Ranganathan - YugabyteDB 联合创始人兼联合首席执行官

SPONSORED BY YUGABYTEDB

Transcript 抄本

Srini Penchikala: Welcome to today's InfoQ Live roundtable webinar, titled, "AI Agents and LLMs: Scaling the Next Wave of Automation". In this webinar, we will discuss the latest advancements in Large Language Models, or LLMs, and the recent trend of AI agents and agentic AI architectures. I am Srini Penchikala. I serve as the lead editor of the AI, ML, and Data Engineering Community at infoq.com.
Srini Penchikala：欢迎来到今天的 InfoQ Live 圆桌网络研讨会，主题为“AI 代理和 LLM：扩展下一波自动化”。在本次网络研讨会中，我们将讨论大型语言模型（LLM）的最新进展，以及 AI 代理和代理 AI 架构的最新趋势。我是 Srini Penchikala。我是 infoq.com 的 AI、ML 和数据工程社区的首席编辑。

For you today, we have an excellent panel with subject matter experts and practitioners, from different specializations and organizations in the AI and ML space. Our panelists are Hien Luu from Zoox, Karthik Ramgopal from LinkedIn, and Govind Kamtamneni from Microsoft.
今天，我们为您准备了一个出色的小组，其中包括来自 AI 和 ML 领域不同专业和组织的主题专家和从业者。我们的小组成员是来自 Zoox 的 Hien Luu、来自 LinkedIn 的 Karthik Ramgopal 和来自 Microsoft 的 Govind Kamtamneni。

Would you like to introduce yourself and tell our audience what you've been working on?
您想介绍一下自己并告诉我们的观众您一直在做什么吗？

Hien Luu: My name is Hien Luu. I currently work at Zoox. We are in the autonomy space. I lead the ML platform team there. I've been working on the ML infrastructure area for many years now. One of my hobbies is teaching. I've been teaching at an extension school in the areas of generative AI. It's been really fun. Right now is a very exciting time. Everybody knows we're in the midst of an AI revolution, so much innovation and advancement is going on at the moment. It's really fun.
Hien Luu：我叫 Hien Luu。我目前在 Zoox 工作。我们处于自动驾驶领域。我在那里领导 ML 平台团队。多年来，我一直在 ML 基础设施领域工作。我的爱好之一是教学。我一直在一所扩展学校教授生成式 AI 领域的工作。这真的很有趣。现在是一个非常激动人心的时刻。每个人都知道我们正处于一场 AI 革命之中，目前正在进行如此多的创新和进步。这真的很有趣。

Govind Kamtamneni: I'm part of our Global Black Belt team, basically an extension of engineering here. I'm a technical director here. I work with close to 200 customers since we launched this Azure OpenAI service. It's obviously been, as Hien noted, a revolution. Every experience and application is being evolved to be AI native. Essentially, I personalize or automate parts of your workflows or your experiences. There's a lot that we'll discuss here, things that are working for customers, but also not working.
Govind Kamtamneni：我是全球黑带团队的一员，基本上是工程团队的延伸。我是这里的技术总监。自从我们推出这项 Azure OpenAI 服务以来，我与近 200 个客户合作。正如 Hien 所指出的，这显然是一场革命。每个体验和应用程序都在发展成为 AI 原生体验。从本质上讲，我对你的部分工作流程或体验进行个性化或自动化。我们将在这里讨论很多内容，哪些对客户有用，但也不起作用。

Karthik Ramgopal: My name is Karthik Ramgopal. I'm a distinguished engineer at LinkedIn. I'm the tech lead of our product engineering team, which is responsible for all of our member and enterprise facing products. I'm also the overall tech lead for all of our generative AI initiatives, platform, and product, as well as increasingly internal productivity use cases as well. Everyone said a huge revolution is going on right now. In my opinion, after the industrial revolution, this is going to be the biggest transformation we are going to see of society in general. As with anything else, there is a lot of advancement, but also a lot of hype and noise. What I'm hoping we will see through the panel today is how to cut through some of that and focus on real-world insights, and examples of what you can actually do and what you will be able to do in the future, potentially with these technologies.
Karthik Ramgopal：我叫 Karthik Ramgopal。我是 LinkedIn 的一名杰出工程师。我是产品工程团队的技术主管，负责我们所有面向会员和企业的产品。我还是我们所有生成式 AI 计划、平台和产品的整体技术负责人，以及越来越多的内部生产力用例。每个人都说一场巨大的革命现在正在进行。在我看来，工业革命之后，这将是我们将看到的整个社会的最大转变。与其他任何事情一样，有很多进步，但也有很多炒作和噪音。我希望我们今天通过小组讨论看到的是，如何消除其中的一些问题，专注于现实世界的见解，以及你实际可以做什么以及你将来能够做什么的例子，可能使用这些技术。

Srini Penchikala: As I always say, definitely, we want to highlight the AI innovations that our audience should be hyped about and not the hype of the AI innovations.
Srini Penchikala：正如我经常说的，我们当然想强调我们的观众应该大肆宣传的 AI 创新，而不是对 AI 创新的炒作。

What's an AI Agent?
什么是 AI 代理？

There are so many different definitions and claims about agentic AI solutions. To set a common base of understanding for this discussion and for this forum, can you define what's an AI agent and how AI agent-based applications are different from traditional AI applications that have some type of a job scheduler, a workflow, and a decision-making capability built in. How is this different from those other apps?
关于代理 AI 解决方案，有很多不同的定义和声明。为了为本次讨论和论坛奠定一个共同的理解基础，您能否定义什么是 AI 代理，以及基于 AI 代理的应用程序与内置了某种类型的作业调度程序、工作流和决策功能的传统 AI 应用程序有何不同。这与其他应用程序有何不同？

Hien Luu: It's good to establish some kind of a baseline. Initially, when I learned about AI agents probably about a year ago from Andrew Ng, he introduced that, I got really confused. Like, what does it mean? The term, it's hard to digest in terms of what it encompasses and stuff like that. For me, after reading a lot of blogs, learning, whatever, the thing that really helped me understand is really step back and really understand the term, what is agentic or the agentic part of AI really means in terms of its own definition. I looked it up and according to the dictionary, it says like, the ability to act independently and achieve outcomes through self-directed actions. I think when I read that and I map that to, there's a lot of discussion about what AI agent systems are, it's starting to make a lot of sense. Hopefully that makes it a little bit easier for other folks that are a little bit confused about what AI agents are. With that, right now we map it to what AI agents are.
Hien Luu：建立某种基线是件好事。最初，当我大约一年前从 Andrew Ng 那里了解到 AI 代理时，他介绍了这一点，我真的很困惑。比如，这是什么意思？这个词，很难理解它所包含的内容和类似的东西。对我来说，在阅读了大量博客、学习等等之后，真正帮助我理解的是真正退后一步，真正理解这个术语，什么是 agentic 或 AI 的代理部分，就其自身的定义而言，它的真正含义。我查了一下，根据字典，它说的是，独立行动并通过自我指导的行动实现结果的能力。我认为，当我读到它并将其映射到它时，有很多关于 AI 代理系统是什么的讨论，它开始变得很有意义。希望这能让其他对 AI 代理有点困惑的人更容易一些。有了这个，现在我们将其映射到 AI 代理是什么。

Essentially, these are applications or systems, they're designed to solve certain complex tasks or to have certain specific goals. They're all typically centered around like o1 LLMs. LLMs now are extremely smart, so that you can use to help with reasonings, plannings, and now that tool use are a part of our tool sets now in terms of building LLM applications. These application systems where they use these tools, they can plan, they can reason, they can interact with environments, and they maintain and control how these systems accomplish tasks, essentially. That's where the agentic part comes in, of using these smart LLMs now that can reason and plan. That's my way of interpreting what this means.
从本质上讲，这些是应用程序或系统，它们旨在解决某些复杂的任务或具有某些特定目标。它们通常都像 o1 LLM 一样以 LLM 为中心。LLM 现在非常智能，因此您可以使用它来帮助推理、规划，现在工具的使用已成为我们构建 LLM 应用程序的工具集的一部分。这些应用程序系统使用这些工具，他们可以规划、推理、与环境交互，并且他们基本上维护和控制这些系统如何完成任务。这就是代理部分的用武之地，现在使用这些可以推理和计划的智能 LLM。这就是我解释这句话的方式。

I think the three concepts that I learned that's helped with understanding is the autonomy aspect of it. This is one aspect that's different than your classical AI, ML systems, where they're designed to solve specific tasks and stuff like that. The autonomy to figure out what are the process, the steps to solve the problems. The second one is the adaptability, that they now can reason and plan, they can adapt the steps based on the tools that they use in the active environment. I think the last one is the goal orientation. It's another part of these AI agent systems. I'll end with this one statement, or a definition that I saw, which I really like in simple terms is, an AI system using LLMs in a loop. You might have heard that before. I want to comment on one more thing, though. Andrew Ng just said something like, instead of arguing over what the definition is like, we can acknowledge that there are actually the different degrees of being agentic into our systems.
我认为我学到的有助于理解的三个概念是它的自主性方面。这是与传统 AI、ML 系统不同的一个方面，在经典 AI 和 ML 系统中，它们旨在解决特定任务和类似问题。弄清楚流程是什么，解决问题的步骤的自主权。第二个是适应性，他们现在可以推理和计划，他们可以根据他们在活动环境中使用的工具调整步骤。我认为最后一个是目标导向。它是这些 AI 代理系统的另一部分。我将以这句话或我看到的一个定义结束，简单来说，我非常喜欢它，即在循环中使用 LLM 的 AI 系统。你以前可能听说过。不过，我还想评论一件事。Andrew Ng 刚才说了这样的话，与其争论定义是什么，不如承认实际上有不同程度的代理性进入我们的系统。

Agentic AI vs. AI Agents: What's the Difference?
代理 AI 与 AI 代理：有什么区别？

Srini Penchikala: AI agents versus agentic AI, these two terminologies, how do you define those?
Srini Penchikala：AI 代理与代理 AI，这两个术语，您如何定义它们？

Karthik Ramgopal: Agentic AI refers to the entire system. If you look at the system, it comprises of AI agents, but it also comprises of orchestrators to orchestrate across these AI agents. It comprises of tools which you call in order to interact with the real world to do something. Again, systems for coordinating with these tools like MCP, which we'll get into a bit later in this discussion. You also have registries of various kinds to announce things, to access things. That end-to-end system is what is called as an agentic AI system. An agent is just one small component of this system.
Karthik Ramgopal：代理 AI 是指整个系统。如果您查看该系统，它由 AI 代理组成，但也由编排器组成，用于在这些 AI 代理之间进行编排。它由您调用的工具组成，以便与现实世界交互以执行某些作。同样，与这些工具（如 MCP）进行协调的系统，我们将在本次讨论的后面介绍。你也有各种类型的注册中心来宣布事物、访问事物。这种端到端系统就是所谓的代理 AI 系统。代理只是该系统的一个小组成部分。

Govind Kamtamneni: I think that's well put. In fact, Berkeley has been coining this term Compound AI system for products that we're going to ship eventually, because there will be aspects of a workflow that requires agency. For example, let's say you have an email automation system. I'm actually working with this customer who's doing this at scale. Let's say you send a tracking number. It automatically replies back what he's doing there. There's part of that where, of course, it has to access tools, but then when it drafts that email and sends it to the end user, it'll obviously factor in the tool's response. This is where the LLM's stochasticity comes into play. It can personalize the response. Maybe the reader or the person that sent the request based on their maybe reading comprehension, it could adjust the response to be more terse, short or lengthy. I think you can have workflows that are more predefined.
Govind Kamtamneni：我认为这说得好。事实上，Berkeley 一直在为我们最终要交付的产品创造这个术语 Compound AI system，因为工作流程的某些方面需要代理。例如，假设您有一个电子邮件自动化系统。实际上，我正在与这个正在大规模执行此作的客户合作。假设您发送了一个运单号。它会自动回复他在那里做什么。当然，其中一部分它必须访问工具，但是当它起草该电子邮件并将其发送给最终用户时，它显然会考虑该工具的响应。这就是 LLM 的随机性发挥作用的地方。它可以个性化响应。也许读者或发送请求的人根据他们的阅读理解能力，它可以将响应调整得更简洁、更短或更长。我认为您可以拥有更预定义的工作流程。

Obviously, we have those, business process automation, for the last 20 years we've been doing that. You can introduce agency steps within the workflow. It's a spectrum. There are some SWE agents, we'll probably get to this later, software engineering agents that can be more goal oriented, that can have a lot more agency to how they create that orchestration and create code paths autonomously. It is definitely a spectrum. The gist of it is, use the LLM in the control flow, depending on if you use the LLM to drive more of the control flow, then it becomes a lot more agentic, which means it also introduces a lot more uncertainty, let's put it that way. You can put it on rails, and you can have more control over the code path and the control flow, that makes it more of a workflow with some agency.
显然，我们有业务流程自动化，在过去的 20 年里，我们一直在这样做。您可以在工作流中引入代理步骤。这是一个光谱。有一些 SWE 代理，我们稍后可能会谈到，软件工程代理可以更面向目标，它们可以在如何创建编排和自主创建代码路径方面有更多的代理。这绝对是一个光谱。它的要点是，在控制流中使用 LLM，这取决于你是否使用 LLM 来驱动更多的控制流，那么它就会变得更加代理，这意味着它也引入了更多的不确定性，让我们这么说吧。你可以把它放在轨道上，你可以对代码路径和控制流有更多的控制权，这使得它更像是一个具有一些代理的工作流程。

Karthik Ramgopal: I think one key difference with the earlier systems is what Govind was alluding to, which is the emerging ability of these systems to autonomously learn and generate tools often to solve the problem. It's called metacognition, which is learning how to learn. This is still an emergent ability. We are seeing in some of these cases, these agents are able to author snippets of code or are able to orchestrate across tools, for example, to unblock themselves to solve a particular task. Again, it sometimes goes off the rails. Sometimes these agents get stuck. These are still very nascent capabilities. That is a key difference from the systems of yore, where the level of agency, the level of cognition, and hence the level of autonomy enabled is significantly higher.
Karthik Ramgopal：我认为与早期系统的一个关键区别是 Govind 所暗示的，即这些系统具有自主学习和生成工具的新兴能力，通常可以解决问题。这被称为元认知，即学习如何学习。这仍然是一种新兴能力。我们看到，在其中一些情况下，这些代理能够编写代码片段或能够跨工具进行编排，例如，解除自身阻塞以解决特定任务。同样，它有时会偏离轨道。有时这些代理会卡住。这些仍然是非常初级的功能。这是与过去系统的关键区别，过去的系统具有代理的水平、认知的水平，因此所实现的自主性水平要高得多。

Agentic AI Use Cases: Overkill or Fit-for-Purpose?
代理 AI 用例：矫枉过正还是适合目的？

Srini Penchikala: Agentic AI architectures are new and still evolving. Like you mentioned, they will have to go through some growing pains before they become more valuable than the biases and hallucinations, which we'll talk more later in the discussion. Just staying at the use case level, what are some use cases or applications that you are seeing where AI agents are a really good fit? Also, the other side of that question is, what are the use cases where AI agents and agentic AI solutions are either overkill or are not recommended for the applications?
Srini Penchikala：代理 AI 架构是新的，并且仍在不断发展。就像你提到的，他们必须经历一些成长的痛苦，才能变得比偏见和幻觉更有价值，我们将在后面的讨论中详细讨论。仅停留在用例级别，您看到哪些 AI 代理真正适合的用例或应用程序？此外，该问题的另一面是，AI 代理和代理 AI 解决方案在哪些用例中要么矫枉过正，要么不推荐用于应用程序？

Karthik Ramgopal: I think any sort of knowledge work automation is where AI agents are excelling right now. I think it's only going to be a matter of time before somebody hooks up an AI model to a set of actuators to interact with the real world and they can do physical tasks as well. At least I haven't heard of anything happening at scale so far like that. Right now, it's mostly limited to these forms of knowledge work. Anything which involves a computer, anything which involves software, anything which involves calling an API, and anything which involves orchestrating a bunch of different things to make it happen. That is what it's pretty good at. What are some of the impediments there? The first impediment is that it goes off the rail and quality suffers.
Karthik Ramgopal：我认为任何类型的知识工作自动化都是 AI 代理目前擅长的地方。我认为，有人将 AI 模型连接到一组致动器以与现实世界交互只是时间问题，他们也可以执行物理任务。至少到目前为止，我还没有听说过像这样大规模发生的事情。目前，它主要局限于这些形式的知识工作。任何涉及计算机的事情，任何涉及软件的事情，任何涉及调用 API 的事情，以及任何涉及编排一堆不同事情来实现它的事情。这就是它非常擅长的地方。那里有哪些障碍？第一个障碍是它偏离轨道，质量受到影响。

The second is that the evaluation of whether it is doing the job correctly or not is hard. The third is, even with all the cost reductions, these models are still incredibly expensive and GPU capacity is fairly constrained at scale on account of a variety of factors. There are some organic limitations in terms of cost, in terms of compute availability, and in some cases in terms of latency, which are also inhibiting crazy adoption. Where are they not good? Obviously, if you have any workflow where you want to have a lot of tight control and you do not want agency, effectively you're following algorithmic business logic, AI agents are overkill. They're going to be expensive, and they're also unnecessary.
第二个是很难评估它是否正确地完成工作。第三个是，即使降低了所有成本，这些模型仍然非常昂贵，并且由于各种因素，GPU 容量在规模上受到相当大的限制。在成本、计算可用性方面，以及在某些情况下在延迟方面存在一些有机限制，这些限制也阻碍了疯狂的采用。他们哪里不好？显然，如果你有任何工作流程，你想要有很多严格的控制，而你又不想有代理，那么实际上你是在遵循算法业务逻辑，那么 AI 代理就太矫枉过正了。它们会很昂贵，而且也是不必要的。

One thing which you can still do is you can always use AI in order to help you generate that logic or generate that code beforehand before you deploy it. It's like moving it further left in the chain. You do it at build time instead of doing it at runtime, and you still have a human verify it before you go and deploy it.
您仍然可以做的一件事是，您始终可以使用 AI 来帮助您生成该逻辑或在部署之前提前生成该代码。这就像在链条中进一步向左移动。您可以在构建时执行此作，而不是在运行时执行，并且在开始部署之前，您仍然需要人工验证它。

How Reliable/Accurate Should AI Agents Be?
AI 代理的可靠性/准确性应该如何？

Srini Penchikala: How reliable does an agentic AI system need to be in order to be useful? For example, 90% accuracy sounds pretty good, but picking up the pieces after 10% wrong results sounds like a lot of work. Again, where are we right now, and where are we going with this?
Srini Penchikala：代理 AI 系统需要多可靠才能发挥作用？例如，90% 的准确率听起来不错，但在 10% 的错误结果之后重新开始听起来像是很多工作。同样，我们现在在哪里，我们要去哪里？

Govind Kamtamneni: Obviously, the classical answer is it depends. Depends on your use case and your risk appetite. I have some digital native customers who are pushing the frontier. Think of, for example, one of the, not GenAI, but traditional ML systems, since Hien is here. Tesla, for example, pushes the frontier, and issue software updates for all kinds of FSD things. I was actually experiencing FSD and it's getting better, but there are those 10% cases where it does go off the rails. As a brand, Tesla is fine taking that risk on. I think it depends on the experience you want to provide to the end user. Obviously, we're all here to serve their needs at the end of the day. If you think your end user wants the latest and greatest, if you can caveat that experience. Even when ChatGPT launched, I think even now on the bottom, it says, responses might be inaccurate or something like that. Just keep in mind that it will have implications to your brand, like the trust. You don't want to tolerate too much risk or push that risk onto the end user.
Govind Kamtamneni：显然，经典的答案是这要看情况。取决于您的使用案例和风险承受能力。我有一些数字原生客户，他们正在推动前沿发展。例如，想想一个不是 GenAI，而是传统的 ML 系统，因为 Hien 就在这里。例如，特斯拉（Tesla）不断开拓前沿，为各种 FSD 产品发布软件更新。我实际上正在经历 FSD，而且情况正在好转，但有 10% 的情况确实会偏离轨道。作为一个品牌，特斯拉愿意承担这种风险。我认为这取决于您希望为最终用户提供的体验。显然，我们都是为了满足他们的需求。如果您认为您的最终用户想要最新、最好的体验，那么您可以警告这种体验。即使 ChatGPT 推出时，我认为即使是现在，它在底部也说，回复可能不准确或类似的东西。请记住，它会对您的品牌产生影响，例如信任。您不想容忍太多风险或将风险推给最终用户。

Hien Luu: I think like most new technology, there's a certain amount of learnings and there's a certain amount of risks. Technology will get better. I think it's basically a journey for all of us to be on and learn. You want to be smart about how to apply in certain use cases that are the consequence of a wrong decision that is not too bad. We can learn from that experience and then build on top of that and apply to more complex use cases in the future. That's the current state of where we are. There are definitely a lot of use cases where AI or agentic AI can be really tremendous, like we saw with deep research. That use case is tremendous in terms of the ability to save time to do research.
Hien Luu：我认为就像大多数新技术一样，有一定的学习量，也存在一定的风险。技术会变得更好。我认为这基本上是我们所有人都需要继续学习和学习的旅程。您希望明智地了解如何在某些用例中应用，这些用例是由于一个不太糟糕的错误决定的结果。我们可以从这些经验中学习，然后在此基础上进行构建，并在未来应用于更复杂的用例。这就是我们目前的状况。肯定有很多用例 AI 或代理 AI 可以非常巨大，就像我们在深入研究中看到的那样。该用例在节省研究时间的能力方面非常出色。

Karthik Ramgopal: I think it's really hard if you talk about these quality percentages as a whole, because most of the use cases where AI is applied today, the product is fairly complex. If it weren't, why would you use AI. You'd just code it up. It's important to talk about components of the system and your level of risk tolerance in each of these components. There's also this concept of building confidence models and invoking human in the loop.
Karthik Ramgopal：我认为，如果你把这些质量百分比作为一个整体来谈论，这真的很难，因为今天应用人工智能的大多数用例，产品都相当复杂。如果不是，您为什么要使用 AI。你只需编写代码即可。重要的是要讨论系统的组成部分以及您在每个组成部分中的风险承受能力水平。还有构建置信度模型和调用 human in the loop 的概念。

At least with the capabilities of AI systems right now, doing full autonomy is mostly like a pipe dream, except for very few use cases. Human in the loop and a lot of AI system architectures need to be designed for this, because you cannot put SLAs on humans. You can put SLAs on systems where you actively invoke the human to ask for clarification or to ask for approval or to ask for input before you undertake a sensitive action. That is, again, a pattern which you can fundamentally integrate. It's also important to note that you have to be very careful with the definition of quality and correctness if you're doing tasks in a loop or if you're doing tasks in a chain. Because what happens is that if you have 90% accuracy everywhere, the error rates build up. First, it'll be 90%, then it'll be 81%, so on and so forth. Progressively, it could result in a much worse error rate than what you anticipated at the beginning.
至少以 AI 系统现在的能力，除了极少数用例外，实现完全自主性几乎就像白日梦。需要为此设计 Human in the Loop 和许多 AI 系统架构，因为您不能将 SLA 放在人类身上。您可以将 SLA 放在系统上，在执行敏感作之前，您可以主动调用人工来请求澄清、请求批准或请求输入。也就是说，同样，你可以从根本上整合一个模式。同样重要的是要注意，如果你在循环中执行任务或在链中执行任务，则必须非常小心质量和正确性的定义。因为发生的情况是，如果你在所有地方都有 90% 的准确率，错误率就会增加。首先，它会是 90%，然后是 81%，依此类推。逐渐地，它可能会导致比您开始时预期的错误率差得多。

Srini Penchikala: Also, on the other hand, so we can use these solutions to iteratively train and learn. Like you said, they can learn about learning. You can use that to your advantage.
Srini Penchikala：此外，另一方面，我们可以使用这些解决方案来迭代训练和学习。就像你说的，他们可以学习学习。你可以利用它来发挥自己的优势。

Karthik Ramgopal: It's very important to mirror the real world in the way you design these applications. I can give you a classic example, coding agents. How do coding agents correct themselves? Just like humans do. You feed them the compiler error and then they look at it and then they know, here is where I messed up. Without that integration with the tool which provides them access to the compiler error, they are equally in the dark, just like you would be if you aren't seeing error log show up in your CI/CD system.
Karthik Ramgopal：在设计这些应用程序时，将现实世界镜像非常重要。我可以举一个经典的例子，编码代理。编码代理如何自我纠正？就像人类一样。你把编译器错误给他们，然后他们看了看，然后他们就知道了，这就是我搞砸的地方。如果没有与为他们提供编译器错误访问权限的工具集成，他们同样处于黑暗中，就像您在 CI/CD 系统中看不到错误日志一样。

Govind Kamtamneni: I hundred percent agree. We do see a lot of systems like Cognition's Devin, for example, does a good job. Even some of the coding agents when they're misaligned, it does ask for the human to provide clarification. Sometimes they also reward hack. As a human, you have to validate the response that it generated. For example, some of these software engineering agents, they will accomplish your acceptance criteria, but then they might have hardcoded parts of it. We see that as well. Obviously, humans are very much in the loop. We just have to work with them at a much more higher-level abstraction. The end result is output for a human goes up. Human is very much in the loop.
Govind Kamtamneni：我百分之百同意。我们确实看到很多系统，例如 Cognition 的 Devin，做得很好。即使是一些编码代理，当它们错位时，它确实需要人工提供澄清。有时他们也会奖励 hack。作为人类，您必须验证它生成的响应。例如，其中一些软件工程代理，他们将完成您的验收标准，但随后他们可能会对部分标准进行硬编码。我们也看到了这一点。显然，人类在很大程度上处于循环中。我们只需要在更高级别的抽象中与它们合作。最终结果是 Human 的 output 上升。Human 在很大程度上处于循环中。

The Architectures of Agentic AI (Technical Details)
代理 AI 的架构（技术详细信息）

Srini Penchikala: Let's talk about the technical details of these architectures. What do the agentic AI application architectures comprise of? What are the key components? Then, how do they interact with each other? In other words, what additional tools and infrastructure do we need to develop and deploy these applications?
Srini Penchikala：我们来谈谈这些架构的技术细节。代理 AI 应用程序架构由什么组成？关键组件是什么？那么，它们是如何相互作用的呢？换句话说，我们需要哪些额外的工具和基础设施来开发和部署这些应用程序？

Govind Kamtamneni: I actually have a huge blog post about this with decision tree and all that. You can probably search my name and search for how to build AI agents fast. Essentially, it is almost as similar as building earlier microservices that are cloud native that we built. You need an orchestration layer or component, then obviously the model is doing the reasoning. You need to have round trips with the model. The most important thing with all AI agents and with the round trips with the model is the context.
Govind Kamtamneni：实际上，我有一篇很长的博文，内容是关于决策树之类的。您可能可以搜索我的名字并搜索如何快速构建 AI 代理。从本质上讲，它几乎与构建我们构建的早期云原生微服务一样相似。你需要一个编排层或组件，那么显然模型正在进行推理。您需要与模型进行往返。对于所有 AI 代理以及与模型的往返，最重要的是上下文。

Ultimately, you're going to be spending most of your time on organizing that information layer and the information retrieval aspect. Providing the right context is literally everything. That layer is also very important. The ways you organize information can be more semantic. There's a lot of vector databases. Pretty much every database now supports vector index capabilities. That just allows you to improve the quality of your retrieval. Then you can, of course, stitch them together, as Karthik was saying, but be careful when you have too many multi-agents that are interacting with each other, the error rate can compound. More importantly, the quality attributes of the system. At the end of the day if you want reliable, highly available systems, you got to make sure that there's a gateway in front of your system that handles authentication and authorization for agents as well.
最终，您将花费大部分时间来组织信息层和信息检索方面。提供正确的上下文实际上就是一切。该层也非常重要。您组织信息的方式可以更加语义化。有很多矢量数据库。现在，几乎每个数据库都支持向量索引功能。这只允许您提高检索质量。然后，正如 Karthik 所说，你当然可以将它们拼接在一起，但当你有太多相互交互的多代理时要小心，错误率可能会加剧。更重要的是，系统的质量属性。归根结底，如果您想要可靠、高度可用的系统，您必须确保系统前面有一个网关，该网关也处理代理的身份验证和授权。

Entitlements are very important for agents. You'll see a lot of the identity providers, including from Microsoft, we have Entra ID, move in this space where we're going to facilitate agents to have their own identities, because we do envision a world where a lot of flows with agents accessing data or tools is on behalf of user flows. We do envision a world where agents are going to be more event-driven. They're going to act independently and they'll then ask the human or someone for permissions. In that case, entitlements is very important. That's more coming soon, but for now, just making sure that you have proper fine-grained authorization for the resources that the agents are going to access, or the orchestration layer is going to access.
权利对代理来说非常重要。你会看到很多身份提供商，包括来自 Microsoft 的身份提供商，我们有 Entra ID，在这个领域中，我们将帮助代理拥有自己的身份，因为我们确实设想了一个世界，其中许多流，代理访问数据或工具代表用户流。我们确实设想了一个代理将更加受事件驱动的世界。他们将独立行动，然后他们会向人类或某人请求权限。在这种情况下，权利非常重要。该功能即将推出，但就目前而言，只需确保您对代理将要访问的资源或编排层将要访问的资源拥有适当的精细授权。

Then, even between the orchestration and the model layer, usually we recommend some L7 gateway that can handle failovers or things of that nature, because these models ultimately, they're doing a lot of matrix multiplications and they very much introduce latency to the experience. They are not super reliable. There's a lot of capacity problems. You want to handle that scenario at scale. Then, yes, just packaging up, like you would package any microservice and deploying it to any container-based solution, that's all the same. I think the one most important thing here is evals. Evaluations are key.
然后，即使在编排和模型层之间，通常我们也会推荐一些可以处理故障转移或类似性质的 L7 网关，因为这些模型最终会进行大量的矩阵乘法运算，并且会给体验带来延迟。他们不是超级可靠。有很多容量问题。您希望大规模处理该场景。然后，是的，只需打包，就像打包任何微服务并将其部署到任何基于容器的解决方案一样，这都是一样的。我认为这里最重要的一点是 evals。评估是关键。

Obviously, the models are saturating a lot of benchmarks that showcase reasoning. Yes, they're highly good polymaths that can reason well, but the evals that you care about, at the end of the day whatever experience you're stitching, like create a benchmark for that, create evaluations that are idiosyncratic for your use case. Then there's a lot of libraries, we have one from Azure AI, but there's libraries out there on evals, there's RAGAs. Use those eval libraries and also do continuous monitoring, and make sure you collect user feedback. That data is also very rich potentially if you ever want to fine-tune and things like that. Make sure to store that data. There's a lot there. It's like the existing ways of doing cloud native twelve-factor apps still apply, but then there's all these nuances with GenAI stuff.
显然，这些模型正在使许多展示推理的基准饱和。是的，他们是非常优秀的博学家，可以很好地推理，但你关心的评估，归根结底，无论你正在拼接什么经验，比如为此创建一个基准，都会创建针对你的用例特有的评估。然后有很多库，我们有一个来自 Azure AI 的库，但也有关于 evaluate 的库，还有 RAGA。使用这些评估库并进行持续监控，并确保收集用户反馈。如果您想进行微调之类的事情，这些数据也可能非常丰富。确保存储该数据。那里有很多东西。就像现有的云原生 12 因素应用程序方法仍然适用，但 GenAI 存在所有这些细微差别。

Hien Luu: That's very true, what you said there. A lot of people tend to focus on those, ok, what are the new ways of building systems, these AI systems. Similar to how we thought about building classical AI, ML system where you saw that famous picture where AI, the model is just a small piece of the overall bigger system. An AI system is similar in a way, a lot of pieces that we learned over the last 20, 30 years of building microservices, all that's still applicable. The AI part is still just a small part of the overall system. There are some new parts, like Govind mentioned, that's very specific to the nature of these stochastic systems with evaluations and all that stuff like that.
Hien Luu：你说的没错。很多人倾向于关注那些，好吧，构建系统的新方法是什么，这些 AI 系统。类似于我们构建经典 AI 的 ML 系统，您看到的那张著名的图片，其中 AI 模型只是整个更大系统的一小部分。AI 系统在某种程度上是相似的，我们在过去 20 到 30 年构建微服务中学到了很多东西，所有这些仍然适用。AI 部分仍然只是整个系统的一小部分。正如 Govind 提到的，有一些新部分非常特定于这些随机系统的性质，包括评估和所有类似的东西。

Karthik Ramgopal: The rules never change. Good systems design is still good systems design. Account for the increased level of non-determinism in these systems, and that results in a variety of ways. First is observability. A lot of these stochastic systems we have for observability don't work here. You do not have predefined paths. By definition, there is a lot of agency, and there is a lot of possibilities and various paths which can be taken. You need to invest in an observability solution which will give you intelligible data from what's happening in production in the system. That's the first.
Karthik Ramgopal：规则永远不会改变。好的系统设计仍然是好的系统设计。解释这些系统中非确定性水平的增加，这导致了多种方式。首先是可观察性。我们用于可观测性的许多随机系统在这里不起作用。您没有预定义的路径。根据定义，有很多能动性，有很多可能性和各种途径可以采取。您需要投资于可观察性解决方案，该解决方案将为您提供来自系统中生产中发生的事情的可理解数据。这是第一个。

The second is from a hardware resource capacity planning perspective as well, that this non-determinism gets in. Have you provisioned enough resources or do you have enough safeguards built into your system to be able to throttle workloads to be able to asynchronously execute them when capacity is available, prioritize them appropriately. Again, these problems happen at a certain amount of scale. That's important. I think the third thing is, a lot of these systems, in addition to being slow, also fail because you have so many moving components, you have so much non-determinism, so you may not get the right output every time. How robust is your error handling, graceful degradation, escalation to human in the loop, all these things? These are not just systems design, like they have to reflect every way, from your UI to your AI. Because some of these also manifest themselves in the UI as to how do you respond to users. That end-to-end picture is super important.
第二个是从硬件资源容量规划的角度来看，这种不确定性也进入了。您是否预置了足够的资源，或者您是否在系统中内置了足够的保护措施，以便能够限制工作负载，以便在容量可用时异步执行它们，并适当地确定它们的优先级。同样，这些问题以一定程度的规模发生。这很重要。我认为第三件事是，很多这样的系统，除了速度慢之外，还会失败，因为你有太多的移动组件，你有太多的不确定性，所以你可能无法每次都得到正确的输出。您的错误处理、优雅的降级、升级到人为循环，所有这些事情有多健壮？这些不仅仅是系统设计，它们必须反映从 UI 到 AI 的方方面面。因为其中一些也会在 UI 中体现出来，比如你如何响应用户。这种端到端的画面非常重要。

Srini Penchikala: I'm glad you all mentioned these additional topics that we need to be cognizant about in terms of AI context, authentication, authorization, API gateway routing, circuit breaker, observability, Karthik, you mentioned that. I was envisioning we would be talking about these more like next year, like a phase two of AI agents. I'm glad that we're all talking about them now, because these should not be afterthoughts. They should be built into the systems right from the beginning.
Srini Penchikala：我很高兴你们都提到了我们需要了解 AI 上下文、身份验证、授权、API 网关路由、断路器、可观察性、Karthik 等其他主题。我设想我们会像明年一样更多地讨论这些，就像 AI 代理的第二阶段一样。我很高兴我们现在都在谈论它们，因为这些不应该是事后的想法。它们应该从一开始就内置到系统中。

Leveraging AI Agents in the SDLC (Software Development Life Cycle)
在 SDLC（软件开发生命周期）中利用 AI 代理

Most of our audience are senior technical leaders in their organizations. They would like to know how they can benefit from AI agents in their day-to-day work tasks, which is basically software development and so on. How can we leverage AI agents in different parts of the SDLC process? What is the new role of a software developer with more and more development tasks being managed by AI programs? How can we be still relevant and not necessarily fear about AI agents, but embrace them and use them to better ourselves?
我们的大多数听众是他们组织中的高级技术领导者。他们想知道如何在日常工作任务（基本上是软件开发等）中从 AI 代理中受益。我们如何在 SDLC 流程的不同部分利用 AI 代理？随着越来越多的开发任务由 AI 程序管理，软件开发人员的新角色是什么？我们如何才能仍然与时俱进，而不必害怕 AI 代理，而是拥抱它们并利用它们来改善自己？

Karthik Ramgopal: I think what you said at the last is very important. You should not fear it. You should embrace it, and see how you can use it best. I can talk a bit about how I use AI in my day-to-day development. The first is I don't vibe code. There is so much hype about vibe coding. I still don't believe in the hype. It may be good for prototyping small things, but building anything serious, which is what I think software developers end up doing in professional environments, not so great yet. Maybe it will get there. What I do use though, is I use a bunch of these AI native IDEs. LinkedIn is part of the Microsoft family, so we use GitHub Copilot quite a bit. GitHub Copilot is getting better off late with agent mode and all these things, especially in VS Code.
Karthik Ramgopal：我认为您最后所说的非常重要。你不应该害怕它。你应该接受它，看看如何最好地利用它。我可以谈谈我如何在日常开发中使用 AI。首先是我不 vibe 代码。关于 vibe 编码的炒作太多了。我仍然不相信这种炒作。它可能适合于小事物的原型设计，但构建任何严肃的东西，我认为这就是软件开发人员最终在专业环境中做的事情，还不是那么好。也许它会到达那里。不过，我使用的是一堆这样的 AI 原生 IDE。LinkedIn 是 Microsoft 大家庭的一部分，因此我们经常使用 GitHub Copilot。GitHub Copilot 最近在代理模式和所有这些方面做得越来越好，尤其是在 VS Code 中。

Again, I use GitHub Copilot quite a bit to understand the code base, ask it questions, which I'd have to research manually normally, as well as ask it to make changes. It still makes mistakes, although it's getting better. The onus is still on me to have an understanding of, first, how do I use the tool? How do I prompt it the best? How do I ask the follow-up questions in the right way? What model do I choose? Because you have the reasoning models, you have the regular generative models, which are non-reasoning for certain kinds of tasks. Certain models are good. Again, you get to know some of these things as you start interacting with these tools more and more. The second is, how do I review the output it produces? One of the challenges I'm facing already is that AI is a great productivity accelerator, but it can produce reams and volumes of code way faster than a human can. I need to keep up with the ability to review it because it can still make mistakes.
同样，我经常使用 GitHub Copilot 来理解代码库，向它提出问题，我通常必须手动研究这些问题，并要求它进行更改。它仍然会犯错误，尽管它正在变得更好。我仍然有责任了解，首先，我如何使用该工具？我该如何最好地提示它？如何以正确的方式提出后续问题？我选择什么型号？因为你有推理模型，所以你有常规的生成模型，这些模型对某些类型的任务是非推理的。某些模型很好。同样，随着您开始越来越多地与这些工具交互，您会了解其中的一些内容。第二个是，我如何查看它产生的输出？我已经面临的挑战之一是 AI 是一个很好的生产力加速器，但它可以比人类更快地生成大量代码。我需要跟上审查它的能力，因为它仍然会犯错误。

More importantly, the mistakes it makes are often very subtle in nature. It's not very obvious. You have to have an extra eye for detail. If anything, your conceptual knowledge, as well as your general ability to understand large pieces of information and synthesize results from it, needs to get better in order for all of you to keep up, all of us to keep up, in this new world. That is one area, coding assistants and things like that. You can also use it for unit test generation of various kinds. You don't need to write pesky tests yourself, although they are still very important for quality. Then there are also aspects of using AI further right, where you can use it in order to understand anomalies in your traffic patterns, incident root causing, deployment failures, debugging issues there. That is one entire category of things. You can also move further left, where even during the design process, you can use AI quite a bit.
更重要的是，它所犯的错误往往在本质上是非常微妙的。这不是很明显。你必须特别注意细节。如果有的话，你的概念知识，以及你理解大量信息并从中综合结果的一般能力，需要变得更好，以便你们所有人都能跟上，我们所有人都能跟上，在这个新世界里。这是一个领域，编码助手和类似的东西。您还可以将其用于各种单元测试生成。您不需要自己编写讨厌的测试，尽管它们对于质量仍然非常重要。然后，在更右侧使用 AI 的某些方面，您可以使用它来了解流量模式中的异常、事件根源原因、部署失败、调试问题。这是一整类事情。你也可以进一步向左移动，即使在设计过程中，你也可以大量使用 AI。

For example, we have integrated Glean at LinkedIn. I end up using Glean quite a bit because it's connected to our entire corpus of documents, Office 365, Google Docs, our internal wikis, which contain a ton of information. If I'm doing some research, for example, to write a design doc, I will start with a Glean chat prompt, which essentially saves a bunch of grunt work for me from going, finding that information, putting references appropriately, and crafting the design. Again, that initial outline, which is produced, I will refine it later with my HI, human intelligence, in order to get it into the final shape.
例如，我们已经在 LinkedIn 上集成了 Glean。我最终使用了很多 Glean，因为它连接到我们的整个文档语料库、Office 365、Google Docs、我们的内部 wiki，其中包含大量信息。如果我正在做一些研究，例如，写一个设计文档，我会从一个 Glean 聊天提示开始，这基本上为我省去了一堆繁重的工作，让我无法去寻找信息，适当地放置参考资料，并制作设计。同样，我稍后会用我的 HI（人类智能）来提炼它，以便让它进入最终的形状。

Srini Penchikala: What other parts of the SDLC are you guys seeing that AI agents are being used?
Srini Penchikala：你们看到 SDLC 的哪些其他部分正在使用 AI 代理？

Hien Luu: I think testing is an area that these models can help quite a bit. Most software engineers probably don't like that much writing tests. I think that's an area that we can leverage these tools to help with that. Not just writing tests, but also asking insightful questions about edge cases or other things that Karthik brought up. I would love to use that. Stepping back now, I don't code as much anymore, but if I were an engineer and that's 100% of my daily tasks, I would probably give more thought about like, what kinds of questions, because these LLMs are there to answer our questions.
Hien Luu：我认为测试是这些模型可以提供帮助的一个领域。大多数软件工程师可能不太喜欢编写测试。我认为这是我们可以利用这些工具来帮助实现这一目标的领域。不仅编写测试，还询问有关 Karthik 提出的边缘情况或其他事情的有见地的问题。我很想使用它。现在回过头来，我不再那么多编码了，但如果我是一名工程师，这是我 100% 的日常任务，我可能会更多地考虑诸如什么样的问题，因为这些 LLM 可以回答我们的问题。

If we can come up with smart, intelligent questions that are relevant to help our software engineering tasks, I think I would spend a lot more time thinking about what kind of question I should ask them such that I can improve whatever task I'm doing, whether it's improving the robustness of the microservice, or handling throttling, or whatever that is. I think that's a different mindset, and a mindset that software engineers need to start building more of like how to think about what kind of thoughtful questions that would be useful for their tasks.
如果我们能提出与我们的软件工程任务相关的聪明问题，我想我会花更多的时间思考我应该问他们什么样的问题，这样我就可以改进我正在做的任何任务，无论是提高微服务的健壮性，还是处理限制。或者其他什么。我认为这是一种不同的心态，也是软件工程师需要开始建立的心态，比如如何思考什么样的深思熟虑的问题对他们的任务有用。

Govind Kamtamneni: Yes, hundred percent. I just want to add that the answers are all there. It's about the questions that we ask and how we ask that matters. What I'm doing a lot is I'm actually reading a lot. GitHub Copilot obviously has this developer inner loop experience, but there's also the outer loop experience called Padawan. It's similar to Cognition's Devin, and there are others out there that do that. What I'm using is to have it generate, it's kind of like a deep research of your code base, when you go to a new code base and have it generate Mermaid diagrams and stuff, and really have a systems thinking approach. I'm using my Kindle a lot more now because I'm reading and understanding what Hien and Karthik are saying, so that I can ask the better question and pass the context that is needed to solve the very focused problem or a use case that needs to be implemented. We're also going to launch SRE agent, and there's a bunch of these agentic experiences that will augment you.
Govind Kamtamneni：是的，百分之百。我只想补充一点，答案都在那里。重要的是我们提出的问题以及我们如何提出问题。我做了很多事情，实际上我读了很多书。GitHub Copilot 显然具有这种开发人员内部循环体验，但也有称为 Padawan 的外部循环体验。它类似于 Cognition 的 Devin，还有其他人这样做。我使用的是让它生成，这有点像对你的代码库进行深入研究，当你去一个新的代码库，让它生成 Mermaid 图和其他东西，并真正拥有一种系统思考方法。我现在更多地使用我的 Kindle，因为我正在阅读和理解 Hien 和 Karthik 所说的内容，这样我就可以提出更好的问题，并传递解决非常集中的问题或需要实现的用例所需的上下文。我们还将推出 SRE 代理，其中有一系列代理体验可以增强您的体验。

At the same time, the onus is on the human to ultimately, again, I think it all goes back to ask the right question because the answers are all there. These models, like o3 high reasoning, if you think about just the benchmark, like you are interacting with a polymath that is one of the smartest coder out there. They're saturating all kinds of coding benchmarks, and it's only going to get better. Ultimately, it's up to you to know that user need that you're solving and mapping it throughout the SDLC process and leveraging these models throughout the process.
与此同时，最终，我认为这一切都回到了提出正确的问题上，因为答案都在那里。这些模型，就像 o3 高推理一样，如果你只考虑基准测试，就像你正在与一个博学家互动，而博学家是最聪明的程序员之一。他们正在使各种编码基准饱和，而且它只会变得更好。最终，您需要知道用户需求，并在整个 SDLC 流程中解决和映射它，并在整个过程中利用这些模型。

Karthik Ramgopal: I want to give a very crude example. When IDEs came about, we saw a transformation in the development process with respect to how people were coding inside the text editors. You had the ability to do structured find and replace. You had the ability to open files side by side. You had syntax highlighting. Developer productivity went up so much. You had debugging within the IDE. This is similar. It's just an additional tool, which gives you even more power over your code. Don't look only at code. Code is an important critical aspect of the SDLC, but there are a lot of other aspects also. There are AI tools which help you automate end-to-end, as Govind and Hien pointed out.
Karthik Ramgopal：我想举一个非常粗略的例子。当 IDE 出现时，我们看到了开发过程中人们在文本编辑器中编码方式的转变。您有能力进行结构化的查找和替换。您可以并排打开文件。你有语法高亮显示。开发人员的工作效率大大提高。您在 IDE 中进行了调试。这是类似的。它只是一个额外的工具，它为您提供了对代码的更多控制。不要只看代码。代码是 SDLC 的一个重要关键方面，但还有很多其他方面。正如 Govind 和 Hien 所指出的那样，有一些 AI 工具可以帮助您实现端到端自动化。

Govind Kamtamneni: I was going to add one quote by Sam Altman. I think it's really catchy. It's like, "Don't be a collector of facts, be a connector of dots".
Govind Kamtamneni：我本来想补充一句 Sam Altman 的一句话。我觉得这真的很吸引人。这就像，“不要成为事实的收集者，要成为点的连接器”。

Srini Penchikala: The facts can be relative. No, I agree with you also. I'm definitely more interested in the shift left side of the SDLC process. Automatically writing tests for the code is important, but how much of that code is really relevant to your requirements, really relevant to your design? Again, we can use AI agents to solve problems the right way, and also, we can use them to solve the right problems. I think it goes both ways. Govind, I'm looking forward to all those different products you mentioned about. Padawan sounds like a Star Trek connection there, for the outer loop experience.
Srini Penchikala：事实可能是相对的。不，我也同意你的看法。我肯定对 SDLC 流程的左移更感兴趣。自动为代码编写测试很重要，但有多少代码真正与您的需求相关，真正与您的设计相关？同样，我们可以使用 AI 代理以正确的方式解决问题，而且，我们可以使用它们来解决正确的问题。我认为这是双向的。Govind，我期待着您提到的所有这些不同的产品。Padawan 听起来像是星际迷航的联系，用于外循环体验。

Accuracy, Hallucination, and Bias with AI Agents
AI 代理的准确性、幻觉和偏差

One thing we've been concerned about AI programs in general is the accuracy of their output, because they have their own concerns. The accuracy of the output and the hallucinations. Now we bring in the AI agents into the mix and automate as many tasks as possible, try to let the agents do what humans have been doing. How do you see this whole accuracy and the hallucination and biases space transforming with the agents? How worse will it get?
一般来说，我们一直担心 AI 程序输出的准确性，因为它们有自己的担忧。输出的准确性和幻觉。现在，我们将 AI 代理引入其中，并自动执行尽可能多的任务，尝试让代理执行人类一直在做的事情。你如何看待这种整体准确性以及幻觉和偏见空间随着代理的变化而变化？情况会变得更糟？

Hien Luu: In a short answer, it's facts of life, so we have to deal with that. This uncertainty comes to us since day one of building an AI system already. There are all these undeterministics, they give you probabilities of an answer of a prediction, whether it's spam or not spam. There's a measure that we've been taking, but it's more exaggerated with the hallucination, the natural language aspect of it's still more challenging than a numeric value. Definitely, it's a challenge. A lot of enterprises are concerned if they want to apply this into their real-world use cases, where the consequences are not just content generation, but it could impact their user or whatever that's causing damages. Definitely a big concern for enterprises. I think there's studies after study why enterprises are way behind in terms of adopting these technologies.
Hien Luu：简短的回答是，这是生活的事实，所以我们必须处理这个问题。这种不确定性从构建 AI 系统的第一天起就已经出现在我们面前。有所有这些不确定因素，它们给你预测答案的概率，无论它是不是垃圾邮件。我们一直在采取一个衡量标准，但它更夸张于幻觉，它的自然语言方面仍然比数值更具挑战性。毫无疑问，这是一个挑战。许多企业都担心他们是否想将其应用于实际用例中，因为其后果不仅仅是内容生成，还可能影响他们的用户或任何造成损害的因素。绝对是企业关心的一大问题。我认为，有一项又一项的研究研究为什么企业在采用这些技术方面远远落后。

In general, I think these LLMs are getting smarter. These AI frontier labs, they spend a lot of effort in improving or reducing hallucination, but nevertheless, there's still that. The question is, what can we do as we build these agentic systems? At first, you understand, for your particular use case or use cases, the hallucinations, what the cause of those might be. If you're building a very domain specific agentic system and using the models and maybe those models were not trained with your domain specific area, that may be an indication that you might have to do something with that in terms of the knowledge cutoff or the limit of it. Understanding what the underlying causes might be for your use cases is the first thing to do.
总的来说，我认为这些 LLM 变得越来越聪明。这些 AI 前沿实验室，他们花费了大量精力来改善或减少幻觉，但尽管如此，仍然存在。问题是，当我们构建这些代理系统时，我们能做什么？首先，您了解对于您的特定用例或用例，幻觉，以及这些幻觉的原因可能是什么。如果你正在构建一个非常特定于领域的代理系统并使用这些模型，并且这些模型可能没有针对你的领域特定区域进行训练，这可能表明你可能不得不在知识截止或限制方面做些什么。首先要做的是了解您的使用案例的根本原因。

In terms of what actions, or methods, or strategy that you can employ, I think there's sets of good practices that are out there now. Start with grounding. I think it's a very common technique now with grounding in terms of the context. Go back to what Govind said, it's all about the context. Grounding the relevant information and stuff like that. That's why RAG has become very powerful because of the ability to ground with relevant content in the prompting context. This is something that a lot of people don't spend a whole lot of effort in.
就你可以采用的行动、方法或策略而言，我认为现在已经有一套好的做法。从接地开始。我认为这是现在一种非常常见的技术，就上下文而言。回到 Govind 所说的，这完全是关于上下文的。为相关信息和类似的东西奠定基础。这就是为什么 RAG 变得非常强大的原因，因为它能够在提示上下文中使用相关内容。这是很多人不会花很多精力的事情。

At the end of the day, it's like we're interacting with LLMs through prompting, and what we say in the prompt matters. Well-crafted prompt engineering is still extremely valuable and relevant to help with these kinds of challenges. Be specific with all the other stuff that good prompt engineering practices. Other things you can do outside of that is the guardrails we talked about earlier that people mentioned. Human in loop too, that's another aspect that you can build into your system at a proper time to involve the human when the responses seem suspicious and doesn't pass the smell check thing.
归根结底，这就像我们通过提示与 LLM 交互，我们在提示中说的内容很重要。精心设计的提示工程仍然非常有价值，并且与帮助应对此类挑战相关。具体说明所有其他方面，这些内容会促进良好的工程实践。除此之外，您可以做的其他事情是我们之前谈到的人们提到的护栏。这也是 Human in loop 的另一个方面，您可以在适当的时间构建到您的系统中，以便在响应看起来可疑且未通过气味检查时让人类参与进来。

Then, evaluation, evaluation, evaluation. It's all out there now. Are actually people doing it or not? That's a different question. Building evaluation requires a lot of upfront investments. I think you want to do that iteratively and incrementally as well. It's not something that you probably can come up with a whole set then you're done. It's something that needs to be dealt with in an incremental manner. At the end of the day, there's techniques to help with reducing, but eliminating, I think we're not there yet, as far as my understanding goes. I'd love to hear experiences from Govind and Karthik of actually working with their customers and building AI agentic systems at LinkedIn.
然后，评估、评估、评估。现在一切都在那里。真的有人在做吗？这是一个不同的问题。构建评估需要大量的前期投资。我认为您也希望以迭代和增量的方式执行此作。这不是你可能能想出一整套然后你就完成了的事情。这是需要以渐进的方式处理的事情。归根结底，有一些技术可以帮助减少，但就我的理解而言，我认为我们还没有达到那个水平。我很想听听 Govind 和 Karthik 在 LinkedIn 上与客户实际合作并构建 AI 代理系统的经验。

Karthik Ramgopal: I can share some of the other techniques which we use. The first is this technique called critique loops, where effectively you have another observer critique the response, using the outcomes, and see if it meets the smell test of those outcomes. Again, this is a technique you use in evals as well.
Karthik Ramgopal：我可以分享我们使用的其他一些技术。第一种是这种称为批评循环的技术，您可以有效地让另一个观察者使用结果来批评响应，并查看它是否满足这些结果的气味测试。同样，这也是您在 evals 中使用的一种技术。

At runtime, though, you cannot use this very heavily. You cannot put a powerful model. You cannot have a very complex critique logic, because, again, it's going to consume compute cycles and latency. There's always this tradeoff between quality and latency and compute costs. In your offline evals, you can actually put a more powerful model, in order for you to actually evaluate the responses of your system and understand where it failed. I think there was a question about, how do I do evaluation? Should I do random sampling? Random sampling is a good starter, but it first starts with the definition of accuracy. Do you have an objective definition of what being accurate means? Which is actually quite hard in these non-deterministic systems to get a comprehensive definition of that.
但是，在运行时，您不能大量使用它。你不能放一个强大的模型。你不能有一个非常复杂的批评逻辑，因为它会消耗计算周期和延迟。质量与延迟和计算成本之间总是存在这种权衡。在离线评估中，您实际上可以放置一个更强大的模型，以便您实际评估系统的响应并了解它失败的地方。我认为有一个问题，我该如何进行评估？我应该进行随机抽样吗？随机抽样是一个很好的起点，但它首先要从准确性的定义开始。您对准确意味着什么有客观的定义吗？实际上，在这些非确定性系统中，要得到它的全面定义是相当困难的。

Once you have that, then you can decide how to pick and choose, because, ideally, you do not want to do random. You want to get a representative variation of responses and ensure that you did well across them. Again, that requires some analysis of your data itself. Something which a lot of folks do is they capture traces. Of course, they anonymize these traces to ensure that personal information is not emitted in whatever way possible. After that, they feed these traces offline to their more powerful model, which then runs the eval and tries to figure out what to do. The other interesting technique, which can be used is that sometimes you really do not need AI, as I said before, because you do not want reasoning.
一旦你有了这个，那么你就可以决定如何挑选和选择，因为，理想情况下，你不想做随机。您希望获得具有代表性的响应变体，并确保您在它们中表现良好。同样，这需要对数据本身进行一些分析。很多人做的事情是捕获痕迹。当然，他们会对这些痕迹进行匿名化处理，以确保个人信息不会以任何可能的方式发出。之后，他们将这些跟踪离线馈送到更强大的模型，然后该模型运行 eval 并尝试弄清楚该怎么做。另一个可以使用的有趣技术是，正如我之前所说，有时你真的不需要 AI，因为你不需要推理。

In those cases, don't use AI or use a less powerful model. For example, for a bunch of classification tasks, you could use a much simpler AI model. You don't need an LLM. Of course, it's easier to do with an LLM, but is it the most efficient? Is it the most reliable? Probably not. It's going to be cheapest model. In some cases, you can just fall back to business logic. Last but not the least, and I say this very carefully, sometimes, when all else fails, you can also apply various kinds of fine-tuning techniques in order for you to create fine-tuned models. I say when all else fails because people prematurely jump to it without trying all they can do with prompt engineering, and RAG, and the right systems architecture, because it's expensive, since the foundation models keep advancing.
在这些情况下，请勿使用 AI 或使用功能较弱的模型。例如，对于一堆分类任务，您可以使用更简单的 AI 模型。你不需要 LLM。当然，使用 LLM 更容易，但它是最有效的吗？它是最可靠的吗？可能不是。这将是最便宜的型号。在某些情况下，您可以回退到业务逻辑。最后但并非最不重要的一点是，我非常谨慎地说，有时，当所有其他方法都失败时，您还可以应用各种微调技术来创建微调模型。我说当所有其他方法都失败时，因为人们过早地跳到它上面，而没有尝试他们能做的所有事情，即快速工程、RAG 和正确的系统架构，因为它很昂贵，因为基础模型不断进步。

As long as you have a fine-tuned model, you have to maintain it. You have to ensure that when the task specification changes, you haven't had loss in generalization, which results in worse performance. Pick your poison carefully, but that is also a technique which is useful in some cases.
只要你有一个微调的模型，你就必须维护它。您必须确保在任务规范更改时，没有泛化损失，这会导致性能变差。仔细选择你的毒药，但这也是一种在某些情况下有用的技术。

Govind Kamtamneni: Prompt rewrite is another one that is pretty much baked in now, for example, in our search service. Things like that can help a lot.
Govind Kamtamneni：Prompt rewrite 是另一个现在几乎已经融入其中的功能，例如，在我们的搜索服务中。这样的事情会有很大帮助。

Srini Penchikala: Yes, definitely. I agree with you all. Like you all mentioned, with AI, there's a lot more options available. As a developer, as an end user, I would like to have more options that I can pick from rather than fewer options. With more options comes more evaluation and more discipline. That's what it is.
Srini Penchikala：是的，当然。我同意你们所有人的看法。就像你们都提到的，有了 AI，有更多的选择。作为开发人员和最终用户，我希望有更多的选项可供我选择，而不是更少的选项。更多的选择带来了更多的评估和更多的纪律。就是这样。

Model Context Protocol, (MCP) and the Evolution of AI Solutions
模型上下文协议（MCP）和 AI 解决方案的演变

Regarding the next topic, we can jump into probably the biggest recent development in this space, the Model Context Protocol, MCP. I know I've been seeing several publications, articles on a daily basis on this. They claim it's an open protocol that standardizes how applications provide context to LLMs. They also say that it will help you build agents and complex workflows on top of LLMs. That's the definition on their website. Can you share your experience on how you see this MCP, where it fits, overall, in the evolution of AI solutions? How can it help? What can we use it for? Where do you see this going? I think it's a good standard that we have now, but like any standard, it will probably have to evolve.
关于下一个主题，我们可以跳转到该领域可能最新的最大发展，即模型上下文协议 MCP。我知道我每天都会看到一些关于这方面的出版物和文章。他们声称它是一个开放协议，标准化了应用程序如何为 LLM 提供上下文。他们还表示，它将帮助您在 LLM 之上构建代理和复杂的工作流程。这是他们网站上的定义。您能否分享一下您如何看待这个 MCP，以及它在 AI 解决方案发展中的整体位置？它如何提供帮助？我们可以用它来做什么？您认为这将走向何方？我认为我们现在有一个很好的标准，但就像任何标准一样，它可能必须不断发展。

Govind Kamtamneni: Actually, let's think about a world without any standards or protocols, like how things are developed actually right now. If you have an agentic system, let's pick user onboarding just for the sake of it. Let's say you're building a Compound AI system that does user onboarding and hopefully has some agency in some of the workflow tasks there. Let's say a new employee joins, the developer that is developing this orchestration layer has to understand SuccessFactors, or ServiceNow APIs, and deterministically code according to those API specs. Also, the day two aspect of that, handle the day two aspect.
Govind Kamtamneni：实际上，让我们想一想一个没有任何标准或协议的世界，就像现在事物的实际发展方式一样。如果您有一个代理系统，那么让我们为了它而选择用户引导。假设您正在构建一个 Compound AI 系统，该系统执行用户引导，并希望在一些工作流任务中具有一些代理权。假设有新员工加入，开发此编排层的开发人员必须了解 SuccessFactors 或 ServiceNow API，并根据这些 API 规范确定性地编码。此外，第二天的方面，处理第二天的方面。

If ServiceNow changes their interface, this developer has to change this orchestration layer. If you think about employee onboarding, maybe you first have to create a new employee record in SuccessFactors, or something like that. Then you have to maybe issue a laptop or something like that. That could be a ticket in ServiceNow with all the details. Then maybe even notify the manager or get manager's approval and things like that. That could be a Team's message or something like that. You need to know all these API specs and maintain them and all that stuff. I think going forward, it's becoming clear that we're not just developing systems for humans, but actually for AI agents to consume. If all these systems instead had some standard that they could all conform with a standard protocol and hopefully a standard transport layer. That's what MCP is basically proposing. Anthropic started it. It's fair to say that everyone is embracing it. Let's see how it goes because the governance layer is still shaky.
如果 ServiceNow 更改了他们的接口，则此开发人员必须更改此编排层。如果您考虑员工入职，也许您首先必须在 SuccessFactors 中创建新的员工记录，或者类似的事情。然后你也许必须发行一台笔记本电脑或类似的东西。这可能是 ServiceNow 中的一张包含所有详细信息的工单。然后甚至可能通知经理或获得经理的批准等等。这可能是一个团队的信息或类似的东西。您需要了解所有这些 API 规范并维护它们以及所有这些内容。我认为，展望未来，越来越明显的是，我们不仅仅是为人类开发系统，而是实际为 AI 代理开发供消费的系统。如果所有这些系统都有一些标准，那么它们都可以符合标准协议，并希望符合标准传输层。这就是 MCP 的基本提议。Anthropic 启动了它。公平地说，每个人都在接受它。让我们看看情况如何，因为治理层仍然不稳定。

For now, at least everyone is building essentially a simple facade proxy on their existing SuccessFactors, or ServiceNow, or GitHub, this MCP server. The host of MCP, let's pick VS Code, for example, or GitHub Copilot, or this onboarding agentic system that could then work with these MCP servers and they have to follow the standard. The client developer now doesn't have to know SuccessFactors' idiosyncratic APIs, and implement them and maintain them. It's the same MCP standard for all these other systems. Ultimately for these models, again, they're very great at reasoning, but for them to be economically useful, they have to work with a lot of systems in the real world, for example, this onboarding workflow, to be successful. It has to integrate with all these systems. I think going forward it's great that we have some standard, finally, at least for now.
就目前而言，至少每个人都在他们现有的 SuccessFactors、ServiceNow 或 GitHub（这个 MCP 服务器）上构建一个简单的门面代理。MCP 的主机，例如，让我们选择 VS Code，或者 GitHub Copilot，或者这个入职代理系统，然后可以与这些 MCP 服务器一起使用，并且它们必须遵循标准。客户端开发人员现在不必了解 SuccessFactors 的特殊 API，也不必实施和维护它们。对于所有这些其他系统，它都是相同的 MCP 标准。归根结底，对于这些模型来说，它们同样非常擅长推理，但要使它们在经济上有用，它们必须与现实世界中的许多系统一起工作，例如，这个入门工作流程，才能成功。它必须与所有这些系统集成。我认为展望未来，我们终于有一些标准了，至少现在是这样，这很好。

Then, of course, there's also Agent-to-Agent standard that is emerging. There's the new standard I was just reading that is a superset of Agent-to-Agent, I think it's called NANDA. We'll see which one wins out. It's good that the industry is rallying. This is nothing new. We had in the past, obviously we're communicating over HTTP. That was a standard established by the Internet Foundation or something. There's CNCF. Kubernetes is pretty much the standard orchestration layer now. I think standards bodies are good. It was just a matter of time, especially a platform shift this big as, I think it's bigger than the internet. Obviously, this is a standard for agents to talk with resources and tools.
当然，还有正在兴起的 Agent-to-Agent 标准。我刚刚读到的新标准是 Agent-to-Agent 的超集，我认为它被称为 NANDA。我们看看哪一个胜出。该行业正在团结起来是件好事。这并不是什么新鲜事。我们过去有，显然我们是通过 HTTP 进行通信的。那是互联网基金会（Internet Foundation）或其他机构制定的标准。有 CNCF。Kubernetes 现在几乎是标准的编排层。我认为标准机构是好的。这只是时间问题，尤其是如此大的平台转变，我认为它比互联网更大。显然，这是代理与资源和工具交谈的标准。

Primarily, there's also prompts and other things there. There's also hopefully a standard that'll emerge where agents can talk to other agents, have a registry. That's what A2A is trying to do. Then, hopefully, this other swarm of agents, a new standard by MIT, that's also out there. There's a lot there, but it's where we are right now. I do think it's an evolving space.
首先，那里还有提示和其他东西。还希望会出现一个标准，让代理人可以与其他代理人交谈，拥有一个登记册。这就是 A2A 正在努力做的事情。然后，希望这另一群代理，麻省理工学院的新标准，也在那里。那里有很多东西，但这就是我们现在所处的位置。我确实认为这是一个不断发展的领域。

Karthik Ramgopal: I think MCP is still an evolving standard. It's a great way in order to connect externally. It still has some gaps primarily around security, authentication, authorization, which are pretty critical, which is why we haven't deployed it internally yet. I'm sure it's just a matter of time before the community solves these problems. I think the important thing to note here is that MCP is primarily a protocol for calling tools. It isn't a protocol for Agent-to-Agent communication, because Agent-to-Agent isn't a synchronous RPC, or even a streaming RPC. It's way more complicated. You have asynchronous handoffs, you have human in the loops, you have multi-interaction patterns, which is why the protocols like A2A and agency and the other emerging ones, I think will be better fits. Right now, amidst the hype, everyone is trying to force fit MCP everywhere. Be thoughtful about where you use it and where you don't.
Karthik Ramgopal：我认为 MCP 仍然是一个不断发展的标准。这是与外部联系的好方法。它仍然存在一些差距，主要是在安全性、身份验证、授权方面，这些都非常关键，这就是为什么我们还没有在内部部署它的原因。我敢肯定，社区解决这些问题只是时间问题。我认为这里需要注意的重要一点是，MCP 主要是一种调用工具的协议。它不是代理到代理通信的协议，因为代理到代理不是同步 RPC，甚至不是流式 RPC。这要复杂得多。你有异步交接，你有人工在循环中，你有多交互模式，这就是为什么像 A2A 和 agency 和其他新兴协议，我认为会更合适。现在，在炒作中，每个人都试图在任何地方强制适应 MCP。考虑一下你在哪些地方使用它，在哪里不使用它。

Hien Luu: It makes a lot of sense. I think everybody agrees that it has its own place for tool use. If that becomes a standard, then it opens up a whole slew of, make it easy to integrate and use tool use. An example that came to mind, something in the past, like we have something called REST, a protocol for dealing with HTTP, but this is specifically for LLMs. Let's see where it's going. There's a lot of explosion of MCP servers out there that I see. Let's see where it goes. I think it makes a lot of sense.
Hien Luu：这很有意义。我想每个人都同意它有自己的工具使用场所。如果这成为标准，那么它就会开辟一大堆，使其易于集成和使用工具使用。我想到的一个例子是过去的事，比如我们有一个叫做 REST 的东西，这是一种处理 HTTP 的协议，但这是专门用于 LLM 的。让我们看看它会走向何方。我看到 MCP 服务器的数量激增。让我们看看它会走向何方。我认为这很有意义。

The Future of AI Agents - Looking into the Crystal Ball
AI 代理的未来 - 深入了解 Crystal Ball

Srini Penchikala: To conclude this webinar, I would like to put you all on the spot. What's a prediction that you think will happen in the next 12 months? If we were to have a similar discussion 12 months from now, what do you think we should be excited about?
Srini Penchikala：在本次网络研讨会的结束之前，我想让大家都亲临现场。您认为未来 12 个月会发生什么预测是什么？如果我们在 12 个月后进行类似的讨论，您认为我们应该为什么感到兴奋？

Karthik Ramgopal: I am feeling like an oracle today, so I will predict one thing. I think that the transformer architecture and LxMs in general will start getting increasingly applied to traditional relevance surfaces like search and recommendation systems, because right now the cost curve as well as the technological advancement is at a point where it is starting to become feasible. I think we will see an improvement in quality of these surfaces as well, apart from the agentic applications. That's my prediction.
Karthik Ramgopal：我今天感觉自己像个神谕，所以我会预测一件事。我认为 transformer 架构和 LxMs 总体上将开始越来越多地应用于传统的相关性表面，如搜索和推荐系统，因为现在成本曲线和技术进步正处于开始变得可行的点。我认为除了代理应用程序之外，我们还将看到这些表面的质量有所提高。这是我的预测。

Govind Kamtamneni: I have this AGI thing up there. I take Satya's approach there, which is, if it can be economically useful, and I think his number is $100 billion of economic value, then it's AGI. Because we see a lot of benchmarks, they're absolutely saturating. At the end of the day, can it improve human well-being? That is GDP. If we can start, whether it's MCP or whatever, start actually giving actuators to these reasoners. Hopefully, again, the ultimate goal is output per human going up. We're starting to see that. Cursor was the first product - and I'll say that even as a competitor - to hit $100 million ARR. The fastest ever to $100 million recurring revenue. Hopefully, we'll see it in other domains, not just software engineering. Then, human well-being is actually improving with everything we're doing. Hopefully, that'll happen in the next two years.
Govind Kamtamneni：我有这个 AGI 的东西。我采用 Satya 的方法，即，如果它在经济上有用，并且我认为他的数字是 1000 亿美元的经济价值，那么它就是 AGI。因为我们看到了很多基准测试，所以它们绝对是饱和的。归根结底，它能改善人类福祉吗？那就是 GDP。如果我们可以开始，无论是 MCP 还是其他什么，开始真正地为这些推理者提供执行器。希望最终目标是提高人均产量。我们开始看到这一点。Cursor 是第一款达到 1 亿美元 ARR 的产品 - 即使作为竞争对手也是如此。有史以来最快的 1 亿美元经常性收入。希望我们能在其他领域看到它，而不仅仅是软件工程。然后，人类的福祉实际上会随着我们所做的一切而改善。希望这能在未来两年内发生。

Hien Luu: This is probably less of a prediction, but it's something I would love to see, especially in the enterprise, how AI agents are being applied in enterprise scenarios. We'd love to see more practical use cases being out there and see how that really works out in the enterprise. I think the part that's exciting is about Agent-to-Agent, multi-agent systems. There's a lot of discussion about that. It seems pretty fascinating. You get these agents talking to each other and they do their own things. I would love to see how that manifests into real-world useful use cases as well. Hopefully we'll see those in the next 12 months.
Hien Luu：这可能不是一个预测，但这是我希望看到的，尤其是在企业中，AI 代理如何在企业场景中应用。我们希望看到更多实际用例的出现，并了解它们在企业中的真正效果。我认为令人兴奋的部分是关于 Agent-to-Agent、多代理系统。关于这一点有很多讨论。这似乎很有趣。你可以让这些经纪人相互交谈，他们做自己的事情。我也想看看这如何体现在现实世界的有用用例中。希望我们能在未来 12 个月内看到这些。

Srini Penchikala: I'm kind of the same way. I think these agents will help us have less thrashing of lives at work and in personal lives, so we can focus on more important things, whatever they mean, to enjoy our lives and also help the community.
Srini Penchikala：我也有同感。我认为这些代理人将帮助我们减少工作和个人生活中的烦恼，这样我们就可以专注于更重要的事情，无论它们意味着什么，都能享受我们的生活并帮助社区。

Govind Kamtamneni: I just wanted to add, shared prosperity. Wherever it happens, people should not be afraid. There should be a positive outcome for everyone.
Govind Kamtamneni：我只是想补充一点，共享繁荣。无论它发生在何处，人们都不应该害怕。每个人都应该有一个积极的结果。

Srini Penchikala: To quote the Spock.
Srini Penchikala：引用 Spock 的话。

See more presentations with transcripts
查看更多带有文字记录的演示文稿

Recorded at: 录制于：

Jul 09, 2025 7月 09， 2025

Govind Kamtamneni 戈文德·卡姆塔姆内尼
Hien Luu
Sr. Engineering Manager @Zoox | Author of MLOps with Ray
高级工程经理 @Zoox |MLOps with Ray 的作者
Karthik Ramgopal
Srini Penchikala
Senior Software Architect

Login with: 登录方式：

Don't have an InfoQ account?还没有 InfoQ 帐户？

AI Agents & LLMs: Scaling the Next Wave of AutomationAI Agents & LLMs：扩展自动化的下一波

Summary 总结

Bio

About the conference 关于会议

INFOQ EVENTS INFOQ 活动

Transcript 抄本

What's an AI Agent? 什么是 AI 代理？

Agentic AI vs. AI Agents: What's the Difference? 代理 AI 与 AI 代理：有什么区别？

Agentic AI Use Cases: Overkill or Fit-for-Purpose? 代理 AI 用例：矫枉过正还是适合目的？

The Architectures of Agentic AI (Technical Details) 代理 AI 的架构（技术详细信息）

Leveraging AI Agents in the SDLC (Software Development Life Cycle) 在 SDLC（软件开发生命周期）中利用 AI 代理

Accuracy, Hallucination, and Bias with AI Agents AI 代理的准确性、幻觉和偏差

Model Context Protocol, (MCP) and the Evolution of AI Solutions 模型上下文协议 （MCP） 和 AI 解决方案的演变

The Future of AI Agents - Looking into the Crystal Ball AI 代理的未来 - 深入了解 Crystal Ball

Related Sponsored Content相关赞助内容

Related Sponsor 相关赞助商

Related Sponsored Content相关赞助内容

Related Sponsored Content相关赞助内容

Related Sponsored Content相关赞助内容

Related Sponsored Content相关赞助内容

Related Sponsored Content相关赞助内容

Related Sponsored Content相关赞助内容

This content is in the AI, ML & Data Engineering topic 此内容在 AI， ML & Data Engineering 主题中。

Related Topics: 相关主题：

Related Editorial 相关社论

Popular across InfoQ 在 INFOQ 上流行

Don't have an InfoQ account?
还没有 InfoQ 帐户？

AI Agents & LLMs: Scaling the Next Wave of Automation
AI Agents & LLMs：扩展自动化的下一波

What's an AI Agent?
什么是 AI 代理？

Agentic AI vs. AI Agents: What's the Difference?
代理 AI 与 AI 代理：有什么区别？

Agentic AI Use Cases: Overkill or Fit-for-Purpose?
代理 AI 用例：矫枉过正还是适合目的？

The Architectures of Agentic AI (Technical Details)
代理 AI 的架构（技术详细信息）

Leveraging AI Agents in the SDLC (Software Development Life Cycle)
在 SDLC（软件开发生命周期）中利用 AI 代理

Accuracy, Hallucination, and Bias with AI Agents
AI 代理的准确性、幻觉和偏差

Model Context Protocol, (MCP) and the Evolution of AI Solutions
模型上下文协议（MCP）和 AI 解决方案的演变

The Future of AI Agents - Looking into the Crystal Ball
AI 代理的未来 - 深入了解 Crystal Ball

Related Sponsored Content
相关赞助内容

Related Sponsored Content
相关赞助内容

Related Sponsored Content
相关赞助内容

Related Sponsored Content
相关赞助内容

Related Sponsored Content
相关赞助内容

Related Sponsored Content
相关赞助内容

Related Sponsored Content
相关赞助内容

This content is in the AI, ML & Data Engineering topic
此内容在 AI， ML & Data Engineering 主题中。