這是用戶在 2024-11-25 13:53 為 https://www.turingpost.com/p/aia5?_bhlid=fd11ea4a92c46e5ede358f9f5c42d26d38ea12f9 保存的雙語快照頁面,由 沉浸式翻譯 提供雙語支持。了解如何保存?
  • Turing Post
  • Posts
  • 🦸🏻#5: Building Blocks of Agentic Systems

🦸🏻#5: Building Blocks of Agentic Systems

What powers an AI agent?

Intro

We took an unconventional approach to launching our Agentic Workflow series, beginning with open-endedness – a concept that lays the foundation for understanding the dynamic potential of agentic systems. From there, we introduced the essential vocabulary, along with real and potential examples of agents, to build a solid base. Now, it’s time to dive into the building blocks of agentic systems and explore the core components that bring these systems to life. What does it take to "wire together" an agent to make it work?

In today’s episode:

  1. Different frameworks for agentic systems

  2. Critical components (Profiling, Knowledge, Memory, Reasoning&Planning, Reflections, Actions)

  3. Human-AI communication as a trend (review of “Mutual Theory of Mind for Human-AI Communication”)

  4. Conclusion

  5. Bonus: Resources to diving deeper

In their early days, agents were more like isolated bots, each designed for a narrow set of tasks. Fast forward to today, and it’s all about creating interconnected autonomous systems that fully leverage the ever-growing capabilities of AI.

As a quick reminder, autonomous agents are entities that perceive their environment, make decisions, and act to achieve goals. These agents vary in their learning capability, physicality, specialization, and task complexity.

When it comes to describing the core components of these systems, there are multiple approaches. For instance, some frameworks are more detailed, like this one:

Image Credit: User Behavior Simulation with Large Language Model based Agents by L. Wang et al.

Others, like LangChain’s Harrison Chase, take a more schematic and simplified approach:

Image Credit: Harrison Chase

Regardless of the framework, successful implementation of AI agents boils down to a few critical components →

Profiling

Profiling is where an agent is assigned roles that shape its behavior. It defines the agent’s identity, objectives, and boundaries, setting guidelines for interactions with users and systems. By adapting to specific tasks or user preferences, profiling ensures the agent stays aligned with its purpose.

For example, a "Coder Agent" can handle repetitive programming tasks, debug code, or create new scripts, similar to GitHub Copilot. A "Content Creator Agent" might draft articles or design graphics, while a "Project Manager Agent" could prioritize tasks and manage workflows. These roles help agents focus on their domains, improving efficiency and easing workloads for users.

Knowledge

The agent’s knowledge provides domain-specific expertise, helping it understand tasks and make decisions based on factual data. This is often achieved using pre-trained AI models, structured knowledge bases, and mechanisms for continuous learning.

For example, models like GPT or Llama enable natural language understanding, as seen in tools like IBM Watson Health. Knowledge bases offer organized references, such as legal databases for case law. Continuous learning allows agents to adapt and stay relevant, ensuring they handle industry-specific tasks effectively.

Memory

At first glance, you might think memory is simply part of knowledge. While they are closely related, memory is a distinct component with its own research foundation. An agent’s knowledge base typically consists of semantic memory – general facts, concepts, and rules about the world, as well as instructions for handling queries.

Memory, on the other hand, extends beyond semantic knowledge. It includes the ability to store and retrieve interaction-specific data, such as user input from previous queries, past experiences, and their outcomes. This allows agents to adapt and improve over time. Memory systems, encompassing short-term, long-term, episodic, and semantic components, enable agents to retain and reuse information from past interactions.

Effective memory implementation ensures that critical data is saved and accessible, empowering agents to inform current decisions with past insights, maintain continuity in conversations, and enhance user interactions.

Reasoning and planning

To effectively navigate tasks and achieve goals, an agent relies on its reasoning and planning capabilities. This involves breaking down tasks (task decomposition), analyzing them to identify the best course of action, and orchestrating the necessary steps toward success. By applying logical reasoning, the agent uses AI algorithms and heuristics to tackle complex situations, facilitating problem-solving, task decomposition, and strategic planning. Goal management plays a critical role, allowing the agent to set, prioritize, and adapt goals based on importance and feasibility.

Reasoning and decision-making are integral to an agent’s functionality, where logical rules and algorithms help draw conclusions and make decisions based on its knowledge base. Before responding to a query, the agent generates a sequence of actions to ensure a reliable outcome. Planning techniques such as reflection, self-critique, chain-of-thought reasoning, and subgoal decomposition are often employed, enabling agents to operate with precision and adaptability.

Reflection

Although we mentioned reflection as a technique in the previous section, Andrew Ng highlights its importance as a standalone category in agentic workflows. He identifies it as one of four key design patterns poised to drive major progress in AI workflows this year and the next. Here’s why:

Reflection enables agents to process feedback and learn from their experiences. It’s a powerful yet straightforward design pattern that boosts performance by allowing models to critique and refine their outputs. Instead of depending solely on user feedback, the model evaluates its responses critically, identifies improvements, and revises accordingly. For instance, when generating code, the model can check its output for correctness, style, and efficiency, then refine it to produce better results. Iterative reflection often leads to significant enhancements.

Beyond self-reflection, external tools like unit tests or web searches can further validate outputs, helping models identify and resolve errors. Multi-agent frameworks take this concept even further – one agent generates content, while another critiques it, enabling collaborative improvement. Reflection has consistently shown its value across various tasks, including coding, writing, and question answering.

Actions

Finally, the agent's ability to take actions bridges its internal reasoning with the external world, enabling it to achieve its goals through precise execution. This involves function calling as a core mechanism, where the agent interacts with APIs, software, or hardware to perform tasks, seamlessly integrating with external tools or services. By invoking these functions, the agent can communicate with users, systems, or other agents, while following through on its planned steps to complete tasks. Effective action involves calling appropriate methods, accessing services or databases, and monitoring outcomes to guide future decisions, ensuring the agent operates efficiently and effectively.

Connecting these components requires a unified framework that supports all aspects of the agent's functionality. Seamless data exchange between components ensures that, for example, memory feeds into reasoning, and reflection informs future planning. Building components as modular units allows for independent updates and improvements. The operational flow typically starts with profiling to define the agent's role, followed by knowledge to provide foundational information. Memory retains experiences, reasoning and planning devise strategies, actions execute plans, and reflection evaluates outcomes to influence future reasoning and behavior.

Human-AI Communication

Often overlooked in agentic architecture, communication plays a crucial role in connecting agents with their environments. As AI agents become more integrated into our daily lives, the field of Human-AI interaction is soon to be on everybody’s mind.

An intriguing perspective on this topic comes from the paper Mutual Theory of Mind for Human-AI Communication. Researchers from Georgia Tech introduce the Mutual Theory of Mind (MToM) framework, offering a novel approach to enhancing communication between humans and AI. This framework moves beyond traditional human-computer interactions, emphasizing a collaborative model where AI systems adapt and engage in more meaningful and intuitive ways.

Image Credit: The original paper

Inspired by human Theory of Mind, the MToM framework highlights how humans and AI can build and refine mental models of each other. By focusing on mutual interpretation and feedback, it enables AI systems to adjust their responses to better meet user needs. Through stages of construction, recognition, and revision, these systems develop a more intuitive understanding of their role in interactions.

The researchers tested this approach in practical settings, such as online learning, where AI teaching assistants adapted their behavior based on student input. They also studied user responses to AI errors, especially when the AI misrepresented personal traits. Their findings underline the importance of trust and clear communication in bridging gaps between user expectations and AI capabilities.

This work offers valuable insights into designing AI systems as collaborative partners. By improving understanding between humans and AI, these systems could support applications in areas like education and personal assistance, while fostering more responsible and inclusive designs.

Conclusion

By wiring together these core components – profiling, knowledge, memory, reasoning/planning, reflection, and actions – you create an AI agent capable of sophisticated autonomous behavior. Each component plays a critical role, and their integration ensures the agent can perceive, decide, and act effectively within its environment. In the following episodes of this series, we will explore each of the core elements of the agentic workflow in a separate edition, providing you with the latest practical and theoretical insights.

The key to a successful agentic system lies in how well these components communicate and support one another to achieve the desired objectives. As we explore the technologies that bring these agents to life, the focus should expand beyond what makes an agent work to how they thrive within a connected ecosystem – and how we, as humans, will communicate and collaborate with them.

Bonus: Resources to dive deeper into Agentic workflows

How did you like it?