Applying Large Language Models for Intelligent Industrial Automation 应用大型语言模型实现智能工业自动化
From Theory to Application: Towards Autonomous Systems with Large Language Models 从理论到应用:利用大型语言模型实现自主系统
Yuchen Xia, Nasser Jazdi, Michael Weyrich 夏雨晨、纳赛尔-贾兹迪、迈克尔-韦里奇Institute of Industrial Automation and Software Engineering, University of Stuttgart 斯图加特大学工业自动化与软件工程研究所
Abstract 摘要
: This paper explores the transformative potential of Large Language Models (LLMs) in industrial automation, presenting a comprehensive framework for their integration into complex industrial systems. We begin with a theoretical overview of LLMs, elucidating their pivotal capabilities such as interpretation, task automation, and autonomous agent functionality. A generic methodology for integrating LLMs into industrial applications is outlined, explaining how to apply LLM for task-specific applications. Four case studies demonstrate the practical use of LLMs across different industrial environments: transforming unstructured data into structured data as asset administration shell model, assisting risk analysis and management, planning and controlling industrial operations autonomously, and interacting with simulation models to determine the parametrization of the process. The studies illustrate the ability of LLMs to manage versatile tasks and interface with digital twins and automation systems. The results and findings indicate that efficiency and productivity improvements can be achieved by strategically deploying LLM technologies in industrial settings. :本文探讨了大型语言模型(LLMs)在工业自动化中的变革潜力,提出了将其集成到复杂工业系统中的综合框架。我们首先对 LLMs 进行了理论概述,阐明了其关键功能,如解释、任务自动化和自主代理功能。我们概述了将 LLMs 集成到工业应用中的通用方法,并解释了如何将 LLM 应用于特定任务的应用。四项案例研究展示了 LLMs 在不同工业环境中的实际应用:将非结构化数据转换为结构化数据作为资产管理外壳模型、协助风险分析和管理、自主规划和控制工业运行,以及与仿真模型交互以确定流程参数。这些研究表明,LLMs能够管理多种任务,并与数字孪生和自动化系统对接。研究结果和结论表明,通过在工业环境中战略性地部署LLM技术,可以提高效率和生产力。
Large Language Models / Generative AI / Autonomous Systems / Intelligent Multi-Agent System 大型语言模型/生成式人工智能/自主系统/智能多代理系统
1 Introduction 1 引言
Large Language Models (LLMs) are changing industrial automation by leveraging advanced natural language processing capabilities. In this rapidly evolving landscape. LLMs offer unprecedented potential to streamline processes, assist decision-making, and enhance productivity in industrial settings, signaling an emerging trend toward more intelligent systems. Our comprehensive study investigates the array of capabilities exhibited by LLMs, from fundamental text interpretation to task execution, and advancing toward autonomous systems. We explore how their unique capabilities can be applied to enhance efficiency and the degree of automation in industrial applications. 大型语言模型 (LLMs) 利用先进的自然语言处理能力,正在改变工业自动化。在这一快速发展的环境中。LLMs为简化流程、辅助决策和提高工业生产效率提供了前所未有的潜力,标志着更智能系统的新兴趋势。我们的综合研究调查了 LLMs 所展示的一系列能力,从基本的文本解释到任务执行,并向自主系统迈进。我们探讨了如何应用它们的独特能力来提高工业应用的效率和自动化程度。
This paper begins by introducing a systematic theoretical framework that explains the critical aspects of LLMs for their downstream applications (Section 2). Following this foundational overview, we present a generic methodology for integrating LLMs into industrial applications, demonstrating how these capabilities can be applied to building intelligent systems (Section 3). The applications of these theories and methodology are then illustrated through four case studies that showcase the real-world implementation of LLM multi-agent system in various industrial settings (Section 4). Finally, we draw a conclusion and highlight the vision that the purposeful and methodical integration of large language models, digital twins and automation system potentially leads to future industrial autonomous systems (Section 5). 本文首先介绍了一个系统的理论框架,该框架解释了 LLMs 下游应用的关键方面(第 2 节)。在这一基础性概述之后,我们介绍了将 LLMs 集成到工业应用中的通用方法,展示了如何将这些功能应用于构建智能系统(第 3 节)。然后,我们通过四个案例研究,展示了 LLM 多代理系统在不同工业环境中的实际应用,从而说明了这些理论和方法的应用(第 4 节)。最后,我们得出结论,并强调了有目的、有条理地整合大型语言模型、数字孪生和自动化系统可能带来未来工业自主系统的愿景(第 5 节)。
2 Overview of important LLM capability 2 重要的 LLM 功能概览
Our investigation identifies several pivotal capabilities of LLMs that have significant implications for their practical impact, ranging from text interpretation to complex task execution and autonomous agent functionality, as shown in Figure 1. A fundamental capability among these is text understanding, which is essential for progressing to more complex functionalities. The advanced capabilities allow LLMs to perform more intelligent tasks, thereby enhancing their ability to act as autonomous agents in complex environments. In this section, we introduce these capabilities through essential key points. 如图 1 所示,我们的调查确定了 LLMs 的几项关键能力,这些能力对其实际影响具有重大意义,包括从文本解释到复杂任务执行和自主代理功能。其中最基本的功能是文本理解,这对于实现更复杂的功能至关重要。高级功能允许 LLMs 执行更多智能任务,从而增强其在复杂环境中作为自主代理的能力。在本节中,我们将通过基本要点来介绍这些功能。
Figure 1 The spectrum of large language model capabilities: from interpretation to autonomous agent 图 1 大型语言模型的能力范围:从解释到自主代理
2.1 Text understanding 2.1 文本理解
Text understanding is a foundational capability for LLMs, essential for grasping the semantics and concepts conveyed by text. This capability is rooted in the neuron structure of LLMs, where semantics and meanings are processed [1,2]. LLMs develop this capability during pre-training, updating neuron weights to improve next-token prediction accuracy. Three impactful aspects include multilingual proficiency, multi-domain representation, and the encoding of human knowledge. 文本理解是LLMs的一项基础能力,对于掌握文本传达的语义和概念至关重要。这种能力植根于LLMs的神经元结构,而语义和含义正是在神经元结构中进行处理的[1,2]。LLMs在预训练过程中开发了这种能力,更新神经元权重以提高下一个标记词预测的准确性。具有影响力的三个方面包括多语言能力、多领域表示和人类知识编码。
1) Multi-lingual 1) 多语种
The multi-lingual capability has implications beyond mere translation. It enables an understanding of the various expressions of concepts and their relationships in different languages and cultures. LLMs can learn semantics and knowledge from one language and apply it in another [3][4]. 多语言能力的意义不仅仅在于翻译。它使人们能够理解概念的各种表达方式以及它们在不同语言和文化中的关系。LLMs可以学习一种语言的语义和知识,并将其应用于另一种语言[3][4]。
2) Multi-domain representation 2) 多域表示法
Building on their linguistic adaptability, LLMs can be trained on how knowledge is represented and modeled in different domains, using diverse textual forms such as sentences, conversations, programming code, and structured models. This training enables the LLMs to integrate and synthesize domain-specific knowledge, improving their overall performance in complex tasks across different domains. For instance, researchers conclude that training LLMs on programming code can significantly enhance their structured reasoning capability . 在语言适应性的基础上,LLMs 可以使用不同的文本形式(如句子、对话、编程代码和结构化模型),就不同领域的知识表示和建模方式接受培训。这种训练使 LLMs 能够整合和综合特定领域的知识,从而提高他们在不同领域的复杂任务中的整体表现。例如,研究人员得出结论认为,对LLMs进行编程代码方面的训练可以显著提高他们的结构化推理能力 。
3) Human knowledge 3) 人类知识
Encoded within the text is a profound repository of human knowledge. LLMs are developed by learning from vast amounts of collective human knowledge in text-based training data, which allows them to access the knowledge that has been acquired and accumulated over human history, thus significantly broadening the scope of their applications and deepening the insights they can provide 文本中蕴含着深厚的人类知识。LLMs是通过学习基于文本的训练数据中大量的人类集体知识而开发出来的,这使它们能够获取人类历史上获得和积累的知识,从而大大拓宽了它们的应用范围,加深了它们所能提供的洞察力
2.2 Instruction following 2.2 以下指令
After pre-training, LLMs have encoded knowledge but may not effectively utilize it. Although LLMs can generate readable text, this text is not always practically useful. Correct text does not automatically equate to useful text. The utility of the generated text depends on its alignment with specific user needs and the fulfillment of an implicit goal. To actively apply this learned knowledge, LLMs require further training to follow instructions and address user intents effectively. 经过预培训后,LLMs 拥有了编码知识,但可能无法有效利用这些知识。虽然 LLMs 可以生成可读文本,但这些文本并不总是实际有用的。正确的文本并不自动等同于有用的文本。生成文本的实用性取决于它是否符合特定用户的需求,以及是否实现了隐含目标。要积极应用所学知识,LLMs 需要接受进一步的培训,以遵循指令并有效地处理用户意图。
To ensure that LLMs produce not just correct but also useful text, models usually undergo instruct fine-tuning to better align with user needs. Reinforcement Learning from Human Feedback (RLHF) improve the models by integrating human judgment into the training [6]. Furthermore, the strategies and policies governing how information shall be organized and communicated play a crucial role in answering questions and producing useful text. Proxy policy optimization [7] and direct preference optimization [8] aim to address this problem. These fine-tuning methods enhance the LLMs' ability to generate text that aligns with user instructions and intents in practical applications. 为确保LLMs不仅能生成正确的文本,而且能生成有用的文本,通常会对模型进行指导性微调,以更好地满足用户需求。从人类反馈中强化学习(RLHF)通过将人类判断纳入训练来改进模型[6]。此外,管理信息组织和交流方式的策略和政策在回答问题和生成有用文本方面起着至关重要的作用。代理策略优化[7]和直接偏好优化[8]旨在解决这一问题。这些微调方法增强了 LLMs 在实际应用中生成符合用户指令和意图的文本的能力。
The instruction-following capability lays the groundwork for three advanced functionalities: prompting, reasoning, and problem-solving, detailed in the next sections. 遵循指令的能力为三种高级功能奠定了基础:提示、推理和解决问题。
1) Prompting (In-context learning) 1) 提示(情境学习)
The initial function of most generative LLMs is text continuation, generating ongoing content based on previous input. Different LLMs could have different behaviors and styles to generate content, and changing them would require a costly re-training process with the risk of catastrophic forgetting. This limitation is overcome by the paradigm of in-context learning: The input is no longer treated as merely content, but also as instruction in guiding the LLM's behavior in a specific context. This allows dynamic adaptation of a general LLM to perform specific tasks without the need for retraining the model. 大多数生成式 LLMs 的初始功能是文本续写,即根据以前的输入生成持续的内容。不同的 LLMs 在生成内容时可能会有不同的行为和风格,而改变这些行为和风格将需要一个代价高昂的重新训练过程,并有可能造成灾难性的遗忘。上下文学习模式克服了这一限制:输入不再仅仅被视为内容,还被视为在特定情境中指导 LLM 行为的指令。这样,一般的LLM就可以动态地适应特定任务的执行,而无需重新训练模型。
Consequently, prompting has become an essential tool for efficiently utilizing LLMs. It allows users to shape the model's behavior through prompts, transforming a general-purpose LLM into a specialized LLM without extensive re-training. This adaptability through prompting reduces the need for constant fine-tuning during LLM application development, though it does increase token consumption during inferencing. 因此,提示已成为有效利用 LLMs 的重要工具。它允许用户通过提示来塑造模型的行为,将通用的 LLM 转变为专用的 LLM 而无需大量的重新培训。这种通过提示实现的适应性减少了在 LLM 应用程序开发过程中不断进行微调的需要,但在推理过程中会增加令牌消耗。
2) Reasoning 2) 推理
Reasoning is a foundational capability of LLMs, bridging the gap between understanding knowledge and applying it to solve specific tasks. Research shows that training LLMs to follow complex instructions can significantly improve their reasoning performance in problem-solving [9]. 推理是LLMs的一项基础能力,是理解知识和应用知识解决具体任务之间的桥梁。研究表明,训练 LLMs 遵循复杂指令能显著提高他们在解决问题时的推理能力 [9]。
Unlike traditional rule-based systems, LLMs engage in "conceptual reasoning", which depends on understanding semantic relationships and patterns. This advanced form of reasoning marks a departure from conventional expert systems that depend on exhaustive reasoning rule sets, offering enhanced generalization and intelligence by internalizing vast amounts of knowledge patterns during training. A prevalent theory explaining this intelligence is the compression theory: it posits that an LLM's intelligence derives from its ability to compress large data volumes into abstract patterns encoded within its neural architecture. Researchers suggest viewing LLMs not just as information processors, but also as information compressors [10]. This also implies that intelligence is closely related to prediction, compression, abstraction and generalization, which also means that hallucinations are inherently inevitable and sometimes even desired. 与传统的基于规则的系统不同,LLMs进行的是 "概念推理",这取决于对语义关系和模式的理解。这种先进的推理方式与依赖于详尽推理规则集的传统专家系统不同,它通过在训练过程中将大量知识模式内化,增强了泛化能力和智能。解释这种智能的一种流行理论是压缩理论:该理论认为,LLM的智能来自于其将大量数据压缩为抽象模式并编码到神经架构中的能力。研究人员建议,不仅要把LLMs看作信息处理器,还要把它看作信息压缩器[10]。这也意味着,智能与预测、压缩、抽象和概括密切相关,这也意味着幻觉在本质上是不可避免的,有时甚至是人们所希望的。
3) Problem-solving 3) 解决问题
Problem-solving in LLMs involves devising a sequence of reasoning steps that lead to a solution. This capability is especially critical for scenarios ranging from solving puzzles, high school exams or engineering problems. A typical approach is "chain-of-though" [11], which enhances task performance by breaking down complex problems into logical, intermediate steps, simplifying the problem-solving process and increasing explainability and precision. Furthermore, other researchers propose the "tree-of-though" [12] methodology: by instructing LLM to search and evaluate multiple potential reasoning pathways dynamically, the method determines the most viable pathway(s) for problem-solving. LLMs中的问题解决涉及到设计一连串的推理步骤,从而找到解决方案。这种能力对于解决谜题、高中考试或工程问题等场景尤为重要。一种典型的方法是 "思维链"[11],它通过将复杂问题分解成逻辑性强的中间步骤,简化了问题解决过程,提高了可解释性和精确性,从而提高了任务绩效。此外,其他研究人员还提出了 "思维树"[12] 方法:通过指示 LLM 动态搜索和评估多个潜在的推理路径,该方法可确定解决问题的最可行路径。
These approaches embody pragmatism in philosophy, which treats language as an instrument in predicting and decision-making to reach a goal, rather than just as description and representation. These framework allows the reasoning text produced by LLMs to be seen as steps towards a concrete, actionable, and verifiable solution that is assessable against a problem. 这些方法体现了哲学中的实用主义,它将语言视为预测和决策以实现目标的工具,而不仅仅是描述和表示。这些框架允许将 LLMs 生成的推理文本视为实现具体、可操作和可验证的解决方案的步骤,这些解决方案可针对问题进行评估。
2.3 Autonomous agent 2.3 自主代理
An autonomous agent is a software-enabled entity that can perform tasks independently in a dynamic environment, making decisions and performing actions. In this framework, LLMs act as the "brain" of the agent, providing the intelligence for information processing and decision-making. However, LLMs typically do not have connections to external systems and direct physical interaction with the environment. 自主代理是一种由软件支持的实体,可以在动态环境中独立执行任务,做出决策并采取行动。在这个框架中,LLMs充当了代理的 "大脑",为信息处理和决策提供智能。但是,LLMs通常不与外部系统连接,也不与环境进行直接的物理交互。
In this context, data and control interfaces are pivotal links to fill the gap, enabling LLMs to reach beyond the digital space and interact with the physical world. By creating virtual replicas of physical entities and environments, digital twins can provide LLMs with "eyes" to observe the real-world context and "hands" to influence the physical environment. 在这种情况下,数据和控制界面是填补空白的关键环节,使LLMs能够超越数字空间,与物理世界互动。通过创建物理实体和环境的虚拟复制品,数字孪生可以为LLMs提供观察现实世界环境的 "眼睛 "和影响物理环境的 "双手"。
In this section, we point out the three critical aspects of this advanced functionality. 在本节中,我们将指出这一高级功能的三个关键方面。
2.3.1 Interact with APIs (LLM uses tools) 2.3.1 与 API 交互(LLM 使用工具)
Designing autonomous agents involves empowering LLMs to interact with external software systems through APIs, enhancing their capabilities in data retrieval, information analysis, and operational execution. These interactions can be categorized into three types: 设计自主代理需要授权 LLMs 通过应用程序接口与外部软件系统进行交互,从而增强其在数据检索、信息分析和操作执行方面的能力。这些交互可分为三种类型:
Data sources: These include databases, search engines, document repositories, and data integration middleware for sensors or IoT devices They provide LLMs with timely and relevant information, as seen in systems like Retrieval Augmented Generation (RAG) [13], which integrate localized data sources to improve LLM applications. 数据源:这些数据源包括数据库、搜索引擎、文档库以及用于传感器或物联网设备的数据集成中间件。它们为LLMs提供及时、相关的信息,如检索增强生成(RAG)[13] 等系统,该系统集成了本地化数据源,以改进LLM应用。
Specialized algorithms: These connect the LLM to the interfaces of software algorithms that offer superior performance in specific tasks. Examples include computational tools like calculators for math tasks or efficient pathfinding algorithms for navigation. 专用算法:这些算法将 LLM 与在特定任务中提供卓越性能的软件算法接口连接起来。例如,用于数学任务的计算工具(如计算器)或用于导航的高效寻路算法。
Control interfaces: These interfaces can be used for parametrized process invocation, enabling the LLM to initiate actionable operations. The LLM can influence the external system by triggering control commands through these interfaces. 控制接口:这些接口可用于参数化流程调用,使 LLM 能够启动可操作的操作。LLM 可以通过这些接口触发控制命令,从而影响外部系统。
The interaction between an LLM and APIs can be achieved by training the LLM to generate parameterized interface calls and integrating the outcomes into its text generation process [14] or using prompt engineering to generate the interface calls, as demonstrated in [15]. LLM与应用程序接口之间的交互可以通过训练LLM生成参数化的接口调用并将结果集成到其文本生成过程中来实现 [14],或者使用提示工程来生成接口调用,如 [15] 所示。
This approach of connecting LLM with external software is often referred to as "tool usage" to enhance its functionality and reasoning capabilities, much like a human using specialized tools to achieve higher precision and efficiency in their work. 这种将 LLM 与外部软件连接起来的方法通常被称为 "工具使用",以增强其功能和推理能力,就像人类使用专业工具来实现更高的工作精度和效率一样。
2.3.2 Observe / think / decide 2.3.2 观察/思考/决定
Designing an LLM for a specific task typically requires manually crafting and refining prompts to ensure the LLM agent behaves as instructed and achieves the desired results. This prompting process can become chaotic if not managed properly. To address this, we apply a universal "Observe/Think/Decide" framework to systematically apply LLMs to specific tasks, structuring the problem-solving process into three critical steps: 为特定任务设计 LLM 通常需要手动制作和完善提示,以确保 LLM 代理按照指示行事并实现预期结果。如果管理不当,这一提示过程可能会变得混乱。为了解决这个问题,我们采用了一个通用的 "观察/思考/决定 "框架,将 LLMs 系统地应用到特定任务中,并将解决问题的过程划分为三个关键步骤:
1) Observe 1) 观察
In the observation phase, LLM agents collect and pre-process data from data sources. This could include real-time data from APIs, stored data from databases, or direct user inputs. The focus is on identifying and interpreting the inputs, as well as prioritizing relevant information while filtering out noise information or abstracting away unnecessary details. The observation phase lays the groundwork for all subsequent reasoning processes and decisionmaking. 在观察阶段,LLM 代理从数据源收集并预处理数据。这可能包括来自应用程序接口的实时数据、来自数据库的存储数据或用户的直接输入。重点是识别和解释输入,以及优先处理相关信息,同时过滤掉噪音信息或抽象掉不必要的细节。观察阶段为所有后续推理过程和决策制定奠定了基础。
2) Think 2) 思考
Once the data is gathered, the agent moves to the thinking phase, where it analyzes and synthesizes the information. During this stage, the LLM performs reasoning steps to solve a defined task. LLM can either perform inherent reasoning or interact with external software components 收集数据后,代理将进入思考阶段,对信息进行分析和综合。在这一阶段,LLM执行推理步骤,以解决确定的任务。LLM可以执行固有推理或与外部软件组件交互
inherent reasoning: the model generates text with the aim of reaching a conclusion. The text in a reasoning step is treated as an argument or hypothesis, which means the generated text is not always treated as true, and its validity and plausibility are always subject to an evaluation mechanism outside the thinking process. The viewpoint of pragmatism in phylosophy is helpful, viewing the truth of a belief in terms of its efficacy and the practical outcomes it enables. Generalization and abstraction are desired in creative thinking and heuristic problem-solving. Sometimes, hallucinations are not necessarily detrimental. 内在推理:模型生成文本的目的是得出结论。推理步骤中的文本被视为论据或假设,这意味着生成的文本并不总是被视为真实的,其有效性和合理性总是受制于思维过程之外的评价机制。系统哲学中的实用主义观点是有帮助的,它从信念的有效性及其促成的实际结果的角度来看待信念的真伪。在创造性思维和启发式解决问题的过程中,需要进行概括和抽象。有时,幻觉并不一定有害。
external software interaction: if specialized software is superior in some specific tasks, external results (e.g., calculator results, simulation outputs) can be merged with the generated text for a more efficient and reliable reasoning process. 外部软件交互:如果专业软件在某些特定任务中具有优势,则可将外部结果(如计算器结果、模拟输出)与生成的文本合并,以提高推理过程的效率和可靠性。
3) Decision-making 3) 决策
The final phase is action, where the LLM derives a decision based on previously generated texts in the reasoning steps. Decisions may vary from generating textual outputs to making API calls that interact with external systems or create responses to users. 最后一个阶段是行动,LLM根据推理步骤中先前生成的文本推导出一个决策。决策可以是生成文本输出,也可以是调用 API 与外部系统交互或创建对用户的响应。
The "Observe/Think/Decide" framework offers a structured and reproducible approach for designing task-specific LLM systems. It can also assist in simplifying the process of identifying system failures by pinpointing whether performance issues stem from the observed data, the reasoning process, or the action phase. This clarity helps develop, test, and improve LLM-powered systems. 观察/思考/决策 "框架为设计特定任务的LLM系统提供了一种结构化和可重复的方法。该框架还可以帮助简化识别系统故障的过程,精确定位性能问题是源于观察到的数据、推理过程还是行动阶段。这种清晰度有助于开发、测试和改进LLM驱动的系统。
2.3.3 Multi-agent design 2.3.3 多代理设计
Some tasks come up with high complexity that is too difficult for a single LLM to tackle. This is where the design of LLM multi-agent systems becomes necessary. The design approach begins with task decomposition, where a complex task are broken down into smaller, more manageable sub-tasks that can be independently processed. Following this, the specialization of agents is required. This specialization involves applying tailored LLM agents (through prompting, fine-tuning, or both) to perform specific sub-tasks and implement them into software components (agents) with different responsibilities. The general methodology for agent design is presented in Section 3. 有些任务非常复杂,单个 LLM 难以解决。这就需要设计 LLM 多代理系统。设计方法从任务分解开始,将复杂的任务分解成更小、更易于管理的子任务,这些子任务可以独立处理。随后,需要对代理进行专业化。这种专业化包括应用定制的 LLM 代理(通过提示、微调或两者兼而有之)来执行特定的子任务,并将其实施为具有不同职责的软件组件(代理)。第 3 节介绍了代理设计的一般方法。
For instance, a simple LLM multi-agent system might include these agents: 例如,一个简单的 LLM 多代理系统可能包括这些代理:
Data retrieval agent (observation): This agent focuses on sourcing and pre-processing information from various data points and databases. It is responsible for ensuring that the information used in reasoning is current and relevant. 数据检索代理(观察):该代理主要负责从各种数据点和数据库中获取信息并进行预处理。它负责确保推理中使用的信息是最新的和相关的。
Analysis agent (think): Dedicated to processing and analyzing the data retrieved, this agent interprets the data, performs reasoning, and generates insights. 分析代理(think):该代理专门处理和分析检索到的数据,对数据进行解释、推理并提出见解。
Interaction agent (decision): This agent uses the insights generated by the analysis agent to respond to users and is responsible for user interaction. 交互代理(决策):该代理利用分析代理生成的见解来回应用户,并负责与用户互动。
Some repetitive sub-tasks that other software algorithms can more reliably and effectively execute do not necessarily require an LLM agent. 其他软件算法可以更可靠、更有效地执行的一些重复性子任务并不一定需要 LLM 代理。
3 General methodology for integrating LLM into industrial automation and software systems as LLM agent 3 将 LLM 作为 LLM 代理集成到工业自动化和软件系统的一般方法
For incorporating LLMs into industrial automation and their software systems, our methodology pivots around creating seamless interfaces that link LLM multi-agent systems with digital twins and automation systems, as showin in Figure 2. This integration facilitates intelligent decision-making by equipping LLM agents with the ability to observe, reason, and decide, thus acting as the system's "brain". Industrial automation systems, equipped with perception and execution capabilities, function as the "eyes" and "hands" Meanwhile, digital twins serve as high-fidelity digital replicas for synchronized information sources, action interfaces, and simulation models. The digital twin system acts as a mediator to connect the LLM system with the automation system. Together, they form an autonomous system capable of automated adaptation with less human intervention. 为了将 LLMs 集成到工业自动化及其软件系统中,我们的方法主要是创建无缝接口,将 LLM 多代理系统与数字孪生和自动化系统连接起来,如图 2 所示。这种集成通过为 LLM 代理配备观察、推理和决策能力,从而充当系统的 "大脑",促进智能决策。同时,数字孪生系统是同步信息源、操作界面和仿真模型的高保真数字复制品。数字孪生系统是连接LLM系统和自动化系统的中介。它们共同组成了一个自主系统,能够在较少人工干预的情况下实现自动适应。
Figure 2 Autonomous system enabled with LLM multi-agent system, digital twins and automation system 图 2 利用 LLM 多代理系统、数字双胞胎和自动化系统实现的自主系统
In the subsequent section, we delineate the general method for strategically deploying LLMs to establish a multi-agent system aimed at performing automation tasks (3.1 and 3.2). The interaction between LLM, digital twins, and automation systems is demonstrated later in case studies in 4.3 and 4.4. 在接下来的章节中,我们将详细介绍战略性部署 LLMs 以建立旨在执行自动化任务的多代理系统的一般方法(3.1 和 3.2)。LLM 、数字孪生和自动化系统之间的互动将在后面的 4.3 和 4.4 案例研究中展示。
3.1 From large language model to LLM agent 3.1 从大型语言模型到 LLM 代理
Fundamentally, an LLM predicts the next token based on the preceding text, generating a text continuation. To harness this capability for specific tasks, a task-specific prompt can be crafted to provide context and direct the model to produce the desired output. This approach transforms the LLM into an information processing component, creating an LLM agent, as shown in Figure 3 . 从根本上说,LLM 可以根据前面的文本预测下一个标记,从而生成文本续篇。为了将这一功能用于特定任务,可以制作特定任务的提示,以提供上下文并指导模型生成所需的输出。如图 3 所示,这种方法将 LLM 转化为信息处理组件,创建了一个 LLM 代理。
Figure 3 Design of an LLM agent as an information processor via prompting 图 3 通过提示将 LLM 代理设计为信息处理器
Here, we introduce the concept of LLM agent and define it as: 在此,我们引入 LLM 代理的概念,并将其定义为
An LLM agent is a software component designed to perform specific tasks by processing textual inputs and leveraging interpreting, inferential reasoning, and instruction-following capabilities to generate outputs. It undertakes a defined (sub-)task within a system, aiming to achieve a goal through information processing. LLM代理是一种软件组件,旨在通过处理文本输入并利用解释、推理和指令遵循能力生成输出来执行特定任务。它在系统中执行一项确定的(子)任务,旨在通过信息处理实现目标。
A vital element of the LLM agent is the prompt. In our research, we apply a structured prompt template with some key element specifications to effectively guide the LLM toward producing the desired task results, as shown in Figure 4. For instance, we apply this prompt in our LLM application for an autonomous production system (cf. Section 4.3). LLM 代理的一个重要元素是提示。如图 4 所示,在我们的研究中,我们应用了一个结构化的提示模板,其中包含一些关键要素规范,以有效引导 LLM 生成所需的任务结果。例如,我们在自主生产系统 的 LLM 应用中应用了这种提示(参见第 4.3 节)。
Figure 4 A structured prompt template for designing information processing of LLM agent 图 4 用于设计 LLM 代理信息处理的结构化提示模板
3.2 From LLM agent to LLM multi-agent system 3.2 从 LLM 代理到 LLM 多代理系统
In practical scenarios, the complexity of tasks may surpass what a single LLM can handle. Consider a control task (cf. Section 4.4), which involves the interpretation of initial raw information, analysis, decision-making for actions, and the creation of summarized reports for the human-machine interface. Such a task becomes too complicated for a single LLM and should be decomposed into subtasks. For each subtask, a dedicated LLM agent can be designed with its specialized responsibility. Connecting these LLM agents together results in a system of LLM agents in which multiple LLM agents collaborate towards an overall task objective. 在实际场景中,任务的复杂性可能会超过单个 LLM 所能处理的范围。考虑一项控制任务(参见第 4.4 节),它涉及初始原始信息的解释、分析、行动决策以及为人机界面创建汇总报告。对于单个 LLM 来说,这样的任务过于复杂,因此应将其分解为多个子任务。对于每个子任务,都可以设计一个专门的 LLM 代理,由其专门负责。将这些 LLM 代理连接在一起,就形成了一个 LLM 代理系统,在这个系统中,多个 LLM 代理为实现总体任务目标而协作。
Here, we introduce the concept of the LLM multi-agent system and define the LLM multi-agent system as: 在此,我们引入 LLM 多代理系统的概念,并将 LLM 多代理系统定义为:
An LLM multi-agent system is an integrated software system that harnesses multiple LLM agents to accomplish a set of associated sub-tasks. These agents collaboratively function within the system's architecture, undertaking their respective responsibilities to achieve overarching goals through integrated information processing. LLM多代理系统是一种集成软件系统,它利用多个LLM代理来完成一系列相关的子任务。这些代理在系统架构内协同运作,承担各自的责任,通过综合信息处理实现总体目标。
Figure 5 The integration of individual LLM agents into a collaborative LLM multi-agent system 图 5 将单个 LLM 代理集成到协作式 LLM 多代理系统中
4 Case studies 4 案例研究
4.1 Case study 1: generation of structured Asset Administration Shell model from unstructured data 4.1 案例研究 1:从非结构化数据生成结构化资产管理壳模型
In today's digital landscape, the exponential growth of unstructured data presents a significant challenge, leading to the underutilization of valuable information. In contrast, structured data enables quicker and more accurate retrieval of information, supports seamless interoperability within technical systems, and serves as the foundation for advanced analytics and data-driven applications. The Asset Administration Shell (AAS) exemplifies a structured data model designed specifically for industrial applications, which encapsulates comprehensive digital models that manage and store asset data. 在当今的数字化环境中,非结构化数据的指数级增长带来了巨大的挑战,导致宝贵的信息得不到充分利用。相比之下,结构化数据可以更快、更准确地检索信息,支持技术系统内的无缝互操作性,并为高级分析和数据驱动型应用奠定基础。资产管理外壳(AAS)是专为工业应用设计的结构化数据模型的典范,它封装了管理和存储资产数据的综合数字模型。
Traditionally, converting unstructured data into structured formats has been a labor-intensive and error-prone process, heavily reliant on significant human effort. Furthermore, rule-based automated processes suffer from limited applicability across diverse data types and contexts. The text interpretation and semantic understanding capabilities of LLMs offer a promising solution to these challenges. 传统上,将非结构化数据转换为结构化格式是一个劳动密集型且容易出错的过程,严重依赖大量人力。此外,基于规则的自动化流程对不同数据类型和上下文的适用性有限。LLMs 的文本解释和语义理解功能为应对这些挑战提供了一个前景广阔的解决方案。
Figure 6 The application "AASbyLLM": generating AAS instance model from unstructured text data [16] 图 6 应用程序 "AASbyLLM":从非结构化文本数据生成 AAS 实例模型 [16]
Leveraging LLMs, we developed an application [16] to analyze unstructured information in technical documents describing technical details about machines and devices. The LLM agent system extracts the semantic essence (referred to as "semantic node") from the data and translates it into a structured format in the form of Asset Administration Shell. The results are evaluated against human evaluators. 我们利用 LLMs 开发了一个应用程序 [16],用于分析技术文档中描述机器和设备技术细节的非结构化信息。LLM 代理系统从数据中提取语义精华(称为 "语义节点"),并将其转换为资产管理外壳形式的结构化格式。其结果由人类评估员进行评估。
Figure 7 System components diagram of the application"AASbyLLM" 图 7 应用程序 "AASbyLLM "的系统组件图
The system component diagram illustrates the automated process for generating AAS models using LLMs. Users input unstructured text containing technical knowledge about devices via a web-based user interface. This unstructured text is then processed to generate an AAS instance model for the device by an LLM-agent system. The first LLM agent is designed for the extraction of semantic details, and it forms a data structure to store the intermediate result. Another semantic search agent retrieves semantically similar data entries from a local database, enriching and aligning the informaion in the intermediate result with standardized semantics. Then, a synthesis LLM agent analyzes the intermediate results from the other agents and generates a complete set of textual information that can be used to form a structured AAS instance model according to industrial standards. Finally, the generated AAS instance model is returned to the user. 系统组件图说明了使用 LLMs 生成 AAS 模型的自动化流程。用户通过基于网络的用户界面输入包含设备技术知识的非结构化文本。然后由 LLM 代理系统对这些非结构化文本进行处理,以生成设备的 AAS 实例模型。第一个LLM代理用于提取语义细节,并形成一个数据结构来存储中间结果。另一个语义搜索代理从本地数据库中检索语义相似的数据条目,用标准化语义丰富和调整中间结果中的信息。然后,一个合成代理LLM分析来自其他代理的中间结果,并生成一套完整的文本信息,用于根据工业标准形成结构化的 AAS 实例模型。最后,生成的 AAS 实例模型将返回给用户。
Our initial prototype implementation can generate error-free and informative AAS data elements (structured data) from unstructured data. This indicates that a significant proportion of effort in the labor-intensive data curation process can be converted into a simpler verification process, substantially reducing labor effort and costs in data management. Open-source LLMs that are deployed on local servers also reach a satisfactory level of . We are optimistic that this initial result can be further improved with prompts adjustment or model fine-tuning. This approach introduces a novel method of data cleansing and management, elevating the value of data by ensuring its accessibility and utility in a more structured and meaningful format. Details and demos of this work are published in [16] with a hosted demo in Github . 我们最初的原型实施可以从非结构化数据中生成 无差错且信息丰富的 AAS 数据元素(结构化数据)。这表明,劳动密集型数据整理过程中的很大一部分工作可以转换为更简单的验证过程,从而大大减少数据管理的劳动强度和成本。部署在本地服务器上的开源 LLMs 也达到了令人满意的 水平。我们乐观地认为,通过提示调整或模型微调,这一初步结果可以得到进一步改善。这种方法引入了一种新颖的数据清理和管理方法,通过确保数据的可访问性和实用性,以更有条理和更有意义的格式提升数据的价值。这项工作的详情和演示发表在 [16] 中,并在 Github 上提供了托管演示。
4.2 Case Study 2: LLM assisted Failure Mode and Effects Analysis 4.2 案例研究 2:LLM辅助故障模式和影响分析
Failure Mode and Effects Analysis (FMEA) is a instrumental method for identifying potential failures in products or processes to ensure safety, quality, and reliability. Traditionally, FMEA requires extensive expert input and is time-consuming. This case study demonstrates how LLMs can automate and enhance the FMEA process, significantly reducing manual effort and the quality of the analysis. 失效模式及影响分析(FMEA)是一种工具性方法,用于识别产品或流程中的潜在失效,以确保安全、质量和可靠性。传统上,FMEA 需要大量的专家意见,而且非常耗时。本案例研究展示了 LLMs 如何实现 FMEA 流程自动化并增强其功能,从而显著减少人工操作并提高分析质量。
We provide two key demonstrations of this application, which are being updated in a public GitHub repository : 我们提供了该应用的两个关键演示,并在 GitHub 公共仓库 中进行了更新:
- FMEA for safety-critical product: This demo uses an example of Brake-by-Wire system to showcase the application of the LLM in analyzing reliability and safety issues in complex systems. The LLM system suggests failure modes and risk mitigation measures, as shown in Figure 8. - 安全关键型产品的 FMEA:本演示以线控制动器系统为例,展示了 LLM 在分析复杂系统的可靠性和安全性问题时的应用。如图 8 所示,LLM 系统提出了故障模式和风险缓解措施。
- FMEA in manufacturing process: This demo focuses on enhancing safety and quality of industrial production processes. Here, the LLM leverages detailed data and documentation of manufacturing processes to assist with context-specific FMEA analysis. - 制造过程中的 FMEA:该演示侧重于提高工业生产过程的安全和质量。在这里,LLM 利用制造流程的详细数据和文档来协助进行针对具体情况的 FMEA 分析。
The application's main feature allows users to work directly with an FMEA table with the assistance of LLM. As users input or edit data, the LLM system actively analyzes the input and generates a list of suggestions for potential failures and risks relevant to the current context. These recommendations are then presented to users, who can easily select relevant content, streamlining the process of identifying and managing risks, as well as their countermeasures. 该应用程序的主要功能是允许用户在 LLM 的帮助下直接使用 FMEA 表格。当用户输入或编辑数据时,LLM 系统会主动分析输入内容,并生成与当前情况相关的潜在故障和风险建议列表。然后,这些建议将呈现给用户,用户可以轻松选择相关内容,从而简化识别和管理风险及其应对措施的流程。
Figure 8 The user interface of the developed LLM application for FMEA 图 8 为 FMEA 开发的 LLM 应用程序的用户界面
These demonstrations illustrate the significant advantages of integrating LLMs into FMEA processes: 这些演示说明了将 LLMs 集成到 FMEA 流程中的显著优势:
Enhanced risk coverage: The LLM's ability to generate and suggest potential new failure modes increases the thoroughness of risk assessments. 增强风险覆盖范围:LLM 能够生成和建议潜在的新故障模式,从而提高了风险评估的全面性。
Time savings for experts: By automating the generation of FMEA content, the system frees up expert time previously spent on extensive brainstorming sessions. The complicated content creation effort is converted into easier validation effort. 节省专家时间:通过自动生成 FMEA 内容,该系统释放了专家们之前用于大量头脑风暴会议的时间。复杂的内容创建工作被转化为简单的验证工作。
Cross-disciplinary knowledge communication: The LLM system interprets the data and documents from various fields available within a company, synthesizing these to perform analysis. It helps bridge communication gaps within organizations, facilitating better knowledge sharing. This is particularly crucial in FMEA analysis, where interdisciplinary collaboration is essential in complex environments. 跨学科知识交流:LLM系统可以解释公司内部各领域的数据和文件,并将其综合起来进行分析。它有助于弥合组织内部的沟通鸿沟,促进更好的知识共享。这在 FMEA 分析中尤为重要,因为在复杂的环境中,跨学科协作是必不可少的。
To enhance the relevance and accuracy of the suggestions, our application also incorporates a Retrieval Augmented Generation (RAG) data processing pipeline, as shown in Figure 9. This feature enables the LLM system to access and utilize internal company documents, ensuring that the generated content is specifically tailored to the company's processes and past experiences. 为了提高建议的相关性和准确性,我们的应用程序还集成了一个检索增强生成 (RAG) 数据处理管道,如图 9 所示。这一功能使 LLM 系统能够访问和利用公司内部文档,确保生成的内容专门针对公司的流程和过去的经验。
Figure 9 System design overview of the RAG-based application 图 9 基于 RAG 的应用系统设计概览
The interactive FMEA applications demonstrate how LLM-system can transform traditional engineering processes. By combining intelligent text generation with user-friendly FMEA editing applications, this not only enhances the depth and breadth of the FMEA but also significantly improves the quality of the final analysis, leading to safer and more reliable products and processes. 交互式 FMEA 应用程序展示了 LLM 系统如何改变传统的工程流程。通过将智能文本生成与用户友好的 FMEA 编辑应用程序相结合,这不仅增强了 FMEA 的深度和广度,还显著提高了最终分析的质量,从而生产出更安全、更可靠的产品和工艺。
4.3 Case Study 3: Autonomous production planning and control 4.3 案例研究 3:自主生产计划与控制
The intelligence of large language models can also be applied in industrial production scenarios to create more efficient and intelligent automation systems, addressing some limitations that traditional systems fail to overcome: 大型语言模型的智能也可应用于工业生产场景,以创建更高效、更智能的自动化系统,解决传统系统无法克服的一些局限性:
Fixed operation: Traditional automation systems are often rigid and designed for specific tasks with a fixed set of codes and instructions. They do not have flexible decision-making capabilities for new information or scenarios that are too hard to pre-define during system development. 固定操作:传统的自动化系统通常比较死板,是为特定任务设计的,有一套固定的代码和指令。它们不具备灵活的决策能力,无法应对新信息或在系统开发过程中难以预先确定的情况。
Training cost: To reconfigure the automation system for a new production process, human operators typically need to have extensive knowledge of the complicated technical system. The need for specialized training to manage and control automation systems can be a significant barrier. 培训成本:要为新的生产流程重新配置自动化系统,人类操作员通常需要对复杂的技术系统有广泛的了解。管理和控制自动化系统所需的专业培训可能是一个重大障碍。
Labor cost and quick reaction: Automation systems rely on human operators for supervision, decision-making, and intervention, especially when unexpected events arise. Humans may not detect production events quickly and react on time with informed plans and actions. 人力成本和快速反应:自动化系统依赖于人类操作员的监督、决策和干预,尤其是在突发事件发生时。人类可能无法快速检测到生产事件,也无法及时做出反应,制定明智的计划和行动。
To overcome this limitation, we designed an autonomous production system powered by large language model agents that can plan and control the operations of the automation system. LLM agents propose parametrization settings for machines in real-time for varying events or environmental conditions, enhancing process efficiency and reducing reaction time. 为了克服这一限制,我们设计了一个由大型语言模型代理驱动的自主生产系统,这些代理可以规划和控制自动化系统的运行。LLM代理可针对不同的事件或环境条件,实时为机器提出参数设置,从而提高流程效率并缩短反应时间。
At the laboratory of the institute of industrial automation and software engineering (IAS) at the university of Stuttgart, we have retrofitted an automated production system composed of various modules for flexible production applications. When given a task instruction from a user or on identifying a triggering event, the LLM agents propose a production plan consisting of a sequence of atomic operations to complete the task or react to the event. As shown in Figure 10, these production modules are capable of executing diverse production and logistics operations. Each module provides a data and control interface through a run-time environment, enabling real-time data retrieval and control operations via these interfaces. 在斯图加特大学工业自动化和软件工程研究所 (IAS) 的实验室里,我们改造了一个由各种模块组成的自动化生产系统,用于灵活的生产应用。当用户下达任务指令或识别到触发事件时,LLM代理会提出一个由一系列原子操作组成的生产计划,以完成任务或对事件做出反应。如图 10 所示,这些生产模块能够执行各种生产和物流操作。每个模块都通过运行时环境提供数据和控制接口,通过这些接口实现实时数据检索和控制操作。
Figure 10 Cyber-physical modular production system at IAS 图 10 IAS 的网络物理模块化生产系统
Utilizing this run-time environment, we have designed LLM agents that interpret data and control the physical system through the provided control interfaces. These agents function at different levels within the automation pyramid, facilitating autonomous planning and control of flexible production processes, as shown in Figure 11 利用这一运行时环境,我们设计了 LLM 代理,通过提供的控制接口解释数据并控制物理系统。如图 11 所示,这些代理可在自动化金字塔的不同层次发挥作用,促进灵活生产流程的自主规划和控制
The executable interfaces for production control are carried out on two levels. High-level skills with coarse granularity are performed by automation modules, typically for performing one process segment like transport, drilling, inspecting, and so on; low-level functionalities with fine granularity are executed within an automation module, typically for actuation like conveyor movement, stopper on/off, gate switch on/off, and so on. As shown in Figure 11 . 生产控制的可执行界面分为两个层次。具有粗粒度的高级技能由自动化模块执行,通常用于执行运输、钻孔、检查等工序;具有细粒度的低级功能在自动化模块内执行,通常用于执行输送机移动、挡板开/关、闸门开关开/关等操作。如图 11 所示。
Figure 11 LLM agents in automation pyramid 图 11 自动化金字塔中的LLM代理
Then, we designed several operator LLM agents that are responsible for controlling the automation modules and one manager LLM for coordinating the automation modules. As shown in Figure 12, the production planning is also carried out on two levels. First, the manager LLM agent interprets the triggering events or a user text input. Then, it arranges the skills of the automation module to form a general plan for the production process. Next, the generated skill command is given to an operator LLM agent that actually controls the automation module. The operator LLM agent then generates a detailed plan for concrete operation procedures to control the actuator movements. By executing these, the production operation is fulfilled in the physical environment. 然后,我们设计了几个操作员 LLM 代理负责控制自动化模块,一个经理 LLM 负责协调自动化模块。如图 12 所示,生产计划也分两个层次进行。首先,经理LLM代理解释触发事件或用户文本输入。然后,它安排自动化模块的技能,形成生产流程的总体规划。接下来,生成的技能指令将交给实际控制自动化模块的操作员 LLM 代理。然后,操作员 LLM 代理将生成控制执行器运动的具体操作流程的详细计划。通过执行这些操作步骤,就能在物理环境中完成生产操作。
Figure 12 The designed autonomous production planning and control system consists of LLM agents, digital twins and automation system 图 12 所设计的自主生产计划和控制系统由 LLM 代理、数字孪生和自动化系统组成
Our implemented prototype demonstrates the ability to handle tasks that are not pre-defined . The LLM agents interpret the user input or triggering events, utilize the knowledge pattern learned by LLM as heuristics, and strategically plan production processes dynamically. The digital twin system provides the data and control interface, allowing the planned process to be executed and controlled in the physical environment. A screenshot of the UI is shown in Figure 13. The developed prototype assists in quickly reacting to events and order information, rapidly generating an operation plan for user validation and approval under safety considerations. This research underscores the potential of integrating large language models (LLMs) into industrial automation systems, particularly in the context of smart factories, enhancing agility, flexibility, and adaptability in production processes. 我们实施的原型展示了处理未预先定义的 任务的能力。LLM代理解释用户输入或触发事件,利用LLM学习到的知识模式作为启发式方法,并对生产流程进行动态战略规划。数字孪生系统提供数据和控制界面,允许在物理环境中执行和控制计划流程。用户界面截图如图 13 所示。所开发的原型有助于对事件和订单信息做出快速反应,快速生成操作计划,供用户在考虑安全因素的情况下进行验证和批准。这项研究强调了将大型语言模型(LLMs)集成到工业自动化系统中的潜力,尤其是在智能工厂的背景下,从而提高生产流程的敏捷性、灵活性和适应性。
Figure 13 The user interface and the live demonstration of operation planning and control 图 13 操作规划和控制的用户界面和现场演示
4.4 Case Study 4: LLM agents interact with the simulation models in digital twins 4.4 案例研究 4:LLM代理与数字孪生中的仿真模型互动
In Case Study 3, we explained how LLMs can be used for planning and controlling processes in general. Process planning and control involve defining the process segments and steps, as well as monitoring and adjusting the process parameters to ensure they meet desired outcomes. To allow for more precise and reliable planning and control in detail, as well as to predict the optimal outcomes of the process, a more sophisticated design of LLM systems is required. 在案例研究 3 中,我们介绍了 LLMs 如何用于规划和控制一般流程。流程规划和控制包括定义流程环节和步骤,以及监控和调整流程参数,以确保它们达到预期结果。为了实现更精确、更可靠的详细规划和控制,并预测工艺的最佳结果,需要对 LLM 系统进行更复杂的设计。
This sophistication can be achieved by integrating simulation models into the LLM agent system. By establishing the connection between LLM and simulation models, LLMs can access a virtual experimenting tool to evaluate alternative solutions before implementation in real-world settings. The simulation models allow LLMs to anticipate the result of process parametrization, enabling detailed process analyses and exploration of how variations in parameters might affect outcomes. This helps LLMs identify optimal process settings and configurations, enhancing precision and reliability in process planning and control. 通过将仿真模型集成到 LLM 代理系统中,可以实现这种复杂性。通过在 LLM 和仿真模型之间建立联系,LLMs 可以访问虚拟实验工具,在实际环境中实施之前评估备选解决方案。通过仿真模型,LLMs 可以预测工艺参数化的结果,进行详细的工艺分析,并探索参数变化可能对结果产生的影响。这有助于LLMs确定最佳工艺设置和配置,提高工艺规划和控制的精确度和可靠性。
Figure 14 Overview diagram showcasing the integration of LLM agents with a simulation model 图 14 展示 LLM 代理与仿真模型集成的概览图
In a work-in-progress research, we designed an LLM agent system that interacts with a simulation model . The simulation model exposes the input interface to set the parameter inputs and provides the simulation output via runtime interfaces. Several agents are designed to interact with the simulation model. First, an observation agent observes the simulation state data and generates insights. Then, a reasoning and decision-making agent performs heuristic reasoning and updates the input parameters for the simulation with the objective of achieving satisfactory outcomes. Here, a local database can be integrated into the agent system to inject existing insights into the reasoning process through a retrieval augmented generation mechanism. Based on iterative interactions between the LLM agents and the simulation model, a set of parameters for setting the simulated process with satisfactory outcomes can be determined. The interactive parameter setting process between the LLM agent system and the simulation model is recorded and summarized by another agent, and the results will be presented to the user through a UI. 在一项正在进行的研究中,我们设计了一个与仿真模型 交互的 LLM 代理系统。仿真模型提供输入接口以设置参数输入,并通过运行时接口提供仿真输出。设计了几个代理来与仿真模型交互。首先,观察代理观察仿真状态数据并提出见解。然后,推理和决策代理执行启发式推理,更新仿真输入参数,以实现令人满意的结果。在此,可将本地数据库集成到代理系统中,通过检索增强生成机制将现有见解注入推理过程。根据 LLM 代理与仿真模型之间的迭代交互,可以确定一组参数,用于设置具有满意结果的仿真过程。LLM 代理系统与仿真模型之间的交互式参数设置过程将由另一个代理进行记录和汇总,并通过用户界面将结果呈现给用户。
The integration of simulation models with LLM multi-agent system offers significant potential for optimizing process planning and control. This combination enables risk-free experimentation and validation, allowing LLM agents to simulate various scenarios and proactively adjust production plans and process parametrization before implementation. Additionally, enabling LLMs to interact with simulation models facilitates more precise and effective task planning. This allows LLM agents to autonomously determine the optimal solutions for user-defined problems, enhancing overall process efficiency and adaptability. 仿真模型与LLM多代理系统的集成为优化流程规划和控制提供了巨大的潜力。这种结合实现了无风险实验和验证,使 LLM 代理能够模拟各种情况,并在实施前主动调整生产计划和流程参数。此外,使 LLMs 能够与仿真模型进行交互,有助于制定更精确、更有效的任务计划。这使 LLM 代理能够自主确定用户定义问题的最佳解决方案,从而提高整体流程效率和适应性。
5 Conclusion 5 结论
This paper underscores the transformative potential of integrating Large Language Models (LLMs) into industrial automation, highlighting their capacity to automate complex tasks, enhance efficiency, and foster productivity across various applications. We presented a systematic approach for developing LLM-powered systems for a range of industrial applications, from basic text interpretation, content generation for risk analysis and management, complex process planning and control, to interaction with simulation models in digital twins. The synergy between LLM agents, digital twins, and automation systems leads to the development of autonomous systems. Such system in a smart factory context would be capable of selfoptimization, self-diagnosis, and self-configuration, minimizing human intervention and enabling more responsive manufacturing processes. This evolution in industrial automation represents a major leap towards autonomous systems. Our ongoing research and application-oriented case studies demonstrate the promising impact of integrating LLMs into industrial automation, achieving higher levels of intelligence and autonomy. 本文强调了将大型语言模型 (LLMs) 集成到工业自动化中的变革潜力,突出了其在各种应用中自动执行复杂任务、提高效率和促进生产力的能力。我们介绍了为一系列工业应用开发 LLM 驱动系统的系统方法,包括基本文本解释、风险分析和管理内容生成、复杂流程规划和控制,以及与数字孪生中的仿真模型进行交互。LLM代理、数字孪生和自动化系统之间的协同作用可促进自主系统的发展。智能工厂中的此类系统将能够进行自我优化、自我诊断和自我配置,从而最大限度地减少人工干预,使生产流程的响应速度更快。工业自动化的这一演变代表着向自主系统的重大飞跃。我们正在进行的研究和以应用为导向的案例研究表明,将LLMs集成到工业自动化中,实现更高水平的智能化和自主化,将带来巨大的影响。
In the future, our research will focus on broadening and refining our methodologies. We aim to more thoroughly explore the specific boundary conditions that are critical for LLM-based applications in industrial automation. Moreover, we plan to further conduct comprehensive and systematic testing and evaulating of these applications. This will deepen our understanding of their limitations and guide improvements in their practical implementation, aiming to maximize their pragmatic value in real-world industrial settings. 未来,我们的研究将侧重于拓宽和完善我们的方法。我们的目标是更深入地探索对于基于 LLM 的工业自动化应用至关重要的特定边界条件。此外,我们还计划进一步对这些应用进行全面、系统的测试和评估。这将加深我们对其局限性的理解,并指导改进其实际应用,从而最大限度地提高其在实际工业环境中的实用价值。
6 Acknowledgements 6 致谢
This work was supported by Stiftung der Deutschen Wirtschaft (SDW) and the Ministry of Science, Research and the Arts of the State of BadenWuerttemberg within the support of the projects of the Exzellenzinitiative II. 这项工作得到了德国经济基金会(SDW)和巴登一符腾堡州科学、研究和艺术部在 Exzellenzinitiative II 项目支持范围内的资助。
7 References 7 参考资料
[1] Gurnee, W., Nanda, N., Pauly, M., Harvey, K., Troitskii, D., et al., 2023, Finding Neurons in a Haystack: Case Studies with Sparse Probing. [1] Gurnee, W., Nanda, N., Pauly, M., Harvey, K., Troitskii, D., et al., 2023, Finding Neurons in a Haystack:稀疏探测案例研究》。
[2] Bills, S., Cammarata, N., Mossing, D., Tillman, H., Gao, L., et al., 2023, Language models can explain neurons in language models, URL https://openaipublic. blob. core. windows. net/neuron-explainer/paper/index. html.
[3] Conneau, A., Wu, S., Li, H., Zettlemoyer, L., Stoyanov, V., 2020, Emerging Cross-lingual Structure in Pretrained Language Models, in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 6022-6034 [3] Conneau, A., Wu, S., Li, H., Zettlemoyer, L., Stoyanov, V., 2020, Emerging Cross-lingual Structure in Pretrained Language Models, in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp.
[4] Shi, F., Suzgun, M., Freitag, M., Wang, X., Srivats, S., et al., 2022, Language Models are Multilingual Chain-of-Thought Reasoners.
[5] Yang, K., Liu, J., Wu, J., Yang, C., Fung, Y. R., et al., 2024, If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents. . [5] Yang, K., Liu, J., Wu, J., Yang, C., Fung, Y. R., et al., 2024, If LLM Is the Wizard, Then Code Is the Wand:关于代码如何使大型语言模型成为智能代理的调查。.
[6] Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C. L., et al., 2022, Training language models to follow instructions with human feedback.
[8] Rafailov, R., Sharma, A., Mitchell, E., Ermon, S., Manning, C. D., et al., 2023 Direct Preference Optimization: Your Language Model is Secretly a Reward Model. . [8] Rafailov, R., Sharma, A., Mitchell, E., Ermon, S., Manning, C. D., et al., 2023 Direct Preference Optimization:您的语言模型其实是一个奖励模型。.
[9] Xu, C., Sun, Q., Zheng, K., Geng, X., Zhao, P., et al., 2023, WizardLM: Empowering Large Language Models to Follow Complex Instructions.
[10] Deletang, G., Ruoss, A., Duquenne, P.-A., Catt, E., Genewein, T., et al., 2024, Language Modeling Is Compression, in The Twelfth International Conference on Learning Representations.
[11] Wei, J., Wang, X., Schuurmans, D., Bosma, M., ichter, brian, et al., 2022, Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, in Advances in Neural Information Processing Systems, pp. 24824-24837. [11] Wei, J., Wang, X., Schuurmans, D., Bosma, M., ichter, Brian, et al., 2022, Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, in Advances in Neural Information Processing Systems, pp.
[12] Yao, S., Yu, D., Zhao, J., Shafran, I., Griffiths, T., et al., 2023, Tree of Thoughts: Deliberate Problem Solving with Large Language Models, in Advances in Neural Information Processing Systems, pp. 11809-11822. [12] Yao, S., Yu, D., Zhao, J., Shafran, I., Griffiths, T., et al., 2023, Tree of Thoughts:Deliberate Problem Solving with Large Language Models, in Advances in Neural Information Processing Systems, pp.
[13] Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., et al., 2020, Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks, in Advances in Neural Information Processing Systems, pp. 9459-9474. [13] Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., et al., 2020, Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks, in Advances in Neural Information Processing Systems, pp.
[14] Paranjape, B., Lundberg, S., Singh, S., Hajishirzi, H., Zettlemoyer, L., et al., 2023, ART: Automatic multi-step reasoning and tool-use for large language models. . [14] Paranjape, B., Lundberg, S., Singh, S., Hajishirzi, H., Zettlemoyer, L., et al., 2023, ART:大型语言模型的自动多步推理和工具使用。.
[15] Xia, Y., Shenoy, M., Jazdi, N., Weyrich, M., 2023, Towards autonomous system: flexible modular production system enhanced with large language model agents, in 2023 IEEE 28th International Conference on Emerging Technologies and Factory Automation (ETFA), pp. 1-8. [15] Xia, Y., Shenoy, M., Jazdi, N., Weyrich, M., 2023, Towards autonomous system: flexible modular production system enhanced with large language model agents, in 2023 IEEE 28th International Conference on Emerging Technologies and Factory Automation (ETFA), pp.
[16] Xia, Y., Xiao, Z., Jazdi, N., Weyrich, M., 2024, Generation of Asset Administration Shell with Large Language Model Agents: Interoperability in Digital Twins with Semantic Node. . [16] Xia, Y., Xiao, Z., Jazdi, N., Weyrich, M., 2024, Generation of Asset Administration Shell with Large Language Model Agents:用语义节点实现数字孪生中的互操作性》。
Please send your data to our Editor-in-Chief "Science", Prof. Dr.-Ing. Georg Frey: georg.frey@aut.uni-saarland.de 请将您的数据发送给我们的主编 "科学",Dr.-Ing. Georg Frey 教授: georg.frey@aut.uni-saarland.de
Authors 作者supporting his research funding for entrepreneurship support since 2021. 自 2021 年起,支持他的研究经费用于创业支持。Mr. Yuchen Xia 夏雨晨先生Universität Stuttgart 斯图加特大学Institut für Automatisierungstechnik und Softwaresysteme 自动化技术与软件系统研究所Pfaffenwaldring 4770550 Stuttgart 70550 斯图加特Telefon: +4971168567307 电话:+4971168567307+4971168567307E-Mail: yuchen.xia@ias.uni-stuttgart.de 电子邮件:yuchen.xia@ias.uni-stuttgart.de
Yuchen Xia (born 1992) has been an academic staff member at the Institute of Industrial Automation and Software Engineering (IAS) at the University of Stuttgart since 2021. His research centers on large language models, digital twins, and autonomous systems. Mr. Xia earned dual bachelor's degrees in Mechanical Design and Automation from Wuhan University, China, and Automotive and Engine Technology from the University of Stuttgart, Germany, in 2017. He also completed his Master of Science degree at the University of Stuttgart in 2019. Stiftung der Deutschen Wirtschaft has been 夏雨晨(1992 年出生)自 2021 年起成为斯图加特大学工业自动化与软件工程研究所(IAS)的学术人员。他的研究重点是大型语言模型、数字双胞胎和自主系统。夏先生于2017年获得中国武汉大学机械设计与自动化专业和德国斯图加特大学汽车与发动机技术专业双学士学位。他还于 2019 年完成了斯图加特大学的理学硕士学位。德国经济基金会一直是
Dr.-Ing. Nasser Jazdi (born 1975) is the deputy head and academic director of the Institute of Industrial Automation and Software Engineering at the University of Stuttgart. He obtained his Diploma in Electrical Engineering in 1997 and completed his Ph.D. (Thesis: Remote Diagnosis and Maintenance of Embedded Systems) in 2003 at the University of Stuttgart, Germany. Dr. Jazdi is also a Visiting Professor at Anhui University. His research delves into software reliability within IoT, learning aptitude for industrial automation, and AI in industrial automation. He holds senior memberships with IEEE and VDE Association, and contributes to VDI-GPP Software Reliability Group and Berkeley Initiative in Soft Computing (BISC). 纳赛尔-贾兹迪(Nasser Jazdi)博士(1975 年出生)是斯图加特大学工业自动化和软件工程研究所副所长兼学术主任。他于 1997 年获得电气工程文凭,2003 年在德国斯图加特大学完成博士学位(论文:嵌入式系统的远程诊断和维护)。贾兹迪博士还是安徽大学的客座教授。他的研究领域包括物联网中的软件可靠性、工业自动化的学习能力以及工业自动化中的人工智能。他是 IEEE 和 VDE 协会的高级会员,并为 VDI-GPP 软件可靠性小组和伯克利软计算倡议(BISC)做出了贡献。
Dr. Nasser Jazdi 纳赛尔-贾兹迪博士
Universität Stuttgart 斯图加特大学
Institut für Automatisierungstechnik und Softwaresysteme 自动化技术与软件系统研究所
Prof. Dr.-Ing. Michael Weyrich (born 1967) is the Head of the Institute for Automation Technology and Software Engineering (IAS) at the University of Stuttgart since 2013. His expertise lies in industrial automation, with over a decade of industry experience, including roles at Daimler AG and Siemens AG. Prof. Weyrich's research focuses on intelligent automation systems, AIbased system safety, and complexity management. He serves as Chairman of the Board of Directors for the VDI/VDE Society for Measurement and Automation Technology (2022-2025) and received an honorary doctorate from Donetsk National Technical University in 2018. He has also been actively involved in various committees and editorial boards related to automation technology and factory automation. Michael Weyrich 教授(生于 1967 年)自 2013 年起担任斯图加特大学自动化技术与软件工程研究所(IAS)所长。他的专业领域是工业自动化,拥有十多年的行业经验,曾在戴姆勒股份公司和西门子股份公司任职。Weyrich 教授的研究重点是智能自动化系统、基于人工智能的系统安全和复杂性管理。他担任 VDI/VDE 测量与自动化技术协会(2022-2025)董事会主席,并于 2018 年获得顿涅茨克国立技术大学荣誉博士学位。他还积极参与与自动化技术和工厂自动化相关的各种委员会和编辑委员会的工作。
Prof. Dr.-Ing. Dr. h. c. Michael Weyrich Michael Weyrich 博士教授
Universität Stuttgart 斯图加特大学
Institut für Automatisierungstechnik und Softwaresysteme 自动化技术与软件系统研究所