# Full length article 全文Distributed multi-robot source term estimation with coverage control and information theoretic based coordination

基于覆盖控制和信息理论协调的分布式多机器人源项估计

EI检索SCI升级版 计算机科学1区SCI基础版 工程技术1区IF 14.7 如果 14.7SWJTU A++ 西南交通大学A++SWUFE A 西南 A## Keywords 关键字

自主搜索分散式多传感器融合顺序蒙特卡罗模拟传感器控制

## 1. Introduction 1. 引言

In emergency situations involving an unknown release source of airborne chemical, biological, radiological or nuclear (CBRN) particulates, swiftly pinpointing the source and identifying its release parameters is vital for mitigating the commercial, environmental, and societal impacts. Deploying autonomous robots to assess and locate the source of the release has become increasingly favoured, as it reduces human exposure to harmful CBRN environments [1], [2], [3]. A crucial technical component of the autonomous search mission is the path planning functionality, which should ensure that the robot collects the most informative samples possible, and thus leading to more accurate and precise estimation of the release source term.

在涉及空气中化学、生物、放射性或核 （CBRN） 颗粒物释放源的紧急情况下，迅速查明来源并确定其释放参数对于减轻商业、环境和社会影响至关重要。部署自主机器人来评估和定位释放源变得越来越受欢迎，因为它减少了人类暴露于有害的CBRN环境[1]，[2]，[3]。自主搜索任务的一个关键技术组成部分是路径规划功能，它应确保机器人收集尽可能多的信息样本，从而更准确和精确地估计发布源项。

There are various works in the literature that proposed path planning algorithms for a single robot in order to efficiently gather concentration measurements to support the estimation of source parameters [4], [5], [6], [7], [8], [9], [10], [11], [12], [13], [14], [15]. Some of these algorithms employed a concentration gradient based techniques, often referred to as *chemotaxis*, that mimicked the behaviour of various organisms in nature searching for food [6], [7]. Alternatively, information-theoretic based techniques [8], [9], [10], [11], [12], [13], [16], [17], initially referred to as *infotaxis* [4] or cognitive search [10], have been designed to choose the next sampling location so that a feasibly maximum reduction in the uncertainty of the estimated source term can be achieved. The uncertainty reduction of the estimated source parameters can be quantified by using information measures (e.g., entropy, mutual information), following a sequential Bayesian estimation framework that incorporates new measurements to update the belief about the source parameters. Information-theoretic based search algorithms outperformed the chemotaxis-based methods specifically in turbulent environments, where concentration gradient of the CBRN particulates tends to be erroneous and sensor readings intermittent [14]. This is evidenced that most of the recent robotic release source search works with experimental validation (e.g., [15], [16], [17], [18]) follow the information theoretic framework.

文献中有各种工作提出了单个机器人的路径规划算法，以便有效地收集浓度测量值以支持源参数的估计[4]、[5]、[6]、[7]、[8]、[9]、[10]、[11]、[12]、[13]、[14]、[15]。其中一些算法采用了基于浓度梯度的技术，通常称为趋化性，该技术模仿了自然界中各种生物寻找食物的行为[6]，[7]。或者，基于信息理论的技术[8]、[9]、[10]、[11]、[12]、[13]、[16]、[17]，最初被称为信息交通[4]或认知搜索[10]，被设计用于选择下一个采样位置，以便能够最大限度地降低估计源项的不确定性。估计源参数的不确定性降低可以通过使用信息度量（例如，熵、互信息）来量化，遵循顺序贝叶斯估计框架，该框架结合了新的测量值来更新对源参数的信念。特别是在湍流环境中，基于信息理论的搜索算法优于基于趋化性的方法，其中CBRN颗粒的浓度梯度往往是错误的，传感器读数是间歇性的[14]。这证明，最近的大多数机器人发布源搜索都与实验验证（例如，[15]、[16]、[17]、[18]）遵循信息理论框架。

The aforementioned search strategies are primarily designed for a single robot. The source search task can be significantly improved by employing multiple robots that work in conjunction with each other. An efficient multi-robot source term estimation (STE) strategy relies on two key aspects, namely — the collective estimation or belief update protocol, and the collaborative path planning solution, both for the networked robots. The design of these two aspects along with the interaction and coupling between them poses a challenge in designing a fully distributed coordination strategy for multi-robot STE tasks. The coupling primarily stems from the design requirement that collective path planning should ensure the connectivity of robot network in order to facilitate successful distributed estimation or belief sharing. Many existing works on multi-robot information gathering (see e.g., [19]) assume a fully connected robot network and the global belief can be directly accessed. While some works like [20], [21] have used hierarchical methods to ensure scalability for efficient decision-making between robots, and [22] proposed an event-triggered cooperative design to reduce overall communication costs in achieving multi-robot cooperation, there is still a limited number of multi-robot coordination frameworks that ensure a distributed and efficient design which is scalable while limiting the necessary inter-robot network communication costs [23]. Furthermore, additional network challenges like the limited communication range along with constraints on the onboard computational power also motivate a robust design that ensures overall network connectivity with minimal computational costs, while achieving the multi-robot STE task at hand.

上述搜索策略主要是为单个机器人设计的。通过使用多个相互协同工作的机器人，可以显着改善源搜索任务。高效的多机器人源项估计（STE）策略依赖于两个关键方面，即集体估计或信念更新协议，以及协作路径规划解决方案，两者都适用于联网机器人。这两个方面的设计以及它们之间的交互和耦合为多机器人STE任务设计完全分布式协调策略带来了挑战。这种耦合主要源于设计要求，即集体路径规划应确保机器人网络的连通性，以促进成功的分布式估计或信念共享。许多关于多机器人信息收集的现有工作（参见例如，[19]）假设一个完全连接的机器人网络，并且可以直接访问全球信念。虽然像[20]、[21]这样的一些工作已经使用分层方法来确保机器人之间高效决策的可扩展性，并且[22]提出了一种事件触发的协作设计，以降低实现多机器人协作的总体通信成本，但仍有有限数量的多机器人协调框架，这些框架确保了分布式和高效的设计，该设计具有可扩展性，同时限制了必要的机器人间网络通信成本[23]。此外，通信范围有限以及机载计算能力受到限制等其他网络挑战也促使我们采用稳健的设计，以最低的计算成本确保整体网络连接，同时实现手头的多机器人 STE 任务。

In the next subsection, we will discuss some related works for each of the above mentioned design aspects to highlight important design gaps in the literature.

在下一节中，我们将讨论上述每个设计方面的一些相关工作，以突出文献中重要的设计差距。

### 1.1. Related works 1.1. 相关作品

The collective estimation of the belief of the source term across a sensor/robot network can be performed either in a centralised or a distributed manner. A distributed estimator design would provide statistical quantities like the global posterior belief or likelihood functions with additional robustness and scalability to a large robot network as compared to a centralised protocol. There have been various Bayesian filters in the literature that can achieve a distributed estimation across a sensor network, as summarised in [24]. A commonly employed technique is measurement-dissemination based protocol aimed at establishing access to all the sampled measurements for each sensing agent, i.e., the robot, in the network through multi-hop constrained flooding techniques (for e.g. [25], [26], [27], [28]). Therefore, each robot, can update its belief using all available measurements across the network to construct their respective posterior beliefs which may not be similar to each other. Measurement-dissemination based protocols can result in high communication costs and poor scalability, which could be problematic for a robot based sensor network, especially as the connectivity and network topology is dynamic. On the contrary, distributed diffusion filters (e.g., [29]) can provide improved scalability by focusing on local dissemination of parameterised statistics of the posterior beliefs. Furthermore, enhanced by the belief-consensus based protocols, the distributed diffusion filter can fuse local posterior beliefs only from the neighbouring robots over multiple iterations to achieve a common belief across the entire network as presented in [30], [31], [32], [33]. A consensus-based distributed particle filter (DPF) only requires local communications between neighbouring robots, therefore, no global knowledge about the communication topology required, lending higher robustness to changes in network topology.

在传感器/机器人网络中，源项的置信度的集体估计可以以集中式或分布式方式进行。与集中式协议相比，分布式估计器设计将为大型机器人网络提供全局后验置信或似然函数等统计量，并具有额外的鲁棒性和可扩展性。文献中有各种贝叶斯滤波器可以跨传感器网络实现分布式估计，如[24]中所述。一种常用的技术是基于测量传播的协议，旨在通过多跳约束泛洪技术（例如[25]、[26]、[27]、[28]）建立对网络中每个传感代理（即机器人）的所有采样测量值的访问。因此，每个机器人都可以使用网络上所有可用的测量来更新其信念，以构建它们各自的后验信念，这些信念可能彼此不相似。基于测量传播的协议可能导致高通信成本和较差的可扩展性，这对于基于机器人的传感器网络来说可能是个问题，尤其是在连接和网络拓扑是动态的情况下。相反，分布式扩散滤波器（例如，[29]）可以通过专注于后验信念的参数化统计的局部传播来提供改进的可扩展性。此外，通过基于信念共识的协议的增强，分布式扩散滤波器可以在多次迭代中仅融合来自相邻机器人的局部后验信念，以实现整个网络的共同信念，如[30]、[31]、[32]、[33]所示。 基于共识的分布式粒子过滤器 （DPF） 只需要相邻机器人之间的本地通信，因此不需要关于通信拓扑的全局知识，从而为网络拓扑的变化提供了更高的鲁棒性。

In the context of multi-robot STE, having each robot possess a common global belief of the source term would be critical in designing a collaborative search behaviour. Thus, a consensus-based DPF renders it as a superior design choice to frameworks with measurement-dissemination protocol as was also presented in [31]. However, consensus-based DPFs may require a higher number of consensus steps for a larger multi-robot network to quickly converge to an consensus global belief. To this end, we present a modified consensus-DPF aimed to mitigate this issue.

在多机器人STE的背景下，让每个机器人都拥有对源项的共同全球信念对于设计协作搜索行为至关重要。因此，基于共识的 DPF 使其成为具有测量传播协议的框架的优越设计选择，正如 [31] 中所介绍的那样。然而，基于共识的 DPF 可能需要更多的共识步骤，才能使更大的多机器人网络快速收敛到共识的全球信念。为此，我们提出了一个修改后的共识-DPF，旨在缓解这个问题。

In literature, there are various multi-robot path planning strategies, such as [34], [35], [36], [37], [38], [39], [40], [41], [42] that proposed collaborative decision making to achieve distributed multi-robot STE. In [34], [42], multiple active and passive coordination strategies using Infotaxis or its variants based reward functions were presented to choose an appropriate control action for each robot in the network. The strategies of [34], [42] optimised the overall estimated information gain over a finite action set per robot by sequentially negotiating individual robot actions. In [37], [39], a weighted social Bayesian estimation protocol was adopted to update the source term belief for each robot, followed by each robot choosing a traditional infotaxis reward based control action from a finite set of discrete actions. Furthermore, a decentralised formation control strategy was presented in [35] where each robot first locally computes its infotaxis reward based control action, followed by performing consensus over the control action to preserve the network formation.

文献中，有多种多机器人路径规划策略，如[34]、[35]、[36]、[37]、[38]、[39]、[40]、[41]、[42]，提出了协同决策来实现分布式多机器人STE。在[34]，[42]中，提出了使用Infotaxis或其基于奖励函数的多种主动和被动协调策略，为网络中的每个机器人选择适当的控制动作。[34]，[42]的策略通过依次协商单个机器人动作，优化了每个机器人有限动作集的总体估计信息增益。在[37]和[39]中，采用加权社会贝叶斯估计协议来更新每个机器人的源项信念，然后每个机器人从一组有限的离散动作中选择传统的基于信息出租车奖励的控制动作。此外，在[35]中提出了一种分散的编队控制策略，其中每个机器人首先在本地计算其基于信息出租车奖励的控制动作，然后对控制动作进行共识，以保持网络编队。

Majority of the strategies in [34], [35], [36], [37], [38], [39], [40], [41], [42] evaluated an information theoretic metric over a *finite set of discrete robot actions*, due to the computational complexity to make the decision over the increased action space of all the robots. Only a few path planning strategies account of practical constraints like limited communication ranges. However, a finite set of feasible actions per robot along with spatio-communication constraints can significantly limit the spatial diversity of the overall set of candidate locations to take measurements. Furthermore, it should be noted that the existing collaborative strategies may result in high communication costs over a large network to converge onto the respective sampling location for each robot, which is not a desirable feature. Thus, in the proposed work, we optimise for the control actions of each robot over a continuous action space first, followed by an informative sampling interval choice, consequently resulting in lower inter-robot communication requirements. It should also be noted that most of the aforementioned strategies also employed *measurement dissemination protocols* which adds to the overall communication costs for the mission and may lead to an over-confident estimate of the source term.

[34]、[35]、[36]、[37]、[38]、[39]、[40]、[41]、[42]中的大多数策略都评估了一组有限的离散机器人动作上的信息理论度量，因为计算复杂，要对所有机器人增加的动作空间做出决策。只有少数路径规划策略考虑了通信范围有限的实际约束。然而，每个机器人的一组有限的可行操作以及空间通信约束可能会显着限制整个候选位置集的空间多样性。此外，应该注意的是，现有的协作策略可能会导致大型网络上的高通信成本，以收敛到每个机器人的相应采样位置，这不是一个理想的功能。因此，在拟议的工作中，我们首先在连续的动作空间上优化每个机器人的控制动作，然后选择信息采样间隔，从而降低机器人之间的通信要求。还应当指出，上述大多数战略还采用了测量传播协议，这增加了特派团的总体通信费用，并可能导致对源术语的估计过于自信。

Incorporating the collective belief update into the design of path planning strategy leads to effective inference of the source term in a distributed manner. However, the key design challenge, which is often overlooked in the multi-robot STE literature, is to ensure connectivity across the robot network. In works like [42], network connectivity of a distributed network is achieved by ensuring the algebraic connectivity (or Fiedler eigenvalue) of the Laplacian matrix associated with the robot-network is positive. However, the computation of algebraic connectivity requires estimation of the Laplacian matrix of the entire network as presented in [43], [44]. Such a real-time estimation of the algebraic connectivity requires several iteration steps consequently resulting in proportionate communication, computational and memory costs per robot and iteration [44], which can prove to be costly in larger robot/sensor networks. In contrast, we introduce a proximity based control action based on a scalable pre-defined subgraph structure to ensure a connected network, rather than replying on dynamic estimation of the network topology.

将集体信念更新纳入路径规划策略的设计中，可以以分布式方式有效地推断源项。然而，在多机器人 STE 文献中经常被忽视的关键设计挑战是确保整个机器人网络的连接。在像[42]这样的工作中，分布式网络的网络连通性是通过确保与机器人网络相关的拉普拉斯矩阵的代数连通性（或Fiedler特征值）为正来实现的。然而，代数连通性的计算需要估计整个网络的拉普拉斯矩阵，如[43]、[44]所示。这种对代数连通性的实时估计需要几个迭代步骤，因此导致每个机器人和迭代的通信、计算和内存成本成比例[44]，这在大型机器人/传感器网络中可能被证明是昂贵的。相比之下，我们引入了基于可扩展的预定义子图结构的基于邻近的控制操作，以确保网络连接，而不是根据网络拓扑的动态估计进行响应。

### 1.2. Technical contributions

1.2. 技术贡献

In this paper, we propose a distributed multi-robot STE strategy following a co-design process which jointly considers the estimation, connectivity and coordination requirements for the first time in a unified framework. We employ a modified *consensus based belief update protocol* to ensure a common source term belief across a robot network. Coordination path planning is designed based on a combination of a coverage control objective function for sensing performance and a proximity objective function to meet communication range constraints, where the control action is obtained by optimising the objective function over continuous actions between sampling instants, in contrast to some information-theoretic based strategies based on a finite action set. The coverage control objective function allows each robot to be effectively placed at the best sensing location with respect to its network neighbours based on the post-consensus belief of the source location. The proximity-based objective function inspired by works of [45], [46] is to ensure connectivity of the robot network with lower communication requirements as compared to the method in [42]. The proximity control by each robot is enforced with respect to a limited set of robots based on a spanning subgraph, thus, addition of new robots to the group can be robustly accommodated by locally updating the proximity set of the relevant robots. Furthermore, as the proposed overall control action for each robot only requires one-hop communication to their network neighbours, the resulting path planner can be scaled to larger networks easily to cover wider areas.

在本文中，我们提出了一种分布式多机器人STE策略，该策略遵循协同设计过程，首次在统一框架中共同考虑了估计、连接和协调需求。我们采用改进的基于共识的信念更新协议，以确保整个机器人网络的共同源术语信念。协调路径规划的设计基于用于感知性能的覆盖控制目标函数和满足通信范围约束的接近目标函数的组合，其中控制动作是通过优化目标函数在采样时刻之间的连续动作中获得的，这与一些基于有限动作集的信息理论策略形成鲜明对比。覆盖控制目标功能允许每个机器人根据源位置的共识后信念，有效地放置在相对于其网络邻居的最佳传感位置。受[45]，[46]启发的基于邻近的目标函数，与[42]中的方法相比，旨在确保机器人网络的连通性，通信要求更低。每个机器人的接近控制都是针对基于跨越子图的有限机器人集强制执行的，因此，可以通过本地更新相关机器人的接近集来稳健地适应向组添加新机器人。此外，由于每个机器人的整体控制动作只需要与其网络邻居进行一跳通信，因此生成的路径规划器可以很容易地扩展到更大的网络以覆盖更广泛的区域。

We also notice that due to the time sensitive nature of the application of multi-robot STE, performing environmental sampling *along the trajectories* resulted from the proposed control action (instead of the converged locations) can be beneficial. To this end, two alternative designs for the choice of a suitable sampling interval are proposed, namely — fixed sampling interval and information theoretic sampling interval. The latter considers multiple candidates for sampling intervals and locally evaluates the potential information gain corresponding to each candidate in order to choose the best sampling interval, followed by achieving consensus over the locally chosen best sampling interval candidates. After agreeing over the desired sampling interval, the robots move along the proposed control action and perform environmental sampling after the agreed upon sampling interval.

我们还注意到，由于多机器人STE应用的时间敏感性，沿着所提出的控制动作（而不是收敛位置）产生的轨迹进行环境采样可能是有益的。为此，提出了两种选择合适采样间隔的备选设计，即固定采样间隔和信息论采样间隔。后者考虑抽样间隔的多个候选者，并局部评估与每个候选者相对应的潜在信息增益，以便选择最佳抽样间隔，然后就局部选择的最佳抽样间隔候选者达成共识。在商定所需的采样间隔后，机器人沿着建议的控制动作移动，并在商定的采样间隔后执行环境采样。

The proposed consensus belief protocol and coverage based path planner is able to substantiate the accuracy and robustness of the source term estimation with multi-robots, while maintaining network connectivity throughout the mission. Such a framework could potentially be extended to applications like information gathering of a relevant phenomenon in an unknown environment or conducting search and rescue operations over an large region (see e.g., [47], [48]) by modifying the underlying coverage objective function using an appropriate event probability function.

所提出的基于共识信念协议和覆盖范围的路径规划器能够证实多机器人源项估计的准确性和鲁棒性，同时在整个任务中保持网络连接。这种框架有可能扩展到应用，例如在未知环境中收集相关现象的信息，或通过使用适当的事件概率函数修改基础覆盖目标函数（参见例如，[47]，[48]）在大区域进行搜索和救援行动。

### 1.3. Paper outline 1.3. 论文大纲

The rest of the paper is organised as follows. Section 2 first discusses motion, sensor and measurement models, along with the limited-communication range design constraint, followed by formally presenting the problem statement addressed in the paper. Section 3 presents a modified consensus-based distributed belief update protocol, followed by a collaborative and distributed path planning algorithm designed to find an optimal control action is discussed and analysed in Section 4. In Section 5, we discuss multiple protocols for choosing a fixed or collaborative interval to perform sampling along the trajectories resulting from the control action designed in Section 4. Finally, we present an extensive simulation study to demonstrate the advantages of the proposed algorithm, followed by concluding the paper with its key features in Section 7.

本文的其余部分组织如下。第2节首先讨论了运动、传感器和测量模型，以及有限的通信范围设计约束，然后正式介绍了本文中解决的问题陈述。第3节介绍了一种改进的基于共识的分布式信念更新协议，随后是第4节讨论和分析的协作和分布式路径规划算法，该算法旨在找到最佳控制行动。在第 5 节中，我们讨论了多种协议，用于选择固定或协作间隔，以沿着第 4 节中设计的控制操作产生的轨迹执行采样。最后，我们进行了广泛的仿真研究，以证明所提算法的优点，然后在第 7 节中总结了该算法的主要特征。

## 2. Preliminaries and problem formulation

2. 前期和问题表述

In this section, we begin by introducing mathematical models to describe the robot kinematics, the plume model for the onboard gas sensors, and, communication capabilities and limitations. Some basics of coverage control techniques required for coordination protocol design are also discussed for completeness. Finally, the problem is formally formulated to state the objectives to be achieved for a multi-robot source term estimation task.

在本节中，我们首先介绍描述机器人运动学的数学模型、车载气体传感器的羽流模型以及通信功能和局限性。为了完整起见，还讨论了协调协议设计所需的覆盖控制技术的一些基础知识。最后，正式提出问题，说明多机器人源项估计任务要实现的目标。

*Notations:* The notation $\u301aa,b\u301b$ denotes all the integers from $a$ to $b$ including the boundaries.

### 2.1. Motion model 2.1. 运动模型

Consider a set of $n$ homogeneous robots in a closed operating region denoted by $\mathcal{D}\subset {\mathbb{R}}^{2}$. The operating region is assumed to be a convex obstacle free open space. Furthermore, the boundary of $\mathcal{D}$ is denoted by $\partial \mathcal{D}$. Let the position of the $i$th robot platform at time ${t}_{k}$ be denoted as ${\mathit{p}}_{k}^{\left(i\right)}={\left[{x}_{k}^{\left(i\right)}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}{y}_{k}^{\left(i\right)}\right]}^{T}$. The motion of each robot platform is modelled as the following single integrator model (1)${\mathit{p}}_{k+1}^{\left(i\right)}={\mathit{p}}_{k}^{\left(i\right)}+{\int}_{{t}_{k}}^{{t}_{k+1}}{\mathit{u}}_{\tau}^{\left(i\right)}d\tau ;$where ${\mathit{u}}_{\tau}^{\left(i\right)}$ denotes the velocity control action at the $\tau \in [{t}_{k},{t}_{k+1}]$ instant.

### 2.2. Sensor and measurement model

2.2. 传感器和测量模型

Each robot platform is assumed to be equipped with a chemical sensor that measures the gas concentration at the robot location. For a source emitting gas release from an unknown location, denoted by $\mathit{s}={\left[{x}_{o}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}{y}_{o}\right]}^{T}$, with a release rate of ${Q}_{o}$, the expected gas concentration over a steady plume at a location $\mathit{q}$ can be expressed using the following plume model: (2)$C(\mathit{q},\mathit{s})=\frac{{Q}_{o}}{2\pi D}exp\left[\frac{\mathit{w}\cdot (\mathit{q}-\mathit{s})}{2D}\right]{K}_{0}\left(\frac{{\Vert \mathit{q}-\mathit{s}\Vert}_{2}}{\lambda}\right)\phantom{\rule{0.16667em}{0ex}},$where ${K}_{0}$ is the zero-order modified Bessel’s equation, and, (3)$\lambda =\sqrt{\frac{\xi D}{1+\frac{{w}^{2}\xi}{4D}}}$The modelling parameters ${Q}_{o}$, $\xi $, $D$, and $w$ denote the release rate, average particle lifetime, the diffusivity, and the wind speed, respectively. The expression $\mathit{w}\cdot (\mathit{q}-\mathit{s})$ refers to the dot product between the wind vector $\mathit{w}$ and $\mathit{q}-\mathit{s}$. The wind vector can be expressed as $\mathit{w}={[wcos\Phi \phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}wsin\Phi ]}^{T}$ where $\Phi $ is the wind direction.

Without loss of generality, it is assumed that the sensor reading (e.g., voltage from a MOX sensor) is proportional to the gas concentration [15], if the gas is detected by the sensor. In this case, the associated measurement noise can be modelled as a Gaussian distribution $\nu \in \mathcal{N}(0,{\sigma}_{\nu}^{2})$, where the variance is normally modelled as a percentage of the concentration reading. Furthermore, for a low-cost sensor, the background white noise in the absence of the released gas also needs to be modelled, which can be denoted by $\stackrel{\u0304}{\nu}\in \mathcal{N}(0,{\sigma}_{\stackrel{\u0304}{\nu}}^{2})$. The variance associated with the background noise can be experimentally determined. To minimise the false detection due to background noise, the measurement is only considered as a detection event if the sensor reading is above a certain threshold, denoted by ${z}_{\mathrm{thr}}$. When the sensor readings are below ${z}_{\mathrm{thr}}$ and subsequently forced to be zero, it is denoted as a non-detection event, which may occur when the gas concentration exists but is not picked up by the sensor due to complex nature of gas dispersion, or the concentration is indistinguishable from background noise readings. We denote the probability of successfully detecting the gas be denoted by ${P}_{d}$, thus, the non-detection probability is given by $1-{P}_{d}$.

Following the plume model and the sensor model, we have the vector of measurements sampled by $i$th robot until time ${t}_{k}$ denoted as ${\mathit{Z}}_{1:k}^{\left(i\right)}={[{z}_{1}^{\left(i\right)}\dots \phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}{z}_{k}^{\left(i\right)}]}^{T}$, where the sensor measurement obtained by $i$th robot at a given sampling instant ${t}_{k}$ can be expressed as (4)${z}_{k}^{\left(i\right)}\left({\mathit{p}}_{k}^{\left(i\right)}\right)=\left\{\begin{array}{cc}C({\mathit{p}}_{k}^{\left(i\right)},\mathit{s})+{\nu}_{k}\phantom{\rule{1em}{0ex}}\hfill & \text{if detection event}\hfill \\ 0\phantom{\rule{1em}{0ex}}\hfill & \text{if non-detection event}\hfill \end{array}\right.\phantom{\rule{0.16667em}{0ex}}.$

Similar models have been shown to perform efficiently in experimental studies like [15], [49].

类似的模型在实验研究中被证明可以有效地执行，如[15]，[49]。

### 2.3. Communication model 2.3. 通信模型

In this paper, we assume that each robot can communicate with another robot if their mutual distance is below ${R}_{c,max}$. Furthermore, a communication graph topology at some time $t$ denoted by ${\mathsf{G}}_{t}=\{\mathcal{E},\mathcal{W}\}$, where $\{\mathcal{E},\mathcal{W}\}$ are the edges and vertices of the graph, is defined for the group of robots deployed for the entire mission. The edge set $\mathcal{E}=\left\{{\mathcal{E}}_{ij}\right\}$ contains an edge for every robot pair $(i,j)$ for which their mutual distance satisfies $d({\mathit{p}}_{t}^{\left(i\right)},{\mathit{p}}_{t}^{\left(j\right)})\le {R}_{c,max}$. The set ${\mathsf{N}}_{c,t}^{\left(i\right)}\subset \mathcal{W}$ denotes the set containing itself and all *network neighbours* of the $i$th robot at time $t$. The network neighbours at any given time, for $i$th robot, are all robots $j$ for which there exists an edge ${\mathcal{E}}_{ij}\in \mathcal{E}$.

### 2.4. Problem statement 2.4. 问题陈述

In this paper, the problem of localisation of an unknown release using multiple robot platforms with network connectivity constraints is addressed. The unknown source term to be estimates is denoted as a parameter vector ${\Theta}_{o}={\left[{\mathit{s}}^{T}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}{Q}_{o}\right]}^{T}$, assuming other source parameters are known a priori. The group of robots is required to achieve the aforementioned mission in a distributed manner by efficiently choosing locations to sample in $\mathcal{D}$ and collaborating with their network neighbour as per the predefined communication topology. Let the estimated source term as per $i$th robot at time ${t}_{k}$ be denoted as ${\stackrel{\u02c6}{\Theta}}_{k}^{\left(i\right)}$. Hence, the mission objectives at time ${t}_{k}$ can be generally categorised as follows:

- (a) （一）
Using the measurements ${\left\{{\mathit{Z}}_{1:k}^{\left(i\right)}\right\}}_{i=1}^{n}$, obtained until ${t}_{k}$, first arrive on a common belief about the estimated source term between all the robots, that is, $p\left({\Theta}_{k}^{\left(i\right)}\right)\approx p\left({\Theta}_{k}^{\left(j\right)}\right)\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\forall \phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}i\ne j$, and

- (b) （二）
Next, using the common belief at time ${t}_{k}$, collectively find the next sampling locations of the robots for the time instant ${t}_{k+1}$, such that consequent posterior belief about the source term, after incorporating the new measurements ${z}_{k+1}^{\left(i\right)}\phantom{\rule{1em}{0ex}}\forall \phantom{\rule{1em}{0ex}}i$, is improved while ensuring the communication topology is preserved.

We make the following mild assumptions, which are commonly used in the literature, before proposing a methodology for achieving the aforementioned mission objectives:

在提出实现上述任务目标的方法之前，我们做了以下文献中常用的温和假设：

- (i) （一）
The robots need to stop at each sampling location to gather their local concentration measurements, which is common in atmospheric sensing.

机器人需要在每个采样地点停下来收集其局部浓度测量值，这在大气传感中很常见。 - (ii) （二）
There exists a single static source of chemical or biological release source in the area of interest $\mathcal{D}$.

- (iii) （三）
The sensor measurements at any given instant across robots are conditionally independent from each other.

机器人在任何给定时刻的传感器测量值在条件上都是相互独立的。 - (iv) （四）
The state (position, velocity, etc.) of the $i$th robot at time step ${t}_{k}$ is known, that is, each robot is equipped with localisation capabilities with respect to the global reference frame.

- (v) （五）
Sensor observations and position of $i$th robot are only communicated to its network neighbours for data fusion and belief update.

- (vi) （六）
Furthermore, it is assumed that each robot in the network performs a belief update at the same time, that is, in synchronisation with each other.

此外，假设网络中的每个机器人同时执行信念更新，即彼此同步。

## 3. Distributed sequential estimation

3. 分布式顺序估计

In this section, we discuss the distributed sequential Bayesian estimation algorithm to achieve the consensus of beliefs among all the nodes of a network structure. We employ a modified version of the posterior consensus-based DPF proposed in [30] to fuse the locally sampled measurements with the beliefs across neighbouring robots in the multi-robot communication network.

在本节中，我们将讨论分布式顺序贝叶斯估计算法，以实现网络结构所有节点之间的信念共识。我们采用了[30]中提出的基于后验共识的DPF的修改版本，将局部采样的测量结果与多机器人通信网络中相邻机器人的信念融合在一起。

To improve the performance shortcomings associated with consensus-based DPF as discussed in Section 1, we modified the consensus DPF proposed in [30] with the *adapt* step of ATC distributed diffusion filters [29]. The aforementioned modification ensures quicker local diffusion information across the network, and thus requiring less number of consensus iterations.

为了改善第 1 节中讨论的与基于共识的 DPF 相关的性能缺陷，我们修改了 [30] 中提出的共识 DPF，并采用了 ATC 分布式扩散滤波器的适应步骤 [29]。上述修改确保了整个网络中更快的局部扩散信息，因此需要更少的共识迭代次数。

Let the posterior belief about the unknown source term at ${t}_{k}$ time instant in possession of the $i$th robot be denoted by $p\left({\Theta}_{k}^{\left(i\right)}\right|{\mathcal{I}}_{k})$ where ${I}_{k}$ denotes all the information available to the $i$th robot till time ${t}_{k}$ (which includes all local measurements, network neighbour measurements, and information received through consensus iterations).

Recall from Section 2.2 about the sensor and measurement model, the likelihood function for the $i$th robot based on the sensor and measurement model in (4) can be expressed as (5)$p\left({z}_{k}^{\left(i\right)}\right|{\Theta}_{k}^{\left(i\right)})=\left\{\begin{array}{cc}\mathcal{N}\left({z}_{k}^{\left(i\right)}-C({\mathit{p}}_{k}^{\left(i\right)},\mathit{s}),{\sigma}_{\nu}^{2}\right)\phantom{\rule{1em}{0ex}}\hfill & \text{if}{z}_{k}^{\left(i\right)}\ge {z}_{\mathrm{thr}}\hfill \\ (1-{P}_{d})+\frac{{P}_{d}}{2}\left[1+\mathsf{erf}\left(\frac{{z}_{\mathrm{thr}}}{\sqrt{2}{\sigma}_{\stackrel{\u0304}{\nu}}}\right)\right]\phantom{\rule{1em}{0ex}}\hfill & \text{if}{z}_{k}^{\left(i\right)}=0\hfill \end{array}\right.$

Using the aforementioned notations, in the subsequent subsections, we will discuss the design and structure of the proposed DPF in detail.

使用上述符号，在随后的小节中，我们将详细讨论拟议的 DPF 的设计和结构。

### 3.1. Local posterior generation

3.1. 局部后代

In this subsection, we discuss the calculation of a posterior belief using locally available measurements for each robot. Following the Monte Carlo based representation of probability distribution function (PDF), the local belief about the source term for the $i$th robot, that is, $p\left({\Theta}_{k}^{\left(i\right)}\right|{\mathcal{I}}_{k})$, can be represented as (6)$p\left({\Theta}_{k}^{\left(i\right)}\right|{\mathcal{I}}_{k})\approx \sum _{{n}_{p}=1}^{{N}_{p}}{w}_{k,{n}_{p}}^{\left(i\right)}\delta ({\Theta}_{k}^{\left(i\right)}-{\Theta}_{k,{n}_{p}}^{\left(i\right)})\phantom{\rule{0.16667em}{0ex}},$where $\delta \left(\cdot \right)$ denotes the Dirac-delta function, ${\{{w}_{k,{n}_{p}}^{\left(i\right)},{\Theta}_{k,{n}_{p}}^{\left(i\right)}\}}_{{n}_{p}=1}^{{N}_{p}}$ denotes the set of ${N}_{p}$ weighted random samples generated using importance sampling to approximate $p\left({\Theta}_{k}^{\left(i\right)}\right|{\mathcal{I}}_{k})$ and thus ${\sum}_{{n}_{p}=1}^{{N}_{p}}{w}_{k,{n}_{p}}^{\left(i\right)}=1$.

For time ${t}_{k+1}$ and the $i$th robot, the locally available sensor measurements are ${\mathit{Z}}_{k+1}^{\left(i\right)}=\left\{{z}_{k+1}^{\left(j\right)}\right\}\phantom{\rule{1em}{0ex}}\forall \phantom{\rule{1em}{0ex}}j\in {\mathsf{N}}_{c,t}^{\left(i\right)}$, including the robot’s own measurement and those from all neighbouring robots (see Fig. 1). Using Bayes’ rule and noting that the source term is static, i.e., ${\Theta}_{k+1}^{\left(i\right)}={\Theta}_{k}^{\left(i\right)}$, the local posterior distribution for the $i$th robot at time ${t}_{k+1}$ can be calculated as, (7)$p\left({\Theta}_{k+1}^{\left(i\right)}\right|{\mathit{Z}}_{k+1}^{\left(i\right)},{\mathcal{I}}_{k})\propto p\left({\mathit{Z}}_{k+1}^{\left(i\right)}\right|{\Theta}_{k+1}^{\left(i\right)})\cdot p\left({\Theta}_{k+1}^{\left(i\right)}\right|{\mathcal{I}}_{k})\phantom{\rule{0.16667em}{0ex}},$where the likelihood $p\left({\mathit{Z}}_{k+1}^{\left(i\right)}\right|{\Theta}_{k+1}^{\left(i\right)})$ can be expressed, using (5) and Assumption (iii), as (8)$p({\mathit{Z}}_{k+1}^{\left(i\right)}\left|{\Theta}_{k+1}^{\left(i\right)}\right)=\prod _{j\in {\mathsf{N}}_{c,t}^{\left(i\right)}}p\left({z}_{k+1}^{\left(j\right)}\right|{\Theta}_{k+1}^{\left(i\right)})\phantom{\rule{0.16667em}{0ex}}.$Under the Monte Carlo approximation, we can express the unnormalised weights of the posterior PDF in the left-hand side (LHS) of (7) as (9)${\stackrel{\u02c6}{w}}_{k+1,{n}_{p}}^{\left(i\right)}={w}_{k,{n}_{p}}^{\left(i\right)}\cdot \prod _{j\in {\mathsf{N}}_{c,t}^{\left(i\right)}}p\left({z}_{k+1}^{\left(j\right)}\right|{\Theta}_{k+1,{n}_{p}}^{\left(i\right)})\phantom{\rule{0.16667em}{0ex}},$which needs to be normalised as ${w}_{k+1,{n}_{p}}^{\left(i\right)}=\frac{{\stackrel{\u02c6}{w}}_{k+1,{n}_{p}}^{\left(i\right)}}{\sum _{{n}_{p}=1}^{{N}_{p}}{\stackrel{\u02c6}{w}}_{k+1,{n}_{p}}^{\left(i\right)}}.$Note that the posterior probability distribution $p\left({\Theta}_{k+1}^{\left(i\right)}\right|{\mathit{Z}}_{k+1}^{\left(i\right)},{\mathcal{I}}_{k})$ computed through (7)–(9) for $i$th robot is only dependent on the concentration measurements obtained by itself and its network neighbours. Compared to the DPF presented in [30] using just the locally sampled measurements of the $i$th robot, the proposed procedure of constructing the local posterior PDF ensures a quicker information diffusion across the network by using extra measurements from network neighbours.

On constructing the local posterior PDF, the next step is to achieve a common belief of the source term across all robots in the network as it allows for a robust and simpler path planning strategy later.

在构建本地后验 PDF 时，下一步是在网络中的所有机器人中实现对源项的共同信念，因为它允许以后制定稳健且更简单的路径规划策略。

### 3.2. Kullback–Leibler average based consensuses posterior generation

3.2. 基于Kullback-Leibler平均的共识后生成

In order to achieve consensus of the posterior distribution in a distributed manner, we use the Kullback–Leibler average (KLA) based consensus technique. The mathematical results associated with the KLA relevant to the DPF algorithm are summarised in Lemma 1.

为了以分布式方式实现后验分布的共识，我们使用了基于Kullback-Leibler平均值（KLA）的共识技术。引理 1 总结了与 DPF 算法相关的 KLA 相关的数学结果。

**Lemma 1 引理 1**

## [50]

*For M local posterior probability distributions* ${m}_{i}\left(\cdot \right)$ *and relative weights* ${\eta}_{i}$ *such that* ${\eta}_{i}>0$ *and* ${\sum}_{1}^{M}{\eta}_{i}=1$*, the weighted KLA, defined as* (10)$\stackrel{\u0304}{m}\left(\cdot \right)=arg\underset{m}{inf}\left[\sum _{1}^{M}{\eta}_{i}\int m\left(x\right)log\frac{m\left(x\right)}{{m}_{i}\left(x\right)}dx\right]\phantom{\rule{0.16667em}{0ex}},$*is given by* (11)$\stackrel{\u0304}{m}\left(\cdot \right)=\left[\prod _{i=1}^{M}{\left[{m}_{i}\left(\cdot \right)\right]}^{{\eta}_{i}}\right]{\left[\int \prod _{i=1}^{M}{\left[{m}_{i}\left(x\right)\right]}^{{\eta}_{i}}dx\right]}^{-1}\phantom{\rule{0.16667em}{0ex}}.$

The result of Lemma 1 presents an informative way averaging multiple posterior distributions which is leveraged to fuse locally available posterior beliefs to update the posterior belief for the $i$th robot. This informative averaging of posterior beliefs can be iteratively performed over multiple consensus steps to ensure that the each robot in the network has an approximately same posterior distribution of the source term. Let ${m}_{k,l}^{\left(i\right)}\left(\cdot \right)$ denote the PDF of the source term computed by the $i$th robot at the $l$th consensus step corresponding to time ${t}_{k}$ and ${\eta}_{ij}>0$ with ${\sum}_{j\in {\mathsf{N}}_{c,t}^{\left(i\right)}}{\eta}_{ij}=1$. The posterior PDF belief computed by the $i$th robot at the $l+1$ consensus step is can be written as (12)${m}_{k,l+1}^{\left(i\right)}=\frac{\prod _{j\in {\mathsf{N}}_{c,t}^{\left(i\right)}}{\left[{m}_{k,l}^{\left(j\right)}\right]}^{{\eta}_{ij}}}{\int \prod _{j\in {\mathsf{N}}_{c,t}^{\left(i\right)}}{\left[{m}_{k,l}^{\left(j\right)}\left(\Theta \right)\right]}^{{\eta}_{ij}}d\Theta}\phantom{\rule{0.16667em}{0ex}}.$

Using (12), each node can iteratively update their local belief using the informative averaging of local posterior beliefs from each neighbour over an iteration. At the beginning of the consensus procedure, we initiate ${m}_{k,0}^{\left(i\right)}=p\left({\Theta}_{k}^{\left(i\right)}\right|{\mathit{Z}}_{k}^{\left(i\right)},{\mathcal{I}}_{k-1})$ and perform the local updates as per (12). However, the process of fusing the local beliefs as per (12) is not straightforward as the posterior beliefs at any given time step ${t}_{k}$ are expressed using the Monte Carlo representation. This issue was resolved in [30] by using Gaussian mixture (GM) model to compactly represent and transmit the Monte Carlo representation of ${m}_{k,l}^{\left(i\right)}$ at each consensus step. The GM model was constructed using regularised particles (see [51]) to ensure equal particle weights. A new importance sampling procedure is also included to improve the support from all locally available GM models at each robot during the fusion process.

A GM model based representation would only require each robot to share the modelling parameters instead of the particle representation to its network neighbours to communicate its current belief, which is more efficient and thus has been widely used in multi-sensor fusion algorithms [52]. In summary, after sampling the environment, each robot performs two actions:

基于GM模型的表示只需要每个机器人与其网络邻居共享建模参数而不是粒子表示，以传达其当前的信念，这更有效，因此已广泛用于多传感器融合算法[52]。总之，在对环境进行采样后，每个机器人执行两个操作：

- (a) （一）
first shares its local measurement with its network neighbours and constructs a local posterior belief using available measurements, and

首先与其网络邻居共享其本地测量值，并使用可用的测量值构建本地后验信念，以及 - (b) （二）
second, performs a consensus procedure to informatively fuse local posterior beliefs until consensus over the posterior beliefs of the source term is achieved throughout the network.

其次，执行共识程序以信息方式融合局部后验信念，直到在整个网络中就源项的后验信念达成共识。

The KLA based posterior consensus procedure [30] performed by each robot after sampling the region is summarised in Algorithm 1, along with the data transmission process shown in Fig. 1.

算法 1 总结了每个机器人在对区域进行采样后执行的基于 KLA 的后验共识程序 [30]，以及图 1 所示的数据传输过程。

**Remark 1 备注 1**

It should be noted that the performance of the proposed consensus particle filter would depend on the communication topology ${\mathsf{G}}_{t}$ and the fusion weights ${\eta}_{ij}$. The convergence of distributed belief consensus algorithm, for a given connected graph topology under some common weight choices, is established in an asymptotic sense as shown in [31], [53].

In practice, a limited number of consensus steps can be implemented to ensure belief convergence within acceptable tolerances. The *width* of the communication network graph ${\mathsf{G}}_{t}$ (longest path length between two nodes) can be used as a heuristic upper limit for the required number of consensus step. For example, the number of consensus step in [30] is one more than the graph width. In this work, as we also utilise the measurements obtained by network neighbours to construct the local posterior belief before consensus iterations, the convergence rate of the belief can be improved. This is primarily because a relatively informed prior is used to initialise the consensus procedure, leading to quicker information diffusion across the network.

**Remark 2 备注2**

Recall that the proposed consensus filter assumes that each robot performs the belief update in synchronisation with each other. In order to practically implement the proposed DPF, each robot could share the iteration labels and current belief update status, while exchanging measurements and Gaussian mixture parameters.

回想一下，建议的共识过滤器假设每个机器人彼此同步执行信念更新。为了实际实现所提出的DPF，每个机器人都可以共享迭代标签和当前信念更新状态，同时交换测量值和高斯混合参数。

In the next section, we will develop the informative, distributed, and coordinated path planning strategy employed between robots to improve the efficiency of choosing their next sampling locations.

在下一节中，我们将开发机器人之间采用的信息化、分布式和协调路径规划策略，以提高选择下一个采样位置的效率。

## 4. Distributed path planning algorithm

4. 分布式路径规划算法

Establishing a common belief about the source term via the consensus process allows the robots to collectively plan their paths leading to the next sampling locations. In this section, we introduce the inter-robot coordination strategy that will spatially distribute the robots to collective more informative samples. This strategy mimics an exploratory and exploitative behaviour for information gathering, which helps the DPF to quickly and efficiently converge onto a source term estimate.

通过共识过程建立关于源项的共同信念，使机器人能够共同规划通往下一个采样位置的路径。在本节中，我们将介绍机器人之间的协调策略，该策略将在空间上将机器人分布到集合中，以收集信息量更大的样本。该策略模仿了信息收集的探索性和利用性行为，这有助于 DPF 快速有效地收敛到源项估计值。

### 4.1. Coordination protocol between robots

4.1. 机器人之间的协调协议

We design the coordination protocol using a distributed optimisation framework with coverage control-like objective functions corresponding to the sensing performance of each robot. Constructing such objective functions requires all the robots to get access to the same underlying PDF of the variable of interest, which in traditional coverage control is static and known a priori. In this work, we leverage the consensus based DPF to provide the common belief of the source term to construct local objective functions for each robot.

我们使用分布式优化框架设计协调协议，该框架具有与每个机器人的传感性能相对应的类似覆盖控制的目标函数。构建这样的目标函数需要所有机器人访问感兴趣变量的相同底层 PDF，这在传统的覆盖控制中是静态的，并且是先验已知的。在这项工作中，我们利用基于共识的DPF来提供源项的共同信念，为每个机器人构建局部目标函数。

Let us first define the *locally induced Voronoi cell* (LIVC) corresponding to the $i$th robot, denoted by ${\mathcal{V}}_{i}$, based on its network neighbours at any time instant $t$. Mathematically expressed as (13)${\mathcal{V}}_{i}=\left\{\mathit{q}\in \mathcal{D}\phantom{\rule{1em}{0ex}}\left|\phantom{\rule{1em}{0ex}}\Vert {\mathit{p}}_{t}^{\left(i\right)}-\mathit{q}\Vert \le \Vert {\mathit{p}}_{t}^{\left(j\right)}-\mathit{q}\Vert \phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\forall \phantom{\rule{1em}{0ex}}j\ne {\mathsf{N}}_{c,t}^{\left(i\right)}\setminus i\right.\right\}\phantom{\rule{0.16667em}{0ex}}.$

Note that the construction of the LIVC ${\mathcal{V}}_{i}$ can be carried out in a distributed manner with communication requirements of position information of neighbours in ${\mathsf{N}}_{c,t}^{\left(i\right)}$. The LIVC for an example case for two different graph structure are presented in Fig. 2. It can be seen from Fig. 2(a) that the robot 5 is only within communication range of robot 4. Thus, the LIVC corresponding to robot 5 has an overlap with Voronoi cells of robot 2 and 3. Furthermore, it can be guaranteed that with a complete graph topology of the communication network between the robots, as shown in Fig. 2(b), the LIVCs are identical to the exact Voronoi tessellation of $\mathcal{D}$. Therefore, the design of the LIVCs, as per (13), ensure that a *region-of-dominance* can be constructed for each robot where each point in that region is closest to it. Thus, utilising such an region of dominance for finding a locally optimal control action to find the next sampling instant is desirable.

Consider a local objective function, ${\mathcal{H}}_{i}$, to be minimised with respect to the position of $i$th robot for $t\in [{t}_{k},{t}_{k+1}]$ given by (14)${\mathcal{H}}_{i}\left({\mathit{p}}_{t}^{\left(i\right)}\right)={\int}_{{\mathcal{V}}_{i}}{\Vert {\mathit{p}}_{t}^{\left(i\right)}-\mathit{q}\Vert}^{2}{\varphi}_{k}\left(\mathit{q}\right)d\mathit{q}\phantom{\rule{0.16667em}{0ex}}.$The construction of ${\mathcal{H}}_{i}$ indicates that ${\mathcal{H}}_{i}$ is minimised when the robot location ${\mathit{p}}_{t}^{\left(i\right)}$ at some time $t$ at the centroid of the LIVC corresponding to ${\varphi}_{k}\left(\mathit{q}\right)$. Furthermore, recall that ${\varphi}_{k}$ refers to the PDF of the spatial distribution associated with the source term belief at the ${t}_{k}^{th}$ time instant. Thus, minimising ${\mathcal{H}}_{i}$ effectively mimics an exploitative behaviour towards high probability region within each robot’s respective LIVC, ${\mathcal{V}}_{i}$, as shown in Fig. 3.

It is worthy to note that the objective function (14) is constructed using only local positional information as per the communication topology and the local density function ${\varphi}_{k}$. The structure of ${\mathcal{H}}_{i}$ is similar to the coverage control objective function as described in [54], but in this paper only the local objective function was used instead of a global positional optimisation index of [54] to support fully distributed implementation. Moreover, the density function is usually known, however, for the application at hand the density function ${\varphi}_{k}$ is constructed based on the post-consensus belief about the source term calculated at the previous sampling instant.

A control velocity vector along the gradient descent direction is derived and shown to minimises the local optimisation index (14) in Theorem 1.

推导了沿梯度下降方向的控制速度向量，并表明该定理 1 中的局部优化指数 （14） 最小化。

**Theorem 1 定理 1**

*For* ${k}_{g}>0$ *and robotic agents governed by* (1)*, the control action given by* (15)${\mathit{u}}_{t,cov}^{\left(i\right)}=-2{k}_{g}{\int}_{{\mathcal{V}}_{i}}({\mathit{p}}_{t}^{\left(i\right)}-\mathit{q}){\varphi}_{k}\left(\mathit{q}\right)d\mathit{q}$$\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}-{k}_{g}\sum _{j\in {\mathsf{N}}_{c,t}^{\left(i\right)}\setminus i}\left[{\int}_{\partial {\mathcal{V}}_{ij}}\left(\frac{{\Vert {\mathit{p}}_{t}^{\left(i\right)}-\mathit{q}\Vert}^{2}}{2\Vert {\mathit{p}}_{t}^{\left(j\right)}-{\mathit{p}}_{t}^{\left(i\right)}\Vert}\right)\left(\mathit{q}-{\mathit{p}}_{t}^{\left(i\right)}\right){\varphi}_{k}\left(\mathit{q}\right)d\mathit{q}\right]\phantom{\rule{0.16667em}{0ex}}.$*drives the agents towards the local minima of* (14)*.*

**Proof 证明**

Refer to Appendix A for the proof. □

有关证明，请参阅附录 A□

To implement ${\mathit{u}}_{t,cov}^{\left(i\right)}$, the $i$th robot requires knowledge of the density function. Using the posterior belief $p\left({\Theta}_{k}^{\left(i\right)}\right|{\mathcal{I}}_{k})$ computed after environment sampling at time ${t}_{k}$, we define the density function ${\varphi}_{k}^{\left(i\right)}$ as the probability distribution function at some point $\mathit{q}\in \mathcal{D}$ using the mixture model constructed at the end of the consensus procedure, respectively, as (16)${\varphi}_{k}^{\left(i\right)}\left(\mathit{q}\right)={\mathcal{G}}_{k}^{(i,L)}\left(\mathit{q},{\stackrel{\u0304}{\Theta}}_{k}^{\left(i\right)}\left(Q\right)\right)\phantom{\rule{0.16667em}{0ex}}.$where ${\mathcal{G}}_{k}^{(i,L)}$ is the Gaussian mixture model and ${\stackrel{\u0304}{\Theta}}_{k}^{\left(i\right)}\left(Q\right)$ is the mean estimate of the release rate at the ${t}_{k}$ instant as computed by the $i$th robot after completion of consensus based belief update. As the updated belief of the source term is computed using a consensus protocol, each robot computes the density function for $\mathit{q}\in \mathcal{D}$ resulting in ${\varphi}_{k}={\varphi}_{k}^{\left(1\right)}\approx {\varphi}_{k}^{\left(2\right)}\approx \cdots \approx {\varphi}_{k}^{\left(n\right)}$.

**Remark 3 备注 3**

It should be noted that as $p\left({\Theta}_{k}^{\left(i\right)}\right|{\mathcal{I}}_{k})$ is the post-consensus belief across the network, the resulting ${\varphi}_{k}^{\left(i\right)}$ is similar for all robots. Therefore, the control direction ${\mathit{u}}_{t,cov}^{\left(i\right)}$ computed as per (15) minimises their local objective function using a similar global belief. Furthermore, the LIVC construction can be preformed in a completely distributed fashion as it only requires spatial information of $j\in {\mathsf{N}}_{c,t}^{\left(i\right)}$. Therefore, the control action ${\mathit{u}}_{t,cov,}^{\left(i\right)}$ can be computed in a distributed fashion.

We have so far discussed the design of control direction that ensures a coordination based control action to achieve spatial diversity of the robots while enforcing exploitative behaviour towards local predicted high probability regions based on the belief of the source term at the previous sampling instant. Following this, we now introduce the complementary control action to ensure a set of proximity constraints are satisfied to achieve a connected communication network topology.

到目前为止，我们已经讨论了控制方向的设计，该方向确保基于协调的控制动作，以实现机器人的空间多样性，同时根据对上一个采样时刻的源项的信念，对局部预测的高概率区域实施剥削行为。在此之后，我们现在引入互补控制操作，以确保满足一组邻近约束，以实现连接的通信网络拓扑。

### 4.2. Proximity control action for network constraints

4.2. 网络约束的接近控制操作

As discussed in the Introduction section, estimation of the algebraic connectivity can result in computational and network burdens for each robot resulting in time expensive control action which is undesirable. Therefore, in this paper, we utilise the proximity function based control action to ensure network connectivity between robots [45], [46] with minimal communication requirements during transit between two sampling locations.

正如简介部分所讨论的，代数连通性的估计可能会导致每个机器人的计算和网络负担，从而导致耗时的控制操作，这是不可取的。因此，在本文中，我们利用基于邻近函数的控制动作来确保机器人之间的网络连接[45]，[46]在两个采样地点之间的传输过程中具有最小的通信要求。

Let ${v}_{p}^{ij}({\mathit{p}}_{t}^{\left(i\right)},{\mathit{p}}_{t}^{\left(j\right)})$ denote the proximity function between the $i$th and $j$th robots and mathematically expressed by (17)${v}_{p}^{ij}({\mathit{p}}_{t}^{\left(i\right)},{\mathit{p}}_{t}^{\left(j\right)})={\left[max\left\{0,\frac{\underset{c,min}{\overset{2}{R}}-{d}_{i,j}^{2}}{\underset{i,j}{\overset{2}{d}}-{R}_{c,max}^{2}}\right\}\right]}^{2}\phantom{\rule{0.16667em}{0ex}},$where ${d}_{i,j}=\Vert {\mathit{p}}_{t}^{\left(i\right)}-{\mathit{p}}_{t}^{\left(j\right)}\Vert $ and ${R}_{c,min}$ denotes the threshold distance at which the proximity control action is activated. Note that the ${\mathit{p}}_{t}^{\left(i\right)}$ and ${\mathit{p}}_{t}^{\left(j\right)}$ refers to the positions of $i$th and $j$th robot at some time $t$ between two sampling instants, say $k$ and $k+1$.

The gradient descent direction of the proximity function ${v}_{p}^{ij}$ can be expressed on differentiating w.r.t. ${\mathit{p}}_{t}^{\left(i\right)}$ as (18)$\frac{d{v}_{ij}^{p}}{d{\mathit{p}}_{t}^{\left(i\right)}}=\left\{\begin{array}{cc}0\phantom{\rule{1em}{0ex}}\hfill & \text{if}{d}_{i,j}\le {R}_{c,min}\hfill \\ 4\frac{\left({R}_{c,max}^{2}-{R}_{c,min}^{2}\right)\left({R}_{c,min}^{2}-{d}_{i,j}^{2}\right)}{{\left({d}_{i,j}^{2}-{\mathit{R}}_{max}^{2}\right)}^{3}}\left[{\mathit{p}}_{t}^{\left(i\right)}-{\mathit{p}}_{t}^{\left(j\right)}\right]\phantom{\rule{1em}{0ex}}\hfill & \text{if}{R}_{c,min}\le {d}_{i,j}\le {R}_{c,max}\hfill \\ \text{not defined}\phantom{\rule{1em}{0ex}}\hfill & \text{if}{d}_{i,j}={R}_{c,max}\hfill \\ 0\phantom{\rule{1em}{0ex}}\hfill & \text{if}{d}_{i,j}\ge {R}_{c,max}\hfill \end{array}\right.$Furthermore, it can be noted from the construction of (17) that ${v}_{ij}^{p}={v}_{ji}^{p}$ and (19)$\frac{\partial {v}_{p}^{ij}}{\partial {\mathit{p}}_{t}^{\left(i\right)}}=-\frac{\partial {v}_{p}^{ij}}{\partial {\mathit{p}}_{t}^{\left(j\right)}}=\frac{\partial {v}_{p}^{ji}}{\partial {\mathit{p}}_{t}^{\left(i\right)}}=-\frac{\partial {v}_{p}^{ji}}{\partial {\mathit{p}}_{t}^{\left(j\right)}}.$Note that network connectivity across the robot network can be ensured if a spanning subgraph can be preserved throughout the mission. Therefore, consider a spanning subgraph ${\mathsf{G}}^{\star}=\{{\mathcal{E}}^{\star},\mathcal{W}\}$ with the robots are its vertices be chosen before the start of the mission based on the robot starting locations. Additionally, ${\mathcal{E}}^{\star}=\left\{{\mathcal{E}}_{ij}^{\star}\right\}$ is set of all edges in ${\mathsf{G}}^{\star}$ with ${\mathcal{E}}_{ij}^{\star}$ denotes the edge between the $i$th and $j$th robots. Hence, for a ${\mathsf{G}}^{\star}$ of a multiagent system, the proximity objective function for each $i$th robot can be expressed as (20)${v}_{p}^{\left(i\right)}=\sum _{j\in {\mathcal{E}}_{ij}^{\star}}{v}_{p}^{ij}({\mathit{p}}_{t}^{\left(i\right)},{\mathit{p}}_{t}^{\left(j\right)})\phantom{\rule{0.16667em}{0ex}}.$It can be noted from the construction of ${v}_{p}^{ij}$, (17), and ${v}_{p}^{\left(i\right)}$, (20), that minimising ${v}_{p}^{\left(i\right)}$ for each node ensures that the $i$th node always remains within a radius of ${R}_{c,max}$ relative to its network neighbours. Therefore, we choose design the proximity control action as the gradient descent direction that minimises ${v}_{p}^{\left(i\right)}$, mathematically expressed, as (21)${\mathit{u}}_{t,px}^{\left(i\right)}=-{k}_{p}\sum _{j\in {\mathcal{E}}_{ij}^{\star}}\frac{\partial {v}_{p}^{ij}}{d{\mathit{p}}_{t}^{\left(i\right)}}\phantom{\rule{0.16667em}{0ex}},$where ${k}_{p}>0$ is the proximity control gain. It is interesting to observe that the proximity action is not active if the relative distance between a robot an its network neighbour is below ${R}_{c,min}$. In order to implement a predefined spanning subgraph based proximity action in a distributed manner, each robot would have to only store a limited list of its spanning subgraph neighbours, which has relatively low memory requirements even for large networks. Furthermore, the construction of the spanning subgraph is performed before the mission begins. Thus, under no failure assumptions, scaling this design for the purpose of proximity control would require minimal effort. We will demonstrate the same through a simulation study.

### 4.3. Convergence analysis of the overall control action

4.3. 整体控制动作的收敛分析

In this subsection, we discuss the convergence of the robot trajectories subject to the overall control action ${\mathit{u}}_{t}^{\left(i\right)}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\forall \phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}t\in [{t}_{k-1},{t}_{k}]$, given by (22)${\mathit{u}}_{t}^{\left(i\right)}=\left\{\begin{array}{cc}{\mathit{u}}_{t,cov}^{\left(i\right)}+{\mathit{u}}_{t,px}^{\left(i\right)}\phantom{\rule{1em}{0ex}}\hfill & \text{if}\Vert {\mathit{u}}_{t,cov}^{\left(i\right)}+{\mathit{u}}_{t,px}^{\left(i\right)}\Vert \le \alpha \hfill \\ \frac{\alpha \left({\mathit{u}}_{t,cov}^{\left(i\right)}+{\mathit{u}}_{t,px}^{\left(i\right)}\right)}{\Vert {\mathit{u}}_{t,cov}^{\left(i\right)}+{\mathit{u}}_{t,px}^{\left(i\right)}\Vert}\phantom{\rule{1em}{0ex}}\hfill & \text{if}\Vert {\mathit{u}}_{t,cov}^{\left(i\right)}+{\mathit{u}}_{t,px}^{\left(i\right)}\Vert >\alpha \hfill \end{array}\right.\phantom{\rule{0.16667em}{0ex}},$where