Elsevier

Information Fusion 信息融合

Volume 111, November 2024, 102503
卷 111, November 2024, 102503
Information Fusion

Full length article 全文
Distributed multi-robot source term estimation with coverage control and information theoretic based coordination
基于覆盖控制和信息理论协调的分布式多机器人源项估计

EI检索SCI升级版 计算机科学1区SCI基础版 工程技术1区IF 14.7 如果 14.7SWJTU A++ 西南交通大学A++SWUFE A 西南 A
https://doi.org/10.1016/j.inffus.2024.102503 Get rights and content 获取权限和内容
Under a Creative Commons license
根据知识共享许可
open access 开放获取

Highlights 突出

  • Fully distributed multi-robot coordination to search an unknown release source.
    完全分布式多机器人协调,以搜索未知的释放源。

  • Consensus-based belief update rule to ensure a unified belief across the robots.
    基于共识的信念更新规则,以确保机器人之间的统一信念。

  • Spatial diversity and collision avoidance embedded in the coordination strategy.
    协调策略中嵌入的空间多样性和避免碰撞。

  • Control action designed to maintain network connectivity and minimise communication.
    旨在保持网络连接并最大限度地减少通信的控制操作。

Abstract 抽象

In this paper, we introduce a novel coordination strategy for a group of autonomous robots tasked with estimating the source term of an airborne chemical release. This strategy integrates distributed Bayesian filtering, coverage control, information-theoretic sampling, and proximity constraint handling, forming an efficient and fully distributed coordination protocol. In the proposed framework, each robot employs a consensus-based belief update rule, allowing it to adaptively incorporate information from neighbouring robots to ensure a unified belief across the network. The overall control action is designed to maximise information gain while maintaining network connectivity and minimising communication requirements during movement between sampling points. Extensive numerical simulations are conducted to analyse the performance of the proposed strategy, which demonstrate significant performance improvements compared to popular filtering practices and advanced path-planning strategies. The simulation study is also designed to substantiate the design choices of the proposed coordination strategy and to emphasise its advantages.
在本文中,我们为一组自主机器人介绍了一种新的协调策略,这些机器人的任务是估计空气中化学物质释放的源项。该策略集成了分布式贝叶斯滤波、覆盖控制、信息论采样和邻近约束处理,形成了高效、完全分布式的协调协议。在拟议的框架中,每个机器人都采用基于共识的信念更新规则,使其能够自适应地整合来自相邻机器人的信息,以确保整个网络的统一信念。整体控制动作旨在最大限度地提高信息增益,同时保持网络连接,并最大限度地减少采样点之间移动期间的通信要求。进行了广泛的数值模拟,以分析所提出的策略的性能,与流行的滤波实践和高级路径规划策略相比,该策略显示出显着的性能改进。仿真研究还旨在证实所提出的协调策略的设计选择,并强调其优势。

Keywords 关键字

Autonomous search
Decentralised multi-sensor fusion
Sequential Monte Carlo simulation
Sensor control

自主搜索分散式多传感器融合顺序蒙特卡罗模拟传感器控制

1. Introduction 1. 引言

In emergency situations involving an unknown release source of airborne chemical, biological, radiological or nuclear (CBRN) particulates, swiftly pinpointing the source and identifying its release parameters is vital for mitigating the commercial, environmental, and societal impacts. Deploying autonomous robots to assess and locate the source of the release has become increasingly favoured, as it reduces human exposure to harmful CBRN environments [1], [2], [3]. A crucial technical component of the autonomous search mission is the path planning functionality, which should ensure that the robot collects the most informative samples possible, and thus leading to more accurate and precise estimation of the release source term.
在涉及空气中化学、生物、放射性或核 (CBRN) 颗粒物释放源的紧急情况下,迅速查明来源并确定其释放参数对于减轻商业、环境和社会影响至关重要。部署自主机器人来评估和定位释放源变得越来越受欢迎,因为它减少了人类暴露于有害的CBRN环境[1],[2],[3]。自主搜索任务的一个关键技术组成部分是路径规划功能,它应确保机器人收集尽可能多的信息样本,从而更准确和精确地估计发布源项。

There are various works in the literature that proposed path planning algorithms for a single robot in order to efficiently gather concentration measurements to support the estimation of source parameters [4], [5], [6], [7], [8], [9], [10], [11], [12], [13], [14], [15]. Some of these algorithms employed a concentration gradient based techniques, often referred to as chemotaxis, that mimicked the behaviour of various organisms in nature searching for food [6], [7]. Alternatively, information-theoretic based techniques [8], [9], [10], [11], [12], [13], [16], [17], initially referred to as infotaxis [4] or cognitive search [10], have been designed to choose the next sampling location so that a feasibly maximum reduction in the uncertainty of the estimated source term can be achieved. The uncertainty reduction of the estimated source parameters can be quantified by using information measures (e.g., entropy, mutual information), following a sequential Bayesian estimation framework that incorporates new measurements to update the belief about the source parameters. Information-theoretic based search algorithms outperformed the chemotaxis-based methods specifically in turbulent environments, where concentration gradient of the CBRN particulates tends to be erroneous and sensor readings intermittent [14]. This is evidenced that most of the recent robotic release source search works with experimental validation (e.g., [15], [16], [17], [18]) follow the information theoretic framework.
文献中有各种工作提出了单个机器人的路径规划算法,以便有效地收集浓度测量值以支持源参数的估计[4]、[5]、[6]、[7]、[8]、[9]、[10]、[11]、[12]、[13]、[14]、[15]。其中一些算法采用了基于浓度梯度的技术,通常称为趋化性,该技术模仿了自然界中各种生物寻找食物的行为[6],[7]。或者,基于信息理论的技术[8]、[9]、[10]、[11]、[12]、[13]、[16]、[17],最初被称为信息交通[4]或认知搜索[10],被设计用于选择下一个采样位置,以便能够最大限度地降低估计源项的不确定性。估计源参数的不确定性降低可以通过使用信息度量(例如,熵、互信息)来量化,遵循顺序贝叶斯估计框架,该框架结合了新的测量值来更新对源参数的信念。特别是在湍流环境中,基于信息理论的搜索算法优于基于趋化性的方法,其中CBRN颗粒的浓度梯度往往是错误的,传感器读数是间歇性的[14]。这证明,最近的大多数机器人发布源搜索都与实验验证(例如,[15]、[16]、[17]、[18])遵循信息理论框架。

The aforementioned search strategies are primarily designed for a single robot. The source search task can be significantly improved by employing multiple robots that work in conjunction with each other. An efficient multi-robot source term estimation (STE) strategy relies on two key aspects, namely — the collective estimation or belief update protocol, and the collaborative path planning solution, both for the networked robots. The design of these two aspects along with the interaction and coupling between them poses a challenge in designing a fully distributed coordination strategy for multi-robot STE tasks. The coupling primarily stems from the design requirement that collective path planning should ensure the connectivity of robot network in order to facilitate successful distributed estimation or belief sharing. Many existing works on multi-robot information gathering (see e.g., [19]) assume a fully connected robot network and the global belief can be directly accessed. While some works like [20], [21] have used hierarchical methods to ensure scalability for efficient decision-making between robots, and [22] proposed an event-triggered cooperative design to reduce overall communication costs in achieving multi-robot cooperation, there is still a limited number of multi-robot coordination frameworks that ensure a distributed and efficient design which is scalable while limiting the necessary inter-robot network communication costs [23]. Furthermore, additional network challenges like the limited communication range along with constraints on the onboard computational power also motivate a robust design that ensures overall network connectivity with minimal computational costs, while achieving the multi-robot STE task at hand.
上述搜索策略主要是为单个机器人设计的。通过使用多个相互协同工作的机器人,可以显着改善源搜索任务。高效的多机器人源项估计(STE)策略依赖于两个关键方面,即集体估计或信念更新协议,以及协作路径规划解决方案,两者都适用于联网机器人。这两个方面的设计以及它们之间的交互和耦合为多机器人STE任务设计完全分布式协调策略带来了挑战。这种耦合主要源于设计要求,即集体路径规划应确保机器人网络的连通性,以促进成功的分布式估计或信念共享。许多关于多机器人信息收集的现有工作(参见例如,[19])假设一个完全连接的机器人网络,并且可以直接访问全球信念。虽然像[20]、[21]这样的一些工作已经使用分层方法来确保机器人之间高效决策的可扩展性,并且[22]提出了一种事件触发的协作设计,以降低实现多机器人协作的总体通信成本,但仍有有限数量的多机器人协调框架,这些框架确保了分布式和高效的设计,该设计具有可扩展性,同时限制了必要的机器人间网络通信成本[23]。此外,通信范围有限以及机载计算能力受到限制等其他网络挑战也促使我们采用稳健的设计,以最低的计算成本确保整体网络连接,同时实现手头的多机器人 STE 任务。

In the next subsection, we will discuss some related works for each of the above mentioned design aspects to highlight important design gaps in the literature.
在下一节中,我们将讨论上述每个设计方面的一些相关工作,以突出文献中重要的设计差距。

1.1. Related works 1.1. 相关作品

The collective estimation of the belief of the source term across a sensor/robot network can be performed either in a centralised or a distributed manner. A distributed estimator design would provide statistical quantities like the global posterior belief or likelihood functions with additional robustness and scalability to a large robot network as compared to a centralised protocol. There have been various Bayesian filters in the literature that can achieve a distributed estimation across a sensor network, as summarised in [24]. A commonly employed technique is measurement-dissemination based protocol aimed at establishing access to all the sampled measurements for each sensing agent, i.e., the robot, in the network through multi-hop constrained flooding techniques (for e.g. [25], [26], [27], [28]). Therefore, each robot, can update its belief using all available measurements across the network to construct their respective posterior beliefs which may not be similar to each other. Measurement-dissemination based protocols can result in high communication costs and poor scalability, which could be problematic for a robot based sensor network, especially as the connectivity and network topology is dynamic. On the contrary, distributed diffusion filters (e.g., [29]) can provide improved scalability by focusing on local dissemination of parameterised statistics of the posterior beliefs. Furthermore, enhanced by the belief-consensus based protocols, the distributed diffusion filter can fuse local posterior beliefs only from the neighbouring robots over multiple iterations to achieve a common belief across the entire network as presented in [30], [31], [32], [33]. A consensus-based distributed particle filter (DPF) only requires local communications between neighbouring robots, therefore, no global knowledge about the communication topology required, lending higher robustness to changes in network topology.
在传感器/机器人网络中,源项的置信度的集体估计可以以集中式或分布式方式进行。与集中式协议相比,分布式估计器设计将为大型机器人网络提供全局后验置信或似然函数等统计量,并具有额外的鲁棒性和可扩展性。文献中有各种贝叶斯滤波器可以跨传感器网络实现分布式估计,如[24]中所述。一种常用的技术是基于测量传播的协议,旨在通过多跳约束泛洪技术(例如[25]、[26]、[27]、[28])建立对网络中每个传感代理(即机器人)的所有采样测量值的访问。因此,每个机器人都可以使用网络上所有可用的测量来更新其信念,以构建它们各自的后验信念,这些信念可能彼此不相似。基于测量传播的协议可能导致高通信成本和较差的可扩展性,这对于基于机器人的传感器网络来说可能是个问题,尤其是在连接和网络拓扑是动态的情况下。相反,分布式扩散滤波器(例如,[29])可以通过专注于后验信念的参数化统计的局部传播来提供改进的可扩展性。此外,通过基于信念共识的协议的增强,分布式扩散滤波器可以在多次迭代中仅融合来自相邻机器人的局部后验信念,以实现整个网络的共同信念,如[30]、[31]、[32]、[33]所示。 基于共识的分布式粒子过滤器 (DPF) 只需要相邻机器人之间的本地通信,因此不需要关于通信拓扑的全局知识,从而为网络拓扑的变化提供了更高的鲁棒性。

In the context of multi-robot STE, having each robot possess a common global belief of the source term would be critical in designing a collaborative search behaviour. Thus, a consensus-based DPF renders it as a superior design choice to frameworks with measurement-dissemination protocol as was also presented in [31]. However, consensus-based DPFs may require a higher number of consensus steps for a larger multi-robot network to quickly converge to an consensus global belief. To this end, we present a modified consensus-DPF aimed to mitigate this issue.
在多机器人STE的背景下,让每个机器人都拥有对源项的共同全球信念对于设计协作搜索行为至关重要。因此,基于共识的 DPF 使其成为具有测量传播协议的框架的优越设计选择,正如 [31] 中所介绍的那样。然而,基于共识的 DPF 可能需要更多的共识步骤,才能使更大的多机器人网络快速收敛到共识的全球信念。为此,我们提出了一个修改后的共识-DPF,旨在缓解这个问题。

In literature, there are various multi-robot path planning strategies, such as [34], [35], [36], [37], [38], [39], [40], [41], [42] that proposed collaborative decision making to achieve distributed multi-robot STE. In [34], [42], multiple active and passive coordination strategies using Infotaxis or its variants based reward functions were presented to choose an appropriate control action for each robot in the network. The strategies of [34], [42] optimised the overall estimated information gain over a finite action set per robot by sequentially negotiating individual robot actions. In [37], [39], a weighted social Bayesian estimation protocol was adopted to update the source term belief for each robot, followed by each robot choosing a traditional infotaxis reward based control action from a finite set of discrete actions. Furthermore, a decentralised formation control strategy was presented in [35] where each robot first locally computes its infotaxis reward based control action, followed by performing consensus over the control action to preserve the network formation.
文献中,有多种多机器人路径规划策略,如[34]、[35]、[36]、[37]、[38]、[39]、[40]、[41]、[42],提出了协同决策来实现分布式多机器人STE。在[34],[42]中,提出了使用Infotaxis或其基于奖励函数的多种主动和被动协调策略,为网络中的每个机器人选择适当的控制动作。[34],[42]的策略通过依次协商单个机器人动作,优化了每个机器人有限动作集的总体估计信息增益。在[37]和[39]中,采用加权社会贝叶斯估计协议来更新每个机器人的源项信念,然后每个机器人从一组有限的离散动作中选择传统的基于信息出租车奖励的控制动作。此外,在[35]中提出了一种分散的编队控制策略,其中每个机器人首先在本地计算其基于信息出租车奖励的控制动作,然后对控制动作进行共识,以保持网络编队。

Majority of the strategies in [34], [35], [36], [37], [38], [39], [40], [41], [42] evaluated an information theoretic metric over a finite set of discrete robot actions, due to the computational complexity to make the decision over the increased action space of all the robots. Only a few path planning strategies account of practical constraints like limited communication ranges. However, a finite set of feasible actions per robot along with spatio-communication constraints can significantly limit the spatial diversity of the overall set of candidate locations to take measurements. Furthermore, it should be noted that the existing collaborative strategies may result in high communication costs over a large network to converge onto the respective sampling location for each robot, which is not a desirable feature. Thus, in the proposed work, we optimise for the control actions of each robot over a continuous action space first, followed by an informative sampling interval choice, consequently resulting in lower inter-robot communication requirements. It should also be noted that most of the aforementioned strategies also employed measurement dissemination protocols which adds to the overall communication costs for the mission and may lead to an over-confident estimate of the source term.
[34]、[35]、[36]、[37]、[38]、[39]、[40]、[41]、[42]中的大多数策略都评估了一组有限的离散机器人动作上的信息理论度量,因为计算复杂,要对所有机器人增加的动作空间做出决策。只有少数路径规划策略考虑了通信范围有限的实际约束。然而,每个机器人的一组有限的可行操作以及空间通信约束可能会显着限制整个候选位置集的空间多样性。此外,应该注意的是,现有的协作策略可能会导致大型网络上的高通信成本,以收敛到每个机器人的相应采样位置,这不是一个理想的功能。因此,在拟议的工作中,我们首先在连续的动作空间上优化每个机器人的控制动作,然后选择信息采样间隔,从而降低机器人之间的通信要求。还应当指出,上述大多数战略还采用了测量传播协议,这增加了特派团的总体通信费用,并可能导致对源术语的估计过于自信。

Incorporating the collective belief update into the design of path planning strategy leads to effective inference of the source term in a distributed manner. However, the key design challenge, which is often overlooked in the multi-robot STE literature, is to ensure connectivity across the robot network. In works like [42], network connectivity of a distributed network is achieved by ensuring the algebraic connectivity (or Fiedler eigenvalue) of the Laplacian matrix associated with the robot-network is positive. However, the computation of algebraic connectivity requires estimation of the Laplacian matrix of the entire network as presented in [43], [44]. Such a real-time estimation of the algebraic connectivity requires several iteration steps consequently resulting in proportionate communication, computational and memory costs per robot and iteration [44], which can prove to be costly in larger robot/sensor networks. In contrast, we introduce a proximity based control action based on a scalable pre-defined subgraph structure to ensure a connected network, rather than replying on dynamic estimation of the network topology.
将集体信念更新纳入路径规划策略的设计中,可以以分布式方式有效地推断源项。然而,在多机器人 STE 文献中经常被忽视的关键设计挑战是确保整个机器人网络的连接。在像[42]这样的工作中,分布式网络的网络连通性是通过确保与机器人网络相关的拉普拉斯矩阵的代数连通性(或Fiedler特征值)为正来实现的。然而,代数连通性的计算需要估计整个网络的拉普拉斯矩阵,如[43]、[44]所示。这种对代数连通性的实时估计需要几个迭代步骤,因此导致每个机器人和迭代的通信、计算和内存成本成比例[44],这在大型机器人/传感器网络中可能被证明是昂贵的。相比之下,我们引入了基于可扩展的预定义子图结构的基于邻近的控制操作,以确保网络连接,而不是根据网络拓扑的动态估计进行响应。

1.2. Technical contributions
1.2. 技术贡献

In this paper, we propose a distributed multi-robot STE strategy following a co-design process which jointly considers the estimation, connectivity and coordination requirements for the first time in a unified framework. We employ a modified consensus based belief update protocol to ensure a common source term belief across a robot network. Coordination path planning is designed based on a combination of a coverage control objective function for sensing performance and a proximity objective function to meet communication range constraints, where the control action is obtained by optimising the objective function over continuous actions between sampling instants, in contrast to some information-theoretic based strategies based on a finite action set. The coverage control objective function allows each robot to be effectively placed at the best sensing location with respect to its network neighbours based on the post-consensus belief of the source location. The proximity-based objective function inspired by works of [45], [46] is to ensure connectivity of the robot network with lower communication requirements as compared to the method in [42]. The proximity control by each robot is enforced with respect to a limited set of robots based on a spanning subgraph, thus, addition of new robots to the group can be robustly accommodated by locally updating the proximity set of the relevant robots. Furthermore, as the proposed overall control action for each robot only requires one-hop communication to their network neighbours, the resulting path planner can be scaled to larger networks easily to cover wider areas.
在本文中,我们提出了一种分布式多机器人STE策略,该策略遵循协同设计过程,首次在统一框架中共同考虑了估计、连接和协调需求。我们采用改进的基于共识的信念更新协议,以确保整个机器人网络的共同源术语信念。协调路径规划的设计基于用于感知性能的覆盖控制目标函数和满足通信范围约束的接近目标函数的组合,其中控制动作是通过优化目标函数在采样时刻之间的连续动作中获得的,这与一些基于有限动作集的信息理论策略形成鲜明对比。覆盖控制目标功能允许每个机器人根据源位置的共识后信念,有效地放置在相对于其网络邻居的最佳传感位置。受[45],[46]启发的基于邻近的目标函数,与[42]中的方法相比,旨在确保机器人网络的连通性,通信要求更低。每个机器人的接近控制都是针对基于跨越子图的有限机器人集强制执行的,因此,可以通过本地更新相关机器人的接近集来稳健地适应向组添加新机器人。此外,由于每个机器人的整体控制动作只需要与其网络邻居进行一跳通信,因此生成的路径规划器可以很容易地扩展到更大的网络以覆盖更广泛的区域。

We also notice that due to the time sensitive nature of the application of multi-robot STE, performing environmental sampling along the trajectories resulted from the proposed control action (instead of the converged locations) can be beneficial. To this end, two alternative designs for the choice of a suitable sampling interval are proposed, namely — fixed sampling interval and information theoretic sampling interval. The latter considers multiple candidates for sampling intervals and locally evaluates the potential information gain corresponding to each candidate in order to choose the best sampling interval, followed by achieving consensus over the locally chosen best sampling interval candidates. After agreeing over the desired sampling interval, the robots move along the proposed control action and perform environmental sampling after the agreed upon sampling interval.
我们还注意到,由于多机器人STE应用的时间敏感性,沿着所提出的控制动作(而不是收敛位置)产生的轨迹进行环境采样可能是有益的。为此,提出了两种选择合适采样间隔的备选设计,即固定采样间隔和信息论采样间隔。后者考虑抽样间隔的多个候选者,并局部评估与每个候选者相对应的潜在信息增益,以便选择最佳抽样间隔,然后就局部选择的最佳抽样间隔候选者达成共识。在商定所需的采样间隔后,机器人沿着建议的控制动作移动,并在商定的采样间隔后执行环境采样。

The proposed consensus belief protocol and coverage based path planner is able to substantiate the accuracy and robustness of the source term estimation with multi-robots, while maintaining network connectivity throughout the mission. Such a framework could potentially be extended to applications like information gathering of a relevant phenomenon in an unknown environment or conducting search and rescue operations over an large region (see e.g., [47], [48]) by modifying the underlying coverage objective function using an appropriate event probability function.
所提出的基于共识信念协议和覆盖范围的路径规划器能够证实多机器人源项估计的准确性和鲁棒性,同时在整个任务中保持网络连接。这种框架有可能扩展到应用,例如在未知环境中收集相关现象的信息,或通过使用适当的事件概率函数修改基础覆盖目标函数(参见例如,[47],[48])在大区域进行搜索和救援行动。

1.3. Paper outline 1.3. 论文大纲

The rest of the paper is organised as follows. Section 2 first discusses motion, sensor and measurement models, along with the limited-communication range design constraint, followed by formally presenting the problem statement addressed in the paper. Section 3 presents a modified consensus-based distributed belief update protocol, followed by a collaborative and distributed path planning algorithm designed to find an optimal control action is discussed and analysed in Section 4. In Section 5, we discuss multiple protocols for choosing a fixed or collaborative interval to perform sampling along the trajectories resulting from the control action designed in Section 4. Finally, we present an extensive simulation study to demonstrate the advantages of the proposed algorithm, followed by concluding the paper with its key features in Section 7.
本文的其余部分组织如下。第2节首先讨论了运动、传感器和测量模型,以及有限的通信范围设计约束,然后正式介绍了本文中解决的问题陈述。第3节介绍了一种改进的基于共识的分布式信念更新协议,随后是第4节讨论和分析的协作和分布式路径规划算法,该算法旨在找到最佳控制行动。在第 5 节中,我们讨论了多种协议,用于选择固定或协作间隔,以沿着第 4 节中设计的控制操作产生的轨迹执行采样。最后,我们进行了广泛的仿真研究,以证明所提算法的优点,然后在第 7 节中总结了该算法的主要特征。

2. Preliminaries and problem formulation
2. 前期和问题表述

In this section, we begin by introducing mathematical models to describe the robot kinematics, the plume model for the onboard gas sensors, and, communication capabilities and limitations. Some basics of coverage control techniques required for coordination protocol design are also discussed for completeness. Finally, the problem is formally formulated to state the objectives to be achieved for a multi-robot source term estimation task.
在本节中,我们首先介绍描述机器人运动学的数学模型、车载气体传感器的羽流模型以及通信功能和局限性。为了完整起见,还讨论了协调协议设计所需的覆盖控制技术的一些基础知识。最后,正式提出问题,说明多机器人源项估计任务要实现的目标。

Notations: The notation a,b denotes all the integers from a to b including the boundaries.

2.1. Motion model 2.1. 运动模型

Consider a set of n homogeneous robots in a closed operating region denoted by DR2. The operating region is assumed to be a convex obstacle free open space. Furthermore, the boundary of D is denoted by D. Let the position of the ith robot platform at time tk be denoted as pk(i)=[xk(i)yk(i)]T. The motion of each robot platform is modelled as the following single integrator model (1)pk+1(i)=pk(i)+tktk+1uτ(i)dτ;where uτ(i) denotes the velocity control action at the τ[tk,tk+1] instant.

2.2. Sensor and measurement model
2.2. 传感器和测量模型

Each robot platform is assumed to be equipped with a chemical sensor that measures the gas concentration at the robot location. For a source emitting gas release from an unknown location, denoted by s=[xoyo]T, with a release rate of Qo, the expected gas concentration over a steady plume at a location q can be expressed using the following plume model: (2)C(q,s)=Qo2πDexpw(qs)2DK0qs2λ,where K0 is the zero-order modified Bessel’s equation, and, (3)λ=ξD1+w2ξ4DThe modelling parameters Qo, ξ, D, and w denote the release rate, average particle lifetime, the diffusivity, and the wind speed, respectively. The expression w(qs) refers to the dot product between the wind vector w and qs. The wind vector can be expressed as w=[wcosΦwsinΦ]T where Φ is the wind direction.

Without loss of generality, it is assumed that the sensor reading (e.g., voltage from a MOX sensor) is proportional to the gas concentration [15], if the gas is detected by the sensor. In this case, the associated measurement noise can be modelled as a Gaussian distribution νN(0,σν2), where the variance is normally modelled as a percentage of the concentration reading. Furthermore, for a low-cost sensor, the background white noise in the absence of the released gas also needs to be modelled, which can be denoted by ν̄N(0,σν̄2). The variance associated with the background noise can be experimentally determined. To minimise the false detection due to background noise, the measurement is only considered as a detection event if the sensor reading is above a certain threshold, denoted by zthr. When the sensor readings are below zthr and subsequently forced to be zero, it is denoted as a non-detection event, which may occur when the gas concentration exists but is not picked up by the sensor due to complex nature of gas dispersion, or the concentration is indistinguishable from background noise readings. We denote the probability of successfully detecting the gas be denoted by Pd, thus, the non-detection probability is given by 1Pd.

Following the plume model and the sensor model, we have the vector of measurements sampled by ith robot until time tk denoted as Z1:k(i)=[z1(i)zk(i)]T, where the sensor measurement obtained by ith robot at a given sampling instant tk can be expressed as (4)zk(i)(pk(i))=C(pk(i),s)+νkif detection event0if non-detection event.

Similar models have been shown to perform efficiently in experimental studies like [15], [49].
类似的模型在实验研究中被证明可以有效地执行,如[15],[49]。

2.3. Communication model 2.3. 通信模型

In this paper, we assume that each robot can communicate with another robot if their mutual distance is below Rc,max. Furthermore, a communication graph topology at some time t denoted by Gt={E,W}, where {E,W} are the edges and vertices of the graph, is defined for the group of robots deployed for the entire mission. The edge set E={Eij} contains an edge for every robot pair (i,j) for which their mutual distance satisfies d(pt(i),pt(j))Rc,max. The set Nc,t(i)W denotes the set containing itself and all network neighbours of the ith robot at time t. The network neighbours at any given time, for ith robot, are all robots j for which there exists an edge EijE.

2.4. Problem statement 2.4. 问题陈述

In this paper, the problem of localisation of an unknown release using multiple robot platforms with network connectivity constraints is addressed. The unknown source term to be estimates is denoted as a parameter vector Θo=sTQoT, assuming other source parameters are known a priori. The group of robots is required to achieve the aforementioned mission in a distributed manner by efficiently choosing locations to sample in D and collaborating with their network neighbour as per the predefined communication topology. Let the estimated source term as per ith robot at time tk be denoted as Θˆk(i). Hence, the mission objectives at time tk can be generally categorised as follows:

  • (a) (一)

    Using the measurements {Z1:k(i)}i=1n, obtained until tk, first arrive on a common belief about the estimated source term between all the robots, that is, p(Θk(i))p(Θk(j))ij, and

  • (b) (二)

    Next, using the common belief at time tk, collectively find the next sampling locations of the robots for the time instant tk+1, such that consequent posterior belief about the source term, after incorporating the new measurements zk+1(i)i, is improved while ensuring the communication topology is preserved.

We make the following mild assumptions, which are commonly used in the literature, before proposing a methodology for achieving the aforementioned mission objectives:
在提出实现上述任务目标的方法之前,我们做了以下文献中常用的温和假设:

  • (i) (一)

    The robots need to stop at each sampling location to gather their local concentration measurements, which is common in atmospheric sensing.
    机器人需要在每个采样地点停下来收集其局部浓度测量值,这在大气传感中很常见。

  • (ii) (二)

    There exists a single static source of chemical or biological release source in the area of interest D.

  • (iii) (三)

    The sensor measurements at any given instant across robots are conditionally independent from each other.
    机器人在任何给定时刻的传感器测量值在条件上都是相互独立的。

  • (iv) (四)

    The state (position, velocity, etc.) of the ith robot at time step tk is known, that is, each robot is equipped with localisation capabilities with respect to the global reference frame.

  • (v) (五)

    Sensor observations and position of ith robot are only communicated to its network neighbours for data fusion and belief update.

  • (vi) (六)

    Furthermore, it is assumed that each robot in the network performs a belief update at the same time, that is, in synchronisation with each other.
    此外,假设网络中的每个机器人同时执行信念更新,即彼此同步。

  1. Download: Download high-res image (227KB)
    下载:下载高分辨率图像 (227KB)
  2. Download: Download full-size image
    下载:下载全尺寸图像

Fig. 1. Flow of information throw the consensus filter: (a) first fusing locally available measurements for ith robot to construct the local posterior distribution as per (7), followed by (b) Consensus iteration l+1 to locally fuse the posterior belief computed at the l consensus step.

3. Distributed sequential estimation
3. 分布式顺序估计

In this section, we discuss the distributed sequential Bayesian estimation algorithm to achieve the consensus of beliefs among all the nodes of a network structure. We employ a modified version of the posterior consensus-based DPF proposed in [30] to fuse the locally sampled measurements with the beliefs across neighbouring robots in the multi-robot communication network.
在本节中,我们将讨论分布式顺序贝叶斯估计算法,以实现网络结构所有节点之间的信念共识。我们采用了[30]中提出的基于后验共识的DPF的修改版本,将局部采样的测量结果与多机器人通信网络中相邻机器人的信念融合在一起。

To improve the performance shortcomings associated with consensus-based DPF as discussed in Section 1, we modified the consensus DPF proposed in [30] with the adapt step of ATC distributed diffusion filters [29]. The aforementioned modification ensures quicker local diffusion information across the network, and thus requiring less number of consensus iterations.
为了改善第 1 节中讨论的与基于共识的 DPF 相关的性能缺陷,我们修改了 [30] 中提出的共识 DPF,并采用了 ATC 分布式扩散滤波器的适应步骤 [29]。上述修改确保了整个网络中更快的局部扩散信息,因此需要更少的共识迭代次数。

Let the posterior belief about the unknown source term at tk time instant in possession of the ith robot be denoted by p(Θk(i)|Ik) where Ik denotes all the information available to the ith robot till time tk (which includes all local measurements, network neighbour measurements, and information received through consensus iterations).

Recall from Section 2.2 about the sensor and measurement model, the likelihood function for the ith robot based on the sensor and measurement model in (4) can be expressed as (5)p(zk(i)|Θk(i))=Nzk(i)C(pk(i),s),σν2ifzk(i)zthr(1Pd)+Pd21+erfzthr2σν̄ifzk(i)=0

Using the aforementioned notations, in the subsequent subsections, we will discuss the design and structure of the proposed DPF in detail.
使用上述符号,在随后的小节中,我们将详细讨论拟议的 DPF 的设计和结构。

3.1. Local posterior generation
3.1. 局部后代

In this subsection, we discuss the calculation of a posterior belief using locally available measurements for each robot. Following the Monte Carlo based representation of probability distribution function (PDF), the local belief about the source term for the ith robot, that is, p(Θk(i)|Ik), can be represented as (6)p(Θk(i)|Ik)np=1Npwk,np(i)δ(Θk(i)Θk,np(i)),where δ() denotes the Dirac-delta function, {wk,np(i),Θk,np(i)}np=1Np denotes the set of Np weighted random samples generated using importance sampling to approximate p(Θk(i)|Ik) and thus np=1Npwk,np(i)=1.

For time tk+1 and the ith robot, the locally available sensor measurements are Zk+1(i)={zk+1(j)}jNc,t(i), including the robot’s own measurement and those from all neighbouring robots (see Fig. 1). Using Bayes’ rule and noting that the source term is static, i.e., Θk+1(i)=Θk(i), the local posterior distribution for the ith robot at time tk+1 can be calculated as, (7)p(Θk+1(i)|Zk+1(i),Ik)p(Zk+1(i)|Θk+1(i))p(Θk+1(i)|Ik),where the likelihood p(Zk+1(i)|Θk+1(i)) can be expressed, using (5) and Assumption (iii), as (8)p(Zk+1(i)|Θk+1(i))=jNc,t(i)p(zk+1(j)|Θk+1(i)).Under the Monte Carlo approximation, we can express the unnormalised weights of the posterior PDF in the left-hand side (LHS) of (7) as (9)wˆk+1,np(i)=wk,np(i)jNc,t(i)p(zk+1(j)|Θk+1,np(i)),which needs to be normalised as wk+1,np(i)=wˆk+1,np(i)np=1Npwˆk+1,np(i).Note that the posterior probability distribution p(Θk+1(i)|Zk+1(i),Ik) computed through (7)(9) for ith robot is only dependent on the concentration measurements obtained by itself and its network neighbours. Compared to the DPF presented in [30] using just the locally sampled measurements of the ith robot, the proposed procedure of constructing the local posterior PDF ensures a quicker information diffusion across the network by using extra measurements from network neighbours.

On constructing the local posterior PDF, the next step is to achieve a common belief of the source term across all robots in the network as it allows for a robust and simpler path planning strategy later.
在构建本地后验 PDF 时,下一步是在网络中的所有机器人中实现对源项的共同信念,因为它允许以后制定稳健且更简单的路径规划策略。

3.2. Kullback–Leibler average based consensuses posterior generation
3.2. 基于Kullback-Leibler平均的共识后生成

In order to achieve consensus of the posterior distribution in a distributed manner, we use the Kullback–Leibler average (KLA) based consensus technique. The mathematical results associated with the KLA relevant to the DPF algorithm are summarised in Lemma 1.
为了以分布式方式实现后验分布的共识,我们使用了基于Kullback-Leibler平均值(KLA)的共识技术。引理 1 总结了与 DPF 算法相关的 KLA 相关的数学结果。

Lemma 1 引理 1

[50]

For M local posterior probability distributions mi() and relative weights ηi such that ηi>0 and 1Mηi=1, the weighted KLA, defined as (10)m̄()=arginfm1Mηim(x)logm(x)mi(x)dx,is given by (11)m̄()=i=1M[mi()]ηii=1M[mi(x)]ηidx1.

The result of Lemma 1 presents an informative way averaging multiple posterior distributions which is leveraged to fuse locally available posterior beliefs to update the posterior belief for the ith robot. This informative averaging of posterior beliefs can be iteratively performed over multiple consensus steps to ensure that the each robot in the network has an approximately same posterior distribution of the source term. Let mk,l(i) denote the PDF of the source term computed by the ith robot at the lth consensus step corresponding to time tk and ηij>0 with jNc,t(i)ηij=1. The posterior PDF belief computed by the ith robot at the l+1 consensus step is can be written as (12)mk,l+1(i)=jNc,t(i)[mk,l(j)]ηijjNc,t(i)[mk,l(j)(Θ)]ηijdΘ.

Using (12), each node can iteratively update their local belief using the informative averaging of local posterior beliefs from each neighbour over an iteration. At the beginning of the consensus procedure, we initiate mk,0(i)=p(Θk(i)|Zk(i),Ik1) and perform the local updates as per (12). However, the process of fusing the local beliefs as per (12) is not straightforward as the posterior beliefs at any given time step tk are expressed using the Monte Carlo representation. This issue was resolved in [30] by using Gaussian mixture (GM) model to compactly represent and transmit the Monte Carlo representation of mk,l(i) at each consensus step. The GM model was constructed using regularised particles (see [51]) to ensure equal particle weights. A new importance sampling procedure is also included to improve the support from all locally available GM models at each robot during the fusion process.

A GM model based representation would only require each robot to share the modelling parameters instead of the particle representation to its network neighbours to communicate its current belief, which is more efficient and thus has been widely used in multi-sensor fusion algorithms [52]. In summary, after sampling the environment, each robot performs two actions:
基于GM模型的表示只需要每个机器人与其网络邻居共享建模参数而不是粒子表示,以传达其当前的信念,这更有效,因此已广泛用于多传感器融合算法[52]。总之,在对环境进行采样后,每个机器人执行两个操作:

  • (a) (一)

    first shares its local measurement with its network neighbours and constructs a local posterior belief using available measurements, and
    首先与其网络邻居共享其本地测量值,并使用可用的测量值构建本地后验信念,以及

  • (b) (二)

    second, performs a consensus procedure to informatively fuse local posterior beliefs until consensus over the posterior beliefs of the source term is achieved throughout the network.
    其次,执行共识程序以信息方式融合局部后验信念,直到在整个网络中就源项的后验信念达成共识。

The KLA based posterior consensus procedure [30] performed by each robot after sampling the region is summarised in Algorithm 1, along with the data transmission process shown in Fig. 1.

  1. Download: Download high-res image (816KB)
  2. Download: Download full-size image

算法 1 总结了每个机器人在对区域进行采样后执行的基于 KLA 的后验共识程序 [30],以及图 1 所示的数据传输过程。
  1. Download: Download high-res image (816KB)
  2. Download: Download full-size image

Remark 1 备注 1

It should be noted that the performance of the proposed consensus particle filter would depend on the communication topology Gt and the fusion weights ηij. The convergence of distributed belief consensus algorithm, for a given connected graph topology under some common weight choices, is established in an asymptotic sense as shown in [31], [53].

In practice, a limited number of consensus steps can be implemented to ensure belief convergence within acceptable tolerances. The width of the communication network graph Gt (longest path length between two nodes) can be used as a heuristic upper limit for the required number of consensus step. For example, the number of consensus step in [30] is one more than the graph width. In this work, as we also utilise the measurements obtained by network neighbours to construct the local posterior belief before consensus iterations, the convergence rate of the belief can be improved. This is primarily because a relatively informed prior is used to initialise the consensus procedure, leading to quicker information diffusion across the network.

Remark 2 备注2

Recall that the proposed consensus filter assumes that each robot performs the belief update in synchronisation with each other. In order to practically implement the proposed DPF, each robot could share the iteration labels and current belief update status, while exchanging measurements and Gaussian mixture parameters.
回想一下,建议的共识过滤器假设每个机器人彼此同步执行信念更新。为了实际实现所提出的DPF,每个机器人都可以共享迭代标签和当前信念更新状态,同时交换测量值和高斯混合参数。

In the next section, we will develop the informative, distributed, and coordinated path planning strategy employed between robots to improve the efficiency of choosing their next sampling locations.
在下一节中,我们将开发机器人之间采用的信息化、分布式和协调路径规划策略,以提高选择下一个采样位置的效率。

4. Distributed path planning algorithm
4. 分布式路径规划算法

Establishing a common belief about the source term via the consensus process allows the robots to collectively plan their paths leading to the next sampling locations. In this section, we introduce the inter-robot coordination strategy that will spatially distribute the robots to collective more informative samples. This strategy mimics an exploratory and exploitative behaviour for information gathering, which helps the DPF to quickly and efficiently converge onto a source term estimate.
通过共识过程建立关于源项的共同信念,使机器人能够共同规划通往下一个采样位置的路径。在本节中,我们将介绍机器人之间的协调策略,该策略将在空间上将机器人分布到集合中,以收集信息量更大的样本。该策略模仿了信息收集的探索性和利用性行为,这有助于 DPF 快速有效地收敛到源项估计值。

4.1. Coordination protocol between robots
4.1. 机器人之间的协调协议

We design the coordination protocol using a distributed optimisation framework with coverage control-like objective functions corresponding to the sensing performance of each robot. Constructing such objective functions requires all the robots to get access to the same underlying PDF of the variable of interest, which in traditional coverage control is static and known a priori. In this work, we leverage the consensus based DPF to provide the common belief of the source term to construct local objective functions for each robot.
我们使用分布式优化框架设计协调协议,该框架具有与每个机器人的传感性能相对应的类似覆盖控制的目标函数。构建这样的目标函数需要所有机器人访问感兴趣变量的相同底层 PDF,这在传统的覆盖控制中是静态的,并且是先验已知的。在这项工作中,我们利用基于共识的DPF来提供源项的共同信念,为每个机器人构建局部目标函数。

Let us first define the locally induced Voronoi cell (LIVC) corresponding to the ith robot, denoted by Vi, based on its network neighbours at any time instant t. Mathematically expressed as (13)Vi=qDpt(i)qpt(j)qjNc,t(i)i.

Note that the construction of the LIVC Vi can be carried out in a distributed manner with communication requirements of position information of neighbours in Nc,t(i). The LIVC for an example case for two different graph structure are presented in Fig. 2. It can be seen from Fig. 2(a) that the robot 5 is only within communication range of robot 4. Thus, the LIVC corresponding to robot 5 has an overlap with Voronoi cells of robot 2 and 3. Furthermore, it can be guaranteed that with a complete graph topology of the communication network between the robots, as shown in Fig. 2(b), the LIVCs are identical to the exact Voronoi tessellation of D. Therefore, the design of the LIVCs, as per (13), ensure that a region-of-dominance can be constructed for each robot where each point in that region is closest to it. Thus, utilising such an region of dominance for finding a locally optimal control action to find the next sampling instant is desirable.

  1. Download: Download high-res image (246KB)
    下载:下载高分辨率图像 (246KB)
  2. Download: Download full-size image
    下载:下载全尺寸图像

Fig. 2. Examples of locally induced Voronoi cells for different network topologies.
图 2.不同网络拓扑结构的局部诱导 Voronoi 细胞示例。

Consider a local objective function, Hi, to be minimised with respect to the position of ith robot for t[tk,tk+1] given by (14)Hi(pt(i))=Vipt(i)q2ϕk(q)dq.The construction of Hi indicates that Hi is minimised when the robot location pt(i) at some time t at the centroid of the LIVC corresponding to ϕk(q). Furthermore, recall that ϕk refers to the PDF of the spatial distribution associated with the source term belief at the tkth time instant. Thus, minimising Hi effectively mimics an exploitative behaviour towards high probability region within each robot’s respective LIVC, Vi, as shown in Fig. 3.

  1. Download: Download high-res image (229KB)
    下载: 下载高分辨率图像 (229KB)
  2. Download: Download full-size image
    下载:下载全尺寸图像

Fig. 3. Graphical representation of robots being locally driven towards high density regions, where × marker denote the ϕkcentroid under LIVC and the arrows denote the local control action.

It is worthy to note that the objective function (14) is constructed using only local positional information as per the communication topology and the local density function ϕk. The structure of Hi is similar to the coverage control objective function as described in [54], but in this paper only the local objective function was used instead of a global positional optimisation index of [54] to support fully distributed implementation. Moreover, the density function is usually known, however, for the application at hand the density function ϕk is constructed based on the post-consensus belief about the source term calculated at the previous sampling instant.

A control velocity vector along the gradient descent direction is derived and shown to minimises the local optimisation index (14) in Theorem 1.
推导了沿梯度下降方向的控制速度向量,并表明该定理 1 中的局部优化指数 (14) 最小化。

Theorem 1 定理 1

For kg>0 and robotic agents governed by (1), the control action given by (15)ut,cov(i)=2kgVi(pt(i)q)ϕk(q)dqkgjNc,t(i)iVijpt(i)q22pt(j)pt(i)qpt(i)ϕk(q)dq.drives the agents towards the local minima of (14).

Proof 证明

Refer to Appendix A for the proof. □
有关证明,请参阅附录 A□

To implement ut,cov(i), the ith robot requires knowledge of the density function. Using the posterior belief p(Θk(i)|Ik) computed after environment sampling at time tk, we define the density function ϕk(i) as the probability distribution function at some point qD using the mixture model constructed at the end of the consensus procedure, respectively, as (16)ϕk(i)(q)=Gk(i,L)q,Θ̄k(i)(Q).where Gk(i,L) is the Gaussian mixture model and Θ̄k(i)(Q) is the mean estimate of the release rate at the tk instant as computed by the ith robot after completion of consensus based belief update. As the updated belief of the source term is computed using a consensus protocol, each robot computes the density function for qD resulting in ϕk=ϕk(1)ϕk(2)ϕk(n).

Remark 3 备注 3

It should be noted that as p(Θk(i)|Ik) is the post-consensus belief across the network, the resulting ϕk(i) is similar for all robots. Therefore, the control direction ut,cov(i) computed as per (15) minimises their local objective function using a similar global belief. Furthermore, the LIVC construction can be preformed in a completely distributed fashion as it only requires spatial information of jNc,t(i). Therefore, the control action ut,cov,(i) can be computed in a distributed fashion.

We have so far discussed the design of control direction that ensures a coordination based control action to achieve spatial diversity of the robots while enforcing exploitative behaviour towards local predicted high probability regions based on the belief of the source term at the previous sampling instant. Following this, we now introduce the complementary control action to ensure a set of proximity constraints are satisfied to achieve a connected communication network topology.
到目前为止,我们已经讨论了控制方向的设计,该方向确保基于协调的控制动作,以实现机器人的空间多样性,同时根据对上一个采样时刻的源项的信念,对局部预测的高概率区域实施剥削行为。在此之后,我们现在引入互补控制操作,以确保满足一组邻近约束,以实现连接的通信网络拓扑。

4.2. Proximity control action for network constraints
4.2. 网络约束的接近控制操作

As discussed in the Introduction section, estimation of the algebraic connectivity can result in computational and network burdens for each robot resulting in time expensive control action which is undesirable. Therefore, in this paper, we utilise the proximity function based control action to ensure network connectivity between robots [45], [46] with minimal communication requirements during transit between two sampling locations.
正如简介部分所讨论的,代数连通性的估计可能会导致每个机器人的计算和网络负担,从而导致耗时的控制操作,这是不可取的。因此,在本文中,我们利用基于邻近函数的控制动作来确保机器人之间的网络连接[45],[46]在两个采样地点之间的传输过程中具有最小的通信要求。

Let vpij(pt(i),pt(j)) denote the proximity function between the ith and jth robots and mathematically expressed by (17)vpij(pt(i),pt(j))=max0,Rc,min2di,j2di,j2Rc,max22,where di,j=pt(i)pt(j) and Rc,min denotes the threshold distance at which the proximity control action is activated. Note that the pt(i) and pt(j) refers to the positions of ith and jth robot at some time t between two sampling instants, say k and k+1.

The gradient descent direction of the proximity function vpij can be expressed on differentiating w.r.t. pt(i) as (18)dvijpdpt(i)=0ifdi,jRc,min4Rc,max2Rc,min2Rc,min2di,j2di,j2Rmax23pt(i)pt(j)ifRc,mindi,jRc,maxnot definedifdi,j=Rc,max0ifdi,jRc,maxFurthermore, it can be noted from the construction of (17) that vijp=vjip and (19)vpijpt(i)=vpijpt(j)=vpjipt(i)=vpjipt(j).Note that network connectivity across the robot network can be ensured if a spanning subgraph can be preserved throughout the mission. Therefore, consider a spanning subgraph G={E,W} with the robots are its vertices be chosen before the start of the mission based on the robot starting locations. Additionally, E={Eij} is set of all edges in G with Eij denotes the edge between the ith and jth robots. Hence, for a G of a multiagent system, the proximity objective function for each ith robot can be expressed as (20)vp(i)=jEijvpij(pt(i),pt(j)).It can be noted from the construction of vpij, (17), and vp(i), (20), that minimising vp(i) for each node ensures that the ith node always remains within a radius of Rc,max relative to its network neighbours. Therefore, we choose design the proximity control action as the gradient descent direction that minimises vp(i), mathematically expressed, as (21)ut,px(i)=kpjEijvpijdpt(i),where kp>0 is the proximity control gain. It is interesting to observe that the proximity action is not active if the relative distance between a robot an its network neighbour is below Rc,min. In order to implement a predefined spanning subgraph based proximity action in a distributed manner, each robot would have to only store a limited list of its spanning subgraph neighbours, which has relatively low memory requirements even for large networks. Furthermore, the construction of the spanning subgraph is performed before the mission begins. Thus, under no failure assumptions, scaling this design for the purpose of proximity control would require minimal effort. We will demonstrate the same through a simulation study.

4.3. Convergence analysis of the overall control action
4.3. 整体控制动作的收敛分析

In this subsection, we discuss the convergence of the robot trajectories subject to the overall control action ut(i)t[tk1,tk], given by (22)ut(i)=ut,cov(i)+ut,px(i)ifut,cov(i)+ut,px(i)ααut,cov(i)+ut,px(i)ut,cov(i)+ut,px(i)ifut,cov(i)+ut,px(i)>α,where α is the maximum speed of each robot in the network. The following convergence study demonstrates that the control action results into a optimal spatial configuration of the robot configuration such that the coordination objective H is minimised while ensuring network connectivity objective.

Consider a Lyapunov function candidate Vl as (23)Vl=β2vp(i)+Hi(pt(i))=β2jEijvpij(pt(i),pt(j))+Hi(pt(i))The time derivative of Vl can be expressed and simplified, using (19), as (24)V̇l=Hipt(i)ut(i)+β2jEijvpijpt(i)ut(i)+jEijvpijpt(j)ut(j)=Hipt(i)ut(i)+β2jEijvpijpt(i)ut(i)+jEijvpjipt(j)ut(j)=Hipt(i)ut(i)+β2jEijvpijpt(i)ut(i)+jEijvpijpt(i)ut(i)=Hipt(i)ut(i)+βjEijvpijpt(i)ut(i) For the case ut,cov(i)+ut,px(i)α and using (15), (21), we get (25)V̇l=1kgut,cov(i)2+ut,cov(i)ut,px(i)+βkput,px(i)2+ut,cov(i)ut,px(i)=ut,cov(i)2kg+βut,px(i)2kp+kp+βkgkgkput,cov(i)ut,px(i) On choosing β=kp/kg and further simplifying, (25) can be rewritten as (26)V̇l=1kgut,cov(i)+ut,px(i)20Similarly, for the case of ut,cov(i)+ut,px(i)>α, one can show that (27)V̇l=αkgut,cov(i)+ut,px(i)0Therefore, from (26), (27), we can see that V̇l<0pt(i) such that ut(i)(pt(i))=0. Moreover, V̇l=0 only for ut(i)=0i which corresponding to an optimum spatial configuration of the multi-robot network that minimises Vl. Hence, using Lyapunov theory of stability, it can be concluded that the proposed control action ut(i)i drives the ith robot position to the optimum configuration that minimises Vl. Furthermore, the choosing β=kp/kg intuitively implies that for a sufficiently large choice of kp w.r.t kg, the control action ut(i) can guarantee that ith robot remain connected to all its communication network neighbours during the time of deployment. In other words, based on the mission requirements, one can choose a large proximity gain for a conservative spatially diverse movements of each robot.

Remark 4 备注4

The action ut,cov(i) drives the robot to the centroid of their respective LIVC. When ut,prox(i) is implemented along with ut,cov(i), the boundedness of each robot to their respective LIVC may not be guaranteed. Therefore, an explicit inter-robot collision avoidance behaviour can be enforced by appending an artificial potential field based inter-robot collision avoidance action to ut(i). Moreover, a similar action can also be incorporated to account for unexpected or dynamic obstacles that can be sensed using onboard sensors.

Remark 5 备注 5

The choice of Rc,min may affect the spatial diversity achieved by ut,cov(i) due to ut,prox(i) ensuring the network communication constraints. By increasing the activation radius Rc,min, the restrictive proximity control would not be activated for relatively longer period as compared to a smaller Rc,min, allowing ut,cov(i) to diversely spread the robots encouraging exploratory behaviour over D. However, choosing a Rc,min too close to the communication limit Rc,max can reduce the effectiveness of ut,prox(i) in ensuring network connectivity. Therefore, Rc,min along with kp needs to be chosen to ensure sufficient exploratory behaviour and network connectivity. Additionally, the choice of the underlying spanning subgraph G would also affect the overall exploratory behaviour of the proposed algorithm as it is designed to maintain the edges of G to ensure overall network connectivity. Therefore, it is recommended that the G be chosen such that the overall number of edges are minimised while ensuring sufficient exploratory behaviour.

In this section, we discussed the convergence property of the robot trajectories resulting from coordination and network constraints adhering control direction. However, it is possible that the waiting for each robot to converge onto the previously discussed optimum configuration may incur larger time penalties. As the STE goal is a time-sensitive from a practical standpoint, it is desirable to further discuss the choice of time interval between successive sampling instants along the trajectories resulting from application of ut(i). We propose two different choices of sampling intervals decision protocols in the next subsection.

5. Choice of sampling interval
5. 采样间隔的选择

In this section, we discuss two sampling interval choice methods called fixed-interval and informative-interval based sampling strategies. Let Δ̄k denotes the sampling interval between sampling instants tk and tk+1.

5.1. Fixed-interval based sampling strategy
5.1. 基于固定间隔的采样策略

A fixed interval based sampling strategy assumes a common pre-specified fixed sampling interval, denoted by Δ̄k=Δ̄. Each robot, while implementing the control action given by ut(i), stops every Δ̄ time interval to sample the environment and perform the belief consensus update step. A fixed sampling interval structure in theory does not require inter-robot communication to synchronise the sampling action. However, it is important to note that choosing the sampling interval to be small would lead to higher number of sampling instants which would result in longer mission time periods. Whereas, for a choice of large sampling interval, the fixed-interval sampling strategy maybe result in longer mission times and potentially insufficient sampled data for an accurate estimation. Furthermore, the choice of the sampling interval is effectively upper bounded as for a sufficiently larger sampling intervals the robots would just converge to the locally optimum configuration due to the convergent properties of the resultant robot trajectories as discussed in previous section.

5.2. Informative-interval based sampling strategy
5.2. 基于信息区间的抽样策略

In this subsection, we proposed a distributed informative interval strategy for sampling the environments along the resultant trajectories from ut,cov(i), to eliminate the requirement of choosing an appropriate sampling interval a priori. We define an informative sampling interval choice by examining M potential sampling intervals and computing the predicted information gain corresponding to each candidate of sampling interval. We use Shannon’s entropy as the index of evaluating information gain.

Let {Δ1,,Δm,,ΔM} denote the set of M potential sampling interval choices to examine. The ith robot acquires the positional information of its spatial neighbours jNc,t(i)i at the current sampling instant, say tk, followed by generating trajectory solutions for all jNc,t(i) using the robot kinematics (1) along with the control action (15) and propagating the robot positions until time tk+ΔM, while storing the positional information of jNc,t(i) at times tk+Δmm{1,,M} as shown in Fig. 4. Note that the propagated trajectories are generated using ut,cov(i) instead of ut(i) as computing ut,prox(j) for all jNc,t(i)i would require positional information of robots that may not be belong to the set Nc,t(i). The term ut,cov(j) for all jNc,t(i) is computed under the assumption that predicted spatial trajectories of all jNc,t(i) are only affected by the robots jNc,t(i).

  1. Download: Download high-res image (278KB)
    下载:下载高分辨率图像 (278KB)
  2. Download: Download full-size image
    下载:下载全尺寸图像

Fig. 4. Local trajectory propagation and storing of predicted positions of robots A, B, and C for m{1,2,3,4,5} by the robot B. Robots A and C are the detected neighbours of robot B at time tk.

Remark 6 备注 6

Increasing the number of interval candidates, M, and potentially increasing the maximum sampling interval, ΔM, can improve the proposed sampling method’s performance as it improves the chances of finding the best interval choice. However, generating a predicted posterior for each M sampling interval candidates and computing their respective information gain increases storage and processing time. The choice of ΔM and M needs to be made based on the available computational authority.

Let the predicted (or propagated) position of the jthNc,t(i) robot at time tk+Δm computed by the ith robot be denoted by pk+Δm(i,j). Therefore, the predicted concentrations readings at pk+Δm(i,j) can be computed using the posterior belief of the source term obtained at tk, p(Θk(i)|Ik), as (28)zk+Δm(i,j)=Cpk+Δm(i,j),np=1Npwk,np(i)Θk,np(i).Furthermore, on computing zk+Δm(i,j) jNc,t(i) and m1,M, the ith robot can generate the predicted resultant posterior belief, p(Θk+Δm(i)|Ik,Zk+Δm(1:n)), for a m1,M as (29)wˆk+Δm,np(i)=wk,np(i)jNc,t(i)pzk+Δm(i,j)|Θk,np(i)wk+Δm,np(i)=wˆk+Δm,np(i)np=1Npwˆk+Δm,np(i).Finally, we can compute the Shannon’s entropy corresponding to each sampling interval Δm as per the following (30)Hs(Θk+Δm(i))=np=1Npwk+Δm,np(i)log2(wk+Δm,np(i)).Based on the computed Shannon’s entropy corresponding to each sampling interval Δm, we choice of the best local sampling interval for the ith robot as (31)Δ̄k(i)=minm1,MHs(Θk+Δm(i)).

Remark 7 备注7

For the case with multiple sampling interval candidates result in the minimum Shannon’s entropy, the smallest candidate is chosen in order to trigger the sampling action as quickly as possible while ensuring the maximum information gain from all the potential candidates, Δm.

On computing the appropriate informative local sampling interval choice, each robot in the network performs a distributed average-consensus algorithm (ACA) as presented in [55]. The ACA allows each robot to fuse their belief of a parameter, in a weighted and distributed manner, with the beliefs of their neighbour in order to converge onto a weighted average belief of the consensus variable. The weights chosen by each robot are based on the maximum degree associated them in the network. Therefore, the post-consensus choice of the informative sampling interval Δ̄k as (32)Δ̄k=ACA(Δ̄k(1:n)).It should be noted that the ACA algorithm is time and network efficient as the consensus is performed over a scalar variable and in a distributed fashion. The post-consensus informative sampling interval Δ̄k is used by each robot as the horizon for the application of the control action (22). The sampling action for each robot is triggered after the passage of Δ̄k time, followed by the belief update procedure and the choice of next informative sampling interval.

  1. Download: Download high-res image (715KB)
    下载:下载高分辨率图像 (715KB)
  2. Download: Download full-size image
    下载:下载全尺寸图像

Fig. 5. Distributed multi-robot estimation and path planning algorithm with informative sampling interval protocol for STE of an unknown gas leak in an open environment.
图 5.分布式多机器人估计和路径规划算法,具有信息采样间隔协议,用于开放环境中未知气体泄漏的STE。

Remark 8 备注 8

It should be noted that the predicted trajectories pk+Δm(i,j)jNc,t(i) generated for the selection of the informative sampling interval may not necessary be the actual trajectories the robots travel over the horizon Δ̄k. However, as the control action (15) is designed based on locally available information, the actual trajectories of each robot are expected to be within the neighbourhood of the predicted sampling location candidates. Furthermore, the average consensus over the sampling interval is expected to mitigate some effects of the error between the predicted and actual trajectories of the robots.

5.3. Overall architecture flowchart and algorithmic summary
5.3. 整体架构流程图和算法总结

The proposed distributed multi-robot path planner (DMrPP) for STE is summarised using its architectural flowchart and an algorithmic summary. Fig. 5 summarises the flowchart of information to be performed by the ith robot to achieve the objective of STE over an distributed multi-robot network and Algorithm 2 codifies the same in a pseudo code.

The condition for termination of search can be chosen based on the design requirements of the mission. One such termination condition which can be evaluated in a distributed manner is the evaluating the certainty of the post-consensus posterior belief. The normed weighted standard deviation of the posterior belief, σest,k,i, reaching below a tolerable threshold, σthres is used as a measure for termination of the algorithm. Another termination criterion can be the a prespecified maximum mission time budget or a combination of a max mission time budget along with σest,k,i. The completely distributive property of the proposed multiagent STE algorithm renders itself to be robust in scenarios where new robots are added to the network during the mission.

  1. Download: Download high-res image (696KB)
  2. Download: Download full-size image

6. Simulation study 6. 模拟研究

To analyse the performance of the proposed algorithm, DMrPP, we first present a few illustrative scenarios demonstrating the path planning and belief update processes. Subsequently, we discuss a Monte Carlo (MC) study comparing DMrPP with its various ablations along with an existing distributed multiagent STE algorithm presented in the literature [35]. The MC comparison study is designed to evaluate the importance and performance features of each key component of the proposed algorithm. The key features of the proposed algorithm are namely - (a) the distributed consensus filtering based belief update, (b) coverage control based coordination protocol, and (c) information-theoretic choice of sampling intervals.
为了分析所提出的算法DMrPP的性能,我们首先提出了一些说明性场景,演示了路径规划和信念更新过程。随后,我们讨论了蒙特卡洛(Monte Carlo, MC)研究,该研究将DMrPP与其各种消融以及文献中介绍的现有分布式多药STE算法进行了比较[35]。MC 比较研究旨在评估所提出算法的每个关键组件的重要性和性能特征。该算法的主要特点是:(a)基于分布式共识滤波的信念更新,(b)基于覆盖控制的协调协议,以及(c)采样区间的信息理论选择。

The first ablation of the proposed algorithm, denoted by A1, replaces the consensus based belief update with a measurement flooding based belief update [35], [42]. The flooding based design aims to communicate all measurements obtained at the current instance across the network, followed by each robot constructing a posterior belief using all currently sampled measurements. The objective of comparing the proposed algorithm with A1 is to establish the convergence properties of the STE error and its associated uncertainty over mission. The second ablation of the proposed algorithm, denoted by A2, replaces the information-theoretic sampling interval choice with a fixed sampling interval.

Finally, we compare the proposed algorithm with distributed multi-robot strategy for STE presented in [35], hereby denoted as BR20. The BR20 algorithm employs a controlled flooding protocol to achieve measurement dissemination across the robot network followed by each robot independently sequentially updates its belief using a Rao-Blackwellised particle filter. Furthermore, a formation-based path planning protocol is employed in BR20, where local control actions optimally chosen over a discrete action set such that the predicted entropy-reduction is maximised, followed by a distributed average consensus protocol to converge on a control action that ensures the group-formation is maintained (consequently the connectivity of the network). We have presented a comparison study with the path planner BR20 as it is the most similar path planning technique in the literature that is designed to perform coordinated action among the accounts, while accounting for range limited communication along with examining multiple sampling interval candidates. To ensure a fair comparison study, we implemented the BR20 algorithm with traditional particle filter (instead of a Rao-Blackwellised filter) for constructing local posterior updates and a vehicle motion model described by (1) (instead of the unicycle-like motion model). Consequently, the feasible action set U used for optimisation is defined as U:V×U×T, where V={α} is the linear speed set, the motion direction set U={eiγπ/8}γ=0,1,,7, and the duration of motion set T is chosen same as the T={Δm}m=1M.

Table 1. Simulated environmental parameters.
表 1.模拟环境参数。

Environmental parameters 环境参数Value 价值
Search domain, D50×50m2
Wind speed, w4 m/s 4 米/秒
Wind direction, ϕ270° (along the negative Y-axis)
270°(沿负Y轴)
Diffusivity, D1 m2/s
Average particle lifetime, ξ5 s 5 秒

Table 2. Consensus particle filter parameters and priors.
表 2.共识粒子滤波器参数和先验。

Empty CellConsensus filter properties
共识筛选器属性
Value 价值
Parameters 参数Number of particles, Np10 000
Number of consensus steps, L3
Probability of detection, Pd85%
Priors 前科Source xo-position,U(0,50)
Source yo-position,U(0,50)
Source release rate, 源释放率,Γ(2,4)

Table 3. Robot operational parameters.
表 3.机器人操作参数。

Operation parameters 操作参数Value 价值
Max. robot speed, α2 m/s 2 米/秒
Sensor threshold, zthr1 μg/m3
Coverage Control gain, kg1000
Proximity gain, kp0.1
Communication range, Rc,max25% of 50 m 50米的25%
Proximity distance threshold, Rc,min15% of 50 m 50米的15%
Control update time, dt100 ms 100 毫秒

The simulated environmental parameters, filter priors and parameters, and robot operational parameters for the presented study are provided in Tables 1, 2, and 3, respectively. Each robot is assumed to be equipped with a single fast response chemical sensor capable of detecting down to a minimum concentrations of 1 μg/m3. The gas measurements are synthetically simulated by adding Gaussian noise with σν equal to 40% of the true concentration at the sampling location. Additionally, M=3 sampling interval candidates are chosen for the computation of the information-theoretic sampling interval. These sampling interval candidates are chosen as Δm=m0.2|D|Mα with m1,M and |D| denoting the largest side length of the search domain. Thus, the maximum sampling interval ΔM=0.2|D|α and such a design of the sampling interval candidates allows an Dappropriate choice of sampling horizon. Furthermore, the robots start near the bottom left corner of D (denoted as the origin in subsequent plots) in an equilateral and regular pentagonal formation centred at (5,5) with a side length of 2 m. Furthermore, the spanning tree G chosen for ensuring network connectivity constraints is shown in Fig. 6. The ground truth about the source will be appropriately discussed in the upcoming subsections. The uncertainty threshold σthres is chosen to be 3 units. The choice of σthres=3 is based on a acceptable tolerance of 4% of |D| in positional estimate and a tolerance of 1 g/s in the estimated release rate. Furthermore, it is assumed that each robot takes tsampling=5 s during each environmental sampling action to collect a measurement and update its belief.

6.1. Illustrative run 6.1. 说明性运行

In this section, we illustrate the performance of the proposed distributed multi-robot STE algorithm DMrPP using a scenario with a 5-robot network. The chemical release source is considered to located 35 m east and 45 m north of the origin and releasing chemical at a release rate of 5 g/s. The simulation results obtained from an illustrative run of the proposed algorithm are presented in Fig. 7, which contain the robot trajectories, history of sampled locations, current robot locations and belief at different time instants. The previously sampled locations are indicated with a filled marker, whereas, the current robot locations are indicated with a slightly large unfilled marker corresponding to each robot. Furthermore, the positional estimates of source location as predicted by each robot is marked with a filled star marker, whereas, the release rate estimates are marked in the with a vertical line.

The mission successfully completed by the 5-robot network within 11 sampling instances and total mission time of 91.6 s, where 41.6 s were spent in transit between sampling locations. The final source term estimates for each robot are tabulated in Table 4. It can be seen that the final converged estimated source term as predicted by each robot lies within a bound of 0.5 units from ground truth. Additionally, the sampling intervals Δ̄k chosen over the mission are plotted in Fig. 8(a). It can be noted from Fig. 8(a) that the robot network chooses larger sampling intervals at the early stage of the mission to quickly cover the search area and uses smaller intervals towards the end of the mission to refine the estimation results.

  1. Download: Download high-res image (2MB)
    下载: 下载高分辨率图像 (2MB)
  2. Download: Download full-size image
    下载:下载全尺寸图像

Fig. 7. An illustrative run for the proposed algorithm at different times.
图 7.所提出的算法在不同时间的说明性运行。

Furthermore, the history of all the graph edges for a complete 5-robot network is presented in Fig. 8(b). The periods of active status for an edge is marked by a coloured patch, whereas, an inactive status is denoted by a white patch. For example, the edge between the robots indexed 1 and 4, denoted by E14, was active in the beginning of the mission, however, it became inactive as the mission progressed and the inter-robot distance grew larger than the communication range Rc,max. It is worthy to note that the graph edges that also belonged to the proximity enforcing spanning subgraph, as shown in Fig. 6, remain active throughout the mission. The edges that are part of G are denoted by a dark orange patch in Fig. 8(b). Additionally, the edge E15 between robots 1 and 5 also remains active throughout the mission providing a redundant edge to ensure network connectivity. Thus, Fig. 8(b) demonstrates that the connectivity of the robot network was ensured by the proposed control action (22) by ensuring the active status of EijG.

  1. Download: Download high-res image (322KB)
    下载:下载高分辨率图像 (322KB)
  2. Download: Download full-size image
    下载:下载全尺寸图像

Fig. 8. (a) Information-theoretic interval, Δ̄k, chosen over the mission. (b) Graph edge activity throughout the mission. The edges highlighted with orange colour also belong to the proximity enforcing subgraph G.

Table 4. Source term estimates by each robot at the end of the mission, where true source term is Θ=[35m45m5g/s].

Robot index 机器人索引Θˆ(i)(xs)mΘˆ(i)(ys)mΘˆ(i)(Qs)g/s
135.028045.13355.4636
235.032045.14305.4415
335.028045.13345.4605
435.039845.13215.4431
535.033545.13795.4592

6.2. Scalability illustration: addition of a new robot

In order to demonstrate the robustness of the proposed framework to addition of a new robot, we present a simulation study that considers a larger field size and total number of robots along with addition of a new robot at some user defined time at a reasonably chosen robot location in the region of interest. For the simulation study, we consider total of 10 robots, with 9 robots start at the beginning of the mission starting in a 20×20m box in the bottom left corner of square search domain of side-length 100m. The 10th robot is added at the 35 s mark at {50,20}m in an open region of interest. The communication range is assumed to be 25m between two robots. The spanning subgraph, G, over the course of the mission is presented in Fig. 9. The source is located at 70 m east and 80 m north from the bottom left corner with a release rate of 10 g/s.

Recall that G is locally stored on each robot before the start of the mission just in the form of a list of robot ids as shown in the first row of Table 5. As the mission progresses as per the proposed strategy, the 10th robot is activated at the 35 s mark and is within communication range of robots 1 and 2. Therefore, robots 1, 2 and 10 update their proximity neighbour lists accordingly as shown in the second row of Table 5.

  1. Download: Download high-res image (210KB)
  2. Download: Download full-size image

Fig. 9. Spanning Subgraph, G over the course of mission. The left subgraph is chosen before the start of the mission and is appropriately updated after the 10th robot is added to the network after 35 s.

  1. Download: Download high-res image (2MB)
  2. Download: Download full-size image

Fig. 10. An illustrative run for the proposed algorithm when a new robot is added after the 35 s mark during the mission. An animated video of the same can be found in the supplementary material.

The simulation results for the aforementioned scenario are presented in Fig. 10. The final estimation error of the source location and release rate, averaged over all the robots, are 0.97m and 0.0995g/s, respectively. The robot network needed a total of 8 sampling instant, out of which the network only had 9 robots for the first three instances. It can be seen from Fig. 10(a) that Robot 10 is not connected to any other robot in the communication graph as it is currently inactive. Whereas, after the 35 s mark, Robot 10 is within communication range of Robots 1, 2, 3, 4, and 5 as seen in Fig. 10(b). As part of the G neighbour update protocol, the robot 10 adds any two neighbouring robots at random to its own list and communicates the same to the randomly chosen robots in order for them to update their list. For the illustrative case, the robots 1 and 2 were randomly chosen as part of the update procedure as presented in Fig. 9 and Table 5. The overall mission time was 93 s, where 58 s were spent in transit. Thus, it can be concluded that the addition of new robots can be robustly handled by the proposed framework. Furthermore, this consequently substantiates the scalability of the path planner discussed in Section 4.

Table 5. List of G neighbours for each robot during the mission.

# Robot12345678910
G neighbour21,32,43,54,65,76,87,98
ids (t<35s)
G neighbour2,101,3,2,43,54,65,76, 8,7,981,2
ids (t35s)10

6.3. Monte Carlo study

In this subsection, we present a Monte Carlo (MC) simulation study to evaluate the performance of the proposed algorithm along with comparing it to its ablations, A1 and A2, and Algorithm BR20 in a statistically relevant sense.

The MC study is designed using 25 randomly chosen source locations and 8 discretely chosen release rates. The source locations are generated by uniformly sampling the upwind region of the search domain D, and the chosen release rates are Qo1=6g/s, Qo2=8g/s, Qo3=10g/s, Qo4=12g/s, Qo5=14g/s, Qo6=16g/s, Qo7=18g/s, and Qo8=20g/s. Therefore, a total of 200 simulations per algorithm are generated using a combination of the aforementioned source locations and release rates. All other simulation and operational parameters are the same as previously mentioned in Table 1, Table 2, Table 3. The performance index employed to perform the comparison study are the following:

  • Averaged Estimation Error (AER): the AER at any time t[tk,tk+1] is defined as the 2-norm of the difference between the estimated source term vector and the ground truth source term averaged over all the robots. Mathematically, AER can be expressed as (33)AER(t)=(1/n)iΘˆk(i)Θ2where Θˆk(i)=np=1Npwk,np(i)Θk,np(i) and Θ=[xs,ys,Qs] is the ground truth vector.

  • Effective Weighted Standard Deviation: The norm of weighted standard deviation vector, denoted by σest,t at t[tk,tk+1], can be interpreted as the effective certainty of the estimated source term averaged over the robot network, and can be computed as (34)σest,t=(1/n)inpwk,np(i)Θk,np(i)Θˆk(i)22.

  • Mission Time (MT): The time of σest,k dropping below σthres or a maximum of 175 s. A maximum mission time criterion is added to bound the simulation run time and examine the performance of various algorithms under limited time constraints.

  • Success Ratio (SR): Ratio of successful mission to total missions, where successful mission are defined as runs where the estimated source term vector, Θˆk, is within a 1-σthres bound around the ground truth. The SR demonstrates the effectiveness of an algorithm to accurately estimate the source term.

6.3.1. Comparison with A1 and A2

In this subsection, we present the MC study data comparing the proposed algorithm with the ablations A1 and A2, where results for Algo. A2 are provided for three different fixed sampling intervals. The overall time profiles of AER and σest,k averaged over the 200 simulation runs for the DMrPP algorithm and, A1 and A2 are presented in Fig. 11, Fig. 12, respectively. Furthermore, the overall mission times and success ratio of the MC study are tabulated in 6. To provide additional insight, the mission times and success ratio are further categorised for low-release rates and high release rates, where Qo1 to Qo4 are classified as low-release rates and Qo5 to Qo8 classified as high-release rates.

It can be noted from Fig. 11(a) that the proposed algorithm outperforms A1 and results in a lower overall estimation error for the robot network. The relatively larger estimation error achieved by A1 as compared to DMrPP can be attributed to the overconfidence in the STE resulting from a measurement flooding based belief update, as can be seen in Fig. 11(b). As measurement flooding based belief update uses all measurements available across the robot network, the resulting updated belief gives equal importance to each measurement. Therefore, the resulting posterior belief is sensitive to the faulty or anomalous measurements (e.g., miss-detection). However, the proposed consensus based belief update design tends to construct a KL weighted average of local posterior beliefs resulting in reduced sensitivity faulty or anomalous measurements and/or local beliefs, which is a desirable feature. Additionally, a measurement flooding based belief update design can incur large network communication costs, especially for larger networks. However, the posterior consensus based belief update design only requires limited communications to local neighbours, resulting in higher scalability to larger networks. Moreover, it can be noted from Table 6 that DMrPP algorithm has a 11% higher success ratio than its A1 ablation, however, A1 has a relatively lower mission times for both low and high release rates as well as the overall study. This further substantiates the previously discussed over-confident behaviour exhibited by measurement flooding based belief update.

  1. Download: Download high-res image (283KB)
  2. Download: Download full-size image

Fig. 11. Monte Carlo Study results for (DMrPP) and A1 algorithms averaged over 200 runs.

  1. Download: Download high-res image (455KB)
  2. Download: Download full-size image

Fig. 12. Monte Carlo Study results for DMrPP and A2 averaged over 200 runs, with A2 simulated with three different fixed sampling intervals of 1 s, 4 s and 7 s.

  1. Download: Download high-res image (407KB)
  2. Download: Download full-size image

Fig. 13. Monte Carlo Study results for DMrPP and BR20 averaged over 200 runs.

The comparison study with the ablation A2 is performed for three fixed sampling intervals of 1 s, 4 s and 7 s. It can be seen from Fig. 12, Fig. 12 that the DMrPP algorithm significantly outperforms A2 with a fixed sampling interval of 1 s. Whereas, the averaged AER and σest,k for sampling intervals of 4 and 7 s reduces at a marginally similar rates as achieved by the DMrPP algorithm. The final averaged estimation error at the end of the mission by DMrPP is marginally lower than the cases of Δ̄=4 s and Δ̄=7 s.

It can be noted from Table 6 that the DMrPP algorithm has an overall 15% higher SR as compared to the 1 s sampling interval case, along with an overall 4% higher SR than the 4 s and 7 s cases of A2 ablation. Furthermore, the average mission times for DMrPP algorithm are 51.9 and 6.6 s lower than the 1 s and 4 s fixed sampling cases. Similar trends can be seen for both low and high release rate scenarios, with a fixed 4 s sampling interval achieving a percentage point over information theoretic sampling interval in the high release rate cases. Note that the sampling interval candidates for the DMrPP algorithm are 1.67 s, 3.34 s and 5.1 s. Therefore, the similar performance of DMrPP to A2 for the cases of Δ̄ = 4 s is primarily due to a curated choice of the fixed sampling intervals as per the size of the region of the interest. Moreover, the marginally poor performance of DMrPP against A2(Δ̄=7s) is most likely due to its candidate set being limited to 5.1 s. Moreover, The information theoretic sampling interval offers superior design flexibility compared to manual tuning of a fixed sampling interval (Δ̄) to achieve optimal performance under the A2 ablation. The informative sampling interval candidate set adapts to factors such as search domain size and robot speed while delivering superior performance, as shown in Table 6.

Table 6. Overall of average mission times and success ratio for the proposed algorithm DMrPP and its ablations, A1 and A2.

Empty CellEmpty CellLow releaseHigh releaseOverall
Empty CellEmpty Cellrates (Qo14)rates (Qo58)Empty Cell
MT (s)DMrPP96.0137.8116.9
A172.694.383.5
A2 (Δ̄=1 s)164.7172.8168.8
A2 (Δ̄=4 s)101.3145.7123.5
A2 (Δ̄=7 s)87.1131.8109.5
SR (%)DMrPP927584
A1865973
A2 (Δ̄=1 s)696869
A2 (Δ̄=4 s)847680
A2 (Δ̄=7 s)887180

6.3.2. Comparison with BR20

The multi robot source term estimation strategy BR20 uses a formation based path planning technique to compute the next sampling location such that predicted information gain is maximised. Therefore, we perform the comparison study between BR20 and DMrPP using 2 different initial configurations. The first starting locations of the robot network, denoted by SL1, the robots start near the bottom left corner of D in an equilateral and regular pentagonal formation centred at (5,5) with a side length of 2 m. The second starting location, denoted by SL2, is also an equilateral and regular pentagonal formation centred at (6,6) with a side length of 5 m.

The resulting profiles of the AER and σest,k for the DMrPP and BR20, with SL1 and SL2 spatial configurations of the robot-network at the beginning, are presented in Fig. 13. It can be seen that the proposed algorithm outperforms the BR20 strategy by an healthy margin on both, AER and σest,k, performance indexes. In other words, the DMrPP algorithm can estimate the source term vector more accurately and precisely than the BR20 strategy. Similar conclusions can be made from Table 7 which contains the mean SR and MT for the MC comparison study. The DMrPP algorithm has a minimum 30% overall higher success ratio against the BR20, with a significantly better performance for high release rate simulation runs. Furthermore, the DMrPP strategy may require a relatively higher mission time as compared to BR20, but its substantially higher SR compensates for it.

Table 7. Overall of average mission times and success ratio for DMrPP and BR20.

Empty CellEmpty CellEmpty CellLow releaseHigh releaseOverall
Empty CellEmpty CellEmpty Cellrates (Qo14)rates (Qo58)Empty Cell
MT (s)SL1DMrPP96.0137.8116.9
BR2074.779.577.1
SL1DMrPP81.8128.6105.2
BR2074.068.271.1
SR (s)SL2DMrPP927584
BR20582541.5
SL1DMrPP957485
BR20703954.5

7. Conclusion

In this paper, we presented a fully-distributed multi-robot source term estimation strategy that will empower the chemical sensing robots in large-scale search missions. The proposed method uses a gradient-descent based control action designed to move each robot towards a locally optimum location with respect to its neighbours for a common belief, while ensuring network connectivity. Furthermore, each robot chooses an information theoretic sampling interval candidate, followed by performing consensus over it across the network. On sampling the environment, each robot communicates only to its neighbours for a limited iteration to fuse their local beliefs to generate a KLD-averaged common belief across the network. The simulation study presents an illustrative scenarios to demonstrate the features of the proposed strategy, followed by an extensive Monte Carlo study designed to emphasis on the key advantage of the consensus-based belief update over measurement-dissemination strategies. It was also demonstrated that the information theoretic choice of the sampling interval exhibited design flexibility and a performance advantage over fixed sampling interval choices. The DMrPP strategy also outperformed the benchmark multi-robot STE strategy by a significant margin.Extending the proposed solution to be robust against robot failure and accounting for presence of obstacles in the environment are some interesting directions of future research. Furthermore, optimising for an ideal subgraph G that ensures sufficient exploratory behaviour can be an interesting direction of future research.

CRediT authorship contribution statement

Rohit V. Nanavati: Writing – original draft, Software, Methodology, Investigation. Matthew J. Coombes: Writing – review & editing, Supervision, Funding acquisition. Cunjia Liu: Writing – review & editing, Supervision, Methodology, Funding acquisition, Conceptualization.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Appendix A. Coordination protocol stability proof

In order to find the condition of optimality that minimises the cost function Hi w.r.t the robot location at time t[tk,tk+1], we set the partial derivative of (14) with respect to pt(i) to zero. Differentiating (14) and using Leibniz rule [56], one can simplify, we get (A.1)Hipt(i)=2Vi(pt(i)q)ϕk(q)dq+Vipt(i)q2ϕk(q)qpt(i)nˆi(q)dqwhere nˆi(q) denotes the outward normal vector to the boundary at point q. Furthermore, the product qpt(i)nˆi(q) denotes the velocity of the boundary at q. The boundary Vi consists of edges on either D or Vij=ViVj for all jNc,t(i)i. Therefore, (A.1) can be rewritten as (A.2)Hipt(i)=2Vi(pt(i)q)ϕk(q)dq+ViDpt(i)q2ϕk(q)×qpt(i)nˆi(q)dq+jNc,t(i)iVijpt(i)q2ϕk(q)qpt(i)nˆi(q)dq.Considering an infinitesimal motion of the ith robot, it holds that qpt(i)=0qViD, rendering the second term of (A.2) as zero. Furthermore, as the equality of (13) holds for qViVj, one can simplify the equality case as (A.3)pt(i)q2=pt(j)q2pt(i)2pt(j)2=2pt(i)pt(j)q.On differentiating (A.3) w.r.t pt(i), one can obtain (A.4)qpt(i)=2pt(j)pt(i)qpt(i).Recall that the shared boundaries of the locally induced Voronoi cells of the ith and jth robots are perpendicular bisectors of the line joining the two robots, that is, we can write nˆi=pt(j)pt(i)pt(j)pt(i). Therefore substituting the same in (A.4), we can write (A.5)qpt(i)nˆi(q)=qpt(i)2pt(j)pt(i)qVij.Therefore, using (A.5), the partial derivative Hipt(i) can be written as (A.6)Hipt(i)=2Vi(pt(i)q)ϕk(q)dq+jNc,t(i)iVijpt(i)q22pt(j)pt(i)qpt(i)ϕk(q)dq.=ut,cov(i)kg.Furthermore, choosing ut(i)=ut,cov(i), the Ḣ can be expressed as (A.7)dHidt=Hipt(i)Tṗt(i)=ut,cov(i)2kg0.It can be concluded from (A.7) that for ut(i)=ut,cov(i), the robots are driven in the direction that minimises their respective local optimising index (14) in a non-increasing fashion.

Appendix B. Supplementary data

The following is the Supplementary material related to this article.

Download: Download video (6MB)

Video S1. .

Data availability

Data will be made available on request.

References

Cited by (0)