这是用户在 2024-11-5 1:59 为 https://app.immersivetranslate.com/pdf-pro/d0a66901-45d4-4daf-9a9e-5df4b6129ad7 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?

Access Point Deployment for Localizing Accuracy and User Rate in Cell-Free Systems
无蜂窝系统中接入点部署以提高定位精度和用户速率

Fanfei Xu 徐凡飞Southeast University School of
东南大学
Information Science and Engineering
信息科学与工程
Nanjing 210096, China 南京 210096,中国

Shangqing Shi 上清诗Southeast University School of
东南大学
Information Science and Engineering
信息科学与工程
Nanjing 210096, China 南京 210096,中国

Shengheng Liu* 刘胜恒*Southeast University School of
东南大学
Information Science and Engineering
信息科学与工程
Nanjing 210096, China 南京 210096,中国s.liu@seu.edu.cn

Zihuan Mao 自欢猫Southeast University School of
东南大学
Information Science and Engineering
信息科学与工程
Nanjing 210096, China 南京 210096,中国

Dazhuan Xu 大转序
Purple Mountain Laboratories
紫山实验室

Nanjing 211111, China 南京 211111,中国

Dongming Wang 王东明Southeast University School of
东南大学
Information Science and Engineering
信息科学与工程
Nanjing 210096, China 南京 210096,中国

Yongming Huang* 永明黄*Southeast University School of
东南大学
Information Science and Engineering
信息科学与工程
Nanjing 210096, China 南京 210096,中国

Abstract 摘要

Evolving next-generation mobile networks is designed to provide ubiquitous coverage and networked sensing. With utility of multiview sensing and multi-node joint transmission, cell-free is a promising technique to realize this prospect. This paper aims to tackle the problem of access point (AP) deployment in cell-free systems to balance the sensing accuracy and user rate. By merging the D optimality with Euclidean criterion, a novel integrated metric is proposed to be the objective function for both max-sum and max min min min\min problems, which respectively guarantee the overall and lowest performance in multi-user communication and target tracking scenario. To solve the corresponding high dimensional non-convex multi-objective problem, the Soft actor-critic (SAC) is utilized to avoid risk of local optimal result. Numerical results demonstrate that proposed SAC-based APs deployment method achieves 20 % 20 % 20%20 \% of overall performance and 120 % 120 % 120%120 \% of lowest performance.
演进的下一代移动网络旨在提供无处不在的覆盖和网络感知。利用多视角感知和多节点联合传输,无基站技术是一种实现这一前景的有前途的技术。本文旨在解决无基站系统中接入点(AP)部署的问题,以平衡感知精度和用户速率。通过将 D 最优性与欧几里得标准相结合,提出了一种新颖的综合指标作为最大和最小 min min min\min 问题的目标函数,分别保证多用户通信和目标跟踪场景中的整体性能和最低性能。为了解决相应的高维非凸多目标问题,采用软演员-评论家(SAC)方法以避免局部最优结果的风险。数值结果表明,所提出的基于 SAC 的 AP 部署方法在整体性能上达到了 20 % 20 % 20%20 \% ,在最低性能上达到了 120 % 120 % 120%120 \%

CCS CONCEPTS CCS 概念

  • Networks rarr\rightarrow Network performance modeling; Wireless access points, base stations and infrastructure; *\cdot Computing methodologies rarr\rightarrow Policy iteration; *\cdot Hardware rarr\rightarrow Wireless integrated network sensors.
    网络 rarr\rightarrow 网络性能建模;无线接入点、基站和基础设施; *\cdot 计算方法 rarr\rightarrow 策略迭代; *\cdot 硬件 rarr\rightarrow 无线集成网络传感器。

    *Both authors are also affiliated to the Purple Mountain Laboratories, Nanjing 211111, China.
    两位作者也隶属于中国南京 211111 的紫金山实验室。
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org.
允许制作本作品的数字或纸质副本用于个人或课堂使用,前提是副本不用于盈利或商业利益,并且副本在首页上注明此通知和完整引用。必须尊重其他作者拥有的本作品组件的版权。允许带有引用的摘要。其他复制、再出版、在服务器上发布或重新分发到列表需要事先获得特定许可和/或支付费用。请求权限请联系 permissions@acm.org

ACM MobiCom '24, November 18-22, 2024, Washington D.C., DC, USA
ACM MobiCom '24,2024 年 11 月 18 日至 22 日,美国华盛顿特区

© 2024 Copyright held by the owner/author(s). Publication rights licensed to ACM. ACM ISBN 979-8-4007-0489-5/24/11
© 2024 版权所有,归所有者/作者持有。出版权已授权给 ACM。ACM ISBN 979-8-4007-0489-5/24/11

https://doi.org/10.1145/3636534.3698221

KEYWORDS 关键词

Cell-free, integrated sensing and communication, access point deployment, soft actor-critic, deep reinforcement learning
无细胞的集成传感与通信、接入点部署、软演员-评论家、深度强化学习

ACM Reference Format: ACM 参考格式:

Fanfei Xu, Shengheng Liu, Zihuan Mao, Shangqing Shi, Dazhuan Xu, Dongming Wang, and Yongming Huang. 2024. Access Point Deployment for Localizing Accuracy and User Rate in Cell-Free Systems. In The 30th Annual International Conference on Mobile Computing and Networking (ACM MobiCom '24), November 18-22, 2024, Washington D.C., DC, USA. ACM, New York, NY, USA, 7 pages. https://doi.org/10.1145/3636534.3698221
Fanfei Xu, Shengheng Liu, Zihuan Mao, Shangqing Shi, Dazhuan Xu, Dongming Wang, 和 Yongming Huang. 2024. 无蜂窝系统中定位精度和用户率的接入点部署. 在第 30 届国际移动计算与网络会议 (ACM MobiCom '24), 2024 年 11 月 18-22 日, 华盛顿特区, 美国. ACM, 纽约, NY, 美国, 7 页. https://doi.org/10.1145/3636534.3698221

1 INTRODUCTION 1 引言

The integrated sensing and communication (ISAC) is expected to become a key usages scenario of the future six-th generation (6G) networks, which is mentioned in newest recommendation concerned with framework and objective of future network of the International Telecommunication Union [10]. Telecommunication base stations, as a widely deployed infrastructure, can significantly enhance situational awareness when equipped with radar sensing capabilities, enabling a variety of novel applications, such as low-altitude drone intrusion detection and accurate virtual environment construction for digital twins [12]. Moreover, as wireless communication networks continue to evolve towards higher frequency bands, more spectrum resources can be obtained to satisfy the high bandwidth requirements of sensing functions. However, high-frequency signals with poor diffraction and high attenuation, makes traditional single station sensing performance significantly limited to shadow effects of obstructions.
集成感知与通信(ISAC)预计将成为未来第六代(6G)网络的关键应用场景,这在国际电信联盟最新的关于未来网络框架和目标的建议中提到。作为广泛部署的基础设施,电信基站在配备雷达感知能力时,可以显著增强态势感知,支持多种新颖应用,如低空无人机入侵检测和数字双胞胎的精确虚拟环境构建。此外,随着无线通信网络不断向更高频段演进,可以获得更多频谱资源,以满足感知功能的高带宽需求。然而,高频信号具有较差的衍射性和高衰减性,使得传统单站感知性能受到遮挡阴影效应的显著限制。
Cell-free networks [9, 14], an innovative architecture for 6G networks, may tackle this issue. Unlike conventional cellular networks, where each user equipment (UE) is served by a single base station within a designated cell, cell-free networks deploy a large number of distributed access points (APs) that collaboratively serve all users within the area. Widely-spread APs connected to a central processing unit (CPU) enable seamless sensing and ubiquitous coverage by use of key techniques such as dynamic user association and
无基站网络[9, 14],作为 6G 网络的一种创新架构,可能会解决这个问题。与传统的蜂窝网络不同,在传统网络中,每个用户设备(UE)由指定小区内的单个基站服务,而无基站网络则部署大量分布式接入点(AP),这些接入点协同为区域内的所有用户提供服务。广泛分布的接入点连接到中央处理单元(CPU),通过动态用户关联等关键技术实现无缝感知和普遍覆盖。
APs deployment [1]. Different APs location seriously affects the channel condition, intuitively, to fully leverage the advantages such as multi-view sensing information and joint transmission of the cell-free ISAC systems, investigating optimal APs deployment is vitally necessary.
APs 部署 [1]。不同 AP 的位置严重影响信道条件,直观上,为了充分利用如多视角感知信息和无小区 ISAC 系统的联合传输等优势,研究最佳 AP 部署是至关重要的。
Mathematical solutions such as vector quantization and gradient descent [ 2 , 4 , 13 ] [ 2 , 4 , 13 ] [2,4,13][2,4,13] were proposed to optimize the two dimension (2D) location of APs to improve of spectral efficiency in cell-free networks. On-demand service capability concerned with sum throughput was achieved by solving APs deployment [7] based on multiple linear regression model. Moreover, APs deployment not only affects communication performance, but also impacts sensing performance. Cramér-Rao lower bound (CRLB) was developed [6] for target velocity estimation in multiple-input multiple output (MIMO) radar, shown that the antenna placement affects the estimation accuracy significantly. Also in MIMO radar systems [3], CRLB for target localization in both coherent and non-coherent processing was developed. Additionally, based on the best unbiased linear unbiased estimator it derived, a closed-form localization estimation that revealed the relationship between antennas location, target location, and localization accuracy was provided. Furthermore, geometry gain of antennas deployment for target localizing is [11] analyzed in MIMO radar systems. In summary, substantial researches have separately analyzed the impacts of APs deployment on communication and sensing performance. However, the area of jointly considering both of them still remains blanket.
数学解决方案,如向量量化和梯度下降 [ 2 , 4 , 13 ] [ 2 , 4 , 13 ] [2,4,13][2,4,13] ,被提出用于优化接入点(AP)的二维(2D)位置,以提高无蜂窝网络中的频谱效率。通过基于多元线性回归模型解决接入点部署问题,实现了与总吞吐量相关的按需服务能力[7]。此外,接入点的部署不仅影响通信性能,还影响感知性能。为多输入多输出(MIMO)雷达中的目标速度估计开发了克拉美-罗下界(CRLB)[6],显示天线位置显著影响估计精度。在 MIMO 雷达系统中[3],也开发了针对目标定位的 CRLB,适用于相干和非相干处理。此外,基于最佳无偏线性无偏估计器,提供了一种闭式形式的定位估计,揭示了天线位置、目标位置和定位精度之间的关系。此外,针对目标定位的天线部署几何增益在 MIMO 雷达系统中进行了分析[11]。 总之,大量研究分别分析了接入点部署对通信和感知性能的影响。然而,联合考虑这两者的领域仍然空白。
Thus, to simultaneously measure sensing and communication performance, in this work we propose a unified evaluation metric in cell-free ISAC systems, merging user rate with localizing accuracy [8] derived from Euclidean distance and D-optimal criterion, and utilize it as the objective function for APs deployment optimization. In addition, considering fairness for all UEs and localizing accuracy throughout the target moving trajectory, we provide the deployment results of both max-sum and max-min problem. Due to the non-convex and high-dimensional property of the original multi-objective problem, mathematical algorithms are challenging to solve it. Soft actor-critic (SAC) [5], a deep reinforcement learning (DRL) based APs deployment method is proposed, with the utility of additive maximum AP deployment entropy term to avoid local optimum. Numerical results show the superior performance of our proposed SAC-based deployment method compared with other DRL algorithms such as deep deterministic policy gradient (DDPG) and twin-delayed DDPG (TD3).
因此,为了同时测量感知和通信性能,在本研究中,我们提出了一种统一的评估指标,用于无小区 ISAC 系统,将用户速率与基于欧几里得距离和 D-最优准则的定位精度[8]相结合,并将其作为 AP 部署优化的目标函数。此外,考虑到所有用户设备(UEs)的公平性以及目标移动轨迹中的定位精度,我们提供了最大和最小问题的部署结果。由于原始多目标问题的非凸性和高维特性,数学算法难以解决。我们提出了一种基于深度强化学习(DRL)的 AP 部署方法——软演员评论家(SAC)[5],利用附加的最大 AP 部署熵项以避免局部最优。数值结果表明,与其他 DRL 算法(如深度确定性策略梯度(DDPG)和双延迟 DDPG(TD3))相比,我们提出的基于 SAC 的部署方法具有优越的性能。

2 ISAC SIGNAL MODEL AND PROBLEM FORMULATION
2 ISAC 信号模型与问题表述

As illustrated in Fig. 1, we consider a cell-free ISAC system consists of M M MM single-antenna transmitter APs and N N NN single-antenna receiver APs which collaboratively serve K K KK single-antenna UEs and estimate position of one target moving with specific trajectory. The 2D position of them is denoted by t m = [ x m t , y m t ] , r n = [ x n r , y n r ] , u k = t m = x m t , y m t , r n = x n r , y n r , u k = t_(m)=[x_(m)^(t),y_(m)^(t)],r_(n)=[x_(n)^(r),y_(n)^(r)],u_(k)=\mathbf{t}_{m}=\left[x_{m}^{\mathrm{t}}, y_{m}^{\mathrm{t}}\right], \mathbf{r}_{n}=\left[x_{n}^{\mathrm{r}}, y_{n}^{\mathrm{r}}\right], \mathbf{u}_{k}= [ x k , y k ] , p = [ x p , y p ] x k , y k , p = x p , y p [x_(k),y_(k)],p=[x^(p),y^(p)]\left[x_{k}, y_{k}\right], \mathbf{p}=\left[x^{\mathrm{p}}, y^{\mathrm{p}}\right] respectively, where m M = { 1 , , M } m M = { 1 , , M } m inM={1,dots,M}m \in \mathbb{M}=\{1, \ldots, M\}, n N = { 1 , , N } , k K = { 1 , , K } n N = { 1 , , N } , k K = { 1 , , K } n inN={1,dots,N},k inK={1,dots,K}n \in \mathbb{N}=\{1, \ldots, N\}, k \in \mathbb{K}=\{1, \ldots, K\}.
如图 1 所示,我们考虑一个无细胞的 ISAC 系统,由 M M MM 个单天线发射器 AP 和 N N NN 个单天线接收器 AP 组成,它们协同服务 K K KK 个单天线用户设备,并估计一个沿特定轨迹移动的目标的位置。它们的二维位置分别用 t m = [ x m t , y m t ] , r n = [ x n r , y n r ] , u k = t m = x m t , y m t , r n = x n r , y n r , u k = t_(m)=[x_(m)^(t),y_(m)^(t)],r_(n)=[x_(n)^(r),y_(n)^(r)],u_(k)=\mathbf{t}_{m}=\left[x_{m}^{\mathrm{t}}, y_{m}^{\mathrm{t}}\right], \mathbf{r}_{n}=\left[x_{n}^{\mathrm{r}}, y_{n}^{\mathrm{r}}\right], \mathbf{u}_{k}= [ x k , y k ] , p = [ x p , y p ] x k , y k , p = x p , y p [x_(k),y_(k)],p=[x^(p),y^(p)]\left[x_{k}, y_{k}\right], \mathbf{p}=\left[x^{\mathrm{p}}, y^{\mathrm{p}}\right] 表示,其中 m M = { 1 , , M } m M = { 1 , , M } m inM={1,dots,M}m \in \mathbb{M}=\{1, \ldots, M\} n N = { 1 , , N } , k K = { 1 , , K } n N = { 1 , , N } , k K = { 1 , , K } n inN={1,dots,N},k inK={1,dots,K}n \in \mathbb{N}=\{1, \ldots, N\}, k \in \mathbb{K}=\{1, \ldots, K\}

2.1 Fisher Information 2.1 费舍尔信息

When system is executing sensing function module, M M MM transmitting APs send probe signal and N N NN receiving APs capture the echo from
当系统执行感知功能模块时, M M MM 发送探测信号的接入点和 N N NN 接收回波的接入点

Figure 1: Cell-free ISAC systems.
图 1:无细胞 ISAC 系统。

target. In this distributed detection system, the low-pass equivalent of the narrow-band signal transmitted from i i ii-th APs at time t t tt is represented as E / M s i ( t ) E / M s i ( t ) sqrt(E//M)s_(i)(t)\sqrt{E / M} s_{i}(t), where E E EE denotes the total transmission energy. We assume that the transmission signal is uncorrelated in any time delay,
目标。在这个分布式检测系统中,从 i i ii -th APs 在时间 t t tt 传输的窄带信号的低通等效表示为 E / M s i ( t ) E / M s i ( t ) sqrt(E//M)s_(i)(t)\sqrt{E / M} s_{i}(t) ,其中 E E EE 表示总传输能量。我们假设传输信号在任何时间延迟下都是不相关的,
T s i ( t ) s j ( t τ ) d t { 1 , if i = j 0 , if i j T s i ( t ) s j ( t τ ) d t 1 ,       if       i = j 0 ,       if       i j int_(T)s_(i)(t)s_(j)^(**)(t-tau)dt~~{[1","," if ",i=j],[0","," if ",i!=j]:}\int_{T} s_{i}(t) s_{j}^{*}(t-\tau) d t \approx\left\{\begin{array}{lll} 1, & \text { if } & i=j \\ 0, & \text { if } & i \neq j \end{array}\right.
where ( ) ( ) (*)^(**)(\cdot)^{*} represent the conjugate operator. In addition, the signal is normalized in the whole signal processing interval T T TT, i.e., T | s i ( t ) | 2 d t = 1 T s i ( t ) 2 d t = 1 int_(T)|s_(i)(t)|^(2)dt=1\int_{T}\left|s_{i}(t)\right|^{2} d t=1.
其中 ( ) ( ) (*)^(**)(\cdot)^{*} 代表共轭算子。此外,信号在整个信号处理区间 T T TT 内被归一化,即 T | s i ( t ) | 2 d t = 1 T s i ( t ) 2 d t = 1 int_(T)|s_(i)(t)|^(2)dt=1\int_{T}\left|s_{i}(t)\right|^{2} d t=1
Non-coherent processing, a more practical operation is selected due to its low requirement of time synchronization compared with coherent processing. The echo signal accepted at n n nn-th receiving AP is denoted by
非相干处理由于对时间同步的低要求,相较于相干处理,选择了一种更实用的操作。在 n n nn -th 接收 AP 接收到的回波信号表示为
y n ( t ) = m = 1 M η m n s m ( t τ m n ) + w n ( t ) y n ( t ) = m = 1 M η m n s m t τ m n + w n ( t ) y_(n)(t)=sum_(m=1)^(M)eta_(mn)s_(m)(t-tau_(mn))+w_(n)(t)y_{n}(t)=\sum_{m=1}^{M} \eta_{m n} s_{m}\left(t-\tau_{m n}\right)+w_{n}(t)
where w n ( t ) i.i.d. C N ( 0 , 1 ) w n ( t )  i.i.d.  C N ( 0 , 1 ) w_(n)(t)∼^(" i.i.d. ")CN(0,1)w_{n}(t) \stackrel{\text { i.i.d. }}{\sim} \mathcal{C N}(0,1) represents additive Gaussian noise, η = η = eta=\eta= [ η 11 , η 12 , , η m n , , η M N ] T η 11 , η 12 , , η m n , , η M N T [eta_(11),eta_(12),dots,eta_(mn),dots,eta_(MN)]^(T)\left[\eta_{11}, \eta_{12}, \ldots, \eta_{m n}, \ldots, \eta_{M N}\right]^{\mathrm{T}} is the coefficient referring to target reflection and channel propagation fading. τ m n τ m n tau_(mn)\tau_{m n} denotes time delay of signal transmission and reflection between m m mm-th transmitting AP and n n nn-th receiving AP
其中 w n ( t ) i.i.d. C N ( 0 , 1 ) w n ( t )  i.i.d.  C N ( 0 , 1 ) w_(n)(t)∼^(" i.i.d. ")CN(0,1)w_{n}(t) \stackrel{\text { i.i.d. }}{\sim} \mathcal{C N}(0,1) 代表加性高斯噪声, η = η = eta=\eta= [ η 11 , η 12 , , η m n , , η M N ] T η 11 , η 12 , , η m n , , η M N T [eta_(11),eta_(12),dots,eta_(mn),dots,eta_(MN)]^(T)\left[\eta_{11}, \eta_{12}, \ldots, \eta_{m n}, \ldots, \eta_{M N}\right]^{\mathrm{T}} 是指目标反射和信道传播衰落的系数。 τ m n τ m n tau_(mn)\tau_{m n} 表示第 m m mm 个发射接入点与第 n n nn 个接收接入点之间信号传输和反射的时间延迟。
τ m n = d m + d n c τ m n = d m + d n c tau_(mn)=(d_(m)+d_(n))/(c)\tau_{m n}=\frac{d_{m}+d_{n}}{c}
where c c cc is speed of light and d m = t m p 2 , d n = r n p 2 d m = t m p 2 , d n = r n p 2 d_(m)=||t_(m)-p||_(2),d_(n)=||r_(n)-p||_(2)d_{m}=\left\|\mathbf{t}_{m}-\mathbf{p}\right\|_{2}, d_{n}=\left\|\mathbf{r}_{n}-\mathbf{p}\right\|_{2} denote the distance between target p p p\mathbf{p} and transmitter or receiver AP.
其中 c c cc 是光速, d m = t m p 2 , d n = r n p 2 d m = t m p 2 , d n = r n p 2 d_(m)=||t_(m)-p||_(2),d_(n)=||r_(n)-p||_(2)d_{m}=\left\|\mathbf{t}_{m}-\mathbf{p}\right\|_{2}, d_{n}=\left\|\mathbf{r}_{n}-\mathbf{p}\right\|_{2} 表示目标 p p p\mathbf{p} 与发射器或接收器 AP 之间的距离。
As for the localizing accuracy, we utilize D-optimal criterion to evaluate the performance of parameter estimation. The product of the CRLB matrix eigenvalues (or determinant) is minimize, which is equivalent to minimize the area of the Elliptical error probable. Fisher information matrix (FIM) is the inverse of CRLB matrix, therefore, minimizing the determinant of CRLB matrix converts to maximizing the FIM. The FIM of the estimation for the parameter
关于定位精度,我们利用 D-最优准则来评估参数估计的性能。最小化 CRLB 矩阵特征值(或行列式)的乘积,这等同于最小化椭圆误差概率的面积。费舍尔信息矩阵(FIM)是 CRLB 矩阵的逆,因此,最小化 CRLB 矩阵的行列式转化为最大化 FIM。参数估计的 FIM

vector ϑ ϑ vartheta\vartheta by use of the measurement vector x x x\mathbf{x} can be expressed as:
向量 ϑ ϑ vartheta\vartheta 可以通过测量向量 x x x\mathbf{x} 表示为:
Φ = E { [ ϑ ln f ( x ϑ ) ] [ ϑ ln f ( x ϑ ) ] T } . Φ = E ϑ ln f ( x ϑ ) ϑ ln f ( x ϑ ) T . Phi=E{[(del)/(del vartheta)ln f(x∣vartheta)][(del)/(del vartheta)ln f(x∣vartheta)]^(T)}.\Phi=\mathrm{E}\left\{\left[\frac{\partial}{\partial \vartheta} \ln \mathrm{f}(\mathrm{x} \mid \vartheta)\right]\left[\frac{\partial}{\partial \vartheta} \ln \mathrm{f}(\mathrm{x} \mid \vartheta)\right]^{\mathrm{T}}\right\} .
To be specific, when the error of the estimation is Gaussian noise, the FIM for in cell-free ISAC system, non-coherent sensing FIM for target detection is equal to
具体来说,当估计误差为高斯噪声时,细胞自由 ISAC 系统中的 FIM,非相干传感的目标检测 FIM 等于
Φ = [ ϕ 11 ϕ 12 ϕ 21 ϕ 22 ] = J 0 T Σ 1 J 0 Φ = ϕ 11      ϕ 12 ϕ 21      ϕ 22 = J 0 T Σ 1 J 0 Phi=[[phi_(11),phi_(12)],[phi_(21),phi_(22)]]=J_(0)^(T)Sigma^(-1)J_(0)\Phi=\left[\begin{array}{ll} \phi_{11} & \phi_{12} \\ \phi_{21} & \phi_{22} \end{array}\right]=\mathbf{J}_{0}^{\mathrm{T}} \Sigma^{-1} \mathrm{~J}_{0}
where J 0 J 0 J_(0)\mathbf{J}_{0} is the Jacobian matrix of the target localizing at p = p = p=\mathbf{p}= [ x p , y p x p , y p [x^(p),y^(p):}\left[x^{\mathrm{p}}, y^{\mathrm{p}}\right. ]
其中 J 0 J 0 J_(0)\mathbf{J}_{0} 是定位于 p = p = p=\mathbf{p}= [ x p , y p x p , y p [x^(p),y^(p):}\left[x^{\mathrm{p}}, y^{\mathrm{p}}\right. 的雅可比矩阵
J 0 = [ ( α 1 t + α 1 r ) ( α 1 t + α 2 r ) ( α 1 t + α N r ) ( α M t + α N r ) ] T J 0 = α 1 t + α 1 r α 1 t + α 2 r α 1 t + α N r α M t + α N r T J_(0)=[(alpha_(1)^(t)+alpha_(1)^(r))(alpha_(1)^(t)+alpha_(2)^(r))cdots(alpha_(1)^(t)+alpha_(N)^(r))cdots(alpha_(M)^(t)+alpha_(N)^(r))]^(T)\mathrm{J}_{0}=\left[\left(\alpha_{1}^{t}+\alpha_{1}^{r}\right)\left(\alpha_{1}^{t}+\alpha_{2}^{r}\right) \cdots\left(\alpha_{1}^{t}+\alpha_{N}^{r}\right) \cdots\left(\alpha_{M}^{t}+\alpha_{N}^{r}\right)\right]^{\mathrm{T}}
where α m t = [ cos θ m t sin θ m t ] T , α n r = [ cos θ n r sin θ n r ] T , θ α m t = cos θ m t sin θ m t T , α n r = cos θ n r sin θ n r T , θ alpha_(m)^(t)=[cos theta_(m)^(t)sin theta_(m)^(t)]^(T),alpha_(n)^(r)=[cos theta_(n)^(r)sin theta_(n)^(r)]^(T),theta\alpha_{m}^{t}=\left[\cos \theta_{m}^{t} \sin \theta_{m}^{t}\right]^{\mathrm{T}}, \alpha_{n}^{r}=\left[\cos \theta_{n}^{r} \sin \theta_{n}^{r}\right]^{\mathrm{T}}, \theta represents the bearing angle of the AP relative to the target, measured from the horizontal axis. According to [11], the expression of determinant of the angular FIM is
其中 α m t = [ cos θ m t sin θ m t ] T , α n r = [ cos θ n r sin θ n r ] T , θ α m t = cos θ m t sin θ m t T , α n r = cos θ n r sin θ n r T , θ alpha_(m)^(t)=[cos theta_(m)^(t)sin theta_(m)^(t)]^(T),alpha_(n)^(r)=[cos theta_(n)^(r)sin theta_(n)^(r)]^(T),theta\alpha_{m}^{t}=\left[\cos \theta_{m}^{t} \sin \theta_{m}^{t}\right]^{\mathrm{T}}, \alpha_{n}^{r}=\left[\cos \theta_{n}^{r} \sin \theta_{n}^{r}\right]^{\mathrm{T}}, \theta 代表 AP 相对于目标的方位角,从水平轴测量。根据 [11],角度 FIM 的行列式表达式为
| Φ | = { m = 1 M n = 1 N ( cos θ m t + cos θ n r ) 2 m = 1 M n = 1 N ( sin θ m t + sin θ n r ) 2 [ m = 1 M n = 1 N ( cos θ m t + cos θ n r ) ( sin θ m t + sin θ n r ) ] 2 } | Φ | = m = 1 M n = 1 N cos θ m t + cos θ n r 2 m = 1 M n = 1 N sin θ m t + sin θ n r 2 m = 1 M n = 1 N cos θ m t + cos θ n r sin θ m t + sin θ n r 2 {:[|Phi|={sum_(m=1)^(M)sum_(n=1)^(N)(cos theta_(m)^(t)+cos theta_(n)^(r))^(2)sum_(m=1)^(M)sum_(n=1)^(N)(sin theta_(m)^(t)+sin theta_(n)^(r))^(2):}],[{:-[sum_(m=1)^(M)sum_(n=1)^(N)(cos theta_(m)^(t)+cos theta_(n)^(r))(sin theta_(m)^(t)+sin theta_(n)^(r))]^(2)}]:}\begin{aligned} |\Phi| & =\left\{\sum_{m=1}^{M} \sum_{n=1}^{N}\left(\cos \theta_{m}^{t}+\cos \theta_{n}^{r}\right)^{2} \sum_{m=1}^{M} \sum_{n=1}^{N}\left(\sin \theta_{m}^{t}+\sin \theta_{n}^{r}\right)^{2}\right. \\ & \left.-\left[\sum_{m=1}^{M} \sum_{n=1}^{N}\left(\cos \theta_{m}^{t}+\cos \theta_{n}^{r}\right)\left(\sin \theta_{m}^{t}+\sin \theta_{n}^{r}\right)\right]^{2}\right\} \end{aligned}
Next, we transform the determinant of the angular FIM into twodimensional Cartesian coordinates, and determine optimal APs deployment positions based on the maximum value of the FIM determinant.
接下来,我们将角度 FIM 的行列式转换为二维笛卡尔坐标,并根据 FIM 行列式的最大值确定最佳 AP 部署位置。
| Φ | = { m = 1 M n = 1 N ( x p x m t p t m 2 + x p x n r p r n 2 ) 2 m = 1 M n = 1 N ( y p y m t p t m 2 + y p y n r p r n 2 ) 2 [ m = 1 M n = 1 N ( x p x m t p t m 2 + x p x n r p r n 2 ) ( y p y m t p t m 2 + y p y n r p r n 2 ) ] 2 } | Φ | = m = 1 M n = 1 N x p x m t p t m 2 + x p x n r p r n 2 2 m = 1 M n = 1 N y p y m t p t m 2 + y p y n r p r n 2 2 m = 1 M n = 1 N x p x m t p t m 2 + x p x n r p r n 2 y p y m t p t m 2 + y p y n r p r n 2 2 {:[|Phi|={sum_(m=1)^(M)sum_(n=1)^(N)((x^(p)-x_(m)^(t))/(||p-t_(m)||_(2))+(x^(p)-x_(n)^(r))/(||p-r_(n)||_(2)))^(2)sum_(m=1)^(M)sum_(n=1)^(N)((y^(p)-y_(m)^(t))/(||p-t_(m)||_(2))+(y^(p)-y_(n)^(r))/(||p-r_(n)||_(2)))^(2):}],[{:-[sum_(m=1)^(M)sum_(n=1)^(N)((x^(p)-x_(m)^(t))/(||p-t_(m)||_(2))+(x^(p)-x_(n)^(r))/(||p-r_(n)||_(2)))((y^(p)-y_(m)^(t))/(||p-t_(m)||_(2))+(y^(p)-y_(n)^(r))/(||p-r_(n)||_(2)))]^(2)}]:}\begin{aligned} |\boldsymbol{\Phi}|= & \left\{\sum_{m=1}^{M} \sum_{n=1}^{N}\left(\frac{x^{p}-x_{m}^{t}}{\left\|\mathbf{p}-\mathbf{t}_{m}\right\|_{2}}+\frac{x^{p}-x_{n}^{r}}{\left\|\mathbf{p}-\mathbf{r}_{n}\right\|_{2}}\right)^{2} \sum_{m=1}^{M} \sum_{n=1}^{N}\left(\frac{y^{p}-y_{m}^{t}}{\left\|\mathbf{p}-\mathbf{t}_{m}\right\|_{2}}+\frac{y^{p}-y_{n}^{r}}{\left\|\mathbf{p}-\mathbf{r}_{n}\right\|_{2}}\right)^{2}\right. \\ & \left.-\left[\sum_{m=1}^{M} \sum_{n=1}^{N}\left(\frac{x^{p}-x_{m}^{t}}{\left\|\mathbf{p}-\mathbf{t}_{m}\right\|_{2}}+\frac{x^{p}-x_{n}^{r}}{\left\|\mathbf{p}-\mathbf{r}_{n}\right\|_{2}}\right)\left(\frac{y^{p}-y_{m}^{t}}{\left\|\mathbf{p}-\mathbf{t}_{m}\right\|_{2}}+\frac{y^{p}-y_{n}^{r}}{\left\|\mathbf{p}-\mathbf{r}_{n}\right\|_{2}}\right)\right]^{2}\right\} \end{aligned}

2.2 User Sum Rate 2.2 用户总速率

An cell-free uplink communication model is studied in this section, where all M + N M + N M+NM+N APs simultaneously accept signals from all K K KK UEs.The uplink receiving signal at l l ll-th AP is
在本节中研究了一种无细胞上行通信模型,其中所有 M + N M + N M+NM+N 个接入点同时接收来自所有 K K KK 个用户设备的信号。第 l l ll 个接入点的上行接收信号为
y l = k = 1 K ρ h l k x k + w l y l = k = 1 K ρ h l k x k + w l y_(l)=sum_(k=1)^(K)sqrtrhoh_(lk)x_(k)+w_(l)y_{l}=\sum_{k=1}^{K} \sqrt{\rho} h_{l k} x_{k}+w_{l}
where ρ , x k ρ , x k sqrtrho,x_(k)\sqrt{\rho}, x_{k} is transmitter power and data symbol of k k kk-th UE, l { 1 , 2 , , M + N } l { 1 , 2 , , M + N } l in{1,2,dots,M+N}l \in\{1,2, \ldots, M+N\}. The received signal of all M + N M + N M+NM+N APs can be represented as
其中 ρ , x k ρ , x k sqrtrho,x_(k)\sqrt{\rho}, x_{k} k k kk -th UE 的发射功率和数据符号, l { 1 , 2 , , M + N } l { 1 , 2 , , M + N } l in{1,2,dots,M+N}l \in\{1,2, \ldots, M+N\} 。所有 M + N M + N M+NM+N 个 AP 的接收信号可以表示为
y = ρ H x + w y = ρ H x + w y=sqrtrhoHx+w\mathbf{y}=\sqrt{\rho} \mathbf{H x}+\mathbf{w}
where x = [ x 1 , x 2 , , x K ] T , w = [ w 1 , w 2 , , w M + N ] T x = x 1 , x 2 , , x K T , w = w 1 , w 2 , , w M + N T x=[x_(1),x_(2),dots,x_(K)]^(T),w=[w_(1),w_(2),dots,w_(M+N)]^(T)\mathbf{x}=\left[x_{1}, x_{2}, \ldots, x_{K}\right]^{\mathrm{T}}, \mathbf{w}=\left[w_{1}, w_{2}, \ldots, w_{M+N}\right]^{\mathrm{T}} and H [ l , k ] = H [ l , k ] = H[l,k]=\mathbf{H}[l, k]= h l k , H C ( M + N ) × K h l k , H C ( M + N ) × K h_(lk),HinC^((M+N)xx K)h_{l k}, \mathbf{H} \in \mathbb{C}^{(M+N) \times K} is channel coefficients matrix. The narrowband fading channel coefficient is written as h l k = β l k g l k h l k = β l k g l k h_(lk)=sqrt(beta_(lk))g_(lk)h_{l k}=\sqrt{\beta_{l k}} g_{l k}, where g l k i.i.d. C N ( 0 , 1 ) g l k  i.i.d.  C N ( 0 , 1 ) g_(lk)∼^(" i.i.d. ")CN(0,1)g_{l k} \stackrel{\text { i.i.d. }}{\sim} \mathcal{C N}(0,1) small fading coefficients, and β l k β l k beta_(lk)\beta_{l k} is large fading coefficients
其中 x = [ x 1 , x 2 , , x K ] T , w = [ w 1 , w 2 , , w M + N ] T x = x 1 , x 2 , , x K T , w = w 1 , w 2 , , w M + N T x=[x_(1),x_(2),dots,x_(K)]^(T),w=[w_(1),w_(2),dots,w_(M+N)]^(T)\mathbf{x}=\left[x_{1}, x_{2}, \ldots, x_{K}\right]^{\mathrm{T}}, \mathbf{w}=\left[w_{1}, w_{2}, \ldots, w_{M+N}\right]^{\mathrm{T}} H [ l , k ] = H [ l , k ] = H[l,k]=\mathbf{H}[l, k]= h l k , H C ( M + N ) × K h l k , H C ( M + N ) × K h_(lk),HinC^((M+N)xx K)h_{l k}, \mathbf{H} \in \mathbb{C}^{(M+N) \times K} 是信道系数矩阵。窄带衰落信道系数写作 h l k = β l k g l k h l k = β l k g l k h_(lk)=sqrt(beta_(lk))g_(lk)h_{l k}=\sqrt{\beta_{l k}} g_{l k} ,其中 g l k i.i.d. C N ( 0 , 1 ) g l k  i.i.d.  C N ( 0 , 1 ) g_(lk)∼^(" i.i.d. ")CN(0,1)g_{l k} \stackrel{\text { i.i.d. }}{\sim} \mathcal{C N}(0,1) 是小衰落系数, β l k β l k beta_(lk)\beta_{l k} 是大衰落系数。
β l k = { d 0 t l u k 2 2 , if l M d 0 r l u k 2 2 , if M < l M + N β l k = d 0 t l u k 2 2 ,       if  l M d 0 r l u k 2 2 ,       if  M < l M + N beta_(lk)={[(d_(0))/(||t_(l)-u_(k)||_(2)^(2))","," if "quad l <= M],[(d_(0))/(||r_(l)-u_(k)||_(2)^(2))","," if "quad M < l <= M+N]:}\beta_{l k}=\left\{\begin{array}{lll} \frac{d_{0}}{\left\|\mathbf{t}_{l}-\mathbf{u}_{k}\right\|_{2}^{2}}, & \text { if } \quad l \leq M \\ \frac{d_{0}}{\left\|\mathbf{r}_{l}-\mathbf{u}_{k}\right\|_{2}^{2}}, & \text { if } \quad M<l \leq M+N \end{array}\right.
where d 0 d 0 d_(0)d_{0} is a constant denoting the reference distance.
其中 d 0 d 0 d_(0)d_{0} 是表示参考距离的常数。

In cell-free systems, signals accepted by distributed APs at various geographic position can be collectively processed in CPU. For instance, zero forcing reception is employed to mitigate interference from multiple UEs. Then, the processed received signal is expressed as
在无蜂窝系统中,分布在不同地理位置的接入点接受的信号可以在中央处理器中进行集体处理。例如,采用零强迫接收来减轻来自多个用户设备的干扰。然后,处理后的接收信号表示为
y ^ = ( H H H ) 1 H H y y ^ = H H H 1 H H y hat(y)=(H^(H)H)^(-1)H^(H)y\hat{\mathbf{y}}=\left(\mathbf{H}^{\mathrm{H}} \mathbf{H}\right)^{-1} \mathbf{H}^{\mathrm{H}} \mathbf{y}
The asymptotic SNR can be expressed as
渐近信噪比可以表示为
1 M + N SNR k ( M + N ) a.s. ρ β ¯ k 1 M + N SNR k ( M + N )  a.s.  ρ β ¯ k (1)/(M+N)SNR_(k)rarr_("(M+N)rarr oo")^("" a.s. "")rho bar(beta)_(k)\frac{1}{M+N} \mathrm{SNR}_{k} \xrightarrow[(M+N) \rightarrow \infty]{\text { a.s. }} \rho \bar{\beta}_{k}
where β ¯ k lim ( M + N ) 1 M + N l β l k β ¯ k lim ( M + N ) 1 M + N l β l k bar(beta)_(k)≜lim_((M+N)rarr oo)(1)/(M+N)sum_(l)beta_(lk)\bar{\beta}_{k} \triangleq \lim _{(M+N) \rightarrow \infty} \frac{1}{M+N} \sum_{l} \beta_{l k}. For simplification of derivation, we consider the transmit power to be unity for all UEs. We assume no shadow fading in the large-scale fading coefficient β l k β l k beta_(lk)\beta_{l k}, and constant d 0 d 0 d_(0)d_{0} is 1 . These assumptions do not affect the conclusions of subsequent methods. At this point, the SNR for k k kk UE can be expressed by use of Euclidean distance criterion:
β ¯ k lim ( M + N ) 1 M + N l β l k β ¯ k lim ( M + N ) 1 M + N l β l k bar(beta)_(k)≜lim_((M+N)rarr oo)(1)/(M+N)sum_(l)beta_(lk)\bar{\beta}_{k} \triangleq \lim _{(M+N) \rightarrow \infty} \frac{1}{M+N} \sum_{l} \beta_{l k} 处。为了简化推导,我们假设所有用户设备(UE)的发射功率为 1。我们假设在大规模衰落系数 β l k β l k beta_(lk)\beta_{l k} 中没有阴影衰落,并且常数 d 0 d 0 d_(0)d_{0} 为 1。这些假设不会影响后续方法的结论。此时, k k kk 用户设备的信噪比(SNR)可以通过使用欧几里得距离标准来表示:
SNR k = ( m = 1 M 1 u k t m 2 2 + n = 1 N 1 u k r n 2 2 ) SNR k = m = 1 M 1 u k t m 2 2 + n = 1 N 1 u k r n 2 2 SNR_(k)=(sum_(m=1)^(M)(1)/(||u_(k)-t_(m)||_(2)^(2))+sum_(n=1)^(N)(1)/(||u_(k)-r_(n)||_(2)^(2)))\mathrm{SNR}_{k}=\left(\sum_{m=1}^{M} \frac{1}{\left\|\mathbf{u}_{k}-\mathbf{t}_{m}\right\|_{2}^{2}}+\sum_{n=1}^{N} \frac{1}{\left\|\mathbf{u}_{k}-\mathbf{r}_{n}\right\|_{2}^{2}}\right)
The sum communication rate of cell-free systems is
无细胞系统的总通信速率是
k = 1 K R k = k = 1 K log ( 1 + SNR k ) = k = 1 K log ( m = 1 M 1 u k t m 2 2 + n = 1 N 1 u k r n 2 2 ) k = 1 K R k = k = 1 K log 1 + SNR k = k = 1 K log m = 1 M 1 u k t m 2 2 + n = 1 N 1 u k r n 2 2 {:[sum_(k=1)^(K)R_(k)=sum_(k=1)^(K)log(1+SNR_(k))],[=sum_(k=1)^(K)log(sum_(m=1)^(M)(1)/(||u_(k)-t_(m)||_(2)^(2))+sum_(n=1)^(N)(1)/(||u_(k)-r_(n)||_(2)^(2)))]:}\begin{aligned} \sum_{k=1}^{K} R_{k} & =\sum_{k=1}^{K} \log \left(1+\mathrm{SNR}_{k}\right) \\ & =\sum_{k=1}^{K} \log \left(\sum_{m=1}^{M} \frac{1}{\left\|\mathbf{u}_{k}-\mathbf{t}_{m}\right\|_{2}^{2}}+\sum_{n=1}^{N} \frac{1}{\left\|\mathbf{u}_{k}-\mathbf{r}_{n}\right\|_{2}^{2}}\right) \end{aligned}

2.3 Deployment Problem Formulation
2.3 部署问题的表述

In the context of cell-free ISAC systems, the comprehensive consideration of communication and sensing performance can be modeled as a multi-objective optimization problem
在无基站的 ISAC 系统中,通信和感知性能的综合考虑可以建模为一个多目标优化问题
max t , r U ( t , r ) = ( U 1 , U 2 ) s . t . x min x i ( x ) x min , i = 1 , , M + N y min y i ( x ) y min , i = 1 , , M + N max t , r U ( t , r ) = U 1 , U 2 s . t . x min x i ( x ) x min , i = 1 , , M + N y min y i ( x ) y min , i = 1 , , M + N {:[max_(t,r),U(t","r)=(U_(1),U_(2))],[s.t.,x_(min) <= x_(i)(x) <= x_(min)","quad i=1","dots","M+N],[,y_(min) <= y_(i)(x) <= y_(min)","quad i=1","dots","M+N]:}\begin{array}{cl} \underset{\mathbf{t}, \mathbf{r}}{\max } & U(\mathbf{t}, \mathbf{r})=\left(U_{1}, U_{2}\right) \\ \mathrm{s.t.} & x_{\min } \leq x_{i}(x) \leq x_{\min }, \quad i=1, \ldots, M+N \\ & y_{\min } \leq y_{i}(x) \leq y_{\min }, \quad i=1, \ldots, M+N \end{array}
constraints in (16) mean that all APs must be deployed within a specific area.
(16)中的约束意味着所有 AP 必须在特定区域内部署。
Directly solving the Pareto front of multi-objective problems is highly challenging. Therefore, we transform the original problem into a single-objective optimization problem for solution.
直接解决多目标问题的帕累托前沿是非常具有挑战性的。因此,我们将原始问题转化为单目标优化问题进行求解。
Due to the dimensional disparity between localizing accuracy and communication rate, traditional weighted sum approaches struggle to achieve balanced optimization between the two. Therefore, we adopt a multiplication method to transform the original problem for resolution, thereby achieving a better trade-off between communication and sensing performance.
由于定位精度和通信速率之间的维度差异,传统的加权和方法难以在两者之间实现平衡优化。因此,我们采用乘法方法来转化原始问题以进行解决,从而在通信和感知性能之间实现更好的权衡。
We firstly consider the maximizing sum of communication capacity and sensing accuracy during the target moving trajectory. In addition, to ensure the uniform service for all UEs and sensing accuracy throughout the entire trajectory period, we also consider the maximization of minimum capacity and accuracy. Then, the
我们首先考虑在目标移动轨迹中最大化通信容量和感知精度的总和。此外,为了确保所有用户设备的均匀服务和整个轨迹期间的感知精度,我们还考虑最小容量和精度的最大化。然后,

objective function can be represented as
目标函数可以表示为
U ( t , r ) = { k = 1 K R k / Q p ( ϵ ) | Φ | / Q , for max-sum min k K R k min p ( ϵ ) | Φ | , for max-min. U ( t , r ) = k = 1 K R k / Q p ( ϵ ) | Φ | / Q ,       for max-sum  min k K R k min p ( ϵ ) | Φ | ,       for max-min.  U(t,r)={[sum_(k=1)^(K)R_(k)//Q*sum_(p(epsilon))|Phi|//Q","," for max-sum "],[min_(k inK)R_(k)*min_(p(epsilon))|Phi|","," for max-min. "]:}U(\mathbf{t}, \mathbf{r})= \begin{cases}\sum_{k=1}^{K} R_{k} / Q \cdot \sum_{\mathbf{p}(\epsilon)}|\Phi| / Q, & \text { for max-sum } \\ \min _{k \in \mathbb{K}} R_{k} \cdot \min _{\mathbf{p}(\epsilon)}|\boldsymbol{\Phi}|, & \text { for max-min. }\end{cases}
where p ( ϵ ) p ( ϵ ) p(epsilon)\mathbf{p}(\epsilon) denotes the parametric expression of the moving target trajectory. Q = card ( ϵ ) Q = card ( ϵ ) Q=card(epsilon)Q=\operatorname{card}(\epsilon) is the sample point number of trajectory.
其中 p ( ϵ ) p ( ϵ ) p(epsilon)\mathbf{p}(\epsilon) 表示移动目标轨迹的参数表达式。 Q = card ( ϵ ) Q = card ( ϵ ) Q=card(epsilon)Q=\operatorname{card}(\epsilon) 是轨迹的样本点数量。

3 SOFT ACTOR CRITIC BASED AP DEPLOYMENT
3 软演员评论家基于 AP 的部署

Considering the original problem involving the multiplication of multiple objective functions is a complex non-convex optimization problem, traditional mathematical methods struggle to achieve the optimal solution. Thus, we first transform the problem into a Markov decision process (MDP) and utilize SAC for its solution. Compared to traditional deep reinforcement learning algorithms, SAC introduces a maximum entropy term in the loss function, which enhances its exploratory performance and mitigates the risk of falling into local optimal solution. This characteristic makes SAC particularly well-suited for addressing the complex multi-objective AP deployment problem.
考虑到原始问题涉及多个目标函数的乘法,这是一个复杂的非凸优化问题,传统数学方法难以实现最优解。因此,我们首先将问题转化为马尔可夫决策过程(MDP),并利用 SAC 进行求解。与传统深度强化学习算法相比,SAC 在损失函数中引入了最大熵项,这增强了其探索性能,并减轻了陷入局部最优解的风险。这一特性使得 SAC 特别适合解决复杂的多目标 AP 部署问题。

3.1 MDP Formulation 3.1 MDP 公式化

Our MDP is represented by a tuple ( S , A , P , R S , A , P , R S,A,P,R\mathbb{S}, \mathbb{A}, P, R ), compared with traditional SAC, both state S S S\mathbb{S} and action A A A\mathbb{A} are discretized here to reduce the complexity of the exploration space and accelerate training procedure. P : S × S × A [ 0 , ) P : S × S × A [ 0 , ) P:SxxSxxArarr[0,oo)P: \mathbb{S} \times \mathbb{S} \times \mathbb{A} \rightarrow[0, \infty) is the state transition probability density function to the next state s i + 1 S s i + 1 S s_(i+1)inS\mathbf{s}_{i+1} \in \mathbb{S} given current state s i S s i S s_(i)inS\mathbf{s}_{i} \in \mathbb{S} and action a i A a i A a_(i)inA\mathrm{a}_{i} \in \mathbb{A}. After taking action, the agent gets reward r ( s i , a i ) R reward r s i , a i R reward r(s_(i),a_(i))inR\operatorname{reward} r\left(\mathbf{s}_{i}, \mathbf{a}_{i}\right) \in \mathbb{R} according to the reward function R : S × A R R : S × A R R:SxxArarrRR: \mathbb{S} \times \mathbb{A} \rightarrow \mathbb{R}. The specific designs of states, actions, and rewards in our APs deployment problem are as follows.
我们的 MDP 由一个元组( S , A , P , R S , A , P , R S,A,P,R\mathbb{S}, \mathbb{A}, P, R )表示,与传统的 SAC 相比,这里将状态 S S S\mathbb{S} 和动作 A A A\mathbb{A} 离散化,以减少探索空间的复杂性并加速训练过程。 P : S × S × A [ 0 , ) P : S × S × A [ 0 , ) P:SxxSxxArarr[0,oo)P: \mathbb{S} \times \mathbb{S} \times \mathbb{A} \rightarrow[0, \infty) 是给定当前状态 s i S s i S s_(i)inS\mathbf{s}_{i} \in \mathbb{S} 和动作 a i A a i A a_(i)inA\mathrm{a}_{i} \in \mathbb{A} 到下一个状态 s i + 1 S s i + 1 S s_(i+1)inS\mathbf{s}_{i+1} \in \mathbb{S} 的状态转移概率密度函数。采取行动后,代理根据奖励函数 R : S × A R R : S × A R R:SxxArarrRR: \mathbb{S} \times \mathbb{A} \rightarrow \mathbb{R} 获得 reward r ( s i , a i ) R reward r s i , a i R reward r(s_(i),a_(i))inR\operatorname{reward} r\left(\mathbf{s}_{i}, \mathbf{a}_{i}\right) \in \mathbb{R} 。我们在 APs 部署问题中的状态、动作和奖励的具体设计如下。
  • State: Since the optimal deployment positions of APs are greatly relying on user and target positions, the state space is defined by the two-dimensional coordinates of users and targets.
    状态:由于接入点(AP)的最佳部署位置在很大程度上依赖于用户和目标的位置,因此状态空间由用户和目标的二维坐标定义。
s i = { u 1 , , u K , p } s i = u 1 , , u K , p s_(i)={u_(1),dots,u_(K),p}\mathbf{s}_{i}=\left\{\mathbf{u}_{1}, \ldots, \mathbf{u}_{K}, \mathbf{p}\right\}
  • Action: As mentioned in Section II, in our cell-free ISAC networks, the communication capacity and sensing accuracy are both related to the locations APs. Intuitively, we consider representing the action space using the 2D coordinates of APs.
    行动:如第二节所述,在我们的无细胞 ISAC 网络中,通信容量和传感精度都与接入点的位置有关。直观上,我们考虑使用接入点的二维坐标来表示动作空间。
a i = { t 1 i , , t M i , r 1 i , , r N i } a i = t 1 i , , t M i , r 1 i , , r N i a_(i)={t_(1)^(i),dots,t_(M)^(i),r_(1)^(i),dots,r_(N)^(i)}\mathbf{a}_{i}=\left\{\mathbf{t}_{1}^{i}, \ldots, \mathbf{t}_{M}^{i}, \mathbf{r}_{1}^{i}, \ldots, \mathbf{r}_{N}^{i}\right\}
  • Reward: The form of the reward function is determined by the expression of the objective function.
    奖励:奖励函数的形式由目标函数的表达式决定。
r ( s i , a i ) = { k = 1 K R k p ( ϵ ) | Φ | , for max-sum min k K R k min p ( ϵ ) | Φ | , for max-min. r s i , a i = k = 1 K R k p ( ϵ ) | Φ | ,       for max-sum  min k K R k min p ( ϵ ) | Φ | ,       for max-min.  r(s_(i),a_(i))={[sum_(k=1)^(K)R_(k)*sum_(p(epsilon))|Phi|","," for max-sum "],[min_(k inK)R_(k)*min_(p(epsilon))|Phi|","," for max-min. "]:}r\left(\mathbf{s}_{i}, \mathbf{a}_{i}\right)= \begin{cases}\sum_{k=1}^{K} R_{k} \cdot \sum_{\mathbf{p}(\epsilon)}|\Phi|, & \text { for max-sum } \\ \min _{k \in \mathbb{K}} R_{k} \cdot \min _{\mathbf{p}(\epsilon)}|\Phi|, & \text { for max-min. }\end{cases}

3.2 SAC for Deployment 3.2 部署的 SAC

Traditional DRL is aiming to learn the optimal policy π π pi\pi which maximizes the cumulative expected rewards, whereas SAC adds a maximum policy entropy term in the DRL objective function
传统的深度强化学习旨在学习最优策略 π π pi\pi ,以最大化累积期望奖励,而 SAC 在深度强化学习目标函数中添加了最大策略熵项
C ( π ) = i E ( s i , a i ) ρ π [ r ( s i , a i ) + ω H ( π ( s i ) ) ] C ( π ) = i E s i , a i ρ π r s i , a i + ω H π s i C(pi)=sum_(i)E_((s_(i),a_(i))∼rho_(pi))[r(s_(i),a_(i))+omegaH(pi(*∣s_(i)))]\mathrm{C}(\pi)=\sum_{\mathrm{i}} \mathrm{E}_{\left(\mathrm{s}_{\mathrm{i}}, \mathrm{a}_{\mathrm{i}}\right) \sim \rho_{\pi}}\left[\mathrm{r}\left(\mathrm{~s}_{\mathrm{i}}, \mathrm{a}_{\mathrm{i}}\right)+\omega \mathcal{H}\left(\pi\left(\cdot \mid \mathrm{s}_{\mathrm{i}}\right)\right)\right]
where π π pi\pi represents the policy generated by actor network, ρ π ρ π rho_(pi)\rho_{\pi} is the historical state-action trajectory under the policy π , ω π , ω pi,omega\pi, \omega is a weighting parameter called temperature coefficient that balances the exploration and exploiting historical experience during the training procedure. H ( π ( s i ) ) = log π ( s i ) H π s i = log π s i H(pi(*∣s_(i)))=-log pi(*∣s_(i))\mathcal{H}\left(\pi\left(\cdot \mid \mathbf{s}_{i}\right)\right)=-\log \pi\left(\cdot \mid \mathbf{s}_{i}\right) is the entropy of action policy at state s i s i s_(i)\mathrm{s}_{i}.
其中 π π pi\pi 代表由演员网络生成的策略, ρ π ρ π rho_(pi)\rho_{\pi} 是在策略下的历史状态-动作轨迹, π , ω π , ω pi,omega\pi, \omega 是一个称为温度系数的权重参数,用于在训练过程中平衡探索和利用历史经验。 H ( π ( s i ) ) = log π ( s i ) H π s i = log π s i H(pi(*∣s_(i)))=-log pi(*∣s_(i))\mathcal{H}\left(\pi\left(\cdot \mid \mathbf{s}_{i}\right)\right)=-\log \pi\left(\cdot \mid \mathbf{s}_{i}\right) 是状态 s i s i s_(i)\mathrm{s}_{i} 下动作策略的熵。

Figure 2: Architecture of SAC-based cell-free ISAC APs deployment system.
图 2:基于 SAC 的无细胞 ISAC AP 部署系统架构。
As shown in Fig. 2, the architecture of SAC-based cell-free ISAC APs deployment systems mainly contains three component. The first component is the training environment, responsible for receiving and executing actions from the actor network. Upon completion of execution, it generates the reward value for current state-action pair and transits to the next state, while storing corresponding state transition historical experience in the replay buffer B B B\mathcal{B} for neural network updates. The second part is the SAC actor network, denoted by parameters μ μ mu\mu, with its loss function defined as follows
如图 2 所示,基于 SAC 的无细胞 ISAC AP 部署系统的架构主要包含三个组件。第一个组件是训练环境,负责接收和执行来自演员网络的动作。执行完成后,它为当前状态-动作对生成奖励值,并转移到下一个状态,同时将相应的状态转移历史经验存储在重放缓冲区 B B B\mathcal{B} 中,以便进行神经网络更新。第二部分是 SAC 演员网络,用参数 μ μ mu\mu 表示,其损失函数定义如下:
L π ( μ ) = E s i B [ E a i π μ [ ω log ( π μ ( a i s i ) ) Q φ ( s i , a i ) ] ] L π ( μ ) = E s i B E a i π μ ω log π μ a i s i Q φ s i , a i L_(pi)(mu)=E_(s_(i)∼B)[E_(a_(i)∼pi_(mu))[omega log(pi_(mu)(a_(i)∣s_(i)))-Q_(varphi)(s_(i),a_(i))]]L_{\pi}(\mu)=\mathrm{E}_{\mathbf{s}_{\mathrm{i}} \sim \mathcal{B}}\left[\mathrm{E}_{\mathbf{a}_{\mathrm{i}} \sim \pi_{\mu}}\left[\omega \log \left(\pi_{\mu}\left(\mathbf{a}_{\mathrm{i}} \mid \mathbf{s}_{\mathrm{i}}\right)\right)-\mathrm{Q}_{\varphi}\left(\mathbf{s}_{\mathrm{i}}, \mathbf{a}_{\mathrm{i}}\right)\right]\right]
The actor network generates action vectors corresponding to input state vectors and feeds them back to the environment for execution. Additionally, the critic network also requires the action vector when updating the Bellman equation.
演员网络生成与输入状态向量对应的动作向量,并将其反馈给环境以执行。此外,评论家网络在更新贝尔曼方程时也需要动作向量。
The third component is the critic network parameterized by φ φ varphi\varphi, with the loss function expressed as follows
第三个组成部分是由 φ φ varphi\varphi 参数化的评论网络,其损失函数表示如下
L Q ( φ ) = E ( s i , a i ) B [ 1 2 ( Q φ ( s i , a i ) ( r ( s i , a i ) + γ Q φ ¯ ( s i + 1 , a i + 1 ) log π μ ( a i + 1 s i + 1 ) ) ) 2 ] L Q ( φ ) = E s i , a i B 1 2 Q φ s i , a i r s i , a i + γ Q φ ¯ s i + 1 , a i + 1 log π μ a i + 1 s i + 1 2 {:[L_(Q)(varphi)=E_((s_(i),a_(i))∼B)[(1)/(2)(Q_(varphi)(s_(i),a_(i))-(r(s_(i),a_(i)):}],[{:+gammaQ_( bar(varphi))(s_(i+1),a_(i+1))-log pi_(mu)(a_(i+1)∣s_(i+1))))^(2)]]:}\begin{aligned} L_{Q}(\varphi)= & \mathrm{E}_{\left(\mathbf{s}_{\mathbf{i}}, \mathbf{a}_{\mathbf{i}}\right) \sim \mathcal{B}}\left[\frac { 1 } { 2 } \left(\mathrm{Q}_{\varphi}\left(\mathrm{s}_{\mathrm{i}}, \mathbf{a}_{\mathrm{i}}\right)-\left(\mathrm{r}\left(\mathbf{s}_{\mathbf{i}}, \mathbf{a}_{\mathbf{i}}\right)\right.\right.\right. \\ & \left.\left.\left.+\gamma Q_{\bar{\varphi}}\left(\mathbf{s}_{i+1}, \mathbf{a}_{i+1}\right)-\log \pi_{\mu}\left(\mathbf{a}_{i+1} \mid \mathbf{s}_{i+1}\right)\right)\right)^{2}\right] \end{aligned}
where γ γ gamma\gamma is discount factor.
其中 γ γ gamma\gamma 是折扣因子。

The critic network takes a state-action pair as input and outputs a Q -value used to assess the quality of the action taken in the current state, where a higher Q-value indicates the potential for greater cumulative reward. In contrast to the original critic network φ φ varphi\varphi, the target critic network φ ¯ φ ¯ bar(varphi)\bar{\varphi} generates state-action pair for the next state, which is used in updating the Bellman equation. To mitigate training instability caused by overly large Q -value estimates, both φ j φ j varphi_(j)\varphi_{j} and φ ¯ j φ ¯ j bar(varphi)_(j)\bar{\varphi}_{j}, j { 1 , 2 } j { 1 , 2 } j in{1,2}j \in\{1,2\} maintain two networks and utilize the smaller Q-value for network updates. A soft update strategy φ ¯ τ φ + ( 1 τ ) φ ¯ φ ¯ τ φ + ( 1 τ ) φ ¯ bar(varphi)larr tau varphi+(1-tau) bar(varphi)\bar{\varphi} \leftarrow \tau \varphi+(1-\tau) \bar{\varphi} with τ 1 τ 1 tau≪1\tau \ll 1 is employed.
评论网络将状态-动作对作为输入,并输出一个 Q 值,用于评估在当前状态下采取的动作的质量,其中更高的 Q 值表示更大的累积奖励潜力。与原始评论网络 φ φ varphi\varphi 相比,目标评论网络 φ ¯ φ ¯ bar(varphi)\bar{\varphi} 生成下一个状态的状态-动作对,这用于更新贝尔曼方程。为了减轻由于 Q 值估计过大而导致的训练不稳定性, φ j φ j varphi_(j)\varphi_{j} φ ¯ j φ ¯ j bar(varphi)_(j)\bar{\varphi}_{j} j { 1 , 2 } j { 1 , 2 } j in{1,2}j \in\{1,2\} 保持两个网络,并利用较小的 Q 值进行网络更新。采用了带有 τ 1 τ 1 tau≪1\tau \ll 1 的软更新策略 φ ¯ τ φ + ( 1 τ ) φ ¯ φ ¯ τ φ + ( 1 τ ) φ ¯ bar(varphi)larr tau varphi+(1-tau) bar(varphi)\bar{\varphi} \leftarrow \tau \varphi+(1-\tau) \bar{\varphi}
In addition to these three main components, another important parameter, the temperature coefficient ω ω omega\omega, is also automatically updated according to its loss function
除了这三个主要组成部分,另一个重要参数温度系数 ω ω omega\omega 也会根据其损失函数自动更新
L ( ω ) = E a i π μ [ ω i log π μ ( a i s i ) ω i H ] L ( ω ) = E a i π μ ω i log π μ a i s i ω i H ¯ L(omega)=E_(a_(i)∼pi_(mu))[-omega_(i)log pi_(mu)(a_(i)∣s_(i))-omega_(i) bar(H)]L(\omega)=\mathrm{E}_{\mathrm{a}_{\mathrm{i}} \sim \pi_{\mu}}\left[-\omega_{\mathrm{i}} \log \pi_{\mu}\left(\mathrm{a}_{\mathrm{i}} \mid \mathbf{s}_{\mathrm{i}}\right)-\omega_{\mathrm{i}} \overline{\mathcal{H}}\right]
where H = | A | dim H ¯ = | A | dim  bar(H)=-|A|_("dim ")\overline{\mathcal{H}}=-|\mathcal{A}|_{\text {dim }} represents the predefined lower bound of the action policy entropy.
其中 H = | A | dim H ¯ = | A | dim  bar(H)=-|A|_("dim ")\overline{\mathcal{H}}=-|\mathcal{A}|_{\text {dim }} 代表动作策略熵的预定义下限。

4 SIMULATION 4 模拟

Table 1: Parameters in DRL model
表 1:DRL 模型中的参数
Parameters 参数 values 价值观
Hidden layer 隐藏层 64 × 32 64 × 32 64 xx3264 \times 32
Learning rate 学习率 10 5 10 5 10^(-5)10^{-5}
Buffer size 缓冲区大小 2 21 2 21 2^(21)2^{21}
Batch size 批量大小 2 9 2 9 2^(9)2^{9}
Discount factor γ γ gamma\gamma 折扣因子 γ γ gamma\gamma 0.98
Soft update target critic network τ τ tau\tau
软更新目标评论网络 τ τ tau\tau
0.005
Parameters values Hidden layer 64 xx32 Learning rate 10^(-5) Buffer size 2^(21) Batch size 2^(9) Discount factor gamma 0.98 Soft update target critic network tau 0.005| Parameters | values | | :---: | :---: | | Hidden layer | $64 \times 32$ | | Learning rate | $10^{-5}$ | | Buffer size | $2^{21}$ | | Batch size | $2^{9}$ | | Discount factor $\gamma$ | 0.98 | | Soft update target critic network $\tau$ | 0.005 |
In this section, we evaluated the performance of our proposed SAC-based APs deployment method in cell-free ISAC systems. We demonstrated the superiority of SAC in this scenario through comparisons with several traditional DRL algorithms. Additionally, to validate the effectiveness of optimizing the joint communication and sensing metric proposed in this work, we compared it with results of only sensing optimizing, only communication optimizing, and weighted sum of communication and sensing optimizing. The configuration of DRL Model is show in Table 1. In our cell-free systems, the target moves along a predefined circular trajectory, while three UEs distributed around the target follow Gaussian distributions with a variance of 2 . During sensing tasks, receiving APs capture echoes transmitted from transmitting AP, reflected from the target. During communication tasks, all APs jointly receive uplink signals from all UEs.
在本节中,我们评估了我们提出的基于 SAC 的 AP 部署方法在无蜂窝 ISAC 系统中的性能。我们通过与几种传统的 DRL 算法进行比较,展示了 SAC 在这种情况下的优越性。此外,为了验证本工作中提出的联合通信和感知指标优化的有效性,我们将其与仅优化感知、仅优化通信以及通信和感知加权和优化的结果进行了比较。DRL 模型的配置如表 1 所示。在我们的无蜂窝系统中,目标沿着预定义的圆形轨迹移动,而分布在目标周围的三个 UE 遵循方差为 2 的高斯分布。在感知任务中,接收 AP 捕获从发射 AP 传输、反射自目标的回波。在通信任务中,所有 AP 共同接收来自所有 UE 的上行信号。
As shown in Fig. 3, in cell-free ISAC systems, the final convergence result of our proposed SAC-based algorithm significantly surpasses other traditional DRL algorithms in both max-min and max-sum problem. Also, SAC is more stable under different random seeds compared with other algorithms. The hyperparameters of DDPG and TD3 exert a profound impact on their performance, typically, for a complex optimization problem, finding hyperparameters that enable them to exceeds the performance of SAC is challenging. According to the data in Table 2, an increase in the number of APs is observed to correlate positively with improved ISAC performance, thereby significantly demonstrating the pronounced advantage of the cell-free systems in this regard.
如图 3 所示,在无小区 ISAC 系统中,我们提出的基于 SAC 的算法的最终收敛结果在最大最小和最大总和问题上显著超过其他传统的 DRL 算法。此外,与其他算法相比,SAC 在不同随机种子下更为稳定。DDPG 和 TD3 的超参数对其性能产生深远影响;通常,对于复杂的优化问题,找到使其性能超过 SAC 的超参数是具有挑战性的。根据表 2 中的数据,观察到接入点(AP)数量的增加与 ISAC 性能的改善呈正相关,从而显著展示了无小区系统在这方面的明显优势。
To be more specific, Fig. 4 and Fig. 5 depict the APs deployment results of the cell-free ISAC systems using different algorithms in max-sum and max-min problems. According to the D-optimal criterion, optimal sensing performance is achieved when the positions of transmitting APs and receiving APs lie on a straight line and form specific angles with the target. According to the Euclidean distance criterion derived from zero-forcing reception, communication performance improves as APs approach to UEs. SAC-based
更具体地说,图 4 和图 5 展示了在最大和最小问题中使用不同算法的无蜂窝 ISAC 系统的 AP 部署结果。根据 D 最优准则,当发射 AP 和接收 AP 的位置在一条直线上并与目标形成特定角度时,能够实现最佳感知性能。根据从零强迫接收推导出的欧几里得距离准则,随着 AP 接近用户设备(UE),通信性能得到改善。基于 SAC 的

Figure 3: Accumulative ISAC rewards of different DRL Algorithms. (a) Max-sum problem. (b) Max-min problem.
图 3:不同 DRL 算法的累积 ISAC 奖励。(a) 最大和问题。(b) 最大最小问题。
Table 2: ISAC value of different number of APs
表 2:不同数量 AP 的 ISAC 值
Number of APs ( M + N ) ( M + N ) (M+N)(\mathrm{M}+\mathrm{N}) AP 数量 ( M + N ) ( M + N ) (M+N)(\mathrm{M}+\mathrm{N}) 2 4 6 8
k = 1 K R k / Q p ( ϵ ) | Φ | / Q k = 1 K R k / Q p ( ϵ ) | Φ | / Q sum_(k=1)^(K)R_(k)//Q*sum_(p(epsilon))|Phi|//Q\sum_{k=1}^{K} R_{k} / Q \cdot \sum_{\mathbf{p}(\epsilon)}|\Phi| / Q 41.07 775.73 3804.47 11614.07
min k K R k min p ( ϵ ) | Φ | min k K R k min p ( ϵ ) | Φ | min_(k inK)R_(k)*min_(p(epsilon))|Phi|\min _{k \in \mathbb{K}} R_{k} \cdot \min _{\mathbf{p}(\epsilon)}|\Phi| 0.042 10.48 86.06 257.28
Number of APs (M+N) 2 4 6 8 sum_(k=1)^(K)R_(k)//Q*sum_(p(epsilon))|Phi|//Q 41.07 775.73 3804.47 11614.07 min_(k inK)R_(k)*min_(p(epsilon))|Phi| 0.042 10.48 86.06 257.28| Number of APs $(\mathrm{M}+\mathrm{N})$ | 2 | 4 | 6 | 8 | | :--- | :---: | :---: | :---: | :---: | | $\sum_{k=1}^{K} R_{k} / Q \cdot \sum_{\mathbf{p}(\epsilon)}\|\Phi\| / Q$ | 41.07 | 775.73 | 3804.47 | 11614.07 | | $\min _{k \in \mathbb{K}} R_{k} \cdot \min _{\mathbf{p}(\epsilon)}\|\Phi\|$ | 0.042 | 10.48 | 86.06 | 257.28 |
APs deployment result satisfies both sensing and communication criterion well. Furthermore, APs deployment in max-min problem in Fig. 5(a) shows that to ensure fairness in service among UEs, each UE is surrounded by several nearby APs to achieve higher SNR. In contrast, in the deployment result of the max-sum problem in Fig. 4(a), all APs are concentrated around the UEs closest to the optimal sensing position.
AP 的部署结果很好地满足了感知和通信标准。此外,图 5(a)中最大最小问题的 AP 部署显示,为了确保用户设备之间服务的公平性,每个用户设备周围都有几个附近的 AP,以实现更高的信噪比。相比之下,图 4(a)中最大总和问题的部署结果显示,所有 AP 都集中在离最佳感知位置最近的用户设备周围。
Table 3 illustrates the communication and sensing values under optimal APs deployment in different objective functions. In both max-min and max-sum problem, the communication values of our
表 3 展示了在不同目标函数下最佳 AP 部署下的通信和感知值。在最大最小和最大总和问题中,我们的通信值

Figure 4: APs deployment results of max-sum problem.
图 4:最大和问题的 AP 部署结果。
Table 3: Communication rate and localizing accuracy of different objective
表 3:不同目标的通信速率和定位准确性
Objective 目标 ISAC Sensing only 仅感知 Comm. only 仅限通信 Weighted sum 加权和
k = 1 K R k / Q k = 1 K R k / Q sum_(k=1)^(K)R_(k)//Q\sum_{k=1}^{K} R_{k} / Q 1.13 0.014 1.26 0.015
p ( ϵ ) | Φ | / Q p ( ϵ ) | Φ | / Q sum_(p(epsilon))|Phi|//Q\sum_{\mathbf{p}(\epsilon)}|\Phi| / Q 688.97 842.92 642.08 842.92
min k K R k min k K R k min_(k inK)R_(k)\min _{k \in \mathbb{K}} R_{k} 0.042 0.0013 0.043 0.0013
min p ( ϵ ) | Φ | min p ( ϵ ) | Φ | min_(p(epsilon))|Phi|\min _{\mathbf{p}(\epsilon)}|\Phi|
min_(p(epsilon))|Phi|| $\min _{\mathbf{p}(\epsilon)}\|\Phi\|$ | | :--- |
0.25 696.89 188.40 696.89
Objective ISAC Sensing only Comm. only Weighted sum sum_(k=1)^(K)R_(k)//Q 1.13 0.014 1.26 0.015 sum_(p(epsilon))|Phi|//Q 688.97 842.92 642.08 842.92 min_(k inK)R_(k) 0.042 0.0013 0.043 0.0013 "min_(p(epsilon))|Phi|" 0.25 696.89 188.40 696.89| Objective | ISAC | Sensing only | Comm. only | Weighted sum | | :--- | :---: | :---: | :---: | :---: | | $\sum_{k=1}^{K} R_{k} / Q$ | 1.13 | 0.014 | 1.26 | 0.015 | | $\sum_{\mathbf{p}(\epsilon)}\|\Phi\| / Q$ | 688.97 | 842.92 | 642.08 | 842.92 | | $\min _{k \in \mathbb{K}} R_{k}$ | 0.042 | 0.0013 | 0.043 | 0.0013 | | $\min _{\mathbf{p}(\epsilon)}\|\Phi\|$ | 0.25 | 696.89 | 188.40 | 696.89 |
ISAC optimization is almost equal to the result of only optimizing the communication objective, and far higher than only optimizing sensing method, while the sensing values of ISAC optimization is better than communication method but lower than sensing-only method. This is because the optimal APs deployment position for sensing are typically aligned in a straight line at specific angles with the target. However, UEs distribution usually does not conform to this characteristic. Therefore, to balance communication rate of UEs, APs cannot be deployed at the optimal sensing position, resulting
ISAC 优化几乎等同于仅优化通信目标的结果,远高于仅优化感知方法,而 ISAC 优化的感知值优于通信方法,但低于仅感知方法。这是因为感知的最佳接入点部署位置通常与目标在特定角度上呈直线排列。然而,用户设备的分布通常不符合这一特征。因此,为了平衡用户设备的通信速率,接入点无法部署在最佳感知位置,导致

Figure 5: APs deployment results of max-min problem.
图 5:最大最小问题的 AP 部署结果。

in a sacrifice of some sensing performance. In addition, due to the significantly different magnitudes between sensing accuracy and communication rate, finding a balanced weight in their sum as a objective function proves challenging. It can be observed that the result of weighted sum method is nearly equivalent to optimizing sensing performance alone.
在牺牲某些传感性能的情况下。此外,由于传感精度和通信速率之间的差异显著,找到它们总和的平衡权重作为目标函数是具有挑战性的。可以观察到,加权和方法的结果几乎等同于单独优化传感性能。
Fig. 4(f) and Fig. 5(f) presents the deployment results of optimizing communication alone and optimizing sensing alone. As previously analyzed, for optimal sensing performance, APs form a line with target at a specific angles. For optimal communication performance, as shown in Fig. 4(e) and Fig. 5(e), APs strive to be as close to the UEs as possible to achieve higher SNR.
图 4(f)和图 5(f)展示了仅优化通信和仅优化感知的部署结果。如前所述,为了实现最佳感知性能,接入点(AP)在特定角度上与目标形成一条线。为了实现最佳通信性能,如图 4(e)和图 5(e)所示,接入点(AP)尽量靠近用户设备(UE)以获得更高的信噪比(SNR)。

5 CONCLUSION 5 结论

In this work, we have firstly investigated the APs deployment problem for maximizing user rate and localizing accuracy in the cell-free ISAC systems. Then a unified evaluation metric merging D-optimal criterion with Euclidean distance have been designed, which enables simultaneous optimization of communication and sensing performance. Finally, we have employed SAC, a DRL algorithm
在这项工作中,我们首先研究了在无蜂窝 ISAC 系统中最大化用户速率和定位精度的 AP 部署问题。然后设计了一种统一的评估指标,将 D 最优准则与欧几里得距离相结合,使通信和感知性能能够同时优化。最后,我们采用了 SAC,一种深度强化学习算法。

augmented with maximum action policy entropy to solve the original non-convex and high-dimensional problem. Numerical findings demonstrated superior convergence results of the proposed method compared with other DRL methods for both system overall performance and fairness performance.
通过最大化动作策略熵来增强,以解决原始的非凸和高维问题。数值结果表明,与其他深度强化学习方法相比,所提方法在系统整体性能和公平性性能方面具有更优的收敛结果。

ACKNOWLEDGMENTS 致谢

This work was supported in part by the Fundamental Research Funds for the Central Universities under Grant Nos. 2242022k60002 and 2242023R40005.
本研究部分得到了中央高校基本科研业务费的支持,资助编号为 2242022k60002 和 2242023R40005。

REFERENCES 参考文献

[1] Hussein A. Ammar, Raviraj Adve, Shahram Shahbazpanahi, Gary Boudreau, and Kothapalli Venkata Srinivas. 2022. User-Centric Cell-Free Massive MIMO Networks: A Survey of Opportunities, Challenges and Solutions. IEEE Communications Surveys and Tutorials 24, 1 (2022), 611-652. https://doi.org/10.1109/ COMST.2021.3135119
[1] Hussein A. Ammar, Raviraj Adve, Shahram Shahbazpanahi, Gary Boudreau, 和 Kothapalli Venkata Srinivas. 2022. 以用户为中心的无细胞大规模 MIMO 网络:机遇、挑战与解决方案的调查。IEEE 通信调查与教程 24, 1 (2022), 611-652. https://doi.org/10.1109/ COMST.2021.3135119

[2] Carles Diaz-Vilor, Angel Lozano, and Hamid Jafarkhani. 2024. Cell-Free UAV Networks With Wireless Fronthaul: Analysis and Optimization. IEEE Transactions on Wireless Communications 23, 3 (2024), 2054-2069.
[2] Carles Diaz-Vilor, Angel Lozano, 和 Hamid Jafarkhani. 2024. 无线前传的无细胞无人机网络:分析与优化. IEEE 无线通信学报 23, 3 (2024), 2054-2069.

[3] Hana Godrich, Alexander M. Haimovich, and Rick S. Blum. 2010. Target Localization Accuracy Gain in MIMO Radar-Based Systems. IEEE Transactions on Information Theory 56, 6 (2010), 2783-2803. https://doi.org/10.1109/TIT.2010.2046246
[3] Hana Godrich, Alexander M. Haimovich, 和 Rick S. Blum. 2010. 基于 MIMO 雷达系统的目标定位精度提升. IEEE 信息理论汇刊 56, 6 (2010), 2783-2803. https://doi.org/10.1109/TIT.2010.2046246

[4] Govind R. Gopal and Bhaskar D. Rao. 2024. Vector Quantization Methods for Access Point Placement in Cell-Free Massive MIMO Systems. IEEE Transactions on Wireless Communications 23, 6 (2024), 5425-5440. https://doi.org/10.1109/ TWC.2023.3326453
[4] Govind R. Gopal 和 Bhaskar D. Rao. 2024. 无蜂窝大规模 MIMO 系统中接入点布置的矢量量化方法. IEEE 无线通信学报 23, 6 (2024), 5425-5440. https://doi.org/10.1109/ TWC.2023.3326453

[5] Tuomas Haarnoja, Aurick Zhou, Kristian Hartikainen, George Tucker, Sehoon Ha, Jie Tan, Vikash Kumar, Henry Zhu, Abhishek Gupta, Pieter Abbeel, et al. 2018. Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905 (2018).
[5] Tuomas Haarnoja, Aurick Zhou, Kristian Hartikainen, George Tucker, Sehoon Ha, Jie Tan, Vikash Kumar, Henry Zhu, Abhishek Gupta, Pieter Abbeel 等. 2018. 软演员-评论家算法及其应用. arXiv 预印本 arXiv:1812.05905 (2018).

[6] Qian He, Rick S. Blum, Hana Godrich, and Alexander M. Haimovich. 2008. CramerRao bound for target velocity estimation in MIMO radar with widely separated antennas. In 2008 42nd Annual Conference on Information Sciences and Systems. 123-127. https://doi.org/10.1109/CISS.2008.4558507
[6] Qian He, Rick S. Blum, Hana Godrich, 和 Alexander M. Haimovich. 2008. 在天线间距较大的 MIMO 雷达中目标速度估计的 Cramer-Rao 界限. 载于 2008 年第 42 届信息科学与系统年会. 123-127. https://doi.org/10.1109/CISS.2008.4558507

[7] Jing Jiang, Fengyang Yan, Yinghui Ye, Worakrin Sutthiphan, Jiayi Zhang, and Bo Ai. 2023. Traffic Demand-Oriented Cell-Free Massive MIMO Network. IEEE Wireless Communications Letters 12, 11 (2023), 1861-1865. https://doi.org/10. 1109/LWC.2023.3296527
[7] 景江,冯阳燕,英辉叶,沃拉克林·苏提潘,佳怡张,博艾。2023。面向交通需求的无小区大规模 MIMO 网络。IEEE 无线通信快报 12, 11 (2023), 1861-1865。https://doi.org/10. 1109/LWC.2023.3296527

[8] An Liu, Zhe Huang, Min Li, Yubo Wan, Wenrui Li, Tony Xiao Han, Chenchen Liu, Rui Du, Danny Kai Pin Tan, Jianmin Lu, Yuan Shen, Fabiola Colone, and Kevin Chetty. 2022. A Survey on Fundamental Limits of Integrated Sensing and Communication. IEEE Communications Surveys & Tutorials 24, 2 (2022), 994-1034. https://doi.org/10.1109/COMST.2022.3149272
[8] An Liu, Zhe Huang, Min Li, Yubo Wan, Wenrui Li, Tony Xiao Han, Chenchen Liu, Rui Du, Danny Kai Pin Tan, Jianmin Lu, Yuan Shen, Fabiola Colone, 和 Kevin Chetty. 2022. 关于集成感知与通信的基本限制的调查. IEEE 通信调查与教程 24, 2 (2022), 994-1034. https://doi.org/10.1109/COMST.2022.3149272

[9] Hien Quoc Ngo, Alexei Ashikhmin, Hong Yang, Erik G. Larsson, and Thomas L. Marzetta. 2017. Cell-Free Massive MIMO Versus Small Cells. IEEE Transactions on Wireless Communications 16, 3 (2017), 1834-1850.
[9] Hien Quoc Ngo, Alexei Ashikhmin, Hong Yang, Erik G. Larsson, 和 Thomas L. Marzetta. 2017. 无需基站的大规模 MIMO 与小型基站. IEEE 无线通信汇刊 16, 3 (2017), 1834-1850.

[10] ITU-R Recommendation. 2023. Framework and overall objectives of the future development of IMT for 2030 and beyond. International Telecommunication Union (ITU) Recommendation (ITU-R) (2023).
[10] 国际电信联盟(ITU)建议。2023 年。2030 年及以后 IMT 未来发展的框架和总体目标。国际电信联盟(ITU)建议(ITU-R)(2023)。

[11] Mohammad Sadeghi, Fereidoon Behnia, Rouhollah Amiri, and Alfonso Farina. 2021. Target Localization Geometry Gain in Distributed MIMO Radar. IEEE Transactions on Signal Processing 69 (2021), 1642-1652. https://doi.org/10.1109/ TSP.2021.3062197
[11] Mohammad Sadeghi, Fereidoon Behnia, Rouhollah Amiri 和 Alfonso Farina. 2021. 分布式 MIMO 雷达中的目标定位几何增益. IEEE 信号处理学报 69 (2021), 1642-1652. https://doi.org/10.1109/ TSP.2021.3062197

[12] Zhiqing Wei, Yucong Du, Qixun Zhang, Wangjun Jiang, Yanpeng Cui, Zeyang Meng, Huici Wu, and Zhiyong Feng. 2024. Integrated Sensing and Communication Driven Digital Twin for Intelligent Machine Network. IEEE Internet of Things Magazine 7, 4 (2024), 60-67. https://doi.org/10.1109/IOTM.001.2300214
[12] 魏志清, 杜宇聪, 张启勋, 姜王军, 崔彦鹏, 孟泽阳, 吴慧慈, 和冯志勇. 2024. 驱动智能机器网络的集成感知与通信数字双胞胎. IEEE 物联网杂志 7, 4 (2024), 60-67. https://doi.org/10.1109/IOTM.001.2300214

[13] Fanfei Xu, Yuhan Ruan, and Yongzhao Li. 2023. Soft Actor-Critic Based 3D Deployment and Power Allocation in Cell-Free Unmanned Aerial Vehicle Networks. IEEE Wireless Communications Letters 12, 10 (2023), 1692-1696. https: //doi.org/10.1109/LWC.2023.3288273
[13] Fanfei Xu, Yuhan Ruan, 和 Yongzhao Li. 2023. 基于软演员-评论家的无蜂窝无人机网络中的 3D 部署和功率分配. IEEE 无线通信快报 12, 10 (2023), 1692-1696. https: //doi.org/10.1109/LWC.2023.3288273

[14] Xiaohu You, Yongming Huang, Shengheng Liu, et al. 2023. Toward 6G TK μ μ mu\mu extreme connectivity: Architecture, key technologies and experiments. IEEE Wireless Communications 30, 3 (2023), 86-95. https://doi.org/10.1109/MWC.004. 2200482
[14] You Xiaohu, Huang Yongming, Liu Shengheng, 等. 2023. 朝向 6G TK μ μ mu\mu 极端连接:架构、关键技术和实验. IEEE 无线通信 30, 3 (2023), 86-95. https://doi.org/10.1109/MWC.004. 2200482
Received XX XXXX 2024; revised XX XXXX 2024; accepted XX XXXX 2024
收到 XX XXXX 2024;修订于 XX XXXX 2024;接受于 XX XXXX 2024