Introduction 介绍

Recent advances in spatially resolved single-cell (SRSC) technologies allow the profiling of cellular gene expression in the tissue context, allowing comprehensive spatial characterization of various systems1,2,3,4,5,6,7. Coordinated by different cell states with varying gene expression patterns, spatial domains are higher-order functional units that recurrently distribute across tissue space, and have close relationships with tissue physiology8,9. In complex diseases such as cancer, mounting evidence has suggested the pivotal roles of specified spatial domains in disease diagnosis and monitoring10,11,12. Given the ever-increasing SRSC data13,14, many computational methods have been developed to identify spatial domains15,16,17.
空间分辨单细胞 (SRSC) 技术的最新进展允许对组织环境中的细胞基因表达进行分析,从而对各种系统进行全面的空间表征 1,2,3,4,5,6,7 。空间域是由具有不同基因表达模式的不同细胞状态协调的高阶功能单元,循环分布在组织空间中,与组织生理学有着密切的关系 8,9 。在癌症等复杂疾病中,越来越多的证据表明特定空间域在疾病诊断和监测中的关键作用 10,11,12 。鉴于不断增加的SRSC数据 13,14 ,已经开发了许多计算方法来识别空间域 15,16,17

In a typical SRSC dataset, the spatial coordinates and gene expression profiles of each cell are measured. Such data representation naturally forms a spatial graph with cells as nodes and gene expression as node attributes, which motivated the two major modeling paradigms in this field, i.e., Graph Neural Network (GNN)18,19,20, and Bayesian Network (BN)21,22,23. Along the developmental paths of both paradigms, the vast majority of methods were designed to improve performance by increasing model complexity. GNN-based methods introduced dedicated neural modules, loss functions, and network architectures. BN-based methods extend additional hidden variables, variable dependencies, and specified priors. Although increasingly complex models often lead to better performance, the improvements are, in some recent studies, seeing a diminishing marginal return24. Besides, additional model complexity may subject the algorithms to non-trivial parameter-tunning, low time efficiency, and/or reduced generalizability. As such, all these issues call for a new paradigm to break through the developmental bottlenecks in this field.
在典型的 SRSC 数据集中,测量每个细胞的空间坐标和基因表达谱。这种数据表示自然地形成了一个以细胞为节点、基因表达为节点属性的空间图,这催生了该领域的两种主要建模范式,即图神经网络(GNN) 18,19,20 和贝叶斯网络( BN) 21,22,23 。沿着这两种范式的发展路径,绝大多数方法都是为了通过增加模型复杂性来提高性能。基于 GNN 的方法引入了专用神经模块、损失函数和网络架构。基于 BN 的方法扩展了额外的隐藏变量、变量依赖性和指定的先验。尽管日益复杂的模型通常会带来更好的性能,但在最近的一些研究中,这些改进却导致边际收益递减