Introduction 介绍

Recent advances in spatially resolved single-cell (SRSC) technologies allow the profiling of cellular gene expression in the tissue context, allowing comprehensive spatial characterization of various systems1,2,3,4,5,6,7. Coordinated by different cell states with varying gene expression patterns, spatial domains are higher-order functional units that recurrently distribute across tissue space, and have close relationships with tissue physiology8,9. In complex diseases such as cancer, mounting evidence has suggested the pivotal roles of specified spatial domains in disease diagnosis and monitoring10,11,12. Given the ever-increasing SRSC data13,14, many computational methods have been developed to identify spatial domains15,16,17.
空间分辨单细胞 (SRSC) 技术的最新进展允许对组织环境中的细胞基因表达进行分析,从而对各种系统进行全面的空间表征 1,2,3,4,5,6,7 。空间域是由具有不同基因表达模式的不同细胞状态协调的高阶功能单元,循环分布在组织空间中,与组织生理学有着密切的关系 8,9 。在癌症等复杂疾病中,越来越多的证据表明特定空间域在疾病诊断和监测中的关键作用 10,11,12 。鉴于不断增加的SRSC数据 13,14 ,已经开发了许多计算方法来识别空间域 15,16,17

In a typical SRSC dataset, the spatial coordinates and gene expression profiles of each cell are measured. Such data representation naturally forms a spatial graph with cells as nodes and gene expression as node attributes, which motivated the two major modeling paradigms in this field, i.e., Graph Neural Network (GNN)18,19,20, and Bayesian Network (BN)21,22,23. Along the developmental paths of both paradigms, the vast majority of methods were designed to improve performance by increasing model complexity. GNN-based methods introduced dedicated neural modules, loss functions, and network architectures. BN-based methods extend additional hidden variables, variable dependencies, and specified priors. Although increasingly complex models often lead to better performance, the improvements are, in some recent studies, seeing a diminishing marginal return24. Besides, additional model complexity may subject the algorithms to non-trivial parameter-tunning, low time efficiency, and/or reduced generalizability. As such, all these issues call for a new paradigm to break through the developmental bottlenecks in this field.
在典型的 SRSC 数据集中,测量每个细胞的空间坐标和基因表达谱。这种数据表示自然地形成了一个以细胞为节点、基因表达为节点属性的空间图,这催生了该领域的两种主要建模范式,即图神经网络(GNN) 18,19,20 和贝叶斯网络( BN) 21,22,23 。沿着这两种范式的发展路径,绝大多数方法都是为了通过增加模型复杂性来提高性能。基于 GNN 的方法引入了专用神经模块、损失函数和网络架构。基于 BN 的方法扩展了额外的隐藏变量、变量依赖性和指定的先验。尽管日益复杂的模型通常会带来更好的性能,但在最近的一些研究中,这些改进却导致边际收益递减 24 。此外,额外的模型复杂性可能会使算法面临不平凡的参数调整、低时间效率和/或降低的通用性。这些问题都需要新的范式来突破该领域的发展瓶颈。

In this work, we evaluate both advantages and disadvantages of the existing state-of-the-art methods to elicit the bottleneck problems that a new method must solve. We next analyze and observe the consistency of cellular neighborhood structure across 24 data from 8 different spatial technologies and tissue systems. Based on this, we design a biology-driven cellular context representation, which obtains consistent prediction improvements over current state-of-the-art GNN models in 3 different supervised learning settings on 13 spatial data across different technologies. Inspired by the above analyses, we present Multi-range cEll coNtext DEciphereR (MENDER) for unsupervised spatial domain identification. MENDER has 3 highlighted points, which are considered major bottlenecks of existing methods: (1) multi-slice spatial domain identification that challenges many advanced methods; (2) scalability to million-level datasets; and (3) improved running time efficiency without the need of GPU. Comprehensive benchmark analyses show MENDER’s substantial improvements in terms of accuracy, continuity, and running time over complex GNN and BN models on various datasets with increasing challenges. On the million-level brain spatial atlas, MENDER is the only method that successfully delineates major brain domains largely consistent with established Allen brain reference, without any human intervention. On a model of mouse brain aging, MENDER identifies subdomains consistent across 3 aging stages and also domains that specifically occurred in young mice. On a 40-patient triple-negative breast cancer (TNBC) dataset, MENDER can differentiate three subtypes of TNBC by explaining the cellular spatial organization differences. We also extended MENDER’s application on a wider range of spatial data types, showing its generalizability.
在这项工作中,我们评估了现有最先进方法的优点和缺点,以找出新方法必须解决的瓶颈问题。接下来,我们分析并观察来自 8 种不同空间技术和组织系统的 24 个数据的细胞邻域结构的一致性。基于此,我们设计了一种生物学驱动的细胞上下文表示,与当前最先进的 GNN 模型相比,它在 3 种不同的监督学习设置中针对不同技术的 13 种空间数据获得了一致的预测改进。受上述分析的启发,我们提出了用于无监督空间域识别的多范围细胞上下文解密器(MENDER)。 MENDER有3个亮点,被认为是现有方法的主要瓶颈:(1)多切片空间域识别,挑战了许多先进方法; (2)可扩展至百万级数据集; (3) 无需 GPU 即可提高运行时间效率。全面的基准分析表明,在挑战日益增加的各种数据集上,MENDER 在准确性、连续性和运行时间方面比复杂的 GNN 和 BN 模型有了显着改进。在百万级大脑空间图谱上,MENDER 是唯一能够成功描绘出与已建立的艾伦大脑参考基本一致的主要大脑区域且无需任何人为干预的方法。在小鼠大脑衰老模型中,MENDER 识别出在 3 个衰老阶段一致的子域以及特别出现在年轻小鼠中的域。在 40 名患者的三阴性乳腺癌 (TNBC) 数据集中,MENDER 可以通过解释细胞空间组织差异来区分 TNBC 的三种亚型。 我们还将 MENDER 的应用扩展到更广泛的空间数据类型,显示了其通用性。

Results 结果

Motivation and overview 动机和概述

Limitations of existing methods
现有方法的局限性

We first explain why a new method for spatial domain identification is still needed given the existence of many methods in the field. We select 8 existing methods published in the last two years and evaluate from 6 criteria, including support for multi-slice analysis, stability, interpretability, scalability, speed, and availability of cell context representation (Supplementary Fig. 1A). The definition of each criterion is briefly explained in Supplementary Fig. 1A and detailed explained in “Six aspects to view existing methods” section of “Methods”. These methods include 4 GNN-based (SpaGCN18, STAGATE19, CCST25, and SpaceFlow20) and 4 BN-based (BayesSpace21, BASS23, SpatialPCA26, and SOTIP27).
我们首先解释为什么鉴于该领域存在许多方法仍然需要一种新的空间域识别方法。我们选择了过去两年发布的 8 种现有方法,并根据 6 个标准进行评估,包括对多切片分析的支持、稳定性、可解释性、可扩展性、速度和细胞上下文表示的可用性(补充图 1A)。每个标准的定义在补充图1A中进行了简要解释,并在“方法”的“查看现有方法的六个方面”部分中进行了详细解释。这些方法包括 4 个基于 GNN 的方法(SpaGCN 18 、 STAGATE 19 、 CCST 25 和 SpaceFlow 20 )和 4 个基于 BN 的方法(BayesSpace 21 、 BASS 23 、 SpatialPCA 26 和 SOTIP 27 )。

One can observe that most evaluation criteria are strongly associated with the method principle (Supplementary Fig. 1A). All GNN-based methods have better scalability and speed (conditional on GPU) than BN-based methods and they can also output the context representations for cells. The common limitations of GNN-based methods are the lack of stability and interpretability inherited from general deep-learning models. BN-based methods, on the contrary, have better output stability and interpretability than GNN-based methods since they are generally built on well-defined probabilistic variable dependencies. But they cannot guarantee good scalability to large datasets with short running time, and generally don’t output the cell context representations (SpatialPCA26 as an exception). These evaluations were also verified in recent studies20,23 and in the subsequent benchmark analysis of this manuscript.
可以看出,大多数评估标准与方法原理密切相关(补充图 1A)。所有基于 GNN 的方法都比基于 BN 的方法具有更好的可扩展性和速度(以 GPU 为条件),并且它们还可以输出单元的上下文表示。基于 GNN 的方法的共同局限性是缺乏继承一般深度学习模型的稳定性和可解释性。相反,基于 BN 的方法比基于 GNN 的方法具有更好的输出稳定性和可解释性,因为它们通常建立在明确定义的概率变量依赖性之上。但它们不能保证对运行时间短的大型数据集具有良好的可扩展性,并且通常不输出单元上下文表示(SpatialPCA 26 是一个例外)。这些评估也在最近的研究 20,23 以及本手稿的后续基准分析中得到了验证。

In particular, some criteria in the above analysis are critical in the advent of the big data era of space omics. Many large consortia efforts have generated spatial datasets containing millions of cells collected from a bunch of slices28,29,30. In such scenarios, the scalability of methods to large datasets, running time, and the support of multi-slice analysis are especially needed. Although there are many methods for spatial domain identification, new innovations still are needed to meet the above criteria as possible.
特别是,上述分析中的一些标准对于空间组学大数据时代的到来至关重要。许多大型联盟的努力已经生成了包含从一堆切片 28,29,30 收集的数百万个单元的空间数据集。在这种场景下,特别需要方法对大数据集的可扩展性、运行时间以及多切片分析的支持。尽管空间域识别的方法有很多,但仍需要新的创新来尽可能满足上述标准。

Consistent neighborhood structures across different data
不同数据之间一致的邻域结构

We analyze the distance of neighboring cells from different spatial technologies and different tissue systems (Supplementary Fig. 1B). To do so, we collect 24 datasets generated by 6 different spatial technologies from SODB31 [https://gene.ai.tencent.com/SpatialOmics/]. These spatial data cover the major part of currently mainstream technologies14, including MERFISH29,30,32, DARTFISH14, BaristaSeq33, STARmap34, osmFISH4, and seqFISH35 (see “Methods”). For each dataset, we first construct a 1-NN graph of all cells based on each cell’s spatial coordinates, then the distance between each cell and its directly connected cell is recorded to form a distribution. One can observe that the major mass (between first and third quartiles) of the distribution is between 10 and 20 μm, concentrating around 15 μm (Supplementary Fig. 1B), even though these datasets are from distinct spatial technologies and tissue systems.
我们分析了来自不同空间技术和不同组织系统的相邻细胞的距离(补充图1B)。为此,我们从 SODB 31 [ https://gene.ai.tencent.com/SpatialOmics/] 收集了由 6 种不同空间技术生成的 24 个数据集。这些空间数据涵盖了目前主流技术的大部分 14 ,包括MERFISH 29,30,32 、DARTFISH 14 、BaristaSeq 33 、STARmap 34 、osmFISH 4 和 seqFISH 35 (参见“方法”)。对于每个数据集,我们首先根据每个单元的空间坐标构建所有单元的 1-NN 图,然后记录每个单元与其直接连接的单元之间的距离以形成分布。人们可以观察到分布的主要质量(第一和第三四分位数之间)在 10 到 20 μm 之间,集中在 15 μm 左右(补充图 1B),尽管这些数据集来自不同的空间技术和组织系统。

Multi-range cEll coNtext DEciphereR (MENDER)
多范围细胞上下文解密器 (MENDER)

Previous studies have used the cell type composition within the cellular neighborhood around each index cell as its context representation, followed by clustering on the representation (termed cellular neighborhood clustering, CNC)12. However, this approach only considers the context information in one spatial range, limiting the consideration of the cellular relationships across multiple ranges. Motivated by CNC, and given the consistent neighborhood structures, we present Multi-range cEll coNtext DEciphereR (MENDER) by building the cell state composition of multiple ranges into the cellular context representation (Fig. 1, see “Methods”). MENDER takes spatial omics data as input (Fig. 1A), then constructs the spatial graph based on the cell distance matrix and defines cell state based on the gene expression profiles (Fig. 1B). Then the cell state frequencies across multiple ranges are recorded to form the cell context representation (Fig. 1C). The applications of MENDER include identifying spatial domains from multiple slices, i.e., multi-slice analysis (Fig. 1D), detecting condition-specific spatial signatures (Fig. 1E), decoding the MENDER representation into cell spatial organization (Fig. 1F), and extending to large-scale datasets (Fig. 1G). Note that MENDER doesn’t rely on accurately annotated cell clustering, instead, cell clustering is obtained by simply adopting standard Leiden algorithm. We have tested MENDER’s robustness to different cell clustering methods and parameters, as well as noisy cell cluster labels (Supplementary Figs. 25, see “Cell state” in “Methods”).
先前的研究使用每个索引细胞周围的细胞邻域内的细胞类型组成作为其上下文表示,然后在表示上进行聚类(称为细胞邻域聚类,CNC) 12 。然而,这种方法仅考虑一个空间范围内的上下文信息,限制了对跨多个范围的细胞关系的考虑。在 CNC 的推动下,并考虑到一致的邻域结构,我们通过将多个范围的细胞状态组成构建到细胞上下文表示中,提出了多范围细胞上下文解密器 (MENDER)(图 1,参见“方法”)。 MENDER 将空间组学数据作为输入(图 1A),然后根据细胞距离矩阵构建空间图,并根据基因表达谱定义细胞状态(图 1B)。然后记录多个范围内的细胞状态频率以形成细胞上下文表示(图 1C)。 MENDER的应用包括从多个切片中识别空间域,即多切片分析(图1D),检测特定条件的空间特征(图1E),将MENDER表示解码为细胞空间组织(图1F),并扩展到大规模数据集(图1G)。请注意,MENDER 并不依赖于精确注释的细胞聚类,而是通过简单地采用标准 Leiden 算法来获得细胞聚类。我们测试了 MENDER 对不同细胞聚类方法和参数以及噪声细胞簇标签的稳健性(补充图 2-5,请参阅“方法”中的“细胞状态”)。

Fig. 1: Overview of MENDER.
图 1:MENDER 概述。
figure 1

A The input of MENDER is spatial omics data, containing a gene expression matrix and a spatial coordinate matrix. B, C The main body of MENDER. The cell distance matrix is computed using the spatial coordinate matrix and the cell state is determined by the gene expression matrix (B). The cell state frequencies are recorded across multiple ranges away from each cell (C). Applications of MENDER. MENDER can perform multi-slice spatial domain identification (D), identify condition-specific spatial signatures (E), interpret the context representation to biological entities (F), and scale to large datasets (G).
A MENDER的输入是空间组学数据,包含基因表达矩阵和空间坐标矩阵。 B、C MENDER的主体。使用空间坐标矩阵计算细胞距离矩阵,并通过基因表达矩阵 (B) 确定细胞状态。细胞状态频率是在远离每个细胞的多个范围内记录的 (C)。 MENDER 的应用。 MENDER 可以执行多切片空间域识别 (D)、识别特定条件的空间特征 (E)、将上下文表示解释为生物实体 (F) 以及扩展到大型数据集 (G)。

Better prediction power than graph neural network models
比图神经网络模型更好的预测能力

To evaluate MENDER’s representation power towards spatial domain prediction and compare with alternative methods, we employed a supervised learning strategy to compare the prediction accuracy of different methods (Fig. 2). The compared methods included SpaGCN18, SpaceFlow20, and STAGATE19, which are considered as the state-of-the-art spatial domain identification methods that can output cellular context representation. We also included SingleRange (the single-range version of MENDER), and CNC (cellular neighborhood clustering)12. We collected 13 SRSC datasets (see “Methods”) with spatial domain annotations regarded as ground truth. We reported the classification accuracy across tenfold across-validation using 3 different classifiers, i.e., Linear Support Vector Machines (SVM) (Fig. 2A), Radial Basis Function (RBF) SVM (Fig. 2B), and Random Forest (Fig. 2C), respectively. A consistent improvement was observed in the cellular context representation obtained by MENDER vis-a-vis modern GNN models independent of the classifier chosen (Fig. 2A–C). We also analyzed the influence of parameters of MENDER, the prediction accuracy of MENDER saturated after the number of ranges reaches 6, and the cell state clustering parameter didn’t affect the classification accuracy (Supplementary Fig. 6).
为了评估 MENDER 对空间域预测的表示能力并与其他方法进行比较,我们采用监督学习策略来比较不同方法的预测精度(图 2)。比较的方法包括 SpaGCN 18 、 SpaceFlow 20 和 STAGATE 19 ,它们被认为是最先进的空间域识别方法,可以输出细胞上下文表示。我们还包括 SingleRange(MENDER 的单范围版本)和 CNC(细胞邻域聚类) 12 。我们收集了 13 个 SRSC 数据集(参见“方法”),其中空间域注释被视为基本事实。我们使用 3 种不同的分类器,即线性支持向量机 (SVM)(图 2A)、径向基函数 (RBF) SVM(图 2B)和随机森林(图 2C)报告了十倍交叉验证的分类精度), 分别。与现代 GNN 模型相比,MENDER 获得的细胞上下文表示有一致的改进,与所选的分类器无关(图 2A-C)。我们还分析了MENDER参数的影响,范围数达到6后MENDER的预测精度饱和,细胞状态聚类参数不影响分类精度(补充图6)。

Fig. 2: Prediction accuracy of different spatial methods.
图2:不同空间方法的预测精度。
figure 2

Linear SVM (A), RBF SVM (B), and Random Forests (C) are performed on the top of each spatial representation method across 13 data, respectively. Each method is run for 5 times. The accuracy is measured using tenfold cross-validation. Error bars are based on mean and 95% confidence interval. Source data are provided as a Source Data file.
线性 SVM (A)、RBF SVM (B) 和随机森林 (C) 分别在 13 个数据的每种空间表示方法的基础上执行。每个方法运行 5 次。使用十倍交叉验证来测量准确性。误差线基于平均值和 95% 置信区间。源数据作为源数据文件提供。

Evaluation 评估

The strength of MENDER over complex models in supervised learning situations was evident from earlier analyses. As spatial domain identification is typically an unsupervised learning process that doesn’t use training data, we compared MENDER with other methods in unsupervised contexts (Fig. 3).
从早期的分析中可以看出 MENDER 在监督学习情况下相对于复杂模型的优势。由于空间域识别通常是一个不使用训练数据的无监督学习过程,因此我们在无监督环境中将 MENDER 与其他方法进行了比较(图 3)。

Fig. 3: Benchmarking of multi-slice spatial domain identifications.
图 3:多切片空间域识别的基准测试。
figure 3

AF Benchmarking on STARmap dataset. The dataset is from the mouse prelimbic area (A) from 3 slices (B). NMI and PAS are used to evaluate different methods for each slice separately (C, E), and jointly (D, F). Error bars are based on mean and 95% confidence interval. G–L Benchmarking on BaristaSeq dataset. The dataset is from the mouse primary visual area (G) from 3 slices (H). NMI and PAS are used to evaluate different methods for each slice separately (I, K), and jointly (J, L). Error bars are based on mean and 95% confidence interval. MR Benchmarking on MERFISH dataset. The dataset is from the mouse frontal cortex area from 31 slices (MP). NMI and PAS are used to evaluate different methods for each slice (Supplementary Fig. 7), and jointly (Q, R). Error bars are based on mean and 95% confidence interval. Source data are provided as a Source Data file.
STARmap 数据集上的 A–F 基准测试。数据集来自小鼠前边缘区域 (A) 的 3 个切片 (B)。 NMI 和 PAS 用于分别评估每个切片的不同方法(C、E)和联合评估(D、F)。误差线基于平均值和 95% 置信区间。 BaristaSeq 数据集上的 G-L 基准测试。数据集来自小鼠主要视觉区域 (G) 的 3 个切片 (H)。 NMI 和 PAS 用于分别评估每个切片的不同方法(I、K)和联合评估(J、L)。误差线基于平均值和 95% 置信区间。 MERFISH 数据集上的 M–R 基准测试。该数据集来自小鼠额叶皮层区域的 31 个切片 (M–P)。 NMI 和 PAS 用于评估每个切片的不同方法(补充图 7),并联合评估(Q,R)。误差线基于平均值和 95% 置信区间。源数据作为源数据文件提供。

The significance of multi-slice spatial domain identification has been highlighted by the ever-increasing large-scale studies, which collect samples from various tissue sections or individuals. Performing multi-slice analysis can ensures consistent cell labeling and uniform clustering granularities across different slices. We benchmarked MENDER against 4 spatial methods (i.e., STAGATE19, BASS23, CNC12, and SOTIP27) that supported multi-slice analysis, 2 non-spatial algorithms (i.e., Louvain and Leiden36), and 1 method that is single-range version of MENDER (i.e., SingleRange). The 3 benchmark datasets contain 3, 3, and 31 slices measured by STARmap, BaristaSeq, and MERFISH, respectively, each containing the expert-annotated spatial domain labels regarded as ground truth (see “Methods”). Consistent with previous studies20,25,26, the evaluation metrics (see “Methods”) included Normalized Mutual Information (NMI, metrics for accuracy, higher better) and Percentage of Abnormal Spots (PAS, metrics for continuity).
不断增加的大规模研究凸显了多切片空间域识别的重要性,这些研究从各种组织切片或个体中收集样本。执行多切片分析可以确保不同切片之间一致的细胞标记和统一的聚类粒度。我们针对 4 种支持多点的空间方法(即 STAGATE 19 、BASS 23 、CNC 12 和 SOTIP 27 )对 MENDER 进行了基准测试。 -切片分析,2种非空间算法(即Louvain和Leiden 36 ),以及1种MENDER单范围版本的方法(即SingleRange)。 3 个基准数据集分别包含 STARmap、BaristaSeq 和 MERFISH 测量的 3、3 和 31 个切片,每个切片都包含被视为地面实况的专家注释的空间域标签(参见“方法”)。与之前的研究 20,25,26 一致,评估指标(参见“方法”)包括标准化互信息(NMI,准确性指标,越高越好)和异常点百分比(PAS,连续性指标)。

Evaluation on STARmap dataset
STARmap数据集评估

We applied the 8 different methods (i.e., Louvain, Leiden, STAGATE, BASS, SOTIP, SingleRange, CNC, and MENDER) on a spatial transcriptomics dataset of the mouse prelimbic area (Fig. 3A) measured by STARmap, containing 3190 cells from 3 slices (Fig. 3B), with expression measurements for 166 genes34. Resulted from 5 replicated runs for each method, the multi-slice spatial domain identification performance of MENDER was consistently the best among alternative methods of all slices, both in accuracy (Figs. 3C, 5 runs for each bar) and continuity (Figs. 3E, 5 runs for each bar). The aggregated performance across all slices also indicated MENDER’s best performance (Fig. 3D, F, 15 runs for each bar).
我们在 STARmap 测量的小鼠前边缘区域的空间转录组数据集(图 3A)上应用了 8 种不同的方法(即 Louvain、Leiden、STAGATE、BASS、SOTIP、SingleRange、CNC 和 MENDER),其中包含来自 3 个区域的 3190 个细胞。切片(图 3B),包含 166 个基因 34 的表达测量。从每种方法 5 次重复运行的结果来看,MENDER 的多切片空间域识别性能在所有切片的替代方法中始终是最好的,无论是准确性(图 3C,每个条形 5 次运行)还是连续性(图 3E) ,每条运行 5 次)。所有切片的聚合性能也表明了 MENDER 的最佳性能(图 3D、F,每个条运行 15 次)。

Evaluation on BaristaSeq dataset
BaristaSeq 数据集的评估

We evaluated on a larger dataset with more spatial domains, which was collected from the mouse primary visual area (Fig. 3G) measured by BaristaSeq, containing 11,426 cells from 3 slices (Fig. 4H), with expression measurements for 79 genes33. Quantitative results of 5 replicated runs showed that MENDER got consistently the best accuracy (Fig. 3I) across different slices, and comparable continuity with BASS substantially better than others (Fig. 3K). The aggregated performance across all slices also indicated MENDER’s best accuracy (Fig. 3J, 15 runs for each bar) in terms of NMI, and second-highest continuity comparable with BASS in terms of PAS (Fig. 3L, 15 runs for each bar).
我们对具有更多空间域的更大数据集进行了评估,该数据集是从 BaristaSeq 测量的小鼠主要视觉区域(图 3G)收集的,包含来自 3 个切片的 11,426 个细胞(图 4H),并测量了 79 个基因的表达 33 。 5 次重复运行的定量结果表明,MENDER 在不同切片上始终获得最佳准确度(图 3I),并且与 BASS 的可比连续性明显优于其他方法(图 3K)。所有切片的聚合性能还表明,MENDER 在 NMI 方面具有最佳准确性(图 3J,每个条形运行 15 次),并且在 PAS 方面与 BASS 相当,具有第二高的连续性(图 3L,每个条形运行 15 次) 。

Fig. 4: Scalability and speed.
图 4:可扩展性和速度。
figure 4

AH Application of MENDER on a mouse brain atlas using MERSCOPE. A The dataset contains 9 slices from 3 different brain positions. For each slice, we show the MENDER result (top). We also highlight one slice for each position to compare with the Allen brain reference (bottom). Specific brain domains are shown, including Pontine Gray (B), Isocortex Layers (C), Corpus Callosum (D), Hippocampal region (E), Thalamus (F), and Caudoputamen (G). H The UMAP embedding on the top of MENDER cellular context representation. I Summary of benchmark datasets for unsupervised spatial domain identification. J Running time comparison of different spatial methods across the 4 datasets. Error bars are based on mean and 95% confidence interval. Source data are provided as a Source Data file.
A–H 使用 MERSCOPE 在小鼠脑图谱上应用 MENDER。 A 该数据集包含来自 3 个不同大脑位置的 9 个切片。对于每个切片,我们显示 MENDER 结果(顶部)。我们还突出显示每个位置的一个切片,以与艾伦大脑参考(底部)进行比较。显示了特定的脑域,包括桥灰 (B)、等皮质层 (C)、胼胝体 (D)、海马区 (E)、丘脑 (F) 和尾壳核 (G)。 H UMAP 嵌入在 MENDER 细胞上下文表示之上。 I 无监督空间域识别基准数据集摘要。 J 4 个数据集不同空间方法的运行时间比较。误差线基于平均值和 95% 置信区间。源数据作为源数据文件提供。

Evaluation on MERFISH dataset
MERFISH数据集评估

We challenged these methods on an even more complex dataset, collected from the mouse frontal cortex area measured by MERFISH, containing 378,918 cells from 31 slices of 3 ages, with expression measurements for 374 genes (Fig. 3M–P). Compared with the former datasets, this MERFISH dataset contained 33-fold more cells organized as more complex tissue structures and measured more genes. More importantly, the 31 slices were collected from multiple individuals from 3 different aging stages, potentially leading to non-shared spatial domains across slices. Due to the large data size, SOTIP and BASS raised running time issues, so we only compared Louvain, Leiden, STAGATE, SingleRange, CNC, and MENDER. Quantitative results of 5 replicated runs showed that MENDER got substantial improvements in both accuracy and continuity across 31 slices (Supplementary Fig. 7). The aggregated performance of all slices also showed the best accuracy (Fig. 3Q, 155 runs for each bar) and highest continuity (Fig. 3R, 155 runs for each bar) of MENDER.
我们在一个更复杂的数据集上对这些方法提出了挑战,该数据集是从 MERFISH 测量的小鼠额叶皮层区域收集的,包含来自 3 个年龄的 31 个切片的 378,918 个细胞,以及 374 个基因的表达测量(图 3M-P)。与以前的数据集相比,这个 MERFISH 数据集包含的细胞数量增加了 33 倍,组织结构更加复杂,并测量了更多基因。更重要的是,这 31 个切片是从 3 个不同衰老阶段的多个个体收集的,可能导致切片之间不共享空间域。由于数据量较大,SOTIP 和 BASS 引发了运行时间问题,因此我们仅比较 Louvain、Leiden、STAGATE、SingleRange、CNC 和 MENDER。 5 次重复运行的定量结果表明,MENDER 在 31 个切片中的准确性和连续性方面均得到了显着改善(补充图 7)。所有切片的聚合性能还显示了 MENDER 的最佳准确性(图 3Q,每个条形 155 次运行)和最高的连续性(图 3R,每个条形 155 次运行)。

Evaluating single-slice versus multi-slice analysis
评估单切片与多切片分析

The benefit of multi-slice analysis over single-slice analysis, as emphasized in recent studies23,27, is its ability to perform spatial domain identifications across multiple slices simultaneously. This facilitates the comparison of identified results across slices. Utilizing single-slice analysis for multiple slices separately introduces challenges, such as the need for additional domain matching, especially when the number of slices increases, and the risk of inconsistent clustering granularity across slices. We further assessed whether multi-slice analysis offers per-slice improvement. We conducted single-slice analysis for each slice across three datasets, resulting in a total of 37 single-slice analyses. We then compared the accuracy (in terms of NMI) of single-slice and multi-slice analyses for each slice (see “Methods”). Although statistically significant, the per-slice improvement of multi-slice over single-slice analysis is relatively small (Supplementary Fig. 8).
正如最近的研究 23,27 所强调的那样,多切片分析相对于单切片分析的好处是它能够同时跨多个切片执行空间域识别。这有利于跨切片的识别结果的比较。对多个切片分别使用单切片分析会带来挑战,例如需要额外的域匹配(尤其是当切片数量增加时),以及跨切片聚类粒度不一致的风险。我们进一步评估了多切片分析是否提供了每切片的改进。我们对三个数据集的每个切片进行了单切片分析,总共产生了 37 个单切片分析。然后,我们比较了每个切片的单切片和多切片分析的准确性(就 NMI 而言)(参见“方法”)。尽管具有统计显着性,但多切片相对于单切片分析的每切片改进相对较小(补充图 8)。

Extended evaluation 扩展评估

We conducted supplementary studies to evaluate MENDER’s performance outside its primary scope. Specifically, we tested MENDER’s capacity for cell type identification, a task distinct from spatial domain identification, using both supervised and unsupervised settings. Please see Supplementary Notes “Extended analysis on cell type identification task”. Our supplementary analyses demonstrated that MENDER, as well as other state-of-the-art spatial domain identification methods such as STAGATE and SpaceFlow, align more closely with Domain annotations than Cell Type annotations. Quantitative measures, such as Normalized Mutual Information (NMI), showed that MENDER’s performance reduced when compared against Cell Type annotation instead of Domain annotation. Furthermore, when performing supervised cell type identification, the accuracy of MENDER’s representation was found to be approximately 50%, indicating a substantial number of mislabeled cells. This pattern was also observed in other leading spatial domain identification methods, suggesting that these methods are not ideally suited for cell type identification tasks.
我们进行了补充研究,以评估 MENDER 在其主要范围之外的表现。具体来说,我们使用监督和无监督设置测试了 MENDER 的细胞类型识别能力,这是一项与空间域识别不同的任务。请参阅补充说明“细胞类型识别任务的扩展分析”。我们的补充分析表明,MENDER 以及其他最先进的空间域识别方法(例如 STAGATE 和 SpaceFlow)与域注释的一致性比细胞类型注释更紧密。标准化互信息 (NMI) 等定量测量表明,与细胞类型注释而不是域注释相比,MENDER 的性能有所下降。此外,在进行有监督的细胞类型识别时,发现 MENDER 表示的准确性约为 50%,表明存在大量错误标记的细胞。在其他领先的空间域识别方法中也观察到了这种模式,表明这些方法并不非常适合细胞类型识别任务。

Scalability on million-level brain atlas
百万级脑图谱的可扩展性

Another distinct feature of MENDER is the scalability to large datasets, which stems from its deterministic recording of a few spatial neighbors for each cell, in contrast with other complex models that require repeatedly accessing the large spatial graph during the iterations of stochastic optimization. We tested MENDER on a single-cell spatial transcriptomics dataset of the whole mouse brain sections measured by MERSCOPE [https://info.vizgen.com/mouse-brain-data], containing 734,696 cells from 9 slices of 3 different brain positions (Fig. 4A), with expression measurements for 483 genes. The challenges of this dataset include a large cell count, the complex spatial structure of the whole brain, and inconsistent domains between different positions. The advantage of this dataset is that each position has three replicates, which can help verify the consistency of the identified spatial domains of multi-slice analysis. The Allen brain map37 supported additional reference to be compared with [https://mouse.brain-map.org/static/atlas]. Due to these challenges, MENDER was the only spatial method that can handle this dataset.
MENDER 的另一个显着特征是对大型数据集的可扩展性,这源于它对每个单元的几个空间邻居的确定性记录,这与其他需要在随机优化迭代期间重复访问大型空间图的复杂模型形成鲜明对比。我们在通过 MERSCOPE [ https://info.vizgen.com/mouse-brain-data] 测量的整个小鼠大脑切片的单细胞空间转录组数据集上测试了 MENDER,该数据集包含来自 3 个不同大脑位置的 9 个切片的 734,696 个细胞(图 4A),包含 483 个基因的表达测量。该数据集的挑战包括大量的细胞数量、整个大脑的复杂空间结构以及不同位置之间的不一致域。该数据集的优点是每个位置有3个重复,可以帮助验证多切片分析所识别的空间域的一致性。艾伦大脑图 37 支持与 [ https://mouse.brain-map.org/static/atlas] 进行比较的额外参考。由于这些挑战,MENDER 是唯一可以处理该数据集的空间方法。

We closely examined the 9-slice joint analysis results provided by MENDER. A comprehensive assessment of all 9 slices revealed visual consistency within the same position, thus verifying the method’s reliability (Fig. 4A, top). Upon closer inspection of the detailed results for the 3 positions, respectively, a clear correspondence between MENDER’s predicted results and the Allen brain map was identified (Fig. 4A, bottom). For instance, MENDER accurately outlined the laminar patterns in the Isocortex in all three positions (Fig. 4A, C), and also identified domains shared by the 3 positions while being morphologically distinct, such as the Corpus Callosum (Fig. 4A, D) and the Hippocampal region (Fig. 4A, E). Additionally, domains not shared across positions were also detected, examples include the Pontine Gray (Position 1, Fig. 4A, B), the Thalamus (Position 1 and 2, Fig. 4A,F), and the Caudoputamen (Position 2 and 3, Fig. 4A, G). The UMAP embedding of MENDER-induced cell context representation (MENDER-UMAP, Fig. 4H) exhibited both continuous tissue patterns and clear boundaries of distinct tissue compartments. These analyses demonstrate MENDER’s scalability on million-level spatial datasets and the potential for condition-specific domain discovery with multi-slice analysis.
我们仔细检查了 MENDER 提供的 9 切片联合分析结果。对所有 9 个切片的综合评估显示了同一位置内的视觉一致性,从而验证了该方法的可靠性(图 4A,顶部)。分别仔细检查 3 个位置的详细结果后,发现 MENDER 的预测结果与 Allen 脑图之间存在明显的对应关系(图 4A,底部)。例如,MENDER 准确地勾勒出所有三个位置的等皮质层状模式(图 4A、C),并且还识别了这 3 个位置共享但形态不同的域,例如胼胝体(图 4A、D)和海马区(图4A,E)。此外,还检测到跨位置不共享的域,例如脑桥灰(位置 1,图 4A、B)、丘脑(位置 1 和 2,图 4A、F)和尾壳门(位置 2 和 3) ,图4A,G)。 MENDER 诱导的细胞上下文表示的 UMAP 嵌入(MENDER-UMAP,图 4H)表现出连续的组织模式和不同组织区室的清晰边界。这些分析证明了 MENDER 在百万级空间数据集上的可扩展性以及通过多切片分析进行特定条件域发现的潜力。

Running time 运行时间

Another important feature of MENDER is its speed. We summarized the running time of different spatial domain methods in previous datasets (Fig. 4I) and identified a substantial improvement of MENDER over other methods across the 4 datasets (Fig. 4J). Specifically, MENDER was 9.4-fold, 9.3-fold, and 139.1-fold faster than the second-fastest method in STARmap dataset (Dataset1), BaristaSeq dataset (Dataset2), and MERFISH dataset (Dataset3), respectively (Fig. 4J). In the last MERSCOPE dataset (Dataset4) containing 734,696 cells, MENDER is the only appliable method.
MENDER 的另一个重要特点是它的速度。我们总结了之前数据集中不同空间域方法的运行时间(图 4I),并发现 MENDER 相对于 4 个数据集的其他方法有实质性改进(图 4J)。具体来说,MENDER 分别比 STARmap 数据集(Dataset1)、BaristaSeq 数据集(Dataset2)和 MERFISH 数据集(Dataset3)中第二快的方法快 9.4 倍、9.3 倍和 139.1 倍(图 4J)。在最后一个包含 734,696 个单元格的 MERSCOPE 数据集(Dataset4)中,MENDER 是唯一适用的方法。

Generalizability 普遍性

We’ve made strides towards testing MENDER’s generalizability on more spatial data modalities, particularly in light of the rapid emergence of new spatial technologies. We evaluated MENDER and other methods on additional datasets (Supplementary Fig. 9). These datasets include single-cell resolution (including Stereo-seq38, osmFISH4, and STARmapPLUS39) and non-single-cell resolution (including Spatial Transcriptomics40, 10x Visium, and Slide-seq6). With the non-single-cell resolution data, we aimed to assess whether MENDER could identify expected tissue structures, even though its initial design is not specifically for such data.
我们在测试 MENDER 在更多空间数据模式上的通用性方面取得了长足进步,特别是考虑到新空间技术的迅速出现。我们在其他数据集上评估了 MENDER 和其他方法(补充图 9)。这些数据集包括单细胞分辨率(包括 Stereo-seq 38 、 osmFISH 4 和 STARmapPLUS 39 )和非单细胞分辨率(包括 Spatial Transcriptomics 40 、10x Visium 和 Slide-seq 6 )。利用非单细胞分辨率数据,我们旨在评估 MENDER 是否能够识别预期的组织结构,尽管其最初设计并不是专门针对此类数据。

Firstly, we used two Slide-seq datasets (Supplementary Fig. 10). The first dataset originates from the mouse cerebellum and contains 23,096 genes measured on 39,496 beads. We compared the results from different methods (Supplementary Fig. 10B) with the tissue structure reference (Supplementary Fig. 10A). It was evident that all methods consistently identified major cerebellum structures like Molecular Layer (ML) and Granule Layer (GL). However, only MENDER pinpointed the Purkinje Layer (PL), which other methods overlooked. These identified structures were then matched using known structural markers (Supplementary Fig. 10C). Marker genes Gpm6b (ML marker, Supplementary Fig. 10C 3rd column), Calb1 (PL marker, Supplementary Fig. 10C 2nd column), and Cblb3 (GL marker, Supplementary Fig. 10C 1st column) were specifically upregulated in the expected structures. The overlay of these genes also displayed patterns consistent with the expected structures and MENDER’s results (Supplementary Fig. 10C 4th column).
首先,我们使用了两个 Slide-seq 数据集(补充图 10)。第一个数据集源自小鼠小脑,包含在 39,496 个珠子上测量的 23,096 个基因。我们将不同方法的结果(补充图10B)与组织结构参考(补充图10A)进行了比较。很明显,所有方法都一致地识别了主要的小脑结构,如分子层 (ML) 和颗粒层 (GL)。然而,只有 MENDER 精确定位了浦肯野层 (PL),而其他方法却忽略了这一点。然后使用已知的结构标记对这些已识别的结构进行匹配(补充图10C)。标记基因Gpm6b(ML标记,补充图10C第3列)、Calb1(PL标记,补充图10C第2列)和Cblb3(GL标记,补充图10C第1列)在预期结构中特异性上调。这些基因的叠加也显示出与预期结构和 MENDER 结果一致的模式(补充图 10C 第 4 列)。

The second dataset, derived from the mouse hippocampus, covered 23,264 genes measured on 53,208 beads. By comparing the results of various methods (Supplementary Fig. 10E) with H&E images and structure annotations from the Allen reference atlas (Supplementary Fig. 10D), it became clear that all methods identified the overall structure of the hippocampal region. However, BASS and SOTIP had difficulties to differentiate finer structures of Cornu Ammonis (CA) and Dentate Gyrus (DG), whereas STAGATE, CNC, and MENDER succeeded in differentiating sub-structures of CA (CA1 and CA3) and DG (Supplementary Fig. 10E). Validation of these identified structures using known structural markers (Supplementary Fig. 10F) revealed Wfs1, a marker gene of CA1, was notably enriched in the region annotated by MENDER (Supplementary Fig. 10F 1st column). The same was true for Chgb in CA3 (Supplementary Fig. 10F 2nd column) and C1ql2 in DG (Supplementary Fig. 10F 3rd column). The overlay of these marker genes (Supplementary Fig. 10F 4th column) matched well with the predicted structures.
第二个数据集来自小鼠海马体,涵盖了在 53,208 个珠子上测量的 23,264 个基因。通过将各种方法的结果(补充图 10E)与 Allen 参考图集的 H&E 图像和结构注释(补充图 10D)进行比较,很明显,所有方法都识别了海马区域的整体结构。然而,BASS 和 SOTIP 难以区分 Cornu Ammonis (CA) 和齿状回 (DG) 的更精细结构,而 STAGATE、CNC 和 MENDER 成功区分 CA(CA1 和 CA3)和 DG 的子结构(补充图 1)。 10E)。使用已知的结构标记对这些已识别的结构进行验证(补充图10F)揭示了Wfs1(CA1的标记基因)在MENDER注释的区域中显着富集(补充图10F第一列)。 CA3 中的 Chgb(补充图 10F 第 2 列)和 DG 中的 C1ql2(补充图 10F 第 3 列)也是如此。这些标记基因的叠加(补充图 10F 第 4 列)与预测的结构非常匹配。

We also collected mouse olfactory bulb (MOB) datasets from four distinct spatial technologies: Spatial Transcriptomics (ST), 10x Visium, Slide-seq, and Stereo-seq. These datasets’ resolutions span from single-cell (Stereo-seq) to nearly single-cell (Slide-seq) to tissue level (10x Visium and ST), enabling a thorough assessment of method generalizability across different spatial resolutions. The MOB tissue exhibited well-structured layer patterns with known layer markers, shown for each dataset (Supplementary Fig. 11C, F, I, L). For ST and 10x Visium data, we provided the histological images from the original publications40,41 (Supplementary Fig. 11A, D). From the ST and 10x Visium results, MENDER consistently identified major MOB layers, including Granule Cell Layer (GCL), Mitral Cell Layer (MCL), Glomerular Layer (GL), and Olfactory Nerve Layer (ONL) (Supplementary Fig. 11B, E). Other methods, such as BASS, SOTIP, and STAGATE also detected some of these structures (Supplementary Fig. 11B, E). From Slide-seq and Stereo-seq results, MENDER highlighted finer tissue structures, including the Rostral Migratory Stream (RMS) and Internal Plexiform Layer (IPL), leveraging the enhanced spatial resolution (Supplementary Fig. 11H, K). Other methods, particularly SOTIP and STAGATE, also detected some expected structures (Supplementary Fig. 11H, K).
我们还从四种不同的空间技术收集了小鼠嗅球 (MOB) 数据集:空间转录组学 (ST)、10x Visium、Slide-seq 和 Stereo-seq。这些数据集的分辨率涵盖从单细胞(Stereo-seq)到近单细胞(Slide-seq)再到组织水平(10x Visium 和 ST),从而能够对不同空间分辨率下的方法通用性进行全面评估。 MOB 组织表现出结构良好的层模式,具有已知的层标记,如每个数据集所示(补充图 11C、F、I、L)。对于 ST 和 10x Visium 数据,我们提供了原始出版物 40,41 的组织学图像(补充图 11A、D)。根据 ST 和 10x Visium 结果,MENDER 一致地识别出主要 MOB 层,包括颗粒细胞层 (GCL)、二尖瓣细胞层 (MCL)、肾小球层 (GL) 和嗅神经层 (ONL)(补充图 11B、E) )。其他方法,例如 BASS、SOTIP 和 STAGATE 也检测到了其中一些结构(补充图 11B、E)。从 Slide-seq 和 Stereo-seq 结果中,MENDER 突出了更精细的组织结构,包括喙部迁移流 (RMS) 和内部丛状层 (IPL),利用增强的空间分辨率(补充图 11H、K)。其他方法,特别是 SOTIP 和 STAGATE,也检测到了一些预期的结构(补充图 11H、K)。

Subsequently, we collected four datasets of brain cortex tissue from two distinct spatial technologies: 10x Visium and osmFISH. While osmFISH offers single-cell resolution, 10x Visium data is at the spot level. The cortical structures facilitated an evaluation of method performance. Similar to our prior analysis, we provided layer markers for reference (Supplementary Fig. 12C, F, I). Paired histological images were available for 10x Visium data (Supplementary Fig. 12A, D), and structure annotations from the original publication were available for osmFISH data (Supplementary Fig. 12G). For the two 10x Visium datasets, BASS and MENDER results yielded the best laminar structures compared to other methods (Supplementary Fig. 12B, E). For the osmFISH data, referencing the tissue anatomy (Supplementary Fig. 12G), all methods identified expected layers such as Pia, Layer1-6, white matter, and hippocampus. Yet, MENDER achieved sharper layer boundaries (Supplementary Fig. 12H).
随后,我们通过两种不同的空间技术收集了四个大脑皮层组织数据集:10x Visium 和 osmFISH。虽然 osmFISH 提供单细胞分辨率,但 10x Visium 数据处于现场水平。皮质结构有助于评估方法性能。与我们之前的分析类似,我们提供了层标记以供参考(补充图12C、F、I)。配对组织学图像可用于 10x Visium 数据(补充图 12A、D),原始出版物的结构注释可用于 osmFISH 数据(补充图 12G)。对于两个 10x Visium 数据集,与其他方法相比,BASS 和 MENDER 结果产生了最佳层状结构(补充图 12B、E)。对于 osmFISH 数据,参考组织解剖结构(补充图 12G),所有方法都识别了预期的层,例如 Pia、Layer1-6、白质和海马体。然而,MENDER 实现了更清晰的层边界(补充图 12H)。

Next, we collected eight data obtained by a newer spatial technology, STARmapPLUS39 (Supplementary Fig. 13). The assayed tissue includes both the cortex and hippocampus regions (Supplementary Fig. 13). Since these datasets are of high quality, sourced from standard mouse brain coronal section, a comparison between method results and the Allen reference atlas readily reveals which methods can better identify expected tissue structures. Across these eight samples, all methods differentiated between the cortex and Hippocampal formation (HPF). Focusing on HPF sub-structures, almost all methods detected CA1, CA3, and DG, but only MENDER consistently identified CA2 (Supplementary Fig. 13). Regarding cortex sub-structures, MENDER delineated the clearest layer boundaries compared to other methods (Supplementary Fig. 13).
接下来,我们收集了通过更新的空间技术 STARmapPLUS 39 获得的八个数据(补充图 13)。检测的组织包括皮质和海马区(补充图13)。由于这些数据集来自标准小鼠大脑冠状切片,质量很高,因此方法结果与艾伦参考图集之间的比较很容易揭示哪些方法可以更好地识别预期的组织结构。在这八个样本中,所有方法都区分了皮质和海马结构 (HPF)。着眼于 HPF 子结构,几乎所有方法都检测到 CA1、CA3 和 DG,但只有 MENDER 一致地识别出 CA2(补充图 13)。关于皮质子结构,与其他方法相比,MENDER 描绘了最清晰的层边界(补充图 13)。

We have developed an online webpage [https://mender-tutorial.readthedocs.io/], which provide essential guidance on the applications of MENDER for various data types.
我们开发了一个在线网页[https://mender-tutorial.readthedocs.io/],为各种数据类型的MENDER应用提供了必要的指导。

Identify age-consistent and age-specific spatial domains
识别年龄一致和特定年龄的空间域

We again analyzed the MERFISH dataset (Fig. 3M–P) since it contains spatial single-cell data at different aging stages (i.e., 4 weeks, 24 weeks, and 20 months)42, which might lead to age-associated biological insights. MENDER-UMAP showed agreement between the low-dimensional embedding and ground truth labels, and also implied the existence of subdomains ignored by original annotation (Fig. 5A, annotated by red, green, pink, and orange dashed circles). We herein sub-clustered the data using MENDER to 9 domains (Fig. 5A, right). Compared with MENDER results before sub-clustering (i.e., the analysis in Fig. 3M–R, where the number of clusters was set to 8), the new clustering result got significantly improved accuracy, consistently across 31 slices (Fig. 5B).
我们再次分析了MERFISH数据集(图3M-P),因为它包含不同衰老阶段(即4周、24周和20个月)的空间单细胞数据 42 ,这可能导致与年龄相关的生物学见解。 MENDER-UMAP 显示了低维嵌入和真实标签之间的一致性,并且还暗示了原始注释忽略的子域的存在(图 5A,用红色、绿色、粉色和橙色虚线圆圈注释)。我们在此使用 MENDER 将数据细分为 9 个域(图 5A,右)。与子聚类之前的 MENDER 结果(即图 3M-R 中的分析,其中聚类数设置为 8)相比,新的聚类结果的准确性显着提高,在 31 个切片中保持一致(图 5B)。

Fig. 5: Discovery of age-related spatial domains.
figure 5

A MENDER-UMAP of the MERFISH aging dataset, colored by the expert annotation from the original publication (left) and MENDER results (right). B The ARI and NMI before and after MENDER sub-clustering. Each point is a slice (the number of slices in total is 31). The green line indicates improved performance after sub-clustering and the orange line indicates decreased performance after sub-clustering. The original annotation and MENDER’s annotation were plotted on the three stages, 4 weeks (C), 24 weeks (D), and 20 months (E), respectively. The aging stage labels are annotated on the MENDER-UMAP plot both on a single plot (F) and separately (G). H The stage distribution across different domains from the original annotation (top) and MENDER (bottom), respectively. I The domain distribution across different slices from 4-week-old mice. J Spatial distribution of MENDER-annotated domains across different aging stages. Different colors indicate different spatial domains. K The spatial signature analysis of MENDER-annotated domains across slices from all aging stages. Each row is a feature of MENDER-computed context representation. The cell states are indicated by different colors, and the ranges are indicated by the number beside the cell state label. Each column shows the spatial signature of an identified domain. The values of the context representation matrix are reflected by the size and colors of the dots. Source data are provided as a Source Data file.
MERFISH 老化数据集的 MENDER-UMAP,由原始出版物(左)和 MENDER 结果(右)中的专家注释着色。 B MENDER 子聚类前后的 ARI 和 NMI。每个点都是一个切片(切片总数为 31)。绿线表示子聚类后性能提高,橙色线表示子聚类后性能下降。原始注释和 MENDER 注释分别绘制在 4 周 (C)、24 周 (D) 和 20 个月 (E) 三个阶段。老化阶段标签在 MENDER-UMAP 图上标注在单个图 (F) 和单独图 (G) 上。 H 分别来自原始注释(顶部)和 MENDER(底部)的不同域的阶段分布。 I 4 周龄小鼠不同切片的结构域分布。 J MENDER 注释域在不同衰老阶段的空间分布。不同的颜色表示不同的空间域。 K 对所有衰老阶段的切片进行 MENDER 注释域的空间特征分析。每一行都是 MENDER 计算的上下文表示的一个特征。细胞状态用不同的颜色表示,范围由细胞状态标签旁边的数字表示。每列显示已识别域的空间特征。上下文表示矩阵的值由点的大小和颜色反映。源数据作为源数据文件提供。

We sought to examine the discrepancy between the original annotation and MENDER prediction (Fig. 5A). We first focused on the Olfactory region (OLF) (Fig. 5A, left), which was sub-clustered as two domains, D6 (pink dashed circle) and D7 (orange dashed circle), by MENDER (Fig. 5A). The spatial single-cell plot displayed not only consistent existence of the two subdomains across different aging stages but also similar localization relationships between them. Spatial signature analysis (see “Methods”) showed dominant distribution of InN-Olf (Inhibitory Neurons) and ExN-Olf (Exhibitory Neurons) in the two sub-domain of OLF (i.e., OLF1 and OLF2), respectively, across ranges 0~4 (Fig. 5K, purple dashed boxes), indicating distinct neuron activities between the two subdomains of OLF conserved across ages. This analysis demonstrated MENDER’s potential to identify biologically meaningful subdomains overlooked by previous analyses.
我们试图检查原始注释和 MENDER 预测之间的差异(图 5A)。我们首先关注嗅觉区域(OLF)(图 5A,左),它被 MENDER 子聚类为两个域:D6(粉色虚线圆圈)和 D7(橙色虚线圆圈)(图 5A)。空间单细胞图不仅显示了两个子域在不同衰老阶段的一致存在,而且它们之间也显示了相似的定位关系。空间特征分析(参见“方法”)显示 InN-Olf(抑制性神经元)和 ExN-Olf(展示性神经元)分别在 OLF 的两个子域(即 OLF1 和 OLF2)中占主导地位,分布范围为 0~ 4(图5K,紫色虚线框),表明OLF的两个子域之间不同年龄保守的神经元活动。该分析证明了 MENDER 具有识别先前分析所忽视的具有生物学意义的子域的潜力。

We next explored an apparent discrepancy between the original annotation and MENDER’s prediction, specifically concerning D4 and D8 (Fig. 5A, right, red and green dashed circle). Both of these were initially annotated as Corpus Callosum (CC) (Fig. 5A, left). The UMAP embedding of MENDER (MENDER-UMAP) revealed an uneven distribution of the three stage labels (i.e., 4 weeks, 24 weeks, and 20 months) across the CC in feature space (Fig. 5F, G). Quantitatively, D8 was almost entirely concentrated in the 4-week stage, while D4 was mostly populated by the 24-week and 20-month stages (Fig. 5H bottom), contrasting the original annotation (Fig. 5H, top). The consistent distribution of D8 across 10 replicates of 4-week mice suggested that this result was not an artifact (Fig. 5I). This paragraph primarily focused on discovering these subdomains within the CC and examining their distribution across different time points.
接下来我们探讨了原始注释和 MENDER 预测之间的明显差异,特别是关于 D4 和 D8(图 5A,右侧,红色和绿色虚线圆圈)。这两者最初都被注释为胼胝体 (CC)(图 5A,左)。 MENDER (MENDER-UMAP) 的 UMAP 嵌入揭示了特征空间中 CC 中三个阶段标签(即 4 周、24 周和 20 个月)的分布不均匀(图 5F、G)。从数量上讲,D8几乎完全集中在4周阶段,而D4主要由24周和20个月阶段填充(图5H底部),与原始注释形成对比(图5H顶部)。 D8 在 4 周小鼠的 10 个重复中的一致分布表明该结果不是人为因素(图 5I)。本段主要侧重于发现 CC 内的这些子域并检查它们在不同时间点的分布。

Advancing from the discovery of these sub-CC domains, we further investigated their spatial distribution. One of the advantages of MENDER’s multi-slice analysis is that it allows for a direct comparison of domain labels across different slices. We selected representative tissue slices from the three aging stages and presented the spatial distribution of MENDER-identified spatial domains for comparison (Fig. 5J). The resulting spatial map exhibited highly similar laminar structures (from outermost to innermost layers: Pia mater, Layer II/III, Layer V, Layer VI, CC, and Striatum) and sharp boundaries, as expected. Notably, the CC domain exhibited different colors (corresponding to different domain labels) between stages, i.e., green at 4 weeks and red at 24 weeks and 20 months. The spatial signature analysis revealed the primary difference between CC (4w) and CC (24w & 20 m) was the distinct distributions of oligodendrocyte subtypes (Fig. 5K, red boxes). These observations underscore MENDER’s potential to identify condition-specific domains and illustrate the spatial relationships between these domains across different stages.
在发现这些子 CC 域的基础上,我们进一步研究了它们的空间分布。 MENDER 多切片分析的优点之一是它允许直接比较不同切片之间的域标签。我们选择了三个衰老阶段的代表性组织切片,并呈现了 MENDER 识别的空间域的空间分布以进行比较(图 5J)。正如预期的那样,所得的空间图表现出高度相似的层状结构(从最外层到最内层:软脑膜、II/III 层、V 层、VI 层、CC 和纹状体)和清晰的边界。值得注意的是,CC 域在不同阶段表现出不同的颜色(对应于不同的域标签),即 4 周时为绿色,24 周和 20 个月时为红色。空间特征分析揭示了 CC (4w) 和 CC (24w & 20 m) 之间的主要区别是少突胶质细胞亚型的不同分布(图 5K,红色框)。这些观察结果强调了 MENDER 识别特定条件域并说明不同阶段这些域之间的空间关系的潜力。

Differentiate spatial-related subtypes of breast cancer
区分乳腺癌的空间相关亚型

Having already benchmarked the performance and scalability of MENDER and demonstrated the biological insights that MENDER-identified spatial domains provide, our next objective was to investigate whether MENDER could identify spatial domains with biomedical significance. For this purpose, we employed the multi-slice analysis of MENDER on a large-scale MIBI-TOF spatial proteomics dataset11 of 40 Triple-Negative Breast Cancer (TNBC) patients, comprising approximately 200,000 cells in total (see Fig. 6A). This large volume of data poses a challenge to other related methods (as demonstrated with former sections), but can be easily resolved by the high computational efficiency of MENDER.
我们已经对 MENDER 的性能和可扩展性进行了基准测试,并展示了 MENDER 识别的空间域提供的生物学见解,我们的下一个目标是研究 MENDER 是否可以识别具有生物医学意义的空间域。为此,我们对 40 名三阴性乳腺癌 (TNBC) 患者的大规模 MIBI-TOF 空间蛋白质组数据集 11 进行了 MENDER 的多层分析,总共包含约 200,000 个细胞(见图6A)。如此大量的数据对其他相关方法提出了挑战(如前几节所示),但可以通过 MENDER 的高计算效率轻松解决。

Fig. 6: Differentiate TNBC subtypes.
图 6:区分 TNBC 亚型。
figure 6

A Information of the datasets. The datasets comprise 3 patient groups, namely cold, mixed and compartmentalized, consisting of 6, 19 and 15 patients, respectively. B The cell type annotations and MENDER-predicted domains on representative patients from different groups are shown. PCA plots of the patients characterized by the proportions of cell type (fine), termed CT-fine repr (C), cell type (coarse), termed CT-coarse repr (D), and MENDER-identified domains termed MENDER repr (E), respectively. N = 6 independent samples for cold, N = 19 for mixed, and N = 15 for compartmentalized. Boxplot setting: the lower and upper hinges show the first and third quartiles (the 25th and 75th percentiles); the center lines correspond to the median; the upper whisker extends from the upper hinge to the largest value, which should be less than 1.5× the interquartile range and the lower whisker extends from the lower hinge to the smallest value, which is at most the 1.5× interquartile range. F The separability of patients by 3 different proportions is quantified by the classification accuracy of 2 supervised classifiers, i.e., KNN and SVM. The y-axis shows the classification accuracy (ACC) reported fivefold cross-validation, using either KNN or SVM as classifiers. The patient-level labels are cold, mixed and compartmentalized. The p-values (one-sided t-test) indicate the significance of the difference between CT-fine repr and MENDER repr, and between CT-coarse repr and MENDER repr, respectively. Error bars are based on mean and 95% confidence interval. G Similar to F, except that the features of these proportions are PCA-reduced to the same dimensions before classification. The number of PCs ranges from 2 to 17. The y-axis also shows the classification accuracy (ACC) reported by fivefold cross-validation. N = 5 independent experiments. Error bars are based on mean and 95% confidence interval. Source data are provided as a Source Data file.
A 数据集的信息。数据集包括 3 个患者组,即冷组、混合组和隔离组,分别由 6、19 和 15 名患者组成。 B 显示了来自不同组的代表性患者的细胞类型注释和 MENDER 预测域。患者的 PCA 图以细胞类型(细)的比例为特征,称为 CT-细 repr (C);细胞类型(粗),称为 CT-粗 repr (D);以及 MENDER 识别的域,称为 MENDER repr (E) ), 分别。 N = 6 个独立样品(冷)、N = 19(混合)、N = 15(分隔)。箱线图设置:下铰链和上铰链显示第一和第三四分位数(第 25 个和第 75 个百分位数);中心线对应于中位数;上须线从上铰链延伸到最大值,该值应小于 1.5× 四分位距,下须线从下铰链延伸到最小值,该值至多为 1.5× 四分位距。 F 通过 2 个监督分类器(即 KNN 和 SVM)的分类精度来量化 3 个不同比例的患者的可分离性。 y 轴显示使用 KNN 或 SVM 作为分类器的五重交叉验证报告的分类精度 (ACC)。患者层面的标签是冰冷的、混合的和分隔的。 p 值(单侧 t 检验)分别表示 CT-fine repr 和 MENDER repr 之间以及 CT-coarse repr 和 MENDER repr 之间差异的显着性。误差线基于平均值和 95% 置信区间。 G 与 F 类似,不同之处在于这些比例的特征在分类前经过 PCA 缩减至相同维度。 PC 数量范围为 2 至 17 台。 y 轴还显示五重交叉验证报告的分类准确性 (ACC)。 N = 5 个独立实验。误差线基于平均值和 95% 置信区间。源数据作为源数据文件提供。

The patients were categorized into three TNBC subtypes based on the accompanying metadata, namely the cold group, mixed group, and compartmentalized group (see Fig. 6A). The three groups were reported to have significantly different survival outcomes11. Notably, the original study had previously shown that differences between the three subtypes could not be explained by cell type abundance alone. Given that MENDER-identified spatial domains integrate the spatial relationships of diverse cell types, we posited that the abundance of spatial domains across patients may be better suited to distinguish between the three TNBC subtypes.
根据随附的元数据,将患者分为三种 TNBC 亚型,即寒冷组、混合组和分隔组(见图 6A)。据报道,三组的生存结果存在显着差异 11 。值得注意的是,最初的研究此前表明,三种亚型之间的差异不能仅用细胞类型丰度来解释。鉴于 MENDER 识别的空间域整合了不同细胞类型的空间关系,我们假设患者之间丰富的空间域可能更适合区分三种 TNBC 亚型。

In this dataset, we observed distinct topological variations across the TNBC subtypes via the spatial domains identified by MENDER (Fig. 6B). We proceeded to represent each patient using the proportion of cell type and domain. The original publication supplied both fine and coarse cell classifications. In subsequent discussions, we define patient representation using MENDER domain proportion as “MENDER repr” and patient representation deploying cell type (fine/coarse) as “CT-fine repr” and “CT-coarse repr”. Aligning with prior findings, neither CT-fine repr nor CT-coarse repr could effectively distinguish among the three subtypes (Fig. 6C, D).
在此数据集中,我们通过 MENDER 识别的空间域观察到 TNBC 亚型之间明显的拓扑变化(图 6B)。我们继续使用细胞类型和域的比例来代表每位患者。原始出版物提供了精细和粗略的细胞分类。在后续讨论中,我们将使用 MENDER 域比例的患者表示定义为“MENDER repr”,将部署细胞类型(细/粗)的患者表示定义为“CT-fine repr”和“CT-coarse repr”。与之前的发现一致,CT-精细repr和CT-粗略repr都不能有效区分这三种亚型(图6C、D)。

Contrastingly, MENDER repr successfully differentiated the subtypes, and captured the progressive prognosis from the cold group, to the mixed group, and ultimately to the compartmentalized group, as shown in the PCA plot (Fig. 6E). In quantitative terms, for both CT-fine repr and CT-coarse repr, while PC1 could significantly differentiate compartmentalized from cold/mixed for the cell type proportion, PC2 was unable to distinguish between the three groups (Fig. 6C, D). Conversely, for MENDER repr, PC1 could significantly differentiate cold from mixed/compartmentalized, and PC2 could significantly differentiate compartmentalized from cold/mixed (Fig. 6E). By combining the two main PCs, MENDER repr could easily tell apart the three TNBC subtypes (Fig. 6E).
相比之下,MENDER repr 成功区分了亚型,并捕获了从感冒组到混合组,最后到分区组的渐进预后,如 PCA 图所示(图 6E)。从定量角度来看,对于 CT-精细 repr 和 CT-粗略 repr,虽然 PC1 可以显着区分细胞类型比例的冷/混合区室,但 PC2 无法区分这三组(图 6C、D)。相反,对于 MENDER repr,PC1 可以显着区分冷与混合/隔室化,PC2 可以显着区分隔室与冷/混合(图 6E)。通过组合两个主要 PC,MENDER repr 可以轻松区分三种 TNBC 亚型(图 6E)。

The PCA analysis underscored the visual separability of the patient groups when MENDER was deployed. In order to evaluate the differentiating ability of different patient representations, we adopted the procedure used in representation learning literature43,44,45 and constructed classification tasks using different representations, reporting the classification accuracy as a measure of the differentiating power of each representation (see “Methods”). For the three representations—MENDER repr, CT-fine repr, and CT-coarse repr—we applied two supervised classifiers: K-nearest neighbors (KNN) and Support Vector Machines (SVM). The results underscored that MENDER repr clearly outperformed CT-fine repr and CT-coarse repr in classifier accuracy, whether KNN (Fig. 6F left) or SVM (Fig. 6F right) was employed as the classifier, suggesting a significantly higher ease of classification for the patient groups using MENDER-derived representations. To control for the effects of varying feature numbers, we also evaluated the classification accuracy in three feature spaces by projecting the same number of principal components (see “Methods”). The findings affirmed that the superior predictive capacity of the MENDER-identified domains remained unaffected by feature dimensionality (Fig. 6G).
PCA 分析强调了部署 MENDER 时患者组的视觉可分离性。为了评估不同患者表征的区分能力,我们采用了表征学习文献 43,44,45 中使用的程序,并使用不同的表征构建了分类任务,将分类准确性报告为每种表征的区分能力的度量(参见“方法”)。对于三种表示形式——MENDER repr、CT-fine repr 和 CT-coarse repr——我们应用了两种监督分类器:K-近邻 (KNN) 和支持向量机 (SVM)。结果强调,无论是采用 KNN(图 6F 左)还是 SVM(图 6F 右)作为分类器,MENDER repr 在分类器准确性方面明显优于 CT-fine repr 和 CT-coarse repr,这表明分类的简易性明显更高对于使用 MENDER 派生表示的患者组。为了控制不同特征数量的影响,我们还通过投影相同数量的主成分来评估三个特征空间中的分类准确性(参见“方法”)。研究结果证实,MENDER 识别的域的卓越预测能力仍然不受特征维度的影响(图 6G)。

Discussions 讨论

Spatial domain identification is a crucial task in spatial biology and is an important intersection of the machine learning and spatial omics fields. For this task, new methods often followed the established paradigms and conducted incremental developments by increasing model complexity. But whether complex models could deliver consistent gains has not been discussed. To this end, our analysis hinted that a simple model might bring better performance over modern complex models, thus inspiring a new paradigm to break through current bottlenecks.
空间域识别是空间生物学的一项关键任务,也是机器学习和空间组学领域的重要交叉点。对于这项任务,新方法通常遵循已建立的范式,并通过增加模型复杂性来进行增量开发。但复杂的模型是否能够带来持续的收益尚未得到讨论。为此,我们的分析表明,简单的模型可能会比现代复杂模型带来更好的性能,从而激发新的范式来突破当前的瓶颈。

There are primarily two factors that can influence the determination of spatial domain labels. The first factor is cellular context because MENDER relies on the representation of cellular context to determine spatial domain labels. However, it’s important to note that the presence of the same spatial domains doesn’t necessarily imply the absence of cellular context variations. For instance, consider the original spatial domain region in Fig. 6B, which can still contain cellular context variations, as demonstrated by the color variations in the Rh region in Supplementary Fig. 35A. Here, we used UMAP-reduced cellular context representation and mapped it to the CIELAB color space for each cell to illustrate these variations. The second factor is the Leiden clustering resolution. When we increased the clustering resolution, we observed that the Rl region generated different spatial domain labels (Supplementary Fig. 35C). Conversely, when we decreased the Leiden resolution, we noticed that the domain labels within Rl became more homogeneous (Supplementary Fig. 35B).

There were two folds of analytical contributions. First, we identified consistent neighborhood statistics across different spatial technologies in different tissue systems. Second, we found that simple cellular context analysis might have improved performance compared to state-of-the-art complex models (e.g., Graph Neural Networks and Bayesian Networks) in both supervised and unsupervised settings. There were also two folds of practical contributions. First, we solved the multi-slice analysis in the spatial domain identification task which was little considered by previous methods. Second, we solved the scalability and running time problems, which were the main issue of previous methods in the applications on million-level datasets. We conducted a memory usage comparison between MENDER and other competing methods (Supplementary Table 1). We recorded the peak memory usage for each method on every dataset. The results indicated that SingleRange, CNC, and MENDER exhibit the best memory efficiency, as they only require the maintenance of one fixed spatial graph and context representation in memory. It’s worth noting that even on the MERSCOPE dataset with over 700,000 cells, MENDER only requires 25 min and 80GB+ of memory, showcasing its potential capability to handle datasets of million-level scale.
分析贡献有两个方面。首先,我们确定了不同组织系统中不同空间技术的一致邻域统计数据。其次,我们发现,在有监督和无监督的环境中,与最先进的复杂模型(例如图神经网络和贝叶斯网络)相比,简单的细胞上下文分析可能会提高性能。还有两方面的实际贡献。首先,我们解决了空间域识别任务中的多切片分析,这是以前的方法很少考虑的。其次,我们解决了可扩展性和运行时间问题,这是先前方法在百万级数据集应用中的主要问题。我们对 MENDER 和其他竞争方法之间的内存使用情况进行了比较(补充表 1)。我们记录了每个数据集上每种方法的峰值内存使用情况。结果表明,SingleRange、CNC 和 MENDER 表现出最佳的内存效率,因为它们只需要在内存中维护一个固定的空间图和上下文表示。值得注意的是,即使在超过70万个单元的MERSCOPE数据集上,MENDER也只需要25分钟和80GB+的内存,展示了其处理百万级数据集的潜在能力。

MENDER’s innovation is best understood within the computational community of spatial transcriptomics, where the mainstream methodological paradigm of spatial domain identifications (also known as spatial clustering) follows a two-step approach8,17. The first step involves encoding the cellular context information into a context-aware representation, and the second step involves clustering the context-aware representation to obtain the spatial domain labels.
MENDER 的创新在空间转录组学的计算社区中得到了最好的理解,其中空间域识别(也称为空间聚类)的主流方法范式遵循两步方法 8,17 。第一步涉及将细胞上下文信息编码为上下文感知表示,第二步涉及对上下文感知表示进行聚类以获得空间域标签。

Regarding the first step, some methods use graph convolutional networks (GCN) to obtain the context-aware representation18,19,46, while others use probabilistic graphical models (PGM)47,48. MENDER utilizes a new concept which presents a descriptor on how cell state is spatially organized within the local context, as an alternative to GCN and PGM methods. We avoided GCN and PGM methods due to their limitations: GCN methods were reported to have unstable outputs across different runs on different machines25, and PGM methods were reported to have longer running time23. Additionally, both existing GCN and PGM methods lack scalability to large datasets, causing the failure of applying existing methods on datasets larger than 106 cells.

For the second step, Leiden is used for clustering the context-aware representation, which was commonly used by other spatial clustering methods’ 2nd step. Although some methods use other clustering algorithms for the second step (e.g., STAGATE uses mclust when the number of classes is known19), they are fundamentally similar to MENDER, as they all use existing clustering algorithms to cluster the context-aware representation to obtain the spatial domain labels.
第二步,Leiden 用于对上下文感知表示进行聚类,这在其他空间聚类方法的第二步中常用。尽管有些方法在第二步中使用其他聚类算法(例如,当类别数量已知时,STAGATE 使用 mclust 19 ),但它们在本质上与 MENDER 类似,因为它们都使用现有的聚类算法来对类别进行聚类。上下文感知表示来获取空间域标签。

Due to the relatively early developmental stage of the spatial omics field, most available datasets are from the brain and other healthy tissues, and there is still a lack of complex disease or tumor data. In such disease cases, the cell spatial organization and effect range (radius) might be more complex and various across different tissue positions, progression stages, or patients. How to design an adaptive effect range is a future challenge. Due to the complex tissue structures to be studied, especially in diseases, identifying hierarchical tissue structures is another important direction. A straightforward solution is to embed hierarchical clustering methods into current spatial clustering methods. A recent innovation solved this problem in a different way, extracting spatial structure through co-expression hotspots, enabling the identification of multi-scale, multi-layer, interpretable organizational structures49.
由于空间组学领域发展相对早期,大多数可用数据集来自大脑和其他健康组织,仍然缺乏复杂的疾病或肿瘤数据。在此类疾病情况下,细胞空间组织和效应范围(半径)可能更加复杂,并且在不同的组织位置、进展阶段或患者之间存在差异。如何设计自适应的效果范围是未来的挑战。由于要研究的组织结构非常复杂,特别是在疾病中,识别分层组织结构是另一个重要方向。一个简单的解决方案是将层次聚类方法嵌入到当前的空间聚类方法中。最近的一项创新以不同的方式解决了这个问题,通过共表达热点提取空间结构,从而能够识别多尺度、多层、可解释的组织结构 49

Methods 方法

MENDER 修复者

MENDER takes multiple slices of spatial omics data as input (see “Input” in the following and Fig. 1A), which contains two matrices, i.e., the gene expression matrix and the spatial coordinate matrix. Next, MENDER uses the gene expression matrix to determine the cell states (see “Cell Group Computation” in the following), and uses the spatial coordinate matrix to obtain the multi-range neighborhood of each cell (see “Multi-range neighborhood Representation Computation” in the following) (Fig. 1B). Then for each index cell, MENDER records the number of each cell state in each range and concatenates them to get the context representation of the index cell (Fig. 1C), which is finally clustered to get the spatial domains (see “MENDER-UMAP Visualization and MENDER Spatial Domains Computation” in the following) (Fig. 1D-G). For technical details, please refer to Supplementary Fig. 14 while reading the following descriptions. Data elements (including input, output, and intermediate data), boxed in purple in Supplementary Fig. 14, are emphasized using “Double quotes” in the following descriptions.
MENDER以多个空间组学数据切片作为输入(参见下文的“输入”和图1A),其中包含两个矩阵,即基因表达矩阵和空间坐标矩阵。接下来,MENDER使用基因表达矩阵来确定细胞状态(参见下文“细胞组计算”),并使用空间坐标矩阵获得每个细胞的多范围邻域(参见“多范围邻域表示计算”) ”)(图1B)。然后,对于每个索引单元,MENDER 记录每个范围内每个单元状态的数量,并将它们连接起来以获得索引单元的上下文表示(图 1C),最后对其进行聚类以获得空间域(参见“MENDER-UMAP”)可视化和 MENDER 空间域计算”)(图 1D-G)。有关技术细节,请参阅补充图 14,同时阅读以下说明。补充图 14 中紫色框内的数据元素(包括输入、输出和中间数据)在以下描述中使用“双引号”进行强调。

Input 输入

The input of MENDER is multiple slices of spatially resolved single-cell data. “Multiple slices” means these tissue slices are collected from multiple tissue sections and don’t share a common spatial coordinate system. MENDER processes a spatial dataset containing G genes measured on N cells from S slices, with three pieces of input information: (1) The “Gene expression matrix” (Nrows×Gcolumns); (2) The “Spatial matrix” (Nrows×2columns, for 2D data, or Nrows×3columns, for 3D data); (3) The slice ID identifier (a vector of length N), specifying the origin slice of each cell. In order to be compatible with common single-cell and spatial omics analysis packages50,51, the data accepted by MENDER is prepared in Anndata format. The multi-slice data should be merged into the same Anndata object, using the keyword “slice_id” in Anndata.obs to identify different slices.

Cell group computation 细胞群计算

Before the construction of cell context representation, MENDER relies on the determination of each cell’s cell state (i.e., “Cell group” in Supplementary Fig. 14) based on the gene expression matrix. In our practice, if batch effects exist across slices, Harmony52 (scanpy.externel implementation, default setting) is used for data integration, followed by neighborhood searching and Leiden clustering (scanpy.tool implementation, with resolution = 2) on the “Harmonized embedding”. If no batch effect is present, the PCA-reduced gene expression profiles are directly subjected to neighborhood searching and Leiden clustering.
在构建细胞上下文表示之前,MENDER 依赖于根据基因表达矩阵确定每个细胞的细胞状态(即补充图 14 中的“细胞组”)。在我们的实践中,如果跨切片存在批量效应,则使用Harmony 52 (scanpy.externel实现,默认设置)进行数据集成,然后进行邻域搜索和Leiden聚类(scanpy.tool实现,分辨率= 2)关于“统一嵌入”。如果不存在批次效应,则直接对 PCA 减少的基因表达谱进行邻域搜索和莱顿聚类。

Some resource papers provided expert-annotations of cell types, if such reliable annotation is provided as prior knowledge, the above procedure can be bypassed to directly acquire the “Cell group”. This approach has the potential to enhance accuracy and shorten the running time. In real-world applications, particularly when confronted with complex disease cases and possible batch effects, we highly recommend using reliable cell type annotation across multiple slices to evade potential inaccuracies.
一些资源论文提供了细胞类型的专家注释,如果提供这种可靠的注释作为先验知识,则可以绕过上述过程,直接获取“细胞组”。这种方法有可能提高准确性并缩短运行时间。在现实应用中,特别是在面对复杂的疾病病例和可能的批次效应时,我们强烈建议在多个切片上使用可靠的细胞类型注释,以避免潜在的不准确性。

We evaluated the robustness of MENDER with respect to various cell clustering methods and parameters across multiple datasets (Supplementary Figs. 24). Specifically, we assessed four single-cell clustering methods: UMAP + KMeans, Louvain, Leiden, and SC3s53. We then reported MENDER’s performance (quantified in terms of NMI) in relation to diverse clustering parameters associated with each method. As clustering granularity increased, MENDER’s performance rapidly approached its peak and maintained without significant degradation. This behavior demonstrates MENDER’s robustness to variations in cell clustering methods and their corresponding parameters across datasets. Broadening our exploration, we examined MENDER’s robustness against low-quality cell clusters (Supplementary Fig. 5). To achieve this, we introduced varying noise levels to cell cluster labels. Our results indicate that MENDER’s performance suffers only a slight drop when the noise level remains under 0.5, emphasizing MENDER’s robustness even in the presence of noisy cell group labels.
我们评估了 MENDER 在多个数据集中的各种细胞聚类方法和参数方面的稳健性(补充图 2-4)。具体来说,我们评估了四种单细胞聚类方法:UMAP + KMeans、Louvain、Leiden 和 SC3s 53 。然后,我们报告了 MENDER 与每种方法相关的不同聚类参数的性能(以 NMI 量化)。随着聚类粒度的增加,MENDER 的性能迅速接近峰值并保持不变,没有明显下降。这种行为证明了 MENDER 对细胞聚类方法及其跨数据集的相应参数的变化的鲁棒性。为了扩大我们的探索范围,我们检查了 MENDER 针对低质量细胞簇的鲁棒性(补充图 5)。为了实现这一目标,我们向细胞簇标签引入了不同的噪声水平。我们的结果表明,当噪声水平保持在 0.5 以下时,MENDER 的性能仅略有下降,这强调了即使存在噪声细胞组标签,MENDER 的稳健性。

In more general cases, the disengagement of the cell state clustering step could provide a divide-and-conquer solution when inputting low-quality spatial data. Besides the above approach to determine the “Cell Group”, one can also apply reference-based54 cell type annotations using methods such as scArches55, Tangram56, and Spatial-ID57, to determine the cell state labels. Such feature of MENDER makes it a highly flexible framework that can effectively integrate rich resources of bioinformatics tools.
在更一般的情况下,当输入低质量空间数据时,脱离细胞状态聚类步骤可以提供分而治之的解决方案。除了上述确定“细胞组”的方法外,还可以使用 scArches 55 、 Tangram 56 等方法应用基于引用的 54 细胞类型注释,和 Spatial-ID 57 ,以确定细胞状态标签。 MENDER的这种特性使其成为一个高度灵活的框架,可以有效整合丰富的生物信息学工具资源。

Multi-range neighborhood Representation Computation
多范围邻域表示计算

The “Spatial Matrix”, “Slice ID”, and the “Cell Group” (obtained previously) are utilized to calculate the “Multi-range Neighborhood Representation”. The “Slice ID” separates the “Spatial Matrix” into multiple matrices, each representing the spatial matrix of a slice, to prevent cells from different slices from becoming spatial neighbors. For each spatial matrix corresponding to a slice, spatial neighborhood searching (Squidpy51 implementation, default parameters) is performed independently by constructing a multi-process parallelization pool (Python multiprocessing implementation).
“空间矩阵”、“切片 ID”和“单元组”(之前获得)用于计算“多范围邻域表示”。 “切片ID”将“空间矩阵”分成多个矩阵,每个矩阵代表一个切片的空间矩阵,以防止来自不同切片的细胞成为空间邻居。对于切片对应的每个空间矩阵,通过构建多进程并行化池(Python多处理实现)独立执行空间邻域搜索(Squidpy 51 实现,默认参数)。

For each slice, around every cell, S ranges (the setting of S depends on the spatial resolution, as noted in Supplementary Fig. 14: 2 for 10X Visium/ST, 4 for Slide-seq, and 6 for single-cell resolution technologies) of spatial neighborhoods are created, forming S ring areas around the central cell (the radius is set to 15um by default). The cell index located within each ring area is recorded for each central cell. Note that although 15um was consistently set across this manuscript, it may not necessarily hold if broader range of spatial data is obtained from a larger variety of tissues in the future. Therefore, we have developed a function module called estimate_radius that enables the evaluation of distance distributions for new datasets.

Combining the “Cell Group” computed earlier, for each cell, the frequencies of cell types located within the central cell’s associated multi-range neighborhood are recorded. These frequencies are then concatenated to form the “Multi-range Neighborhood Representation”.
结合之前计算的“细胞组”,对于每个细胞,记录位于中心细胞相关的多范围邻域内的细胞类型的频率。然后将这些频率连接起来形成“多范围邻域表示”。

Formally, suppose the total number of cells across all input slices is N, the first step partitioned all cells into C distinct cell states, i.e., “Cell group” in Supplementary Fig. 14, noted as G={gc},c[1,C]. The cell state of the i-th cell is noted as celli,i[1,N]. The origin slice of the i-th cell is noted as slicei,i[1,N]. The spatial coordinate of the i-th cell is noted as (xi,yi),i[1,N]. The number of ranges is set to S. The radius is set to R. Then the multi-range neighborhood representation matrix, MZ+N×(S×C), in which the i-th row is the context-aware representation of the i-th cell.

Mi,(s1)×C+c=|{j|(s1)×RDist(i,j)<s×R}{j|slicej=slicei}{j|cellj=gc}|
(1)
where:
Dist(i,j)=(xjxi)2+(yjyi)2
i,j[1,N]
s[1,S]
c[1,C]

MENDER-UMAP visualization and MENDER spatial domains computation
MENDER-UMAP 可视化和 MENDER 空间域计算

Utilizing the “Multi-range Neighborhood Representation” of each cell, dimension reduction and clustering are executed to generate the “MENDER-UMAP Visualization” and the “MENDER Spatial Domains”. To create the “MENDER-UMAP Visualization”, neighborhood graph (scanpy.pp.neighbors) is constructed on the normalized and PCA-reduced “Multi-range Neighborhood Representation” (implemented by scanpy.pp.normalize_total then scanpy.pp.log1p and scanpy.pp.pca). Then UMAP (implemented by scanpy.tool.umap) is applied on the neighborhood graph. To generate the “MENDER Spatial Domains”, the Leiden clustering is employed to cluster on the neighborhood graph (same as before). The clustering resolution of Leiden is set in the following manner: if the expected number of domains is known, a function is implemented to automatically estimate the suitable Leiden resolution. This can be accomplished by executing run_clustering_normal (with a positive value as the parameter, for the expected number of domains). Conversely, if the expected number of domains is not available (as in exploratory studies), the Leiden resolution defaults to 0.5. This is achieved by running run_clustering_normal (with a negative value for clustering resolution).
利用每个单元的“多范围邻域表示”,执行降维和聚类以生成“MENDER-UMAP可视化”和“MENDER空间域”。为了创建“MENDER-UMAP 可视化”,邻域图 (scanpy.pp.neighbors) 是在标准化和 PCA 简化的“多范围邻域表示”上构建的(由 scanpy.pp.normalize_total 实现,然后是 scanpy.pp.log1p 和scanpy.pp.pca)。然后将 UMAP(由 scanpy.tool.umap 实现)应用于邻域图。为了生成“MENDER 空间域”,采用莱顿聚类在邻域图上进行聚类(与之前相同)。 Leiden 的聚类分辨率按以下方式设置:如果已知域的预期数量,则实现一个函数来自动估计合适的 Leiden 分辨率。这可以通过执行 run_clustering_normal 来完成(对于预期的域数量,使用正值作为参数)。相反,如果预期的域数量不可用(如在探索性研究中),则莱顿分辨率默认为 0.5。这是通过运行 run_clustering_normal (聚类分辨率为负值)来实现的。

Evaluation and biomedical applications
评估和生物医学应用

Finally, once the spatial domains are obtained, one might want to evaluate the accuracy of the identified domains and carry out biomedical applications. For evaluation purposes, MENDER includes Compute_NMI, a tool for comparing the similarity between predicted domains and ground truth domains using Normalized Mutual Information (NMI). Compute_PAS is also provided to assess the spatial coherence of the predicted domains.
最后,一旦获得空间域,人们可能想要评估所识别域的准确性并进行生物医学应用。出于评估目的,MENDER 包括 Compute_NMI,这是一种使用归一化互信息 (NMI) 比较预测域和真实域之间相似性的工具。还提供 Compute_PAS 来评估预测域的空间一致性。

For biomedical applications, it is suggested to use the proportions of domains of each patient as their representation. In the case of unsupervised analysis, Principal Component Analysis (PCA) is applied to the patient representation to embed each patient in a low-dimensional space. This process can automatically partition patients into different groups with significantly different outcomes. More details can be found in “Evaluation of patient representations” in “Methods”.
对于生物医学应用,建议使用每个患者的域比例作为其表示。在无监督分析的情况下,主成分分析(PCA)应用于患者表示,以将每个患者嵌入低维空间中。这个过程可以自动将患者分为不同的组,其结果显着不同。更多详细信息可以在“方法”中的“患者陈述评估”中找到。

Determine the optimal clustering resolution
确定最佳聚类分辨率

The challenge of determining the appropriate resolution or number of regions in spatial clustering is a common hurdle in the field. To address this challenge, we introduced the “res_search” method in MENDER. This approach enables users to iteratively search for the optimal Leiden resolution, given the expected number of regions (Supplementary Fig. 36). To demonstrate, Supplementary Fig. 37 highlights the effectiveness of the “res_search” method in resolution selection. Using a MERSCOPE brain dataset, we showed that MENDER, with default resolution settings, identifies fine-grained structures. However, when applying “res_search” with an expected number of regions set to 5, MENDER accurately discerns broader brain regions, aligning with the Allen Brain Atlas.
确定空间聚类中适当的分辨率或区域数量的挑战是该领域的常见障碍。为了应对这一挑战,我们在 MENDER 中引入了“res_search”方法。这种方法使用户能够在给定预期区域数量的情况下迭代搜索最佳莱顿分辨率(补充图 36)。为了证明这一点,补充图 37 强调了“res_search”方法在分辨率选择方面的有效性。使用 MERSCOPE 大脑数据集,我们表明 MENDER 在默认分辨率设置下可以识别细粒度结构。然而,当应用“res_search”并将预期区域数量设置为 5 时,MENDER 可以准确识别更广泛的大脑区域,与 Allen Brain Atlas 保持一致。

Six aspects to view existing methods
六方面审视现有方法

Support for multi-slice analysis
支持多切片分析

The multi-slice analysis is a concept proposed relative to single-slice analysis23. It aims to perform spatial domain identifications on multiple slices at the same time so that the labels of the identified results can be compared across slices. If the single-slice analysis is used to analyze multiple slices separately, it will lead to two problems. First, the labeling result of each slice is independent, which means that the domain A in one slice and the domain A in another slice do not necessarily refer to the same domain, resulting in the need for additional domain matching, which is more challenging in scenarios where the number of slices increases. Second, single-slice analysis of multiple slices may result in inconsistent clustering granularity across slices. For example, domain A in one slice may be split into domain C and domain D in another slice.
多切片分析是相对于单切片分析 23 提出的概念。它的目的是同时对多个切片进行空间域识别,以便识别结果的标签可以跨切片进行比较。如果用单切片分析来分别分析多个切片,会导致两个问题。首先,每个切片的标记结果是独立的,这意味着一个切片中的域A和另一切片中的域A不一定指同一个域,导致需要额外的域匹配,这在切片数量增加的场景。其次,多个切片的单切片分析可能会导致切片之间的聚类粒度不一致。例如,一个切片中的域A可以被划分为另一切片中的域C和域D。

The importance of multi-slice analysis was highlighted in recent studies23,27, and only 3 existing methods provided the interface for users to perform multi-slice analysis, including STAGATE19, BASS23, and SOTIP27. STAGATE provides a tutorial for multi-slice analysis at [https://stagate.readthedocs.io/en/latest/AT1.html], SOTIP provides at [https://github.com/TencentAILabHealthcare/SOTIP/tree/master/SOTIP_analysis/multi_sample], and BASS provides at [https://zhengli09.github.io/BASS-Analysis/].
最近的研究强调了多切片分析的重要性 23,27 ,现有的方法中只有3种方法为用户提供了进行多切片分析的接口,包括STAGATE 19 、BASS 23 和 SOTIP 27 。 STAGATE 在 [ https://stagate.readthedocs.io/en/latest/AT1.html] 提供了多切片分析的教程,SOTIP 在 [ https://github.com/TencentAILabHealthcare/SOTIP/tree/master/ SOTIP_analysis/multi_sample],BASS在[https://zhengli09.github.io/BASS-Analysis/]提供。

We can categorize the methods that support multi-slice analysis into three paradigms: “early-support”, “late-support”, and “data-end-support”, based on the point in the workflow where the integration occurs.
根据工作流程中发生集成的点,我们可以将支持多切片分析的方法分为三种范式:“早期支持”、“后期支持”和“数据端支持”。

The “early-support” paradigm typifies methods such as SOTIP and BASS, which perform data integration early in the procedure. In other words, they initially harmonize gene expressions across different slices before proceeding to construct spatial graphs independently for each slice. The final modeling is then performed on all slices jointly, yielding spatial domain results comparable between slices.
“早期支持”范式代表了 SOTIP 和 BASS 等方法,它们在过程的早期执行数据集成。换句话说,他们最初协调不同切片之间的基因表达,然后继续为每个切片独立构建空间图。然后对所有切片联合执行最终建模,产生切片之间可比较的空间域结果。

On the other hand, STAGATE exemplifies the “late-support” paradigm, where single-slice analysis of STAGATE is performed independently for each slice. This process generates a context-aware representation for each slice. The data integration operation is then performed on these representations, with the final clustering carried out in the integrated embedding space.
另一方面,STAGATE 体现了“后期支持”范例,其中 STAGATE 的单切片分析是针对每个切片独立执行的。此过程为每个切片生成上下文感知的表示。然后对这些表示执行数据集成操作,并在集成的嵌入空间中进行最终的聚类。

The third paradigm, “data-end-support”, is exemplified by BayesSpace. This paradigm modifies the spatial coordinates of different slices to lay on the same spatial coordinates, and maintains a substantial gap between slices, so that spots from different slices are not neighbors. The algorithm then proceeds with single-slice analysis, yielding spatial clustering results.
第三个范式“数据端支持”,以贝叶斯空间为例。该范例修改不同切片的空间坐标以放置在相同的空间坐标上,并在切片之间保持很大的间隙,使得来自不同切片的点不相邻。然后该算法继续进行单切片分析,产生空间聚类结果。

At present, SpaGCN, CCST, SpaceFlow, and SpatialPCA currently lack the necessary functionalities in their code and documentation for multi-slice analysis. However, with modifications to their code, they could be adapted to facilitate multi-slice analysis, using either the “early-support” or “late-support” paradigms previously discussed. On the other hand, BayesSpace’s methodology does not generate context-aware cell representations, making it unsuitable for extension via the “late-support” paradigm.
目前,SpaGCN、CCST、SpaceFlow 和 SpatialPCA 的代码和文档中缺乏多切片分析所需的功能。然而,通过修改代码,可以使用前面讨论的“早期支持”或“后期支持”范例来调整它们以促进多切片分析。另一方面,BayesSpace 的方法不会生成上下文感知的单元表示,因此不适合通过“后期支持”范式进行扩展。

Stability 稳定

Stability is the ability of a computational method to produce similar output as possible in different runs when given identical input58. Reproducibility in the scientific community has received widespread attention in recent years, and method stability is an important part of it. The randomness and non-convexity of modern deep-learning models make them difficult to produce stable results across different runs on different machines. Relatively better stability was reported by probabilistic graphical models although they also cannot guarantee complete stability.
稳定性是指当给定相同的输入 58 时,计算方法在不同的运行中产生尽可能相似的输出的能力。近年来,重现性在科学界受到广泛关注,方法稳定性是其中的重要组成部分。现代深度学习模型的随机性和非凸性使得它们很难在不同机器上的不同运行中产生稳定的结果。概率图模型报告了相对更好的稳定性,尽管它们也不能保证完全稳定性。

Interpretability 可解释性

Interpretability judges whether the parameters and variables involved in the model can be mapped to biological entities or relationships. Good interpretability can lead to more meaningful model outputs and better means of model diagnosis. Deep-learning models have inherent disadvantages in interpretability compared to Bayesian models, which were generally built upon biological items as variables and dependencies as conditional probabilities.
可解释性判断模型中涉及的参数和变量是否可以映射到生物实体或关系。良好的可解释性可以带来更有意义的模型输出和更好的模型诊断方法。与贝叶斯模型相比,深度学习模型在可解释性方面具有固有的缺点,贝叶斯模型通常建立在生物项作为变量和依赖性作为条件概率的基础上。

Scalability and speed 可扩展性和速度

Since computational methods in spatial omics need to additionally consider spatial relationships of cells, compared with single-cell (non-spatial) methods (such as single-cell clustering), they are more challenging to scale to large-scale data. At present, most existing spatial methods are only applied and benchmarked on relatively smaller data (<100,000 cells). GNN-based methods tend to be more scalable than BN-based methods since they can rely on mature deep-learning communities and tools. The scalability bottleneck of existing methods is considered as the ability to generate output within a reasonable running time. But for even larger datasets, memory may be the next bottleneck (GNN may be at a disadvantage since larger VRAM is less accessible than RAM).
由于空间组学中的计算方法需要额外考虑细胞的空间关系,因此与单细胞(非空间)方法(例如单细胞聚类)相比,它们在扩展到大规模数据时更具挑战性。目前,大多数现有的空间方法仅在相对较小的数据(<100,000 个单元)上进行应用和基准测试。基于 GNN 的方法往往比基于 BN 的方法更具可扩展性,因为它们可以依赖成熟的深度学习社区和工具。现有方法的可扩展性瓶颈被认为是在合理的运行时间内生成输出的能力。但对于更大的数据集,内存可能是下一个瓶颈(GNN 可能处于劣势,因为更大的 VRAM 比 RAM 更难访问)。

Availability of cell context representation
单元上下文表示的可用性

This feature enables the output of both cell context representation (the cellular context information is encoded) and the spatial domain label for each cell. The context representation is a fixed-length vector for each cell, which may be useful for additional downstream analysis like those in single-cell analysis. The additional analysis might include pseudo-space analysis20,59 (similar to the pseudo-time analysis in single-cell analysis60,61,62), data visualization (for example using t-SNE63, UMAP64, and PHATE65), differential expression analysis66,67 (this requires the representation to be biologically meaningful, i.e., interpretable), and other analysis implemented in single-cell packages like SCANPY50 or Seurat68. GNN-based methods generally output both the context representation and spatial domain labels, and the latter is generally obtained by clustering on the former. Among BN-based methods, those based on Markov random fields (e.g., BayesSpace and BASS) generally do not output the context representation.
此功能支持输出每个细胞的细胞上下文表示(细胞上下文信息被编码)和空间域标签。上下文表示是每个细胞的固定长度向量,这对于额外的下游分析(如单细胞分析中的分析)可能有用。附加分析可能包括伪空间分析 20,59 (类似于单细胞分析中的伪时间分析 60,61,62 )、数据可视化(例如使用 t-SNE 63 、 UMAP 64 和 PHATE 65 )、差异表达分析 66,67 (这要求表示具有生物学意义,即可解释),以及在单细胞包中实现的其他分析,例如 SCANPY 50 或 Seurat 68 。基于 GNN 的方法通常同时输出上下文表示和空间域标签,后者通常是通过对前者进行聚类来获得的。在基于 BN 的方法中,基于马尔可夫随机场的方法(例如 BayesSpace 和 BASS)通常不输出上下文表示。

Analyzing the distance of neighboring cells
分析相邻小区的距离

Datasets 数据集

MERFISH primary motor cortex dataset is from Ref. 29, MERFISH hypothalamic preoptic region dataset is from Ref. 32, MERFISH nucleus accumbens dataset is from Ref. 30, DARTFISH occipital cortex dataset is from Ref. 14, BaristaSeq primary visual area dataset is from Ref. 33, STARmap primary visual cortex dataset is from Ref. 34, osmFISH somatosensory cortex dataset is from Ref. 4, seqFISH embryo dataset is from Ref. 35. These datasets contain the major single-cell-level resolution spatial technologies. We didn’t analyze datasets from other single-cell-level technologies such as slide-seq5 or slide-seqV26,69,70 because the sequencing units (i.e., beads) were array-like distributed in the space and thus didn’t reflect real cell distance.
MERFISH 初级运动皮层数据集来自 Ref. 29 ,MERFISH下丘脑视前区数据集来自参考文献。 32 ,MERFISH 伏隔核数据集来自 Ref. 30 ,DARTFISH 枕叶皮层数据集来自 Ref. 14 ,BaristaSeq主视觉区域数据集来自Ref. 33 ,STARmap初级视觉皮层数据集来自参考文献。 34 ,osmFISH体感皮层数据集来自Ref. 4 ,seqFISH胚胎数据集来自Ref. 35 。这些数据集包含主要的单细胞级分辨率空间技术。我们没有分析来自其他单细胞水平技术的数据集,例如slide-seq 5 或slide-seqV2 6,69,70 ,因为测序单元(即珠子)是类似阵列的分布在空间中,因此不能反映真实的小区距离。

Analysis 分析

For each slice in each dataset, we recorded the distance of each cell with its nearest cell, and all cells in the slice were collected as a distribution (i.e., boxplot in Fig. 1B). Specifically, for each slice, we first computed the pairwise distance of all cells to get a distance matrix. This was done using the “pdist” and “squareform” functions of scipy71. To avoid the zero values along the diagonal, we set the diagonal of the distance matrix to infinite value (“fill_diagonal” function of numpy72), then for each row, we record its minimum value as the shortest distance. Some datasets contain many slices, to save space, we only randomly selected 5 slices to show the distribution.
对于每个数据集中的每个切片,我们记录了每个细胞与其最近细胞的距离,并将切片中的所有细胞收集为分布(即图 1B 中的箱线图)。具体来说,对于每个切片,我们首先计算所有单元格的成对距离以获得距离矩阵。这是使用 scipy 71 的“pdist”和“squareform”函数完成的。为了避免对角线上的零值,我们将距离矩阵的对角线设置为无限值(numpy 72 的“fill_diagonal”函数),然后对于每一行,我们将其最小值记录为最短距离。有些数据集包含很多切片,为了节省空间,我们只随机选择5个切片来显示分布。

Boxplot 箱形图

The lower and upper hinges show the first and third quartiles (the 25th and 75th percentiles); the center lines correspond to the median. Distance ranges from 10 μm to 20 μm were highlighted with orange, and distance of 15 μm was indicated with the red dashed line (Supplementary Fig. 1B). Boxplots were generated using Seaborn [https://seaborn.pydata.org/].
下铰链和上铰链显示第一和第三四分位数(第 25 个和第 75 个百分位数);中心线对应于中位数。 10μm到20μm的距离范围用橙色突出显示,15μm的距离用红色虚线表示(补充图1B)。箱线图是使用 Seaborn [ https://seaborn.pydata.org/] 生成的。

Supervised learning settings
监督学习设置

General settings 常规设置

To compare the spatial domain prediction performance between Graph Neural Network models and our simple cellular context representation, we used 3 state-of-the-art methods, i.e., SpaGCN18, SpaceFlow20, and STAGATE19, that can output cell context representation as the input for prediction. For a fair comparison, we set the number of neurons of the hidden layer (i.e., the number of dimensions of context representation) of each method to 50, and the number of epochs is set to 500. Each method on each dataset was performed for 5 times.
为了比较图神经网络模型和我们简单的细胞上下文表示之间的空间域预测性能,我们使用了 3 种最先进的方法,即 SpaGCN 18 、SpaceFlow 20 ,和 STAGATE 19 ,可以输出单元上下文表示作为预测的输入。为了公平比较,我们将每种方法的隐藏层神经元数量(即上下文表示的维数)设置为 50,历元数设置为 500。每个方法在每个数据集上执行了5次。

SpaGCN 斯帕GCN

For SpaGCN, we followed the tutorial in [https://github.com/jianhuupenn/SpaGCN/blob/master/tutorial/tutorial.ipynb], the difference is that we set the “histology” parameter to false since there are not H&E images available. Since SpaGCN involved a PCA step (to reduce the gene expression vector to 50) prior to the GNN network, which would raise an error if the number of genes is smaller than 50. We thus made a modification to the original code so that the “n_comp” of the PCA is set to the smaller value between (50 and the number of measured genes). The resulted cell context representation was obtained by the “embed” attribute of SpaGCN object.
对于SpaGCN,我们按照[https://github.com/jianhuupenn/SpaGCN/blob/master/tutorial/tutorial.ipynb]中的教程进行操作,不同之处在于我们将“histology”参数设置为false,因为没有H&E可用图像。由于SpaGCN在GNN网络之前涉及PCA步骤(将基因表达向量减少到50),如果基因数量小于50,这会引发错误。因此我们对原始代码进行了修改,使得“ PCA的n_comp”被设置为(50和测量基因的数量)之间的较小值。结果细胞上下文表示是通过 SpaGCN 对象的“embed”属性获得的。

SpaceFlow 空间流

For SpaceFlow, the authors provided tutorials for single-cell spatial transcriptomics data [https://github.com/hongleir/SpaceFlow], we followed their recommended parameters and obtained the context representation by the “embedding” attribute of SpaceFlow object.
对于SpaceFlow,作者提供了单细胞空间转录组数据的教程[https://github.com/hongleir/SpaceFlow],我们遵循他们推荐的参数并通过SpaceFlow对象的“嵌入”属性获得上下文表示。

STAGATE 斯塔盖特

For STAGATE, the authors nicely provided multiple tutorials for different spatial data types, including Slide-seqV2, 10X Visium, stereo-seq, and STARmap datasets. Since our test datasets’ characteristics were most similar to STARmap dataset, we used the recommended steps in [https://stagate.readthedocs.io/en/latest/T9_STARmap.html]. We also tuned the “rad_cutoff” parameter of STAGATE for best performance (in practice, this parameter is best tuned so that the number of neighbors per cell on average is around 10).
对于 STAGATE,作者很好地为不同的空间数据类型提供了多个教程,包括 Slide-seqV2、10X Visium、stereo-seq 和 STARmap 数据集。由于我们的测试数据集的特征与 STARmap 数据集最相似,因此我们使用了 [https://stagate.readthedocs.io/en/latest/T9_STARmap.html] 中的推荐步骤。我们还调整了 STAGATE 的“rad_cutoff”参数以获得最佳性能(实际上,最好调整该参数以使每个单元格的邻居数量平均约为 10 个)。

Evaluation 评估

We used three classifiers to evaluate the representation powers of SpaGCN, SpaceFlow, STAGATE, and MENDER. The three classifiers were Linear SVM, RBF SVM, and Random Forest, which were standard classification algorithms for linear classifier, non-linear classifier, and tree-based classifier. We recorded the median classification accuracy across tenfold cross-validations. The classifiers and cross-validation implementations were used from Scikit-learn73.
我们使用三个分类器来评估 SpaGCN、SpaceFlow、STAGATE 和 MENDER 的表示能力。这三种分类器分别是线性SVM、RBF SVM和随机森林,它们是线性分类器、非线性分类器和基于树的分类器的标准分类算法。我们记录了十倍交叉验证的中值分类准确性。分类器和交叉验证实现来自 Scikit-learn 73

Unsupervised learning settings
无监督学习设置

The unsupervised task in this study is multi-slice analysis for spatial domain identification. We benchmarked MENDER against other methods, including 4 spatial methods that were available for multi-slice analysis (STAGATE, BASS, CNC, and SOTIP), and 2 non-spatial methods (Louvain and Leiden). We used two evaluation metrics as done by previous benchmarks26.
本研究中的无监督任务是用于空间域识别的多切片分析。我们对 MENDER 与其他方法进行了基准测试,包括 4 种可用于多切片分析的空间方法(STAGATE、BASS、CNC 和 SOTIP)和 2 种非空间方法(Louvain 和 Leiden)。我们使用了之前基准测试 26 所做的两个评估指标。

Normalized Mutual Information (NMI) can be used to evaluate the accuracy of predicted spatial domain labels compared with ground truth, and high NMI means good performance. NMI quantifies the similarity between two label assignments, supposed as P and T, to the same set of objects. H(P) and H(T) are denoted as their entropies. Then NMI is computed as:

NMI=MI(P,T)H(P)H(T)
(2)

Percentage of Abnormal Spots (PAS) can be used to evaluate the spatial continuity of predicted domain labels given the spatial coordinates, and low PAS means good performance. PAS was calculated as the percentage of cells whose spatial domain label differed from at least 6 of its 10 neighbors.
异常点百分比(PAS)可用于评估给定空间坐标的预测域标签的空间连续性,低 PAS 意味着良好的性能。 PAS 计算为空间域标签与其 10 个邻居中的至少 6 个不同的细胞的百分比。

Given the complexity and heterogeneity inherent in spatial biological data, a single evaluation metric may not sufficiently capture the performance of spatial domain identification methods. While metrics like PAS, Local Inverse Simpson’s Index (LISI), and spatial chaos score (CHAOS) offer insights into the spatial continuity of the predicted domains25,26, higher spatial continuity doesn’t not necessary mean better spatial domain prediction, and they must be interpreted carefully and in the context of other performance measures such as NMI or ARI, which directly compare the predicted labels against the ground truth. The simultaneous consideration of these metrics can provide a more nuanced understanding of the method’s performance. More importantly, it can help to avoid potential misinterpretations that may arise when these metrics are considered in isolation.
考虑到空间生物数据固有的复杂性和异质性,单个评估指标可能不足以捕获空​​间域识别方法的性能。虽然 PAS、局部逆辛普森指数 (LISI) 和空间混沌评分 (CHAOS) 等指标可以深入了解预测域的空间连续性 25,26 ,但更高的空间连续性并不一定意味着更好的空间域预测,并且必须在其他性能指标(例如 NMI 或 ARI)的背景下仔细解释它们,这些指标直接将预测标签与真实情况进行比较。同时考虑这些指标可以更细致地了解该方法的性能。更重要的是,它可以帮助避免单独考虑这些指标时可能出现的潜在误解。

Multi-slice analysis of STAGATE
STAGATE 的多切片分析

We used the tutorial provided in [https://stagate.readthedocs.io/en/latest/AT1.html]. The difference is that that tutorial used 10X Visium datasets that were different from our single-cell resolution datasets. So we pre-process our datasets as done in [https://stagate.readthedocs.io/en/latest/T9_STARmap.html], and used the multi-slice code to construct the spatial graph separately. As to the “rad_cutoff” parameter, like before, we tuned this parameter so that the number of neighbors per cell on average is around 10, to get the best performance.
我们使用了 [ https://stagate.readthedocs.io/en/latest/AT1.html] 中提供的教程。不同之处在于,该教程使用了 10X Visium 数据集,与我们的单细胞分辨率数据集不同。因此,我们按照[https://stagate.readthedocs.io/en/latest/T9_STARmap.html]中的方式预处理数据集,并使用多切片代码单独构建空间图。至于“rad_cutoff”参数,和之前一样,我们调整了这个参数,使每个单元格的邻居数量平均在 10 个左右,以获得最佳性能。

Multi-slice analysis of BASS
BASS的多切片分析

The authors of BASS directly provided the code for multi-slice analysis on single-cell resolution spatial data [https://zhengli09.github.io/BASS-Analysis/STARmap.html]. So we directly employed the steps.
BASS的作者直接提供了对单细胞分辨率空间数据进行多切片分析的代码[https://zhengli09.github.io/BASS-Analysis/STARmap.html]。所以我们就直接采用了步骤。

Multi-slice analysis of SOTIP
SOTIP 的多切片分析

The authors provided the code for multi-slice analysis in [https://github.com/TencentAILabHealthcare/SOTIP/tree/master/SOTIP_analysis/multi_sample]. We directly adopted the code for benchmark analysis.
作者在[https://github.com/TencentAILabHealthcare/SOTIP/tree/master/SOTIP_analysis/multi_sample]中提供了多切片分析的代码。我们直接采用代码进行基准分析。

Parameter setting 参数设定

Please refer to Supplementary Fig. 9. We have also provided the corresponding reproducibility code.
请参阅补充图9。我们还提供了相应的再现性代码。

Elaborating on the parameters, the ‘scale’ refers to the number of ranges employed during the construction of multi-range neighborhoods. For technologies of single-cell resolution, such as STARmap, BaristaSeq, MERFISH, MERSCOPE, Stereo-seq, osmFISH, ExSeq, and STARmapPlus, we have consistently set the ‘scale’ to 6.
详细说明这些参数,“规模”是指在构建多范围邻域时使用的范围数量。对于单细胞分辨率的技术,例如 STARmap、BaristaSeq、MERFISH、MERSCOPE、Stereo-seq、osmFISH、ExSeq 和 STARmapPlus,我们始终将“scale”设置为 6。

The parameters ‘nn’ or ‘radius’ dictate the size of each range. ‘nn’ is utilized for spatial technologies with an array-like spot distribution, such as 10X Visium and ST. Specifically, ‘nn’ is set to 6 for 10X Visium, reflecting its six nearest neighborhoods per spot, and 4 for ST, corresponding to its four nearest neighborhoods per spot. On the other hand, for spatial technologies with non-array-like distributions, we set the ‘radius’ parameter to a consistent 15 µm. We also provided a function in MENDER package called estimate_radius to recommend suitable ‘radius’ for potential users.
参数“nn”或“radius”决定每个范围的大小。 “nn”用于具有类似阵列光斑分布的空间技术,例如 10X Visium 和 ST。具体来说,10X Visium 的“nn”设置为 6,反映每个点的 6 个最近邻域,ST 的“nn”设置为 4,对应每个点的 4 个最近邻域。另一方面,对于具有非阵列状分布的空间技术,我们将“半径”参数设置为一致的 15μm。我们还在 MENDER 包中提供了一个名为estimate_radius的函数,为潜在用户推荐合适的“半径”。

Leiden clustering is employed twice, initially to define the ‘Cell group’ (Supplementary Fig. 14) and finally to obtain ‘MENDER spatial domains’ (Supplementary Fig. 14). For the initial clustering, we consistently apply Leiden with a resolution of 2. The final Leiden clustering requires one parameter, ‘k’. If ‘k’ is positive, MENDER’s ‘res_search’ function is executed to automatically ascertain the optimal Leiden resolution that will yield ‘k’ domains. This is particularly useful when the expected number of domains is known a priori, such as in our benchmark studies with ground truth domain annotations. If ‘k’ is negative, Leiden clustering is executed with a resolution equal to the absolute value of ‘k’. This is preferred when the number of domains is not provided as prior knowledge to the method user. In such scenarios, multiple different resolutions should be tested.
莱顿聚类被采用两次,最初是为了定义“细胞组”(补充图 14),最后是为了获得“MENDER 空间域”(补充图 14)。对于初始聚类,我们始终应用分辨率为 2 的 Leiden。最终的 Leiden 聚类需要一个参数“k”。如果“k”为正,则执行 MENDER 的“res_search”函数以自动确定将产生“k”域的最佳莱顿分辨率。当预先知道预期的域数量时,例如在我们使用地面实况域注释的基准研究中,这特别有用。如果“k”为负,则以等于“k”的绝对值的分辨率执行莱顿聚类。当域的数量没有作为先验知识提供给方法用户时,这是优选的。在这种情况下,应该测试多个不同的分辨率。

Per-slice performance comparison between multi-slice and single-slice analysis
多切片和单切片分析之间的每切片性能比较

Single-slice analysis is conducted using the MENDER.MENDER_single module, while multi-slice analysis is performed using the MENDER.MENDER module in the MENDER package. The parameter settings for both analyses are (scale = 6 | radius = 15 µm | k = #domains). The NMI is compared as follows (using the STARmap dataset (Fig. 3B) as an example). For single-slice analysis, the three slices of the dataset are analyzed independently, each with 10 replicates, resulting in 10 NMI values per slice. For multi-slice analysis, the dataset (three slices jointly) is used for joint spatial clustering, also for 10 replicates. Therefore, each slice has 10 versions of predicted domains and 10 NMI values. Hence, both multi-slice and single-slice analyses of the STARmap dataset yield 30 NMI values, as compared in Supplementary Fig. 8. The p-value is obtained using a one-sided Wilcoxon rank-sum test. The same approach was applied to the BaristaSeq and MERFISH datasets.
使用 MENDER.MENDER_single 模块进行单切片分析,而使用 MENDER 包中的 MENDER.MENDER 模块进行多切片分析。两种分析的参数设置为(尺度 = 6 | 半径 = 15 µm | k = #domains)。 NMI对比如下(以STARmap数据集(图3B)为例)。对于单切片分析,数据集的三个切片独立分析,每个切片有 10 个重复,每个切片产生 10 个 NMI 值。对于多切片分析,数据集(三个切片联合)用于联合空间聚类,也用于 10 个重复。因此,每个切片有 10 个版本的预测域和 10 个 NMI 值。因此,与补充图 8 相比,STARmap 数据集的多切片和单切片分析都会产生 30 个 NMI 值。p 值是使用单侧 Wilcoxon 秩和检验获得的。同样的方法也适用于 BaristaSeq 和 MERFISH 数据集。

Evaluation of patient representations
患者陈述的评估

In our final application, we assessed the performance of three distinct methods for patient-level representation in separating patient groups. Patient-level annotations are available in the original manuscript11, with three distinct labels: cold, compartmentalized, and mixed. The three patient representations evaluated are all proportions of cell-level labels within each patient. The first two representations, termed CT-fine repr and CT-coarse repr, derive from cell type labels within each patient, as annotated by the original paper, but with different cell type granularities (i.e., fine and coarse). The third representation, termed MENDER repr, is derived from the spatial domain labels identified by MENDER (default parameters: radius = 15 µm, scale = 6, k = −0.5). Each representation matrix has rows equivalent to the total number of cells and columns equivalent to the number of unique labels across the dataset.
在我们的最终应用程序中,我们评估了三种不同的患者级别表示方法在分离患者组方面的性能。原始手稿中提供了患者级别的注释 11 ,具有三个不同的标签:冷、分隔和混合。评估的三个患者代表是每个患者内细胞水平标签的所有比例。前两种表示,称为 CT-精细 repr 和 CT-粗略 repr,源自每个患者体内的细胞类型标签,如原始论文所注释,但具有不同的细胞类型粒度(即细和粗)。第三种表示形式称为 MENDER repr,源自 MENDER 识别的空间域标签(默认参数:半径 = 15 µm,尺度 = 6,k = -0.5)。每个表示矩阵的行数相当于单元格总数,列数相当于数据集中的唯一标签数。

Unsupervised analysis (Fig. 6C–E) was performed by initially applying PCA to the three patient representations, followed by mapping all patients into the resultant three PCA spaces (the top two principal components). We used a two-sided Student’s t-test to assess differences between patient groups for each top principal component. PCA was conducted using Scanpy’s default settings, and the Student’s t-test was conducted using Scipy’s implementation.
通过首先将 PCA 应用于三个患者表示,然后将所有患者映射到所得的三个 PCA 空间(前两个主成分)来进行无监督分析(图 6C-E)。我们使用双面学生 t 检验来评估患者组之间每个顶级主成分的差异。 PCA 使用 Scanpy 的默认设置进行,学生 t 检验使用 Scipy 的实现进行。

The supervised analysis (Fig. 6F, G) consisted of two parts. The first part (Fig. 6F) involved supervised classification on the raw feature space of the three patient representations, namely CT-fine repr, CT-coarse repr, and MENDER repr. The second part (Fig. 6G) involved supervised classification on the PCA-reduced (top 2~17 principal components) feature space of the three patient representations. In the first part, we employed a K-nearest-neighbor (KNN) classifier to classify the three patient representations. We reported classification accuracy (using sklearn.metrics.accuracy_score implementation) with fivefold cross-validation. The results are displayed in Fig. 6F (left), where the y-axis represents the KNN classification accuracy. The p-value was calculated using Scipy’s implementation of the Student’s t-test. The procedure for Fig. 6F (right) was similar to that for Fig. 6F (left), except we substituted the KNN classifier with an SVM classifier. In the second part (Fig. 6G), as in the first part, KNN and SVM classifiers were used to evaluate the different patient representations. The difference was that Fig. 6G tested the PCA-reduced versions of the three representations. Specifically, for Fig. 6G (top), the three representations were reduced to k (2~17) top principal components, and for each k, we reported the KNN classification accuracy using fivefold cross-validation. A similar approach was taken for the SVM analysis (Fig. 6G, bottom).
监督分析(图6F,G)由两部分组成。第一部分(图 6F)涉及对三个患者表示的原始特征空间进行监督分类,即 CT-fine repr、CT-coarse repr 和 MENDER repr。第二部分(图 6G)涉及对三个患者表征的 PCA 简化(前 2~17 个主成分)特征空间进行监督分类。在第一部分中,我们采用 K 最近邻 (KNN) 分类器对三种患者表征进行分类。我们通过五重交叉验证报告了分类准确性(使用 sklearn.metrics.accuracy_score 实现)。结果如图6F(左)所示,其中y轴代表KNN分类精度。 p 值是使用 Scipy 的学生 t 检验实现来计算的。图 6F(右)的过程与图 6F(左)类似,只是我们用 SVM 分类器替换了 KNN 分类器。在第二部分(图 6G)中,与第一部分一样,使用 KNN 和 SVM 分类器来评估不同的患者表征。不同之处在于图 6G 测试了三种表示形式的 PCA 简化版本。具体来说,对于图 6G(顶部),三种表示形式被简化为 k (2~17) 个顶部主成分,对于每个 k,我们使用五重交叉验证报告 KNN 分类精度。 SVM 分析采用了类似的方法(图 6G,底部)。

Annotations 注释

All domain annotations have been manually annotated in previously published papers. The STARmap data was obtained from Ref. 34 and manually annotated by Xiang Zhou’s group23, based on the Allen reference map and the gene expression patterns of the prelimbic area. The data and annotation can be downloaded from https://github.com/zhengli09/BASS-Analysis/tree/master/data. The BaristaSeq data was manually annotated by the SpaceTx Consortium14, which comprises 13 labs from 11 universities and institutes. The primary aim of the Consortium is to provide data resources for benchmarking computational methods in the field of spatial transcriptomics. The data and annotations can be downloaded from https://spacetx.github.io. The MERFISH data was annotated in their original study42. Every single slice was first clustered into small patches based on gene expression and spatial locations. These patches were then manually merged with reference to brain anatomical structures and known gene expressions. The data and annotations can be downloaded from https://cellxgene.cziscience.com/collections/31937775-0602-4e52-a799-b6acdd2bac2e.
所有领域注释均已在之前发表的论文中手动注释。 STARmap 数据来自参考文献。 34 ,由周翔课题组 23 根据Allen参考图谱和前边缘区域基因表达模式手动注释。数据和注释可以从https://github.com/zhengli09/BASS-Analysis/tree/master/data下载。 BaristaSeq 数据由 SpaceTx 联盟 14 手动注释,该联盟由来自 11 所大学和研究所的 13 个实验室组成。该联盟的主要目标是为空间转录组学领域的基准计算方法提供数据资源。数据和注释可以从https://spacetx.github.io下载。 MERFISH 数据在他们的原始研究中进行了注释 42 。首先根据基因表达和空间位置将每个切片聚集成小块。然后参考大脑解剖结构和已知的基因表达手动合并这些补丁。数据和注释可以从https://cellxgene.cziscience.com/collections/31937775-0602-4e52-a799-b6acdd2bac2e下载。

Running time 运行时间

For all computational experiments, we used the Python library “time” to record the running time. Each method’s data preprocessing step was included in the duration, along with the main body of the method. For deep-learning methods (STAGATE), since the running time was strongly related to the number of epochs, we set the epochs to 500 as indicated in the tutorials. An increased number of epochs might bring improved accuracy but increased running time.
对于所有计算实验,我们使用Python库“time”来记录运行时间。每种方法的数据预处理步骤以及方法的主体都包含在持续时间中。对于深度学习方法(STAGATE),由于运行时间与轮数密切相关,因此我们将轮数设置为 500,如教程中所示。 epoch 数量的增加可能会提高准确性,但会增加运行时间。

Spatial signature analysis
空间特征分析

Given the interpretability of MENDER, i.e., the representation vector obtained by MENDER could be mapped to biological entities, such as cell state and distances to the index cell. One can identify the spatial differences of one domain (that is, a group of cells clustered by the context representation) compared to another domain. To do this, we employed the Wilcoxon rank-sum test between two groups, found the top 5 features for each domain with the highest scores (the score is computed for each gene using both the p-value and expression levels, implemented by SCANPY), and plotted as dot-plots in Fig. 5K. In the dot-plot, each row represented a feature of the cell context representation by MENDER, and the feature encodes two levels of information, one is the cell state, and another is the distance range away from the index cell (each range is a 15um radius ring centered at the index cell). The dot plot then displayed the spatial signatures of every spatial domain, so that one can observe what specific cell spatial organization is associated with each domain.
考虑到 MENDER 的可解释性,即 MENDER 获得的表示向量可以映射到生物实体,例如细胞状态和到索引细胞的距离。人们可以识别一个域(即由上下文表示聚类的一组单元)与另一个域相比的空间差异。为此,我们在两组之间采用了 Wilcoxon 秩和检验,找到了每个域得分最高的前 5 个特征(使用 p 值和表达水平计算每个基因的得分,由 ScanPY 实现) ,并绘制为图 5K 中的点图。在点图中,每一行代表 MENDER 表示的单元上下文的一个特征,该特征编码两层信息,一层是单元状态,另一层是距索引单元的距离范围(每个范围是一个以索引单元为中心的 15um 半径环)。然后,点图显示每个空间域的空间特征,以便人们可以观察与每个域相关的特定细胞空间组织。

Computational resource 计算资源

CPU: Intel(R) Xeon(R) Gold 5218R CPU @ 2.10 GHz 80 cores. Memory: 263724496 kB in total.
CPU:Intel(R) Xeon(R) Gold 5218R CPU @ 2.10 GHz 80 核。内存:总共 263724496 kB。

Statistics & reproducibility
统计和再现性

No statistical method was used to predetermine sample size. No data were excluded from the analyses. Randomization was achieved by setting random seeds. The algorithm developer and the data analyzer were the same person, so totally blinding was impossible.
没有使用统计方法来预先确定样本量。分析中没有排除任何数据。随机化是通过设置随机种子来实现的。算法开发者和数据分析者是同一个人,所以完全致盲是不可能的。

Reporting summary 报告摘要

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
有关研究设计的更多信息,请参阅本文链接的《自然投资组合报告摘要》。