Spatio-Temporal Self-Supervised Learning for Traffic Flow Prediction
用于交通流预测的时空自监督学习

Jiahao Ji¹, Jingyuan Wang^1,2,3, Chao Huang⁴,
季嘉豪 ¹ 、王靖远 ^1,2,3 、黄超 ⁴ 、
Junjie Wu³, Boren Xu¹, Zhenhe Wu¹, Junbo Zhang^5,6, Yu Zheng^5,6
吴俊杰 ³ 、徐伯仁 ¹ 、吴振和 ¹ 、张俊波 ^5,6 、于郑 ^5,6 Corresponding author: jywang@buaa.edu.cn
通讯作者: jywang@buaa.edu.cn

Abstract 抽象的

Robust prediction of citywide traffic flows at different time periods plays a crucial role in intelligent transportation systems. While previous work has made great efforts to model spatio-temporal correlations, existing methods still suffer from two key limitations: $i$ ) Most models collectively predict all regions’ flows without accounting for spatial heterogeneity, i.e., different regions may have skewed traffic flow distributions. $ii$ ) These models fail to capture the temporal heterogeneity induced by time-varying traffic patterns, as they typically model temporal correlations with a shared parameterized space for all time periods. To tackle these challenges, we propose a novel Spatio-Temporal Self-Supervised Learning (ST-SSL ¹¹1The paper was done when Jiahao Ji was an intern at JD Intelligent Cities Research under the supervision of Junbo Zhang (msjunbozhang@outlook.com).
这篇论文是季嘉豪在京东智能城市研究中心实习时完成的，导师是张俊波 (msjunbozhang@outlook.com)。
不同时间段全市交通流量的稳健预测在智能交通系统中起着至关重要的作用。虽然之前的工作在时空相关性建模方面做出了巨大的努力，但现有方法仍然存在两个关键局限性： $i$ ）大多数模型共同预测所有区域的流量，而没有考虑空间异质性，即不同区域可能会发生变化。造成交通流量分布不均。 $ii$ ）这些模型无法捕获随时间变化的流量模式引起的时间异质性，因为它们通常对所有时间段的共享参数化空间的时间相关性进行建模。为了应对这些挑战，我们提出了一种新颖的时空自我监督学习（ST-SSL ¹ ) traffic prediction framework which enhances the traffic pattern representations to be reflective of both spatial and temporal heterogeneity, with auxiliary self-supervised learning paradigms. Specifically, our ST-SSL is built over an integrated module with temporal and spatial convolutions for encoding the information across space and time. To achieve the adaptive spatio-temporal self-supervised learning, our ST-SSL first performs the adaptive augmentation over the traffic flow graph data at both attribute- and structure-levels. On top of the augmented traffic graph, two SSL auxiliary tasks are constructed to supplement the main traffic prediction task with spatial and temporal heterogeneity-aware augmentation. Experiments on four benchmark datasets demonstrate that ST-SSL consistently outperforms various state-of-the-art baselines. Since spatio-temporal heterogeneity widely exists in practical datasets, the proposed framework may also cast light on other spatial-temporal applications. Model implementation is available at https://github.com/Echo-Ji/ST-SSL.
）交通预测框架，通过辅助的自我监督学习范例，增强了交通模式的表示，以反映空间和时间的异质性。具体来说，我们的 ST-SSL 构建在一个具有时间和空间卷积的集成模块上，用于跨空间和时间编码信息。为了实现自适应时空自监督学习，我们的 ST-SSL 首先在属性和结构级别对流量图数据执行自适应增强。在增强的流量图之上，构建了两个 SSL 辅助任务，以通过空间和时间异构感知增强来补充主要流量预测任务。对四个基准数据集的实验表明，ST-SSL 始终优于各种最先进的基准。由于时空异质性在实际数据集中广泛存在，因此所提出的框架也可以为其他时空应用带来启发。模型实现可在 https://github.com/Echo-Ji/ST-SSL 上找到。

1 Introduction 1简介

Robust traffic flow prediction across different spatial regions at different time periods is crucial for advancing intelligent transportation systems (Zhang et al. 2020). For example, accurate traffic prediction results can not only enable effective traffic controls in a timely manner, but also mitigate tragedies caused by the sudden traffic flow spike. In general, traffic prediction aims to forecast the traffic volume (e.g., inflow and outflow of each region at a given time), from past traffic observations. Recent advances have significantly boosted the research of traffic flow prediction with various deep learning techniques, e.g., convolutional neural networks over region grids (Zhang, Zheng, and Qi 2017), graph neural networks for spatial dependency modeling (Zhang et al. 2021), and attention mechanism for spatial information aggregation (Zheng et al. 2020). Although significant efforts have been made to improve the traffic flow prediction results, existing models still face two key shortcomings.
不同时间段不同空间区域的稳健交通流预测对于推进智能交通系统至关重要（Zhang et al. 2020）。例如，准确的交通预测结果不仅可以及时有效地控制交通，还可以减轻交通流量突然激增造成的悲剧。一般来说，交通预测的目的是根据过去的交通观测来预测交通量（例如，给定时间每个区域的流入和流出）。最近的进展极大地促进了各种深度学习技术的交通流预测研究，例如区域网格上的卷积神经网络（Zhang、Zheng 和 Qi 2017）、用于空间依赖建模的图神经网络（Zhang 等人 2021）、空间信息聚合的注意力机制（Zheng et al. 2020）。尽管在改善交通流预测结果方面已经做出了巨大努力，但现有模型仍然面临两个关键缺陷。

Refer to caption — Figure 1: Illustration of our motivation, *i.e.,* the spatial and temporal heterogeneity of traffic flow data.
图 1：说明我们的动机，即交通流数据的空间和时间异质性。

The first limitation is the lack of modeling spatial heterogeneity exhibited with skewed traffic distributions across different regions. Taking Fig. 1(a) for example, A and B are two real-world regions in Beijing with different urban functions, namely the residential area and transportation hub. We can observe their quite different traffic flow distributions from Fig. 1(b). However, most existing models ignore such spatial heterogeneity and are easily biased towards popular regions with higher traffic volume, which make them insufficient to learn quality citywide traffic pattern representations. While some studies attempt to capture the heterogeneous flow distributions with multiple parameter sets over different regions (Pan et al. 2019b; Bai et al. 2020), the involved large parameter size may lead to the suboptimal issue over the skewed-distributed traffic data. Worse still, the high computational and memory cost of these methods make them infeasible to handle large-scale traffic data in practical urban scenarios. In addition, meta-learning has been used in recent approaches (Pan et al. 2019a; Ye et al. 2022) to consider the difference of region traffic distributions. However, the effectiveness of those models largely relies on the collected handcrafted region spatial characteristics, e.g., nearby points of interest and density of road networks, which limits the model representation generalization ability.
第一个限制是缺乏对不同地区交通分布不均所表现出的空间异质性进行建模。以图1（a）为例，A和B是北京的两个现实世界区域，具有不同的城市功能，即居住区和交通枢纽。从图1（b）中我们可以观察到它们截然不同的交通流分布。然而，大多数现有模型忽略了这种空间异质性，并且很容易偏向交通量较高的热门区域，这使得它们不足以学习高质量的全市交通模式表示。虽然一些研究试图捕获不同区域上多个参数集的异构流量分布（Pan et al. 2019b；Bai et al. 2020），但所涉及的大参数可能会导致偏斜分布流量数据的次优问题。更糟糕的是，这些方法的高计算和内存成本使得它们无法在实际城市场景中处理大规模交通数据。此外，元学习已在最近的方法中使用（Pan et al. 2019a；Ye et al. 2022）来考虑区域流量分布的差异。然而，这些模型的有效性在很大程度上依赖于收集的手工区域空间特征，例如附近的兴趣点和道路网络的密度，这限制了模型表示泛化能力。

Furthermore, current traffic prediction methods model the temporal dynamics with a shared parameter space for all time periods, which can hardly precisely preserve the temporal heterogeneity in the latent embedding space. In real-life scenarios, traffic patterns of different regions vary over time, e.g., from morning to evening, which results in the temporal heterogeneity as shown in Fig. 1(c). Nevertheless, the parameter space differentiation strategy adopted in (Song et al. 2020; Li and Zhu 2021) assumes that the temporal heterogeneity is static across the entire time periods, which is not always held, e.g., evening traffic patterns can be significantly different for workdays and holidays shown in Fig. 1(c).
此外，当前的流量预测方法使用所有时间段的共享参数空间对时间动态进行建模，这很难精确地保留潜在嵌入空间中的时间异质性。在现实生活场景中，不同区域的交通模式随着时间的推移而变化，例如从早上到晚上，这导致了如图1（c）所示的时间异质性。然而，（Song et al. 2020；Li and Zhu 2021）中采用的参数空间微分策略假设时间异质性在整个时间段内是静态的，但这并不总是成立，例如，夜间交通模式可能会显着不同工作日和节假日如图1(c)所示。

To effectively model both spatial and temporal heterogeneity, we present a novel Spatio-Temporal Self-Supervised Learning framework for predicting traffic flow. To encode spatial-temporal traffic patterns, our ST-SSL is built over a graph neural network which integrates temporal and spatial convolutions for information aggregation. To capture the spatial heterogeneity, we design a spatial self-supervised learning paradigm to augment the traffic flow graph at both data-level and structure-level, which is adaptive to the heterogeneous region traffic distributions. Then, the auxiliary self-supervision with a soft clustering paradigm is introduced to be aware of the diverse spatial patterns among different regions. To inject the temporal heterogeneity into our latent representation space, we empower ST-SSL to maintain dedicated representations of temporal traffic dynamics with temporal self-supervised learning paradigm. We summarize the key contributions of this work as follows:
为了有效地模拟空间和时间异质性，我们提出了一种新颖的时空自监督学习框架来预测交通流。为了对时空流量模式进行编码，我们的 ST-SSL 建立在图神经网络之上，该网络集成了时空卷积以进行信息聚合。为了捕捉空间异质性，我们设计了一种空间自监督学习范式，以在数据级别和结构级别上增强交通流图，以适应异构区域的交通分布。然后，引入软聚类范式的辅助自我监督，以了解不同区域之间不同的空间模式。为了将时间异质性注入到我们的潜在表示空间中，我们使 ST-SSL 能够通过时间自监督学习范式来维护时间流量动态的专用表示。我们将这项工作的主要贡献总结如下：

•

To the best of our knowledge, we are the first to propose a novel self-supervised learning framework to model spatial and temporal heterogeneity in traffic flow prediction. This paradigm may shed light on other practical spatio-temporal applications, such as air quality prediction.

• 据我们所知，我们是第一个提出一种新颖的自我监督学习框架来模拟交通流预测中的空间和时间异质性的人。这种范式可能有助于其他实际的时空应用，例如空气质量预测。
•

We propose an adaptive heterogeneity-aware data augmentation scheme over the graph-structured spatial-temporal graph against the noise perturbation.

• 我们提出了一种针对图结构时空图的自适应异质性数据增强方案，以对抗噪声扰动。
•

Two self-supervised learning tasks are incorporated to supplement the main traffic prediction task by enforcing the model discrimination ability with the awareness of both spatial and temporal traffic heterogeneity.

• 结合两个自监督学习任务来补充主要的交通预测任务，通过增强模型辨别能力以及对空间和时间交通异质性的认识。
•

Extensive experiments are conducted on four real-world public datasets to show the consistent performance superiority achieved by our ST-SSL across various settings.

• 在四个真实世界公共数据集上进行了大量实验，以显示我们的 ST-SSL 在各种设置下所实现的一致性能优势。

2 Preliminaries 2预赛

Definition 1 (Spatial Region).

We partition a city into $N=I\times J$ disjoint geographical grids, in which each grid is considered as a spatial region $r_{n}(1\leq n\leq N)$ . We use $\mathcal{V}=\{r_{1},\dots,r_{N}\}$ to denote the spatial region set in a city.

定义1（空间区域）。我们将城市划分为

N=I\times J

不相交的地理网格，其中每个网格都被视为一个空间区域

r_{n}(1\leq n\leq N)

。我们使用

\mathcal{V}=\{r_{1},\dots,r_{N}\}

来表示城市中设置的空间区域。

Definition 2 (Traffic Flow Graph (TFG)).

A traffic flow graph is defined as $\mathcal{G}=\left(\mathcal{V},\mathcal{E},\bm{A},\mathcal{X}_{t-T:t}\right)$ , where $\mathcal{V}$ is the set of spatial regions (nodes) with the size of $|\mathcal{V}|=N$ , and $\mathcal{E}$ is a set of edges connecting two spatially adjacent regions in $\mathcal{V}$ . The adjacent matrix of our traffic flow graph is denoted as $\bm{A}\in\mathbb{R}^{N\times N}$ . We represent the citywide traffic inflow and outflow data over previous $T$ time steps with a traffic tensor $\mathcal{X}_{t-T:t}\in\mathbb{R}^{T\times N\times 2}=\left(\bm{X}_{t-T},\dots,\bm{X}_{t}\right)$ . The traffic volume information of all regions $\mathcal{V}$ at the $t$ -th time slot is denoted as $\bm{X}_{t}\in\mathbb{R}^{N\times 2}$ .

定义2（流量图（TFG））。交通流图定义为

\mathcal{G}=\left(\mathcal{V},\mathcal{E},\bm{A},\mathcal{X}_{t-T:t}\right)

，其中

\mathcal{V}

是大小为

|\mathcal{V}|=N

的空间区域（节点）的集合，而

\mathcal{E}

中两个空间相邻区域的一组边。我们的交通流图的相邻矩阵表示为

\bm{A}\in\mathbb{R}^{N\times N}

。我们用交通张量

\mathcal{X}_{t-T:t}\in\mathbb{R}^{T\times N\times 2}=\left(\bm{X}_{t-T},\dots,\bm{X}_{t}\right)

表示之前

T

时间步长内的全市交通流入和流出数据。各区域

\mathcal{V}

在第

t

时隙的话务量信息记为

\bm{X}_{t}\in\mathbb{R}^{N\times 2}

。

Problem Statement. Given the historical traffic flow graph $\mathcal{G}$ till the current time step, we aim to learn a predictive function which accurately estimates the traffic volume of all regions at the future time step $t+1$ , i.e., $\bm{X}_{t+1}\in\mathbb{R}^{N\times 2}$ .
问题陈述。给定当前时间步长之前的历史交通流量图 $\mathcal{G}$ ，我们的目标是学习一个预测函数，该函数可以准确估计未来时间步长 $t+1$ 下所有区域的交通量，即 $\bm{X}_{t+1}\in\mathbb{R}^{N\times 2}$ 。

3 Methodology 3方法论

This section elaborates on the technical details of our ST-SSL model with the overall architecture shown in Fig. 2.
本节详细介绍了我们的 ST-SSL 模型的技术细节，整体架构如图 2 所示。

3.1 Spatio-Temporal Encoder
3.1时空编码器

We firstly propose a spatio-temporal (ST) encoder to jointly preserve the ST contextual information over the traffic flow graph, so as to jointly model the sequential patterns of traffic data across different time steps and the geographical correlations among spatial regions. Towards this end, we integrate the temporal convolutional component with the graph convolutional propagation network as the backbone for spatial-temporal relational representation.
我们首先提出了一种时空（ST）编码器来联合保存交通流图上的 ST 上下文信息，从而联合建模不同时间步长的交通数据的顺序模式以及空间区域之间的地理相关性。为此，我们将时间卷积组件与图卷积传播网络集成，作为时空关系表示的骨干。

For encoding the temporal traffic patterns, we adopt the 1-D causal convolution along the time dimension with a gated mechanism (Yu, Yin, and Zhu 2018). Specifically, our temporal convolution (TC) takes the traffic flow tensor as the input and outputs a time-aware embedding for each region:
为了对时间流量模式进行编码，我们采用带有门控机制的沿时间维度的一维因果卷积（Yu、Yin 和 Zhu，2018）。具体来说，我们的时间卷积（TC）将交通流张量作为输入，并输出每个区域的时间感知嵌入：

\left(\bm{B}_{t-T_{out}},\dots,\bm{B}_{t}\right)=\mathrm{TC}\left(\bm{X}_{t-T},\dots,\bm{X}_{t}\right),

(1)

where $\bm{B}_{t}\in\mathbb{R}^{N\times D}$ denotes the region embedding matrix at the time step $t$ . The $n$ -th row $\bm{b}_{t,n}\in\mathbb{R}^{D}$ corresponds to the embedding of region $r_{n}$ . Here, $D$ denotes the embedding dimensionality. $T_{out}$ is the length of the output embedding sequence after convolutional operations in TC encoder.
其中 $\bm{B}_{t}\in\mathbb{R}^{N\times D}$ 表示时间步 $t$ 处的区域嵌入矩阵。 $n$ 第 $\bm{b}_{t,n}\in\mathbb{R}^{D}$ 行对应于区域 $r_{n}$ 的嵌入。这里， $D$ 表示嵌入维数。 $T_{out}$ 是TC编码器中卷积运算后输出嵌入序列的长度。

For capturing the region-wise spatial correlations, we design our spatial convolution (SC) encoder based on a graph-based message passing mechanism presented as follows:
为了捕获区域间的空间相关性，我们基于基于图的消息传递机制设计了空间卷积（SC）编码器，如下所示：

\bm{E}_{t}=\mathrm{SC}\left(\bm{B}_{t},\bm{A}\right).

(2)

$\bm{A}$ is the region adjacency matrix of $\mathcal{G}$ . After our SC encoder, we can obtain the refined embeddings $\left(\bm{E}_{t-T_{out}},\dots,\bm{E}_{t}\right)$ of all regions by injecting the geographical context.
$\bm{A}$ 是 $\mathcal{G}$ 的区域邻接矩阵。在我们的 SC 编码器之后，我们可以通过注入地理上下文来获得所有区域的细化嵌入 $\left(\bm{E}_{t-T_{out}},\dots,\bm{E}_{t}\right)$ 。

Our ST encoder is built with a “sandwich” block structure, in which TC $\to$ SC $\to$ TC is each individual block. By stacking multiple blocks, we can obtain a sequence of embedding matrix $\left(\bm{H}_{t-T^{\prime}},\dots,\bm{H}_{t}\right)$ with the temporal dimension of $T^{\prime}$ after several convolutions. After ST encoder-based embedding propagation and aggregation, the temporal dimension $T^{\prime}$ reduces to zero and we generate the final embedding matrix $\bm{H}\in\mathbb{R}^{N\times D}$ for our ST encoder, in which each row $\bm{h}_{n}\in\mathbb{R}^{D}$ denotes the final embedding of region $r_{n}$ .
我们的 ST 编码器采用“三明治”块结构构建，其中 TC $\to$ SC $\to$ TC 是每个单独的块。通过堆叠多个块，经过多次卷积后，我们可以得到时间维度为 $T^{\prime}$ 的嵌入矩阵序列 $\left(\bm{H}_{t-T^{\prime}},\dots,\bm{H}_{t}\right)$ 。在基于 ST 编码器的嵌入传播和聚合之后，时间维度 $T^{\prime}$ 减少到零，我们为 ST 编码器生成最终的嵌入矩阵 $\bm{H}\in\mathbb{R}^{N\times D}$ ，其中每行 $\bm{h}_{n}\in\mathbb{R}^{D}$ 的最终嵌入。

In the next subsection, we will perform the adaptive augmentation over the $\left(\bm{B}_{t-T},\dots,\bm{B}_{t}\right)$ output from the first TC encoder layer (Sec 3.2), and self-supervised learning with the spatial-temporal heterogeneity modeling based on the final region embedding matrix $\bm{H}$ (Sec 3.3-Sec 3.4).
在下一小节中，我们将对第一个 TC 编码器层（第 3.2 节）的 $\left(\bm{B}_{t-T},\dots,\bm{B}_{t}\right)$ 输出执行自适应增强，并使用基于最终区域嵌入的时空异质性建模进行自监督学习矩阵 $\bm{H}$ （第 3.3 节-第 3.4 节）。

3.2 Adaptive Graph Augmentation on TFG
3.2 TFG上的自适应图增强

We devise two phases of graph augmentation schemes on TFG $\mathcal{G}=\left(\mathcal{V},\mathcal{E},\bm{A},\mathcal{X}_{t-T:t}\right)$ with traffic-level data augmentation and graph topology-level structure augmentation, which is adaptive to the learned heterogeneity-aware region dependencies in terms of their traffic regularities.
我们在 TFG $\mathcal{G}=\left(\mathcal{V},\mathcal{E},\bm{A},\mathcal{X}_{t-T:t}\right)$ 上设计了两个阶段的图增强方案，其中包括流量级数据增强和图拓扑级结构增强，该方案根据流量规律适应学习到的异质性感知区域依赖性。

Region-wise Heterogeneity Measurement.
区域异质性测量。

For a region $r_{n}$ , its embedding sequence $(\bm{b}_{t-T,n},\dots,\bm{b}_{t,n})$ within $T$ time steps from rows of $\left(\bm{B}_{t-T},\dots,\bm{B}_{t}\right)$ is used to generate an overall embedding as:
对于区域 $r_{n}$ ，其在 $\left(\bm{B}_{t-T},\dots,\bm{B}_{t}\right)$ 行的 $T$ 时间步内的嵌入序列 $(\bm{b}_{t-T,n},\dots,\bm{b}_{t,n})$ 用于生成整体嵌入，如下所示：

\bm{u}_{n}=\sum_{\tau=t-T}^{t}p_{\tau,n}\cdot\bm{b}_{\tau,n},~{}\operatorname{where}~{}p_{\tau,n}=\bm{b}_{\tau,n}^{\top}\cdot\bm{w}_{0}.

(3)

$\bm{u}_{n}$ is the aggregated representation over region $r_{n}$ ’s embedding sequence across different time steps based on the derived aggregation weight $p_{\tau,n}$ . Here, $\tau$ is the index of the time step range $(t-T,t)$ . The aggregation weight $p_{\tau,n}$ reflects the relevance between the time step-specific traffic pattern ( $\bm{b}_{\tau,n}$ ) and the overall traffic transitional regularities ( $\bm{u}_{n}$ ). $\bm{b}_{\tau,n}$ is region $r_{n}$ ’s embedding at time step $\tau$ and $\bm{w}_{0}\in\mathbb{R}^{D}$ is a learnable parameter vector for transformation.
$\bm{u}_{n}$ 是基于导出的聚合权重 $p_{\tau,n}$ 的区域 $r_{n}$ 嵌入序列在不同时间步长上的聚合表示。这里， $\tau$ 是时间步范围 $(t-T,t)$ 的索引。聚合权重 $p_{\tau,n}$ 反映了特定时间步的流量模式（ $\bm{b}_{\tau,n}$ ）与整体流量转换规律（ $\bm{u}_{n}$ ）之间的相关性。 $\bm{b}_{\tau,n}$ 是区域 $r_{n}$ 在时间步 $\tau$ 的嵌入， $\bm{w}_{0}\in\mathbb{R}^{D}$ 是用于转换的可学习参数向量。

In our ST-SSL model, we propose to estimate the heterogeneity degree between two regions, to be reflective of their traffic distribution difference over time below:
在我们的 ST-SSL 模型中，我们建议估计两个区域之间的异质性程度，以反映它们随时间的流量分布差异，如下所示：

q_{m,n}=\frac{\bm{u}_{m}^{\top}\bm{u}_{n}}{\|\bm{u}_{m}\|\|\bm{u}_{n}\|}.

(4)

Note that a larger $q_{m,n}$ score indicates the higher traffic pattern dependencies between region $r_{m}$ and $r_{n}$ , thus resulting in the lower heterogeneity degree.
请注意，较大的 $q_{m,n}$ 分数表示区域 $r_{m}$ 和 $r_{n}$ 之间的流量模式依赖性较高，从而导致异质性程度较低。

Heterogeneity-guided Data Augmentation.
异质性引导的数据增强。

In our ST-SSL, we propose to perform data augmentation from both the traffic-level and graph topology-level elaborated below:
在我们的 ST-SSL 中，我们建议从流量级别和图拓扑级别执行数据增强，如下所述：

Traffic-level Augmentation. Inspired by the data augmentation strategy in (Zhu et al. 2021), we design an augmentation operator over the constructed traffic tensor $\mathcal{X}_{t-T:t}$ , which is adaptive to the learned time-aware traffic pattern dependencies of each region. In particular, we aim to mask less relevant traffic volume at $\tau$ -th time step of region $r_{n}$ against noise perturbation, based on a derived mask probability ${\rho}_{\tau,n}$ draw from a Bernoulli distribution i.e., ${\rho}_{\tau,n}\sim\mathrm{Bern}(1-p_{\tau,n})$ . The higher ${\rho}_{\tau,n}$ value indicates that region $r_{n}$ ’s traffic volume $\bm{x}_{\tau,n}$ at $\tau$ -th time step is more likely to be masked, due to its lower relevance to the overall traffic regularities of region $r_{n}$ . The augmented data with the traffic-level augmentation is denoted as $\tilde{\mathcal{X}}_{t-T:t}$ .
流量级别增强。受到（Zhu et al. 2021）中数据增强策略的启发，我们在构建的流量张量 $\mathcal{X}_{t-T:t}$ 上设计了一个增强算子，它适应每个区域学习的时间感知流量模式依赖性。特别是，我们的目标是根据导出的掩码概率 ${\rho}_{\tau,n}$ 绘制，在区域 $r_{n}$ 的第 $\tau$ 时间步掩蔽不太相关的流量，以防止噪声扰动来自伯努利分布，即 ${\rho}_{\tau,n}\sim\mathrm{Bern}(1-p_{\tau,n})$ 。 ${\rho}_{\tau,n}$ 值越高，表示区域 $r_{n}$ 在第 $\tau$ 个时间步的流量 $\bm{x}_{\tau,n}$ 更有可能被屏蔽，由于其与区域 $r_{n}$ 整体交通规律的相关性较低。具有流量级别增强的增强数据表示为 $\tilde{\mathcal{X}}_{t-T:t}$ 。

Graph Topology-level Augmentation. In addition to the traffic-level augmentation, we propose to further perform the topology-level augmentation over the region traffic flow graph $\mathcal{G}$ . By doing so, ST-SSL can not only debias the region connections with low inter-correlated traffic patterns, but also capture the long-range region dependencies with the global urban context. Towards this end, $i$ ) Given two spatially adjacent regions $r_{m}$ and $r_{n}$ , their connection edge $(r_{m},r_{n})\in\mathcal{E}$ will be masked if they are not highly dependent in terms of their traffic regularities, measured by the high heterogeneity degree $q_{m,n}$ . The mask probability ${\rho}_{m,n}$ is drawn from a Bernoulli distribution i.e., ${\rho}_{m,n}\sim\mathrm{Bern}(1-q_{m,n})$ . $ii$ ) Given two non-adjacent regions, the low heterogeneity degree $q_{m,n}$ will result in adding an edge between $r_{m}$ and $r_{n}$ based on the masking probability drawn from a Bernoulli distribution, $\mathrm{Bern}(q_{m,n})$ similarly.
图拓扑级增强。除了流量级别的增强之外，我们建议进一步对区域流量图 $\mathcal{G}$ 进行拓扑级别的增强。通过这样做，ST-SSL 不仅可以消除相互关联交通模式较低的区域连接的偏差，还可以捕获与全球城市背景的远程区域依赖性。为此， $i$ ）给定两个空间相邻的区域 $r_{m}$ 和 $r_{n}$ ，如果它们不存在，它们的连接边 $(r_{m},r_{n})\in\mathcal{E}$ 将被屏蔽高度依赖于其交通规律，通过高异质性程度 $q_{m,n}$ 来衡量。掩码概率 ${\rho}_{m,n}$ 是从伯努利分布中得出的，即 ${\rho}_{m,n}\sim\mathrm{Bern}(1-q_{m,n})$ 。 $ii$ ）给定两个不相邻的区域，低异质性程度 $q_{m,n}$ 将导致在 $r_{m}$ 和 $r_{n}$ 之间添加一条边从伯努利分布中得出的掩蔽概率， $\mathrm{Bern}(q_{m,n})$ 类似。

After two augmentation phases, we obtain the augmented TFG $\tilde{\mathcal{G}}=\left(\mathcal{V},\tilde{\mathcal{E}},\tilde{\bm{A}},\tilde{\mathcal{X}}_{t-T:t}\right)$ , with the debiased traffic volume input $\tilde{\mathcal{X}}_{t-T:t}$ (traffic-level augmentation) and structure denoising $\tilde{\mathcal{E}},\tilde{\bm{A}}$ (graph topology-level augmentation).
经过两个增强阶段后，我们获得了增强的 TFG $\tilde{\mathcal{G}}=\left(\mathcal{V},\tilde{\mathcal{E}},\tilde{\bm{A}},\tilde{\mathcal{X}}_{t-T:t}\right)$ ，其中包含去偏的流量输入 $\tilde{\mathcal{X}}_{t-T:t}$ （流量级增强）和结构去噪 $\tilde{\mathcal{E}},\tilde{\bm{A}}$ （图拓扑级增强）。

3.3 SSL for Spatial Heterogeneity Modeling
3.3SSL用于空间异质性建模

Given the heterogeneity-aware augmented TFG, we aim to enable the region embeddings to effectively preserve the spatial heterogeneity with auxiliary self-supervised signals.
考虑到异质性感知增强 TFG，我们的目标是使区域嵌入能够通过辅助自监督信号有效地保留空间异质性。

To achieve this goal, we design a soft clustering-based self-supervised learning (SSL) task over regions, to map them into multiple latent representation spaces corresponding to diverse urban region functionalities (e.g., residential zone, shopping mall, transportation hub). Specifically, we generate $K$ cluster embeddings $\{\bm{c}_{1},\dots,\bm{c}_{K}\}$ (indexed by $k$ ) as latent factors for region clustering. Formally, the clustering process is performed with $\tilde{z}_{n,k}=\bm{c}_{k}^{\top}\tilde{\bm{h}}_{n}$ . Here, $\tilde{\bm{h}}_{n}\in\mathbb{R}^{D}$ is the region embedding of region $r_{n}$ encoded from the augmented TFG $\tilde{\mathcal{G}}$ . $\tilde{z}_{n,k}$ represents the estimated relevance score between region $r_{n}$ ’s embedding and the embedding $\bm{c}_{k}$ of the $k$ -th cluster. Afterwards, the cluster assignment of region $r_{n}$ is generated with $\tilde{\bm{z}}_{n}=(\tilde{z}_{n,1},\dots,\tilde{z}_{n,K})^{\top}$ .
为了实现这一目标，我们设计了一个基于软聚类的区域自监督学习（SSL）任务，将它们映射到与不同城市区域功能（例如住宅区、购物中心、交通枢纽）相对应的多个潜在表示空间。具体来说，我们生成 $K$ 聚类嵌入 $\{\bm{c}_{1},\dots,\bm{c}_{K}\}$ （由 $k$ 索引）作为区域聚类的潜在因素。形式上，聚类过程是通过 $\tilde{z}_{n,k}=\bm{c}_{k}^{\top}\tilde{\bm{h}}_{n}$ 执行的。这里， $\tilde{\bm{h}}_{n}\in\mathbb{R}^{D}$ 是从增强的 TFG $\tilde{\mathcal{G}}$ 编码的区域 $r_{n}$ 的区域嵌入。 $\tilde{z}_{n,k}$ 表示区域 $r_{n}$ 的嵌入与第 $k$ 集群的嵌入 $\bm{c}_{k}$ 之间的估计相关性得分。然后，使用 $\tilde{\bm{z}}_{n}=(\tilde{z}_{n,1},\dots,\tilde{z}_{n,K})^{\top}$ 生成区域 $r_{n}$ 的簇分配。

To provide self-supervised signals based on the heterogeneity-aware soft clustering paradigm for augmentation, the auxiliary learning task is designed to predict the cluster assignment using the region embedding ${\bm{h}_{n}}$ encoded from the original TFG $\mathcal{G}$ as: $\hat{z}_{n,k}=\bm{c}_{k}^{\top}{\bm{h}_{n}}$ , where $\hat{\bm{z}}_{n,k}$ is the predicted assignment score for $\tilde{\bm{z}}_{n,k}$ . The self-supervised augmented task is optimized as follows:
为了提供基于异质性感知软聚类范式的自监督信号以进行增强，辅助学习任务被设计为使用从原始 TFG $\mathcal{G}$ 编码的区域嵌入 ${\bm{h}_{n}}$ 来预测聚类分配。 b1> as: $\hat{z}_{n,k}=\bm{c}_{k}^{\top}{\bm{h}_{n}}$ ，其中 $\hat{\bm{z}}_{n,k}$ 是 $\tilde{\bm{z}}_{n,k}$ 的预测分配分数。自监督增强任务优化如下：

\ell({\bm{h}}_{n},\tilde{\bm{z}}_{n})=-\sum_{k}\tilde{z}_{n,k}\log\frac{\exp\left(\hat{z}_{n,k}/\gamma\right)}{\sum_{j}\exp\left(\hat{z}_{n,j}/\gamma\right)},

(5)

where $\gamma$ is the temperature parameter to control the smoothing degree of softmax output. The overall self-supervised objective over all regions is defined as follows:
其中 $\gamma$ 是控制softmax输出平滑程度的温度参数。各地区自监督总体目标定义如下：

\mathcal{L}_{s}=\sum_{n=1}^{N}\ell({\bm{h}}_{n},\tilde{\bm{z}}_{n}).

(6)

By incorporating the supervision on ${\bm{h}}_{n}$ with the heterogeneity-aware region cluster assignment $\tilde{\bm{z}}_{n}$ , we make the region embedding ${\bm{h}}_{n}$ to be reflective of spatial heterogeneity within the global urban space.
通过将对 ${\bm{h}}_{n}$ 的监督与异质性感知区域集群分配 $\tilde{\bm{z}}_{n}$ 相结合，我们使区域嵌入 ${\bm{h}}_{n}$ 反映全球城市内的空间异质性空间。

Distribution Regularization for Region Clustering. In our heterogeneity-aware region clustering paradigm, we generate the cluster assignment matrix $\tilde{\bm{Z}}=(\tilde{\bm{z}}_{1},\dots,\tilde{\bm{z}}_{N})^{\top}\in\mathbb{R}^{N\times K}$ as self-supervised signals for generative data augmentation. However, two issues need to be addressed to fit the true distribution of regional characteristics in urban space: $i$ ) Since $\tilde{\bm{Z}}$ is produced by matrix production, there is no guarantee that each region’s cluster assignment sums up to 1, i.e., $\tilde{\bm{Z}}\bm{1}_{K}=\bm{1}_{N}$ , where $\bm{1}_{N}$ denotes an $N$ -dimensional vector of all ones. $ii$ ) To avoid the trivial solution that every region has the same assignment, we employ the principle of maximum entropy, i.e., $\tilde{\bm{Z}}^{\top}\bm{1}_{N}=\frac{N}{K}\bm{1}_{K}$ . This encourages all regions to be equally partitioned by the clusters. To tackle these two issues, we define a feasible solution set as:
区域聚类的分布正则化。在我们的异质性感知区域聚类范例中，我们生成聚类分配矩阵 $\tilde{\bm{Z}}=(\tilde{\bm{z}}_{1},\dots,\tilde{\bm{z}}_{N})^{\top}\in\mathbb{R}^{N\times K}$ 作为生成数据增强的自监督信号。然而，要拟合城市空间中区域特征的真实分布，需要解决两个问题： $i$ ）由于 $\tilde{\bm{Z}}$ 是由矩阵生产产生的，因此不能保证每个区域的集群赋值总和为 1，即 $\tilde{\bm{Z}}\bm{1}_{K}=\bm{1}_{N}$ ，其中 $\bm{1}_{N}$ 表示所有 1 的 $N$ 维向量。 $ii$ ）为了避免每个区域都有相同分配的简单解决方案，我们采用最大熵原理，即 $\tilde{\bm{Z}}^{\top}\bm{1}_{N}=\frac{N}{K}\bm{1}_{K}$ 。这鼓励集群对所有区域进行平等划分。为了解决这两个问题，我们定义一个可行的解决方案集：

\tilde{\mathcal{Z}}=\left\{\tilde{\bm{Z}}\in\mathbb{R}_{+}^{N\times K}\middle|\tilde{\bm{Z}}\bm{1}_{K}=\bm{1}_{N},\tilde{\bm{Z}}^{\top}\bm{1}_{N}=\frac{N}{K}\bm{1}_{K}\right\}.

(7)

For any assignment $\tilde{\bm{Z}}\in\tilde{\mathcal{Z}}$ , we can use it to map the embedding matrix $\tilde{\bm{H}}=(\tilde{\bm{h}}_{1},\dots,\tilde{\bm{h}}_{N})^{\top}\in\mathbb{R}^{N\times D}$ into the cluster matrix $\bm{C}=(\bm{c}_{1},\dots,\bm{c}_{K})^{\top}\in\mathbb{R}^{K\times D}$ . Thus, we search for the optimal solution by maximizing the similarity between the embeddings and the clusters, i.e.,
对于任何赋值 $\tilde{\bm{Z}}\in\tilde{\mathcal{Z}}$ ，我们可以使用它将嵌入矩阵 $\tilde{\bm{H}}=(\tilde{\bm{h}}_{1},\dots,\tilde{\bm{h}}_{N})^{\top}\in\mathbb{R}^{N\times D}$ 映射到聚类矩阵 $\bm{C}=(\bm{c}_{1},\dots,\bm{c}_{K})^{\top}\in\mathbb{R}^{K\times D}$ 中。因此，我们通过最大化嵌入和聚类之间的相似度来搜索最佳解决方案，即

\max_{\tilde{\bm{Z}}\in\mathcal{Z}}\mathrm{tr}\left(\tilde{\bm{Z}}\bm{C}\tilde{\bm{H}}^{\top}\right)+\epsilon H(\tilde{\bm{Z}}),

(8)

where $\mathrm{tr}(\cdot)$ is the trace operator that sums elements on the main diagonal of a square matrix, $H(\tilde{\bm{Z}})$ is the entropy function defined as $-\sum_{n,k}\tilde{z}_{n,k}\log\tilde{z}_{n,k}$ , and $\epsilon$ is a parameter that controls the smoothness of the assignment. Finally, the original assignment in Eq. (6) is replaced with the optimal solution. Refer to the Appendix for the solution procedure.
其中 $\mathrm{tr}(\cdot)$ 是对方阵主对角线上的元素求和的迹运算符， $H(\tilde{\bm{Z}})$ 是定义为 $-\sum_{n,k}\tilde{z}_{n,k}\log\tilde{z}_{n,k}$ 的熵函数，而 $\epsilon$

3.4 SSL for Temporal Heterogeneity Modeling
3.4SSL用于时间异质性建模

In this component, we further design a self-supervised learning (SSL) task to inject the temporal heterogeneity into time-aware region embeddings, by enforcing the divergence among time step-specific traffic pattern representations.
在此组件中，我们进一步设计了一个自监督学习（SSL）任务，通过强制特定于时间步的流量模式表示之间的差异，将时间异质性注入时间感知区域嵌入中。

Specifically, we firstly fuse the encoded time-aware region embeddings from both the original and augmented TFGs:
具体来说，我们首先融合来自原始 TFG 和增强 TFG 的编码时间感知区域嵌入：

\bm{v}_{t,n}=\bm{w_{1}}\odot\bm{h}_{t,n}+\bm{w_{2}}\odot\tilde{\bm{h}}_{t,n},

(9)

where $\odot$ is the element-wise product. $\bm{w}_{1},\bm{w}_{2}$ are learnable parameters. After that, we generate the city-level representation $\bm{s}_{t}$ at the time step $t$ through aggregating embeddings of all regions ( $\sigma$ is the sigmoid function):
其中 $\odot$ 是逐元素乘积。 $\bm{w}_{1},\bm{w}_{2}$ 是可学习的参数。之后，我们通过聚合所有区域的嵌入在时间步骤 $t$ 生成城市级表示 $\bm{s}_{t}$ （ $\sigma$ 是 sigmoid 函数）：

\bm{s}_{t}=\sigma\left(\frac{1}{N}\sum_{n=1}^{N}\bm{v}_{t,n}\right).

(10)

To enhance the representation discrimination ability among different time steps, we treat the region-level and city-level embeddings ( $\bm{v}_{t,n},\bm{s}_{t}$ ) from the same time step as the positive pairs in our SSL task, and the embeddings from different time steps as negative pairs. With this design, the auxiliary supervision of positive pairs will encourage the consistency of time-specific citywide traffic trends (e.g., rush hours, weather factors), while the negative pairs help in capturing the temporal heterogeneity across different time steps. Formally, the temporal heterogeneity-enhanced SSL task is optimized with the following loss with cross-entropy metric:
为了增强不同时间步之间的表示辨别能力，我们将同一时间步中的地区级和城市级嵌入（ $\bm{v}_{t,n},\bm{s}_{t}$ ）视为 SSL 任务中的正对，并将不同时间步中的嵌入视为正对。时间步长为负对。通过这种设计，正对的辅助监督将促进特定时间的全市交通趋势（例如高峰时间、天气因素）的一致性，而负对有助于捕获不同时间步长的时间异质性。形式上，时间异质性增强的 SSL 任务通过交叉熵度量进行了以下损失的优化：

\mathcal{L}_{t}=-\left(\sum_{n=1}^{N}\log g\left(\bm{v}_{t,n},\bm{s}_{t}\right)+\sum_{n=1}^{N}\log\left(1-g\left(\bm{v}_{t^{\prime},n},\bm{s}_{t}\right)\right)\right),

(11)

where $t$ and $t^{\prime}$ denote two different time steps. $g$ is a criterion function defined as $g\left(\bm{v}_{t,n},\bm{s}_{t}\right)=\sigma\left(\bm{v}_{t,n}^{\top}\bm{W}_{3}\bm{s}_{t}\right)$ . $\bm{W}_{3}\in\mathbb{R}^{N\times N}$ is the learnable transformation matrix.
其中 $t$ 和 $t^{\prime}$ 表示两个不同的时间步长。 $g$ 是定义为 $g\left(\bm{v}_{t,n},\bm{s}_{t}\right)=\sigma\left(\bm{v}_{t,n}^{\top}\bm{W}_{3}\bm{s}_{t}\right)$ 的标准函数。 $\bm{W}_{3}\in\mathbb{R}^{N\times N}$ 是可学习的变换矩阵。

3.5 Model Training 3.5模型训练

In the learning process of our ST-SSL, we feed the embedding $\bm{h}_{n}\in\bm{H}$ of each region $r_{n}$ into an MLP structure to enable the traffic flow prediction at the future time step $t+1$ as:
在 ST-SSL 的学习过程中，我们将每个区域 $r_{n}$ 的嵌入 $\bm{h}_{n}\in\bm{H}$ 馈送到 MLP 结构中，以实现未来时间步骤 $t+1$

\hat{\bm{x}}_{t+1,n}=\mathrm{MLP}(\bm{h}_{n}),

(12)

where $\hat{\bm{x}}_{t+1,n}$ is the predicted result. The model is optimized by minimizing the loss function below:
其中 $\hat{\bm{x}}_{t+1,n}$ 是预测结果。通过最小化以下损失函数来优化模型：

\mathcal{L}_{p}=\sum_{n=1}^{N}\lambda\left|x_{t+1,n}^{(0)}-\hat{x}_{t+1,n}^{(0)}\right|+(1-\lambda)\left|x_{t+1,n}^{(1)}-\hat{x}_{t+1,n}^{(1)}\right|,

(13)

where $x_{t+1,n}^{(0)},x_{t+1,n}^{(1)}$ denote the ground truth of inflow and outflow respectively. $\lambda$ is a parameter to balance the influence of each type of traffic flow.
其中 $x_{t+1,n}^{(0)},x_{t+1,n}^{(1)}$ 分别表示流入和流出的基本事实。 $\lambda$ 是平衡各类流量影响的参数。

Finally, we obtain the overall loss by incorporating the self-supervised spatial and temporal heterogeneity modeling losses in Eq. (6) and (11) into the joint learning objective:
最后，我们通过将自我监督的空间和时间异质性建模损失纳入方程（1）中来获得总体损失。 (6)和(11)纳入联合学习目标：

\mathcal{L}_{joint}=\mathcal{L}_{p}+\mathcal{L}_{s}+\mathcal{L}_{t}.

(14)

Our model can be trained via the back-propagation algorithm. The entire training procedure can be summarized into four stages: $i$ ) given a TFG $\mathcal{G}$ , we generate a region embedding matrix $\bm{H}$ by the ST encoder. $ii$ ) Meanwhile, we perform adaptive augmentation to refine $\mathcal{G}$ as $\tilde{\mathcal{G}}$ , which is fed into the shared ST encoder to output $\tilde{\bm{H}}$ . $iii$ ) By using $\bm{H}$ and $\tilde{\bm{H}}$ , we calculate the losses $\mathcal{L}_{s}$ , $\mathcal{L}_{t}$ , and $\mathcal{L}_{p}$ that are used to produce the joint loss $\mathcal{L}_{joint}$ . $iv$ ). We employ the back-propagation algorithm to train ST-SSL until $\mathcal{L}_{joint}$ converges.
我们的模型可以通过反向传播算法进行训练。整个训练过程可以概括为四个阶段： $i$ ）给定一个TFG $\mathcal{G}$ ，我们通过ST编码器生成一个区域嵌入矩阵 $\bm{H}$ 。 $ii$ ）同时，我们执行自适应增强将 $\mathcal{G}$ 细化为 $\tilde{\mathcal{G}}$ ，将其输入共享 ST 编码器以输出 $\tilde{\bm{H}}$ 。 $iii$ ）通过使用 $\bm{H}$ 和 $\tilde{\bm{H}}$ ，我们计算损失 $\mathcal{L}_{s}$ 、 $\mathcal{L}_{t}$ 和 $\mathcal{L}_{p}$ 用于产生联合损失 $\mathcal{L}_{joint}$ 。 $iv$ ）。我们采用反向传播算法来训练 ST-SSL，直到 $\mathcal{L}_{joint}$ 收敛。

4 Experiments 4实验

In this section, we evaluate the performance of ST-SSL on a series of experiments over several real-world datasets, which are summarized to answer the following research questions:
在本节中，我们通过对多个真实数据集进行的一系列实验来评估 ST-SSL 的性能，这些实验经过总结以回答以下研究问题：

$\bullet$ RQ1: How is the overall traffic prediction performance of ST-SSL as compared to various baselines?
$\bullet$ RQ1：与各种基线相比，ST-SSL 的整体流量预测性能如何？

$\bullet$ RQ2: How do designed different sub-modules contribute to the model performance?
$\bullet$ RQ2：设计的不同子模块对模型性能有何贡献？

$\bullet$ RQ3: How does ST-SSL perform with regard to heterogeneous spatial regions and different time periods?
$\bullet$ RQ3：ST-SSL 对于异构空间区域和不同时间段的表现如何？

$\bullet$ RQ4: How do the augmented graph and learned representations benefit the model?
$\bullet$ RQ4：增强图和学习表示如何使模型受益？

4.1 Experimental Settings 4.1 实验设置

Data Description. 数据说明。

We evaluate our model on two types of public real-world traffic datasets summarized in Tab. 1.
我们在表 1 中总结的两种类型的公共现实世界流量数据集上评估我们的模型。 1.

The first kind is about bike rental records in New York City. NYCBike1 (Zhang, Zheng, and Qi 2017) spans from 04/01/2014 to 09/30/2014, and NYCBike2 (Yao et al. 2019) spans from 07/01/2016 to 08/29/2016. They are all measured every 30 minutes. The second kind is about taxi GPS trajectories. NYCTaxi (Yao et al. 2019) spans from 01/01/2015 to 03/01/2015. Its time interval is half an hour. BJTaxi (Zhang, Zheng, and Qi 2017), collected in Beijing, spans from 03/01/2015 to 06/30/2015 on an hourly basis.
第一种是关于纽约市的自行车租赁记录。 NYCBike1（Zhang、Zheng 和 Qi 2017）的时间跨度为 04/01/2014 至 09/30/2014，NYCBike2（Yao 等人 2019）的时间跨度为 07/01/2016 至 08/29/2016。它们每 30 分钟测量一次。第二类是出租车GPS轨迹。 NYCTaxi（Yao et al. 2019）的有效期为 01/01/2015 至 03/01/2015。其时间间隔为半小时。 BJTaxi（Zhang、Zheng 和 Qi 2017）在北京收集，时间跨度从 03/01/2015 到 06/30/2015 按小时计算。

For all datasets, previous 2-hour flows as well as previous 3-day flows around the predicted time are used to predict the flows for the next time step. We use a sliding window strategy to generate samples, and then split each dataset into the training, validation, and test sets with a ratio of 7:1:2.
对于所有数据集，之前 2 小时的流量以及预测时间附近的之前 3 天的流量用于预测下一个时间步的流量。我们使用滑动窗口策略生成样本，然后将每个数据集按 7:1:2 的比例分割为训练集、验证集和测试集。

Evaluation Metrics & Baselines.
评估指标和基线。

In our experiments, two common metrics are used for evaluation: Mean Average Error (MAE) and Mean Average Percentage Error (MAPE). We compare our proposed ST-SSL with 8 baselines that fall into three categories.
在我们的实验中，使用两个常见指标进行评估：平均平均误差（MAE）和平均平均百分比误差（MAPE）。我们将我们提出的 ST-SSL 与分为三类的 8 个基线进行比较。

Traditional Time Series Prediction Approaches:
传统时间序列预测方法：
$\bullet$ ARIMA (Kumar and Vanajakshi 2015): it is a classical time series prediction model.
$\bullet$ ARIMA (Kumar and Vanajakshi 2015)：它是一个经典的时间序列预测模型。
$\bullet$ SVR (Castro-Neto et al. 2009): it is a regression model widely used for time series analysis.
SVR (Castro-Neto et al. 2009)：它是一种广泛用于时间序列分析的回归模型。

Spatial-Temporal Traffic Prediction Methods:
时空交通预测方法：
$\bullet$ ST-ResNet (Zhang, Zheng, and Qi 2017): it is a convolution-based model that constructs multiple traffic time series to capture the temporal dependencies and utilizes residual convolution to model the spatial correlations.
$\bullet$ ST-ResNet（Zhang、Zheng 和 Qi 2017）：它是一种基于卷积的模型，构建多个流量时间序列来捕获时间依赖性，并利用残差卷积来建模空间相关性。
$\bullet$ STGCN (Yu, Yin, and Zhu 2018): it is a graph convolution-based model that combines 1D convolution to capture spatial and temporal correlations, respectively.
STGCN (Yu, Yin, and Zhu 2018)：它是一种基于图卷积的模型，结合一维卷积来分别捕获空间和时间相关性。
$\bullet$ GMAN (Zheng et al. 2020): it is an attention-based predictive model that adopts an encoder-decoder architecture.
GMAN（Zheng et al. 2020）：它是一种基于注意力的预测模型，采用编码器-解码器架构。

Spatial-Temporal Methods Considering Heterogeneity:
考虑异质性的时空方法：
$\bullet$ AGCRN (Bai et al. 2020): it enhances the traditional graph convolution by adaptive modules and combines them into recurrent networks to capture spatial-temporal correlations.
$\bullet$ AGCRN（Bai et al. 2020）：它通过自适应模块增强了传统的图卷积，并将它们组合成循环网络以捕获时空相关性。
$\bullet$ STSGCN (Song et al. 2020): it captures the complex localized spatial-temporal correlations through a spatial-temporal synchronous modeling mechanism.
STSGCN（Song et al. 2020）：它通过时空同步建模机制捕获复杂的局部时空相关性。
$\bullet$ STFGNN (Li and Zhu 2021): it integrates with STFGN module and a novel gated CNN module, and captures hidden spatial dependencies by a data-driven graph and its further fusion with given spatial graphs.
STFGNN（Li and Zhu 2021）：它与 STFGN 模块和新颖的门控 CNN 模块集成，并通过数据驱动图捕获隐藏的空间依赖性，并与给定的空间图进一步融合。

Methods in the last category model the traffic heterogeneity by using multiple parameter spaces.
最后一类方法通过使用多个参数空间对流量异构性进行建模。

Data type 数据类型	Bike rental 自行车租赁		Taxi GPS 出租车GPS
Dataset 数据集	NYCBike1 纽约自行车1	NYCBike2 纽约自行车2	NYCTaxi 纽约出租车	BJTaxi 北京出租车
Time interval 时间间隔	1 hour 1小时	30 min 30分钟	30 min 30分钟	30 min 30分钟
# regions # 地区	16 $\times$ 8	10 $\times$ 20	10 $\times$ 20	32 $\times$ 32
# taxis/bikes # 出租车/自行车	6.8k+	2.6m+	22m+	34k+

Table 1: Statistics of Datasets.
表 1：数据集统计。

Dataset 数据集	Metric 公制	Type 类型	ARIMA	SVR	ST-ResNet	STGCN	GMAN	AGCRN	STSGCN	STFGNN	ST-SSL
NYCBike1 纽约自行车1	MAE	In	10.66	7.27	5.53±0.06	5.33±0.02	6.77±3.42	5.17±0.03	5.81±0.04	6.53±0.10	4.94±0.02
	MAE	Out	11.33	7.98	5.74±0.07	5.59±0.03	7.17±3.61	5.47±0.03	6.10±0.04	6.79±0.08	5.26±0.02
	MAPE	In	33.05	25.39	25.46±0.20	26.92±0.08	31.72±12.29	25.59±0.22	26.51±0.32	32.14±0.23	23.69±0.11
	MAPE	Out	35.03	27.42	26.36±0.50	27.69±0.14	34.74±17.04	26.63±0.30	27.56±0.39	32.88±0.19	24.60±0.27
NYCBike2 纽约自行车2	MAE	In	8.91	12.82	5.63±0.14	5.21±0.02	5.24±0.13	5.18±0.03	5.25±0.03	5.80±0.10	5.04±0.03
	MAE	Out	8.70	11.48	5.26±0.08	4.92±0.02	4.97±0.14	4.79±0.04	4.94±0.05	5.51±0.11	4.71±0.02
	MAPE	In	28.86	46.52	32.17±0.85	27.73±0.16	27.38±1.13	27.14±0.14	29.26±0.13	30.73±0.49	22.54±0.10
	MAPE	Out	28.22	41.91	30.48±0.86	26.83±0.21	26.75±1.14	26.17±0.22	28.02±0.23	29.98±0.46	21.17±0.13
NYCTaxi 纽约出租车	MAE	In	20.86	52.16	13.48±0.14	13.12±0.04	15.09±0.61	12.13±0.11	13.69±0.11	16.25±0.38	11.99±0.12
	MAE	Out	16.80	41.71	10.78±0.25	10.35±0.03	12.06±0.39	9.87±0.04	10.75±0.17	12.47±0.25	9.78±0.09
	MAPE	In	21.49	65.10	24.83±0.55	21.01±0.18	22.73±1.20	18.78±0.04	22.91±0.44	24.01±0.30	16.38±0.10
	MAPE	Out	21.23	64.06	24.42±0.52	20.78±0.16	21.97±0.86	18.41±0.21	22.37±0.16	23.28±0.47	16.86±0.23
BJTaxi 北京出租车	MAE	In	21.48	52.77	12.12±0.11	12.34±0.09	13.13±0.43	12.30±0.06	12.72±0.03	13.83±0.04	11.31±0.03
	MAE	Out	21.60	52.74	12.16±0.12	12.41±0.08	13.20±0.43	12.38±0.06	12.79±0.03	13.89±0.04	11.40±0.02
	MAPE	In	23.12	65.51	15.50±0.26	16.66±0.21	18.67±0.99	15.61±0.15	17.22±0.17	19.29±0.07	15.03±0.13
	MAPE	Out	20.67	65.51	15.57±0.26	16.76±0.22	18.84±1.04	15.75±0.15	17.35±0.17	19.41±0.07	15.19±0.15

Table 2: Model comparison on four datasets in terms of MAE and MAPE (%). In and Out represent the inflow and outflow.
表 2：四个数据集的模型比较 MAE 和 MAPE (%)。 In 和 Out 代表流入和流出。

Parameter Settings. 参数设置。

The ST-SSL is implemented with PyTorch. The embedding dimension $D$ is set as 64. Both the temporal and spatial convolution kernel sizes of ST encoder are set to 3. The perturbation ratios for both traffic-level and topology-level augmentations are set as 0.1. The training phase is performed using the Adam optimizer and the batch size of 32. The experiments of baseline evaluation are conducted with their released codes on the LibCity (Wang et al. 2021) platform.
ST-SSL 使用 PyTorch 实现。嵌入维度 $D$ 设置为64。ST编码器的时间和空间卷积核大小都设置为3。流量级和拓扑级增强的扰动比设置为0.1。训练阶段使用 Adam 优化器，批量大小为 32。基线评估的实验是在 LibCity（Wang et al. 2021）平台上使用其发布的代码进行的。

4.2 Performance Comparison (RQ1)
4.2性能比较（RQ1）

Table 2 shows the comparison results of all methods. We run all deep learning models with 5 different seeds and report the average performance and their standard deviations.
表2显示了所有方法的比较结果。我们使用 5 种不同的种子运行所有深度学习模型，并报告平均性能及其标准差。

Performance Superiority of ST-SSL.
ST-SSL 的性能优势。

According to Student’s $t$ -test at level 0.01, our ST-SSL significantly outperforms other competing baselines with regard to both metrics over all datasets. This demonstrates the effectiveness of ST-SSL in jointly modeling the spatial and temporal heterogeneity in a self-supervised manner. Fig. 3 visualizes the prediction error ( $|\hat{x}_{n}-x_{n}|/x_{n}$ ) of ST-SSL and two best performed baselines on BJTaxi dataset, where a brighter pixel means a larger error. The superiority of our model can still be observed, which is consistent with the quantitative results in Table 2. Interestingly, ST-SSL exhibits a significant improvement in the suburban areas (green boxes in Fig. 3), which justifies the effectiveness of spatial heterogeneity modeling that transfers information among global similar regions.
根据学生 $t$ 级别 0.01 的测试，我们的 ST-SSL 在所有数据集的两个指标方面都显着优于其他竞争基线。这证明了 ST-SSL 以自我监督的方式联合建模空间和时间异质性的有效性。图 3 可视化了 ST-SSL 的预测误差 ( $|\hat{x}_{n}-x_{n}|/x_{n}$ ) 和 BJTaxi 数据集上两个表现最好的基线，其中更亮的像素意味着更大的误差。我们的模型的优越性仍然可以观察到，这与表2中的定量结果一致。有趣的是，ST-SSL在郊区（图3中的绿色框）表现出显着的改善，这证明了空间异质性的有效性在全球相似区域之间传输信息的建模。

Performance Comparison between Baselines.
基线之间的性能比较。

Spatio-temporal prediction methods outperform time series approaches in most cases, which suggests the necessity to capture spatial dependencies. The methods that take into account the heterogeneity of traffic data usually perform better than those that use shared parameters across different regions and time periods, indicating the rationality of learning spatial and temporal heterogeneity in traffic prediction.
在大多数情况下，时空预测方法优于时间序列方法，这表明有必要捕获空间依赖性。考虑交通数据异质性的方法通常比使用不同区域和时间段共享参数的方法表现得更好，这表明学习时空异质性在交通预测中的合理性。

4.3 Ablation Study (RQ2) 4.3 消融研究（RQ2）

To analyze the effects of sub-modules in our ST-SSL framework, we perform ablation studies with five variants:
为了分析 ST-SSL 框架中子模块的影响，我们使用五种变体进行消融研究：

$\bullet$ ST-SSL-sa: This variant replaces heterogeneity-guided structure augmentation on graph topology with random edge removal and addition augmentations.
$\bullet$ ST-SSL-sa：该变体用随机边缘去除和加法增强取代了图拓扑上的异质性引导结构增强。

$\bullet$ ST-SSL-ta: This variant replaces heterogeneity-guided traffic-level augmentation with random traffic volume masking augmentations.
$\bullet$ ST-SSL-ta：此变体用随机流量屏蔽增强取代了异构引导的流量级别增强。

$\bullet$ ST-SSL-sh: This variant which disables spatial heterogeneity modeling in the joint framework.
$\bullet$ ST-SSL-sh：此变体禁用联合框架中的空间异质性建模。

$\bullet$ ST-SSL-th: This variant which disables temporal heterogeneity modeling in the joint framework.
$\bullet$ ST-SSL-th：此变体禁用联合框架中的时间异质性建模。

The results are presented in Fig. 4. We can observe that ST-SSL beats the variants with random augmentation, indicating the effectiveness of our adaptive heterogeneity-guided data augmentation at both traffic-level and graph structure-level. Moreover, ST-SSL consistently outperforms ST-SSL-sh and ST-SSL-th, which justifies the necessity to jointly model the spatial and temporal heterogeneity. In summary, each designed sub-module has a positive effect on performance improvement.
结果如图 4 所示。我们可以观察到 ST-SSL 击败了随机增强的变体，这表明我们的自适应异质性引导数据增强在流量级别和图结构级别上的有效性。此外，ST-SSL 始终优于 ST-SSL-sh 和 ST-SSL-th，这证明了联合建模空间和时间异质性的必要性。综上所述，设计的各个子模块对于性能提升都有积极的作用。

4.4 Robustness Analysis (RQ3)
4.4稳健性分析（RQ3）

To explore the robustness of our ST-SSL, we perform traffic prediction for spatial regions with heterogeneous data distributions and time periods with different patterns on BJTaxi. Specifically, we cluster regions by using traffic data statistics, i.e., $(mean,median,standard~{}deviation)$ of their historical traffic flow. As shown in Fig. 5(a), regions with smaller cluster id (next to the color bar) are usually located in suburbs that are less popular and thus have lower traffic. Fig. 5(b) exhibits the prediction performance for different clusters. Our ST-SSL surpasses other baselines by a significant margin, particularly for less popular regions (marked by black circles), which is consistent with results in Fig. 3. This also verifies the robustness of ST-SSL to accurately predicts traffic flows of different types of spatial regions.
为了探索 ST-SSL 的鲁棒性，我们对 BJTaxi 上具有异构数据分布的空间区域和具有不同模式的时间段进行流量预测。具体来说，我们通过使用流量数据统计（即历史流量的 $(mean,median,standard~{}deviation)$ ）来对区域进行聚类。如图 5(a) 所示，簇 id 较小的区域（颜色条旁边）通常位于不太受欢迎的郊区，因此流量较低。图 5(b) 显示了不同簇的预测性能。我们的 ST-SSL 大大超过了其他基线，特别是对于不太受欢迎的区域（用黑色圆圈标记），这与图 3 中的结果一致。这也验证了 ST-SSL 准确预测不同流量的鲁棒性。空间区域的类型。

For temporal heterogeneity, according to urban traffic rhythms (Wang et al. 2019a), we partition a workday into four time periods and a holiday (weekend included) into two time periods, whose categories are given in Fig. 5(c). Fig. 5(d) presents the evaluation performance. Our ST-SSL beats the baselines in terms of every category. Furthermore, ST-SSL shows a significant improvement in categories 3 and 5 that denote the nighttime of workdays and holidays. During these times, traffic flow data are typically sparse, making it difficult for baselines to produce accurate predictions. ST-SSL can handle this situation because we inject the temporal heterogeneity into the time-aware region embeddings.
对于时间异质性，根据城市交通节律（Wang et al. 2019a），我们将工作日划分为四个时间段，将假期（包括周末）划分为两个时间段，其类别如图5（c）所示。图5(d)展示了评估性能。我们的 ST-SSL 在每个类别方面都超越了基准。此外，ST-SSL 在表示工作日和节假日夜间的类别 3 和 5 中显示出显着的改进。在这些时期，交通流数据通常很稀疏，使得基线很难产生准确的预测。 ST-SSL 可以处理这种情况，因为我们将时间异质性注入到时间感知区域嵌入中。

4.5 Qualitative Study (RQ4)
4.5定性研究（RQ4）

In Fig. 6, we investigate the heterogeneity-guided graph topology-level augmentation on BJTaxi. Our augmentation method adaptively removes connections between adjacent regions with heterogeneous traffic patterns, i.e., Zuojiazhuang Residential Zone and Sanyuan Bridge (a transportation hub). Meanwhile, it builds connections between distant regions with similar latent urban function, e.g., Xizhimen Bridge and Sanyuan Bridge that are both transportation hubs. In this way, our ST-SSL can not only debias the region connections with low inter-correlated traffic patterns, but also capture the long-range region dependencies with the global urban context.
在图 6 中，我们研究了 BJTaxi 上异构引导的图拓扑级增强。我们的增强方法自适应地消除具有异构交通模式的相邻区域（即左家庄住宅区和三元桥（交通枢纽））之间的连接。同时，它还建立了具有类似潜在城市功能的遥远地区之间的联系，例如同为交通枢纽的西直门大桥和三元桥。通过这种方式，我们的 ST-SSL 不仅可以消除相互关联流量模式较低的区域连接的偏差，还可以捕获与全球城市环境的远程区域依赖关系。

To further explore why the embeddings obtained by ST-SSL can deliver more accurate traffic prediction than AGCRN, we visualize them on BJTaxi by t-SNE (Van der Maaten and Hinton 2008). We plot the learned embeddings of all regions with ground truth classes the same as Fig. 5(a). As shown in Fig. 7, samples in the same class are more compact and those of different classes are significantly better separated for ST-SSL. This enables ST-SSL to be aware of spatial heterogeneity and transfer information between regions in the same class, which facilitates predictions.
为了进一步探讨为什么 ST-SSL 获得的嵌入可以提供比 AGCRN 更准确的流量预测，我们通过 t-SNE 在 BJTaxi 上将它们可视化（Van der Maaten 和 Hinton 2008）。我们绘制了具有与图 5（a）相同的真实类别的所有区域的学习嵌入。如图 7 所示，对于 ST-SSL，同一类中的样本更加紧凑，不同类中的样本明显更好地分离。这使得 ST-SSL 能够意识到空间异质性并在同一类区域之间传输信息，从而有利于预测。

5 Related Work 5相关工作

Deep Learning for Traffic Prediction.
用于交通预测的深度学习。

Many efforts have been devoted to developing traffic prediction techniques based on various neural networks. RNN (Wang et al. 2019b; Ji et al. 2020) and 1D CNN (Wang et al. 2022, 2016) are applied to capture the temporal dependencies in traffic series. CNN (Zhang, Zheng, and Qi 2017; Yao et al. 2019), GNN (Zhang et al. 2020; Ji et al. 2022), and attention mechanism (Zheng et al. 2020) are introduced to incorporate the spatial information. However, most of them neglect the spatio-temporal heterogeneity problem. Recently, some works model the heterogeneity by using multiple models (Yuan, Zhou, and Yang 2018) or multiple sets of parameters (Bai et al. 2020; Li and Zhu 2021), and some use meta learning to generate different weights based on static features of different regions (Pan et al. 2019a; Ye et al. 2022). However, these methods either introduce a number of parameters that may cause an overfitting problem or require external data that may be not available. To overcome these limitations, we incorporate self-supervised learning into traffic prediction to explore spatial and temporal heterogeneity.
许多努力致力于开发基于各种神经网络的交通预测技术。 RNN (Wang et al. 2019b; Ji et al. 2020) 和 1D CNN (Wang et al. 2022, 2016) 用于捕获流量序列中的时间依赖性。引入 CNN（Zhang、Zheng 和 Qi 2017；Yao et al. 2019）、GNN（Zhang et al. 2020；Ji et al. 2022）和注意力机制（Zheng et al. 2020）来合并空间信息。然而，大多数人忽视了时空异质性问题。最近，一些工作使用多个模型（Yuan、Zhou和Yang，2018）或多组参数（Bai等人，2020；Li和Zhu，2021）来对异质性进行建模，还有一些工作使用元学习基于静态生成不同的权重。不同地区的特征（Pan et al. 2019a；Ye et al. 2022）。然而，这些方法要么引入许多可能导致过度拟合问题的参数，要么需要可能不可用的外部数据。为了克服这些限制，我们将自我监督学习纳入交通预测中，以探索空间和时间异质性。

Self-Supervised Learning for Representation Learning.
用于表征学习的自监督学习。

Self-supervised learning aims to extract useful information from input data to improve the representation quality (Hendrycks et al. 2019). The general paradigm is to augment the input data and then design pretext tasks as pseudo-labels for representation learning. It has achieved great success with text (Kenton and Toutanova 2019), image (Chen et al. 2020), and audio data (Oord, Li, and Vinyals 2018). Motivated by these works, we develop an adaptive data augmentation method for spatio-temporal graph data and introduce two pretext tasks to learn representations that are robust to spatio-temporal heterogeneity, which has not been well explored in existing traffic flow prediction methods.
自监督学习旨在从输入数据中提取有用信息以提高表示质量（Hendrycks et al. 2019）。一般范例是增强输入数据，然后将借口任务设计为表示学习的伪标签。它在文本（Kenton 和 Toutanova 2019）、图像（Chen et al. 2020）和音频数据（Oord、Li 和 Vinyals 2018）方面取得了巨大成功。受这些工作的启发，我们开发了一种针对时空图数据的自适应数据增强方法，并引入了两个借口任务来学习对时空异质性具有鲁棒性的表示，这在现有的交通流预测方法中尚未得到很好的探索。

6 Conclusion and Future Work
6结论和未来工作

This work investigated the traffic prediction problem by proposing a novel spatio-temporal self-supervised learning (ST-SSL) framework. Specifically, we integrated temporal and spatial convolutions to encode spatial-temporal traffic patterns. Then, we devised $i$ ) a spatial self-supervised learning paradigm that consists of an adaptive graph augmentation and a clustering-based generative task, and $ii$ ) a temporal self-supervised learning paradigm that relies on a time-aware contrastive task, to supplement the main traffic flow prediction task with spatial and temporal heterogeneity-aware self-supervised signals. Comprehensive experiments on four traffic flow datasets demonstrated the robustness of ST-SSL. The future work lies in extending our spatial-temporal SSL framework to a model-agnostic paradigm.
这项工作通过提出一种新颖的时空自监督学习（ST-SSL）框架来研究流量预测问题。具体来说，我们集成了时间和空间卷积来编码时空流量模式。然后，我们设计了 $i$ ）一种空间自监督学习范式，由自适应图增强和基于聚类的生成任务组成，以及 $ii$ ）时间自监督学习范式它依赖于时间感知的对比任务，用空间和时间异构感知的自监督信号来补充主要交通流预测任务。对四个交通流数据集的综合实验证明了 ST-SSL 的鲁棒性。未来的工作在于将我们的时空 SSL 框架扩展到与模型无关的范式。

Acknowledgments 致谢

This work was supported by the National Key R&D Program of China (2019YFB2101804). Prof. Wang’s work was supported by the National Natural Science Foundation of China (No. 72222022, 82161148011, 72171013), the Fundamental Research Funds for the Central Universities (YWF-22-L-838) and the DiDi Gaia Collaborative Research Funds. Dr. Zhang’s work was supported by the National Natural Science Foundation of China (No. 62172034) and the Beijing Nova Program (Z201100006820053).
该工作得到了国家重点研发计划（2019YFB2101804）的支持。王教授的工作得到了国家自然科学基金（No. 72222022、82161148011、72171013）、中央高校基本科研业务费专项资金（YWF-22-L-838）和滴滴盖亚协同研究基金的资助。张博士的工作得到了国家自然科学基金（No. 62172034）和北京新星计划（Z201100006820053）的支持。

References 参考

Bai et al. (2020) 白等人。 (2020) Bai, L.; Yao, L.; Li, C.; Wang, X.; and Wang, C. 2020. Adaptive graph convolutional recurrent network for traffic forecasting. NeurIPS, 33: 17804–17815.
白，L.；姚，L.；李，C。王X； Wang, C. 2020。用于流量预测的自适应图卷积循环网络。 NeurIPS，33：17804–17815。
Castro-Neto et al. (2009)
卡斯特罗-内托等人。 (2009) Castro-Neto, M.; Jeong, Y.-S.; Jeong, M.-K.; and Han, L. D. 2009. Online-SVR for short-term traffic flow prediction under typical and atypical traffic conditions. Expert Systems with Applications, 36(3): 6164–6173.
卡斯特罗-内托，M.；郑，Y.-S.；郑，M.-K.；和 Han, L. D. 2009。典型和非典型交通条件下短期交通流预测的在线 SVR。专家系统与应用，36(3)：6164–6173。
Chen et al. (2020) 陈等人。 (2020) Chen, T.; Kornblith, S.; Norouzi, M.; and Hinton, G. 2020. A simple framework for contrastive learning of visual representations. In ICML, 1597–1607.
陈，T。科恩布利斯，S.；诺鲁齐，M.；和 Hinton, G. 2020。视觉表示对比学习的简单框架。在 ICML，1597-1607 年。
Hendrycks et al. (2019) 亨德里克斯等人。 (2019) Hendrycks, D.; Mazeika, M.; Kadavath, S.; and Song, D. 2019. Using self-supervised learning can improve model robustness and uncertainty. NeurIPS, 32.
亨德里克斯，D.；马泽卡，M.；卡达瓦斯，S.；和 Song, D. 2019。使用自我监督学习可以提高模型的稳健性和不确定性。神经IPS，32。
Ji et al. (2022) 吉等人。 (2022) Ji, J.; Wang, J.; Jiang, Z.; Jiang, J.; and Zhang, H. 2022. STDEN: Towards physics-guided neural networks for traffic flow prediction. In AAAI, volume 36, 4048–4056.
吉，J。王，J。蒋Z.；蒋，J。和Zhang, H. 2022。STDEN：面向用于交通流预测的物理引导神经网络。 AAAI，第 36 卷，4048–4056。
Ji et al. (2020) 吉等人。 (2020) Ji, J.; Wang, J.; Jiang, Z.; Ma, J.; and Zhang, H. 2020. Interpretable spatiotemporal deep learning model for traffic flow prediction based on potential energy fields. In ICDM, 1076–1081.
吉，J。王，J。蒋Z.；马，J。和Zhang, H. 2020。基于势能场的交通流预测的可解释时空深度学习模型。在《ICDM》中，1076-1081。
Kenton and Toutanova (2019)
肯顿和图塔诺瓦 (2019) Kenton, J. D. M.-W. C.; and Toutanova, L. K. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics, 4171–4186.
肯顿，J.D.M.-W. C。;和 Toutanova, L. K. 2019。BERT：用于语言理解的深度双向转换器的预训练。计算语言学协会北美分会 2019 年会议记录，4171-4186。
Kumar and Vanajakshi (2015)
库马尔和瓦纳贾克希 (2015) Kumar, S. V.; and Vanajakshi, L. 2015. Short-term traffic flow prediction using seasonal ARIMA model with limited input data. European Transport Research Review, 7(3): 1–9.
库马尔，S.V.；和 Vanajakshi, L. 2015。使用有限输入数据的季节性 ARIMA 模型进行短期交通流预测。欧洲运输研究评论，7(3)：1-9。
Li and Zhu (2021)
李和朱 (2021) Li, M.; and Zhu, Z. 2021. Spatial-temporal fusion graph neural networks for traffic flow forecasting. In AAAI, volume 35, 4189–4196.
李，M。 Zhu, Z. 2021。用于交通流预测的时空融合图神经网络。 AAAI，第 35 卷，4189–4196。
Oord, Li, and Vinyals (2018)
奥尔德、李和维尼亚尔斯 (2018) Oord, A.; Li, Y.; and Vinyals, O. 2018. Representation learning with contrastive predictive coding. CoRR, abs/1807.03748.
奥尔德，A.；李，Y。和 Vinyals, O. 2018。使用对比预测编码进行表示学习。 CoRR，abs/1807.03748。
Pan et al. (2019a) 潘等人。 (2019a) Pan, Z.; Liang, Y.; Wang, W.; Yu, Y.; Zheng, Y.; and Zhang, J. 2019a. Urban traffic prediction from spatio-temporal data using deep meta learning. In ACM SIGKDD, 1720–1730.
潘，Z。梁，Y。王，W。于，Y。郑，Y。和张杰，2019a。使用深度元学习根据时空数据进行城市交通预测。在 ACM SIGKDD，1720-1730 年。
Pan et al. (2019b) 潘等人。 (2019b) Pan, Z.; Wang, Z.; Wang, W.; Yu, Y.; Zhang, J.; and Zheng, Y. 2019b. Matrix factorization for spatio-temporal neural networks with applications to urban flow prediction. In CIKM, 2683–2691.
潘，Z。王，Z。王，W。于，Y。张，J。和郑Y.2019b。时空神经网络的矩阵分解及其在城市流量预测中的应用。在CIKM，2683-2691。
Song et al. (2020) 宋等人。 (2020) Song, C.; Lin, Y.; Guo, S.; and Wan, H. 2020. Spatial-temporal synchronous graph convolutional networks: A new framework for spatial-temporal network data forecasting. In AAAI, volume 34, 914–921.
宋，C.；林，Y。郭，S。和 Wan, H. 2020。时空同步图卷积网络：时空网络数据预测的新框架。 AAAI，第 34 卷，914–921。
Van der Maaten and Hinton (2008)
范德马滕和辛顿 (2008) Van der Maaten, L.; and Hinton, G. 2008. Visualizing data using t-SNE. Journal of machine learning research, 9(11).
范德马滕，L.；和 Hinton, G. 2008。使用 t-SNE 可视化数据。机器学习研究杂志，9（11）。
Wang et al. (2016) 王等人。 (2016) Wang, J.; Gu, Q.; Wu, J.; Liu, G.; and Xiong, Z. 2016. Traffic speed prediction and congestion source exploration: A deep learning method. In ICDM, 499–508.
王，J。顾，Q。吴，J。刘，G.； Xiong, Z. 2016。交通速度预测和拥堵源探索：一种深度学习方法。《ICDM》，499-508。
Wang et al. (2022) 王等人。 (2022) Wang, J.; Ji, J.; Jiang, Z.; and Sun, L. 2022. Traffic Flow Prediction Based on Spatiotemporal Potential Energy Fields. IEEE Transactions on Knowledge and Data Engineering, 1–14.
王，J。吉，J。蒋Z.；和Sun, L. 2022。基于时空势能场的交通流预测。 IEEE 知识与数据工程汇刊，1-14。
Wang et al. (2021) Wang, J.; Jiang, J.; Jiang, W.; Li, C.; and Zhao, W. X. 2021. LibCity: An open library for traffic prediction. In Proceedings of the 29th International Conference on Advances in Geographic Information Systems, 145–148.
Wang et al. (2019a) Wang, J.; Wu, J.; Wang, Z.; Gao, F.; and Xiong, Z. 2019a. Understanding urban dynamics via context-aware tensor factorization with neighboring regularization. IEEE Transactions on Knowledge Data Engineering, 32(11): 2269–2283.
Wang et al. (2019b) Wang, J.; Wu, N.; Zhao, W. X.; Peng, F.; and Lin, X. 2019b. Empowering A* search algorithms with neural networks for personalized route recommendation. In ACM SIGKDD, 539–547.
Yao et al. (2019) Yao, H.; Tang, X.; Wei, H.; Zheng, G.; and Li, Z. 2019. Revisiting spatial-temporal similarity: A deep learning framework for traffic prediction. In AAAI, volume 33, 5668–5675.
Ye et al. (2022) Ye, X.; Fang, S.; Sun, F.; Zhang, C.; and Xiang, S. 2022. Meta graph transformer: A novel framework for spatial-temporal traffic prediction. Neurocomputing, 491: 544–563.
Yu, Yin, and Zhu (2018) Yu, B.; Yin, H.; and Zhu, Z. 2018. Spatio-temporal graph convolutional networks: a deep learning framework for traffic forecasting. In IJCAI, 3634–3640.
Yuan, Zhou, and Yang (2018) Yuan, Z.; Zhou, X.; and Yang, T. 2018. Hetero-ConvLSTM: A deep learning approach to traffic accident prediction on heterogeneous spatio-temporal data. In ACM SIGKDD, 984–992.
Zhang, Zheng, and Qi (2017) Zhang, J.; Zheng, Y.; and Qi, D. 2017. Deep spatio-temporal residual networks for citywide crowd flows prediction. In AAAI, volume 31, 1655–1661.
Zhang et al. (2020) Zhang, X.; Huang, C.; Xu, Y.; and Xia, L. 2020. Spatial-temporal convolutional graph attention networks for citywide traffic flow forecasting. In Proceedings of the 29th ACM International Conference on Information and Knowledge Management, 1853–1862.
Zhang et al. (2021) Zhang, X.; Huang, C.; Xu, Y.; Xia, L.; Dai, P.; Bo, L.; Zhang, J.; and Zheng, Y. 2021. Traffic flow forecasting with spatial-temporal graph diffusion network. In AAAI, volume 35, 15008–15015.
Zheng et al. (2020) Zheng, C.; Fan, X.; Wang, C.; and Qi, J. 2020. GMAN: A graph multi-attention network for traffic prediction. In AAAI, volume 34, 1234–1241.
Zhu et al. (2021) Zhu, Y.; Xu, Y.; Yu, F.; Liu, Q.; Wu, S.; and Wang, L. 2021. Graph contrastive learning with adaptive augmentation. In Proceedings of the Web Conference 2021, 2069–2080.

Spatio-Temporal Self-Supervised Learning for Traffic Flow Prediction 用于交通流预测的时空自监督学习

Abstract 抽象的

1 Introduction 1简介

2 Preliminaries 2预赛

Definition 1 (Spatial Region).

Definition 2 (Traffic Flow Graph (TFG)).

3 Methodology 3方法论

3.1 Spatio-Temporal Encoder3.1时空编码器

3.2 Adaptive Graph Augmentation on TFG3.2 TFG上的自适应图增强

Region-wise Heterogeneity Measurement.区域异质性测量。

Heterogeneity-guided Data Augmentation.异质性引导的数据增强。

3.3 SSL for Spatial Heterogeneity Modeling3.3SSL用于空间异质性建模

3.4 SSL for Temporal Heterogeneity Modeling3.4SSL用于时间异质性建模