Spatio-Temporal Meta-Graph Learning for Traffic Forecasting
用于交通流量预测的时空元图学习

Renhe Jiang^*1,2, Zhaonan Wang^*2, Jiawei Yong³, Puneet Jeph², Quanjun Chen²,
Renhe Jiang ^*1,2 , Zhaonan Wang ^*2 , Jiawei Yong ³ , Puneet Jeph ² , Quanjun Chen ² ,
Yasumasa Kobayashi³, Xuan Song², Shintaro Fukushima³, Toyotaro Suzumura¹
小林康正 ³ ，宋轩 ² ，福岛慎太郎 ³ ，铃木丰太郎 ¹

Abstract 摘要

Traffic forecasting as a canonical task of multivariate time series forecasting has been a significant research topic in AI community. To address the spatio-temporal heterogeneity and non-stationarity implied in the traffic stream, in this study, we propose Spatio-Temporal Meta-Graph Learning as a novel Graph Structure Learning mechanism on spatio-temporal data. Specifically, we implement this idea into Meta-Graph Convolutional Recurrent Network (MegaCRN) by plugging the Meta-Graph Learner powered by a Meta-Node Bank into GCRN encoder-decoder. We conduct a comprehensive evaluation on two benchmark datasets (i.e., METR-LA and PEMS-BAY) and a new large-scale traffic speed dataset called EXPY-TKY that covers 1843 expressway road links in Tokyo. Our model outperformed the state-of-the-arts on all three datasets. Besides, through a series of qualitative evaluations, we demonstrate that our model can explicitly disentangle the road links and time slots with different patterns and be robustly adaptive to any anomalous traffic situations. Codes and datasets are available at https://github.com/deepkashiwa20/MegaCRN.
交通流量预测作为多变量时间序列预测的一个典型任务，一直是人工智能领域的一个重要研究课题。为了解决交通流中隐含的时空异质性和非平稳性问题，在本研究中，我们提出了时空元图学习方法，作为一种针对时空数据的新型图结构学习机制。具体而言，我们通过将由元节点库驱动的元图学习器插入到图卷积循环网络（GCRN）编码器 - 解码器中，将这一思想应用到元图卷积循环网络（MegaCRN）中。我们在两个基准数据集（即 METR-LA 和 PEMS-BAY）以及一个名为 EXPY-TKY 的新的大规模交通速度数据集上进行了全面评估，该数据集覆盖了东京的 1843 条高速公路路段。我们的模型在所有三个数据集上的表现均优于现有最佳模型。此外，通过一系列定性评估，我们证明了我们的模型能够明确区分具有不同模式的道路路段和时间段，并且能够稳健地适应任何异常交通情况。代码和数据集可在 https://github.com/deepkashiwa20/MegaCRN 上获取。

⁰⁰footnotetext: * Corresponding authors who contributed equally.
* 同等贡献的通讯作者。

Introduction 引言

Spatio-temporal data, streamed by sensor networks, are widely studied in both academia and industry given various real-world applications. Traffic forecasting (Yu, Yin, and Zhu 2018; Li et al. 2018; Zheng et al. 2020; Bai et al. 2020; Lee et al. 2022), as one canonical task, has been receiving increasing attention with rapid developing Graph Convolutional Networks (GCNs) (Defferrard, Bresson, and Vandergheynst 2016; Kipf and Welling 2017; Veličković et al. 2017). This spatio-temporal modeling task can be formulated similarly to multivariate time series (MTS) forecasting (Wu et al. 2020; Cao et al. 2020; Shang, Chen, and Bi 2021), but with extra prior knowledge from the geographic space (e.g. sensor locations, road networks) to imply the dependency among sensor signals. Compared with ordinary MTS, traffic data (e.g. traffic speed and flow) potentially contain spatio-temporal heterogeneity, as traffic condition differs over roads (e.g. local road, highway, interchange) and time (e.g. off-peak and rush hours). Moreover, non-stationarity makes the task even more challenging when X factors, including accident and congestion, present.
时空数据，通过传感器网络流式传输，在学术界和工业界都得到了广泛研究，因为它们在各个现实世界应用中具有多种用途。交通预测（Yu, Yin, 和 Zhu 2018；Li 等人 2018；Zheng 等人 2020；Bai 等人 2020；Lee 等人 2022）作为一项典型任务，随着快速发展的图卷积网络（GCNs）（Defferrard, Bresson, 和 Vandergheynst 2016；Kipf 和 Welling 2017；Veličković 等人 2017）而受到越来越多的关注。这个时空建模任务可以类似于多元时间序列（MTS）预测（Wu 等人 2020；Cao 等人 2020；Shang, Chen, 和 Bi 2021）来表述，但额外地利用地理空间（例如传感器位置、道路网络）的先验知识来暗示传感器信号之间的依赖性。与普通 MTS 相比，交通数据（例如交通速度和流量）可能包含时空异质性，因为交通状况在不同道路上（例如地方道路、高速公路、互通式立交）和时间（例如非高峰时段和高峰时段）中有所不同。此外，非平稳性使得在存在 X 因素（包括事故和拥堵）的情况下，这项任务变得更加具有挑战性。

Refer to caption — Figure 1: Progression of Graph Structure Learning for Spatio-Temporal Modeling
图 1：时空建模中图结构学习的发展过程

For more effective traffic forecasting, the existing works have made tremendous progress by modeling latent spatial correlation among sensors and temporal autocorrelation within time series. Since these two relationships can be naturally represented by graph and sequence respectively, the mainstream models handle them by leveraging GCN-based modules (Diao et al. 2019; Guo et al. 2019; Geng et al. 2019; Zhang et al. 2021) and sequence models, such as Recurrent Neural Networks (RNNs) (Li et al. 2018; Bai et al. 2020; Ye et al. 2021), WaveNet (Wu et al. 2019), Transformer (Zheng et al. 2020; Wang et al. 2020; Xu et al. 2020). Particularly, to perform convolution-like operations on graphs, GCNs require an auxiliary input that characterizes the topology of the underlying spatial dependency. This essential part is defined based on certain metrics in early works, such as inverse-distance Gaussian kernel (Yu, Yin, and Zhu 2018; Li et al. 2018), cosine similarity (Geng et al. 2019).
为了更有效地进行交通预测，现有研究通过对传感器之间的潜在空间相关性以及时间序列内的时间自相关性进行建模，取得了巨大进展。由于这两种关系可以分别自然地用图和序列来表示，主流模型通过利用基于图卷积网络（GCN）的模块（刁等人，2019 年；郭等人，2019 年；耿等人，2019 年；张等人，2021 年）和序列模型来处理它们，例如循环神经网络（RNNs）（李等人，2018 年；白等人，2020 年；叶等人，2021 年）、WaveNet（吴等人，2019 年）、Transformer（郑等人，2020 年；王等人，2020 年；徐等人，2020 年）。特别是，为了在图上执行类似卷积的操作，图卷积网络需要一个辅助输入来表征潜在空间依赖关系的拓扑结构。在早期的研究中，这一关键部分是基于某些度量来定义的，例如逆距离高斯核（于、尹和朱，2018 年；李等人，2018 年）、余弦相似度（耿等人，2019 年）。

However, this pre-defined graph not only relies on empirical laws (e.g. Tobler’s first law of geography) which does not necessarily indicate an optimal solution, but ignores the dynamic nature of traffic networks. This twofold limitation has stimulated explorations in two lines of research. The first one aims to find the optimal graph structure that facilitates the forecasting task. GW-Net (Wu et al. 2019) pioneers along this direction by treating the adjacency matrix as free variables (i.e. parameterized node embedding $E$ ) to train, which generates a so-called adaptive graph (in Figure 1). Models including MTGNN (Wu et al. 2020), AGCRN (Bai et al. 2020), GTS (Shang, Chen, and Bi 2021) fall into this category, integrating MTS and traffic forecasting with Graph Structure Learning (GSL) (Zhu et al. 2021). In the other line of research, attempts have been made to tackle network dynamics using matrix or tensor decomposition (Diao et al. 2019; Ye et al. 2021) and attention mechanisms (Guo et al. 2019; Zheng et al. 2020). Motivated by GSL, recent models like SLCNN (Zhang et al. 2020), StemGNN (Cao et al. 2020) further try to learn a time-variant graph structure from observational data. The spatio-temporal graph (STG) derived in this way is essentially input-conditioned, in which parameters denoted by $W$ project observations into node embeddings (termed as momentary graph in Figure 1).
然而，这种预定义图不仅依赖于经验法则（例如托布勒地理第一定律），而这些法则并不一定能给出最优解，而且还忽略了交通网络的动态特性。这种双重局限性激发了两个研究方向的探索。第一个方向旨在找到有助于预测任务的最优图结构。GW-Net（吴等人，2019 年）在这一方向上率先开展研究，将邻接矩阵视为自由变量（即参数化节点嵌入 $E$ ）进行训练，从而生成了所谓的自适应图（如图 1 所示）。包括 MTGNN（吴等人，2020 年）、AGCRN（白等人，2020 年）、GTS（商、陈和毕，2021 年）在内的模型都属于这一类，它们将多变量时间序列（MTS）和交通预测与图结构学习（GSL）（朱等人，2021 年）相结合。在另一个研究方向上，人们尝试使用矩阵或张量分解（刁等人，2019 年；叶等人，2021 年）以及注意力机制（郭等人，2019 年；郑等人，2020 年）来处理网络动态性问题。受图结构学习的启发，最近的一些模型，如 SLCNN（张等人，2020 年）、StemGNN（曹等人，2020 年），进一步尝试从观测数据中学习时变图结构。通过这种方式得到的时空图（STG）本质上是受输入条件制约的，其中用 $W$ 表示的参数将观测值投影到节点嵌入中（在图 1 中称为瞬时图）。

Thus far, while spatio-temporal regularities have been studied systematically, spatio-temporal heterogeneity and non-stationarity have not been tackled properly. Although the heterogeneity issue can be alleviated to some extent by applying attentions over the space and time (Guo et al. 2019; Zheng et al. 2020), sensor signals of different natures are still left entangled, not to mention that incidents are simply untreated. Therefore, we are motivated to propose a novel spatio-temporal meta-graph learning framework. The term meta-graph is coined to describe the generation of node embeddings (similar in adaptive and momentary) for GSL. Specifically, our STG learning consists of two steps: (1) querying node-level prototypes from a Meta-Node Bank; (2) reconstructing node embeddings with Hyper-Network (Ha, Dai, and Le 2016). This localized memorizing capability empowers our modularized Meta-Graph Learner to essentially distinguish traffic patterns on different roads over time, which is even generalizable to incident situations. Our contributions are highlighted as follows:
到目前为止，虽然时空规律性已得到系统研究，但时空异质性和非平稳性尚未得到妥善解决。尽管通过在空间和时间上应用注意力机制（郭等人，2019 年；郑等人，2020 年）可以在一定程度上缓解异质性问题，但不同性质的传感器信号仍然相互纠缠，更不用说事故情况根本未得到处理。因此，我们有动力提出一种新颖的时空元图学习框架。“元图”这一术语是为描述用于图结构学习（GSL）的节点嵌入（具有自适应和瞬时相似性）的生成而创造的。具体而言，我们的时空图（STG）学习包括两个步骤：（1）从元节点库中查询节点级原型；（2）用超网络（Ha、Dai 和 Le，2016 年）重构节点嵌入。这种局部记忆能力使我们模块化的元图学习器能够从本质上区分不同道路随时间变化的交通模式，甚至可以推广到事故情况。我们的贡献主要体现在以下几个方面：

•

We propose a novel Meta-Graph Learner for spatio-temporal graph (STG) learning, to explicitly disentangles the heterogeneity in space and time.
我们提出了一种新颖的用于时空图（STG）学习的元图学习器，以明确地分离空间和时间上的异质性。
•

We present a generic Meta-Graph Convolutional Recurrent Network (MegaCRN), which simply relies on observational data to be robust and adaptive to any traffic situation, from normal to non-stationary.
我们提出了一个通用的元图卷积循环网络（MegaCRN），它仅依赖观测数据，就能对任何交通状况（从正常状况到非平稳状况）保持稳健性和适应性。
•

We validate MegaCRN quantitatively and qualitatively over a group of state-of-the-art models on two benchmarks (METR-LA and PEMS-BAY) and our newly published dataset (EXPY-TKY) that has larger scale and more complex incident situations.
我们在两个基准数据集（METR-LA 和 PEMS-BAY）以及我们新发布的规模更大、事故情况更复杂的数据集（EXPY-TKY）上，对 MegaCRN 与一组最先进的模型进行了定量和定性验证。

Related Work 相关工作

Traffic Forecasting. Traffic forecasting has been taken as a significant research problem in transportation engineering (Huang et al. 2014; Lv et al. 2014; Ma et al. 2015b). As a canonical case of multivariate time series forecasting (Lai et al. 2018), it also has drawn a lot of attention from machine learning researchers. At the very beginning, statistical models including autoregressive model (AR) (Hamilton and Susmel 1994), vector autoregression (VAR) (Stock and Watson 2001), autoregressive integrated moving average (ARIMA) (Pan, Demiryurek, and Shahabi 2012) were applied. Then deep learning methods come to dominate the time series prediction by automatically extracting the non-linear complex features from the data. First, LSTM (Hochreiter and Schmidhuber 1997) and GRU (Chung et al. 2014) demonstrated superior performance in traffic modeling (Ma et al. 2015a; Lv et al. 2018; Li et al. 2018; Zhao et al. 2019; Wang et al. 2020; Bai et al. 2020; Ye et al. 2021; Shang, Chen, and Bi 2021; Lee et al. 2022) as well as multivariate time series forecasting (Lai et al. 2018; Shih, Sun, and Lee 2019). Second, instead of the RNNs, Temporal Convolution (Yu and Koltun 2016) and WaveNet (Oord et al. 2016) with long receptive field were also utilized as the core component in (Yu, Yin, and Zhu 2018; Wu et al. 2019, 2020; Lu et al. 2020; Deng et al. 2021) for temporal modeling. Third, motivated by (Vaswani et al. 2017), a series of traffic transformers (Zheng et al. 2020; Xu et al. 2020) and time series transformers (Li et al. 2019; Zhou et al. 2021; Xu et al. 2021) were proposed to do the long sequence time series modeling. Due to the space limitation, we refer you to the recent surveys (Jiang et al. 2021; Jiang and Luo 2021; Li et al. 2021) on traffic forecasting with deep learning.
交通流量预测。交通流量预测在交通工程领域一直是一个重要的研究问题（Huang 等人，2014；Lv 等人，2014；Ma 等人，2015b）。作为多变量时间序列预测的一个典型案例（Lai 等人，2018），它也引起了机器学习研究人员的广泛关注。起初，统计模型被应用于交通流量预测，包括自回归模型（AR）（Hamilton 和 Susmel，1994）、向量自回归（VAR）（Stock 和 Watson，2001）、自回归综合移动平均模型（ARIMA）（Pan、Demiryurek 和 Shahabi，2012）。随后，深度学习方法通过自动从数据中提取非线性复杂特征，在时间序列预测中占据了主导地位。首先，长短期记忆网络（LSTM）（Hochreiter 和 Schmidhuber，1997）和门控循环单元（GRU）（Chung 等人，2014）在交通建模（Ma 等人，2015a；Lv 等人，2018；Li 等人，2018；Zhao 等人，2019；Wang 等人，2020；Bai 等人，2020；Ye 等人，2021；Shang、Chen 和 Bi，2021；Lee 等人，2022）以及多变量时间序列预测（Lai 等人，2018；Shih、Sun 和 Lee，2019）中表现出了卓越的性能。其次，在（Yu, Yin, and Zhu 2018; Wu et al. 2019, 2020; Lu et al. 2020; Deng et al. 2021）中，具有长感受野的时间卷积（Yu and Koltun 2016）和 WaveNet（Oord et al. 2016）也被用作核心组件来进行时间建模，而非循环神经网络。第三，受（Vaswani et al. 2017）的启发，一系列交通变换器（Zheng et al. 2020; Xu et al. 2020）和时间序列变换器（Li et al. 2019; Zhou et al. 2021; Xu et al. 2021）被提出用于长序列时间序列建模。由于篇幅限制，我们建议您参考近期关于深度学习交通预测的综述（Jiang et al. 2021; Jiang and Luo 2021; Li et al. 2021）。

Graph Structure Learning. Besides the sequence modeling, research efforts have been made to capture the correlations among variables (road links in traffic data) via generic graph structures (Kipf et al. 2018). Early methods either rely on the natural topology of the road network (i.e., binary adjacency graph) or pre-defined graphs in certain metrics (e.g., Euclidean distance) (Li et al. 2018; Yu, Yin, and Zhu 2018). Then, GW-Net (Wu et al. 2019) first proposed to use two learnable embedding matrices to automatically build an adaptive graph based on the input traffic data. Following GW-Net (Wu et al. 2019), MTGNN (Wu et al. 2020) and GTS (Shang, Chen, and Bi 2021) further proposed to learn a parameterized k-degree discrete graph, while AGCRN (Bai et al. 2020) introduced node-specific convolution filters according to the node embedding and CCRNN (Ye et al. 2021) learned multiple adaptive graphs for multi-layer graph convolution. StemGNN (Cao et al. 2020) took the self-attention (Vaswani et al. 2017) learned from the input as the latent graph. Our work distinguishes itself from these methods by augmenting the spatio-temporal graph learning with memory network (Meta-Node Bank) to discover latent node-level prototypes and construct memory-tailored node embedding.
图结构学习。除了序列建模外，研究人员还致力于通过通用图结构来捕捉变量（交通数据中的道路链路）之间的相关性（Kipf 等人，2018 年）。早期方法要么依赖于道路网络的自然拓扑结构（即二元邻接图），要么依赖于某些度量下的预定义图（例如欧几里得距离）（Li 等人，2018 年；Yu、Yin 和 Zhu，2018 年）。随后，GW-Net（Wu 等人，2019 年）首次提出使用两个可学习的嵌入矩阵，根据输入的交通数据自动构建自适应图。继 GW-Net（Wu 等人，2019 年）之后，MTGNN（Wu 等人，2020 年）和 GTS（Shang、Chen 和 Bi，2021 年）进一步提出学习参数化的 k 度离散图，而 AGCRN（Bai 等人，2020 年）根据节点嵌入引入了特定于节点的卷积滤波器，CCRNN（Ye 等人，2021 年）则为多层图卷积学习了多个自适应图。StemGNN（Cao 等人，2020 年）将从输入中学习到的自注意力（Vaswani 等人，2017 年）作为潜在图。我们的工作通过使用记忆网络（元节点库）增强时空图学习，以发现潜在的节点级原型并构建定制化记忆的节点嵌入，从而与这些方法区分开来。

Problem Definition 问题定义

Without loss of generality, we formulate our problem as a multi-step-to-multi-step forecasting task as follows:
不失一般性，我们将我们的问题表述为如下的多步到多步预测任务：

[X_{t-(\alpha-1)},...,X_{t}]\xrightarrow[\theta]{\quad\mathbb{F}(\cdot)\quad}[X_{t+1},...,X_{t+\beta}]

(1)

where $X_{i}$ $\in$ $\mathbb{R}^{N\times C}$ , $N$ is the number of spatial units (i.e., nodes, grids, regions, road links, etc.), and $C$ is the number of the information channel. In our case, $C$ is equal to 1 as we only forecast the traffic speed; the spatial unit is road link. To be simple, we omit $C$ in the rest of our paper. Given previous $\alpha$ steps of observations [ $X_{t-(\alpha-1)}$ ,…, $X_{t-1}$ , $X_{t}$ ], we aim to infer the next $\beta$ horizons [ $X_{t+1}$ , $X_{t+2}$ ,…, $X_{t+\beta}$ ] by training a forecasting model $\mathbb{F}$ with parameter $\theta$ .
其中 $X_{i}$ $\in$ $\mathbb{R}^{N\times C}$ ， $N$ 是空间单元（即节点、网格、区域、道路路段等）的数量， $C$ 是信息通道的数量。在我们的案例中，由于我们仅预测交通速度，所以 $C$ 等于 1；空间单元是道路路段。为简便起见，在本文的其余部分我们省略 $C$ 。给定之前 $\alpha$ 步的观测值 [ $X_{t-(\alpha-1)}$ ，…， $X_{t-1}$ ， $X_{t}$ ]，我们旨在通过训练带有参数 $\theta$ 的预测模型 $\mathbb{F}$ 来推断接下来的 $\beta$ 个时间步长 [ $X_{t+1}$ ， $X_{t+2}$ ，…， $X_{t+\beta}$ ]。

Methodology 方法论

In this section, we present a generic framework for spatio-temporal meta-graph learning, namely Meta-Graph Convolutional Recurrent Network (MegaCRN), built upon Graph Convolutional Recurrent Unit (GCRU) Encoder-Decoder and plugin Meta-Graph Learner, as illustrated in Figure 2.
在本节中，我们提出了一个用于时空元图学习的通用框架，即元图卷积循环网络（MegaCRN），该框架基于图卷积循环单元（GCRU）编码器 - 解码器和插件式元图学习器构建，如图 2 所示。

Preliminaries 预备知识

Graph Convolutional Recurrent Unit. Motivated by the success of Graph Convolutional Networks (GCNs) as a class in representation learning on graph-structured data (e.g. social and road networks), a recent line of research (Li et al. 2018; Bai et al. 2020; Shang, Chen, and Bi 2021; Ye et al. 2021) has explored the possibility of injecting graph convolution operation into recurrent cell (e.g. LSTM). The derived Graph Convolutional Recurrent Unit (GCRU) can thereby simultaneously capture spatial dependency, represented by an input graph topology, and temporal dependency in a sequential manner. Without loss of generality, we take the widely adopted definitions of graph convolution operation and Gated Recurrent Unit (GRU) to denote GCRU, as the basic unit for spatio-temporal modeling:
图卷积循环单元。受图卷积网络（GCNs）在图结构数据（如社交网络和道路网络）表示学习方面取得成功的启发，近期的一系列研究（Li 等人，2018 年；Bai 等人，2020 年；Shang、Chen 和 Bi，2021 年；Ye 等人，2021 年）探索了将图卷积操作注入到循环单元（如长短期记忆网络 LSTM）中的可能性。由此衍生出的图卷积循环单元（GCRU）能够同时捕捉由输入图拓扑结构表示的空间依赖性以及按顺序方式捕捉时间依赖性。不失一般性，我们采用广泛应用的图卷积操作和门控循环单元（GRU）的定义来表示 GCRU，将其作为时空建模的基本单元：

H=\sigma(X\star_{\mathcal{G}}\Theta)=\sigma(\mathop{\sum}_{k=0}^{K}\mathcal{\tilde{P}}^{k}X\textit{ }W_{k})

(2)

\begin{cases}\begin{aligned} u_{t}&=sigmoid([X_{t},\textit{ }H_{t-1}]\star_{\mathcal{G}}\Theta_{u}+b_{u})\\ r_{t}&=sigmoid([X_{t},\textit{ }H_{t-1}]\star_{\mathcal{G}}\Theta_{r}+b_{r})\\ C_{t}&=tanh([X_{t},\textit{ }(r_{t}\odot H_{t-1})]\star_{\mathcal{G}}\Theta_{C}+b_{C})\\ H_{t}&=u_{t}\odot H_{t-1}+(1-u_{t})\odot C_{t}\end{aligned}\end{cases}

(3)

In Equation (2), $X\in\mathbb{R}^{N\times C}$ and $H\in\mathbb{R}^{N\times h}$ denote the input and output of graph convolution operation ( $\star_{\mathcal{G}}$ ), in which $\Theta$ or $W_{K}\in\mathbb{R}^{K\times C\times h}$ are the kernel parameters approximated with the Chebyshev polynomials to the order of $K$ (Defferrard, Bresson, and Vandergheynst 2016) and $\sigma$ is an activation function. In Equation (3), subscripts u, r, and C denote update gate, reset gate, and candidate state in a GCRU cell, in which $\Theta_{\{u,r,C\}}\in\mathbb{R}^{K\times(C+h)\times h}$ denote the gate parameters. Besides observation $X_{t}$ , GCRU requires an auxiliary input $\mathcal{P}\in\mathbb{R}^{N\times N}$ for the topology of graph $\mathcal{G}$ .
在公式（2）中， $X\in\mathbb{R}^{N\times C}$ 和 $H\in\mathbb{R}^{N\times h}$ 分别表示图卷积运算（ $\star_{\mathcal{G}}$ ）的输入和输出，其中 $\Theta$ 或 $W_{K}\in\mathbb{R}^{K\times C\times h}$ 是用切比雪夫多项式近似到 $K$ 阶的核参数（Defferrard, Bresson, and Vandergheynst 2016）， $\sigma$ 是激活函数。在公式（3）中，下标 u、r 和 C 分别表示 GCRU 单元中的更新门、重置门和候选状态，其中 $\Theta_{\{u,r,C\}}\in\mathbb{R}^{K\times(C+h)\times h}$ 表示门参数。除了观测值 $X_{t}$ 外，GCRU 还需要一个辅助输入 $\mathcal{P}\in\mathbb{R}^{N\times N}$ 来表示图 $\mathcal{G}$ 的拓扑结构。

Graph Structure Learning. Matrix $\mathcal{P}$ is conventionally defined based on certain metrics (e.g. inverse distance, cosine similarity) and empirical laws (Yu, Yin, and Zhu 2018; Li et al. 2018; Geng et al. 2019). However, choice of metric can be arbitrary and suboptimal, which motivates a line of research (Wu et al. 2020; Zhang et al. 2020; Shang, Chen, and Bi 2021; Bai et al. 2020; Ye et al. 2021) to integrate Graph Structure Learning (GSL) into spatio-temporal modeling for simultaneous optimization. Here we adopt the canonical formulation (Wu et al. 2019; Bai et al. 2020; Wang et al. 2022) for spatio-temporal graph learning, namely adaptive graph (in Figure 1), denoted by:
图结构学习。矩阵 $\mathcal{P}$ 通常基于某些度量（例如，逆距离、余弦相似度）和经验法则来定义（Yu, Yin, and Zhu 2018; Li et al. 2018; Geng et al. 2019）。然而，度量的选择可能是任意的且并非最优，这促使了一系列研究（Wu et al. 2020; Zhang et al. 2020; Shang, Chen, and Bi 2021; Bai et al. 2020; Ye et al. 2021）将图结构学习（GSL）整合到时空建模中以进行同步优化。在这里，我们采用时空图学习的经典公式（Wu et al. 2019; Bai et al. 2020; Wang et al. 2022），即自适应图（图 1 所示），表示为：

\tilde{\mathcal{P}}=softmax(relu(E\textit{ }E^{\top}))

(4)

where $\tilde{\mathcal{P}}$ is derived by random walk normalizing the non-negative part of matrix product of trainable node embedding $E\in\mathbb{R}^{N\times e}$ to its transpose. The other GSL strategy, momentary graph (Zhang et al. 2020) (in Figure 1), can be defined in a similar fashion with input signal $X_{t}$ or hidden state $H_{t}$ . Taking the latter as an example:
其中 $\tilde{\mathcal{P}}$ 是通过对可训练节点嵌入 $E\in\mathbb{R}^{N\times e}$ 与它的转置的矩阵乘积的非负部分进行随机游走归一化得到的。另一种图结构学习（GSL）策略，即瞬时图（Zhang et al. 2020）（见图 1），可以用输入信号 $X_{t}$ 或隐藏状态 $H_{t}$ 以类似的方式定义。以后者为例：

\tilde{\mathcal{P}}_{t}=softmax(relu((H_{t}*W)\textit{ }(H_{t}*W)^{\top}))

(5)

where parameter matrix $W\in\mathbb{R}^{h\times e}$ essentially projects $H_{t}$ to another embedding space. Notably, momentary graph has other variants, such as replacing the projection with self-attention operation (Cao et al. 2020), but they still follow Equation (5) as a general form.
其中参数矩阵 $W\in\mathbb{R}^{h\times e}$ 本质上是将 $H_{t}$ 投影到另一个嵌入空间。值得注意的是，瞬时图还有其他变体，例如用自注意力操作替换投影（曹等人，2020 年），但它们仍然遵循公式（5）作为一般形式。

Spatio-Temporal Meta-Graph Learner
时空元图学习器

Here we formally describe a new spatio-temporal graph (STG) learning module. The term meta-graph is coined to represent the generation of node embedding for graph structure learning, which is different from its definition in heterogeneous information networks (HIN) (Zhao et al. 2017; Ding et al. 2021). According to the definition in Equation (4) and (5), adaptive graph relies on parameterized node embedding $E$ alone, while momentary graph is in fact input-conditioned (either projecting $X_{t}$ or $H_{t}$ with parameter $W$ ). Apparently, this generation process determines the properties of the derived graphs, as the former is time-invariant but the latter is sensitive to input signals. This motivates us to further enhance the node embeddings for STG generation, as the real-world networks are more complex, manifesting spatio-temporal heterogeneity and non-stationarity.
在此，我们正式描述一个新的时空图（STG）学习模块。我们创造了“元图”这一术语，用于表示为图结构学习生成节点嵌入，这与它在异构信息网络（HIN）中的定义不同（Zhao 等人，2017 年；Ding 等人，2021 年）。根据公式（4）和（5）中的定义，自适应图仅依赖于参数化节点嵌入 $E$ ，而瞬时图实际上是受输入条件制约的（要么用参数 $W$ 对 $X_{t}$ 进行投影，要么对 $H_{t}$ 进行投影）。显然，这个生成过程决定了派生图的性质，因为前者是时不变的，而后者对输入信号敏感。这促使我们进一步增强用于 STG 生成的节点嵌入，因为现实世界中的网络更为复杂，表现出时空异质性和非平稳性。

We are inspired by a line of research in memory networks, which aims to memorize typical features in seen samples for further pattern matching. This technique has been largely employed on computer vision tasks, such as few-shot learning (Vinyals et al. 2016; Santoro et al. 2016) and anomaly detection (Gong et al. 2019; Park, Noh, and Ham 2020). In our case, we would like inject the memorizing and distinguishing capabilities into spatio-temporal graph learning. We thereby leverage the idea of memory networks and build a Meta-Node Bank $\Phi\in\mathbb{R}^{\phi\times d}$ . Here $\phi$ and $d$ denote the number of memory items and the dimension of each item, respectively. We further define the main functions of this memory bank as follows:
我们受到记忆网络领域一系列研究的启发，该领域旨在记忆已见样本中的典型特征，以便进行进一步的模式匹配。这种技术已广泛应用于计算机视觉任务，例如小样本学习（Vinyals 等人，2016 年；Santoro 等人，2016 年）和异常检测（Gong 等人，2019 年；Park、Noh 和 Ham，2020 年）。在我们的研究中，我们希望将记忆和区分能力注入到时空图学习中。因此，我们借鉴记忆网络的思想，构建了一个元节点库 $\Phi\in\mathbb{R}^{\phi\times d}$ 。这里 $\phi$ 和 $d$ 分别表示记忆项的数量和每个记忆项的维度。我们进一步将该记忆库的主要功能定义如下：

Q_{t}^{(i)}=H_{t}^{(i)}*W_{Q}+b_{Q}

(6)

\begin{cases}\begin{aligned} a_{j}^{(i)}&=\frac{\exp{(Q_{t}^{(i)}*\Phi^{\top}[j])}}{\sum_{j=1}^{\phi}\exp{(Q_{t}^{(i)}*\Phi^{\top}[j])}}\\ M_{t}^{(i)}&=\sum_{j=1}^{\phi}a_{j}^{(i)}*\Phi[j]\end{aligned}\end{cases}

(7)

where we use superscript $(i)$ as row index. For instance, $H_{t}^{(i)}\in\mathbb{R}^{h}$ represents $i$ -th node vector in $H_{t}\in\mathbb{R}^{N\times h}$ . Equation (6) denotes a fully connected (FC parameter $W_{Q}\in\mathbb{R}^{h\times d}$ ) layer to project hidden state $H_{t}^{(i)}$ to a localized query $Q_{t}^{(i)}\in\mathbb{R}^{d}$ . Equation (7) denotes the memory reading operation by matching $Q_{t}^{(i)}$ with each memory $\Phi[j]$ to calculate a scalar $a_{j}$ , which physically represents the similarity between vector $Q_{t}^{(i)}$ and memory item $\Phi[j]$ . A meta-node vector $M_{t}^{(i)}\in\mathbb{R}^{d}$ can be further recovered as a combination of memory items. Here a common practice is to utilize the reconstructed representation $M_{t}\in\mathbb{R}^{N\times d}$ to augment the encoded hidden representation $H_{t}$ , denoted by $H^{\prime}_{t}=[H_{t},M_{t}]\in\mathbb{R}^{N\times(h+d)}$ ( $[\cdot]$ denotes a concatenation operation) (Yao et al. 2019; Lee et al. 2022). We further leverage a Hyper-Network (Ha, Dai, and Le 2016) that essentially puts generation of GSL node embeddings conditioned on Meta-Node Bank. This memory-augmented node embedding generation can be formulated as:
其中我们使用上标 $(i)$ 作为行索引。例如， $H_{t}^{(i)}\in\mathbb{R}^{h}$ 表示 $H_{t}\in\mathbb{R}^{N\times h}$ 中的第 $i$ 个节点向量。公式（6）表示一个全连接（FC 参数 $W_{Q}\in\mathbb{R}^{h\times d}$ ）层，用于将隐藏状态 $H_{t}^{(i)}$ 投影到局部查询 $Q_{t}^{(i)}\in\mathbb{R}^{d}$ 。公式（7）表示通过将 $Q_{t}^{(i)}$ 与每个记忆单元 $\Phi[j]$ 进行匹配来计算标量 $a_{j}$ 的记忆读取操作，该标量在物理上表示向量 $Q_{t}^{(i)}$ 和记忆项 $\Phi[j]$ 之间的相似度。元节点向量 $M_{t}^{(i)}\in\mathbb{R}^{d}$ 可以进一步作为记忆项的组合进行恢复。这里的常见做法是利用重构表示 $M_{t}\in\mathbb{R}^{N\times d}$ 来增强编码后的隐藏表示 $H_{t}$ ，表示为 $H^{\prime}_{t}=[H_{t},M_{t}]\in\mathbb{R}^{N\times(h+d)}$ （ $[\cdot]$ 表示拼接操作）（Yao 等人，2019；Lee 等人，2022）。我们进一步利用超网络（Ha、Dai 和 Le，2016），该网络本质上是在元节点库的条件下生成 GSL 节点嵌入。这种记忆增强的节点嵌入生成可以表述为：

\begin{cases}\begin{aligned} E^{\prime}&=\textit{NN}_{H}(\Phi)\\ \tilde{\mathcal{P}^{\prime}}&=softmax(relu(E^{{}^{\prime}}\textit{ }E^{{}^{\prime}\top}))\end{aligned}\end{cases}

(8)

where $\textit{NN}_{H}$ denotes a Hyper-Network. Without loss of generality, we implement it with one FC layer (parameter $W_{E}\in\mathbb{R}^{d\times e}$ ). Then, meta-graph $\tilde{\mathcal{P}^{\prime}}$ can be constructed, as an alternative to adaptive and momentary graphs (defined in Equation (4) an (5)) to feed back to GCRU encoder-decoder.
其中 $\textit{NN}_{H}$ 表示一个超网络。不失一般性，我们用一个全连接层（参数 $W_{E}\in\mathbb{R}^{d\times e}$ ）来实现它。然后，可以构建元图 $\tilde{\mathcal{P}^{\prime}}$ ，作为自适应图和瞬时图（在公式（4）和（5）中定义）的替代，反馈给 GCRU 编码器 - 解码器。

Meta-Graph Convolutional Recurrent Network
元图卷积循环网络

With Meta-Graph Learner as described, we present the proposed Meta-Graph Convolutional Recurrent Network (MegaCRN) as a generic framework for spatial-temporal modeling. MegaCRN learns node-level prototypes of traffic patterns in Meta-Node Bank for updating the auxiliary graph adaptively based on the observed situation. To further enhance its distinguishing power for diverse scenarios on different roads over time, we regulate the memory parameters with two constraints (Gong et al. 2019; Park, Noh, and Ham 2020), including a contrastive loss $\mathcal{L}_{1}$ and a consistency loss $\mathcal{L}_{1}$ , denoted by:
如前文所述的元图学习器，我们提出了元图卷积循环网络（MegaCRN）作为时空建模的通用框架。MegaCRN 在元节点库中学习交通模式的节点级原型，以便根据观测到的情况自适应地更新辅助图。为了进一步增强其对不同道路随时间变化的多样场景的区分能力，我们用两个约束条件（龚等人，2019 年；朴、诺和哈姆，2020 年）来调节记忆参数，包括对比损失 $\mathcal{L}_{1}$ 和一致性损失 $\mathcal{L}_{1}$ ，表示为：

	$\displaystyle\mathcal{L}_{1}$	$\displaystyle=\mathop{\sum}_{t}^{T}\mathop{\sum}_{i}^{N}\max\{\|\|Q_{t}^{(i)}-\Phi[p]\|\|^{2}-\|\|Q_{t}^{(i)}-\Phi[n]\|\|^{2}+\lambda,0\}$		(9)
	$\displaystyle\mathcal{L}_{2}$	$\displaystyle=\mathop{\sum}_{t}^{T}\mathop{\sum}_{i}^{N}\|\|Q_{t}^{(i)}-\Phi[p]\|\|^{2}$		(9)

where $T$ denotes the total number of sequences (i.e. samples) in the training set and $p,n$ denote the top two indices of memory items by ranking $a_{j}^{(i)}$ in Equation 7 given localized query $Q_{t}^{(i)}$ . By implementing these two constraints, we treat $Q_{t}^{(i)}$ as anchor, its most similar prototype $\Phi[p]$ as positive sample, and the second similar prototype $\Phi[n]$ as negative sample ( $\lambda$ denotes the margin between the positive and negative pairs). Here the idea is to keep memory items as compact as possible, at the same time as dissimilar as possible. These two constraints guide memory $\Phi$ to distinguish different spatio-temporal patterns on node-level. In practice, we find adding them into the objective criterion (i.e. MAE) facilitates the convergence of training (with balancing factors $\kappa_{1},\kappa_{2}$ ):
其中 $T$ 表示训练集中序列（即样本）的总数， $p,n$ 表示在给定局部查询 $Q_{t}^{(i)}$ 的情况下，通过对式（7）中的 $a_{j}^{(i)}$ 进行排序得到的记忆项的前两个索引。通过实施这两个约束条件，我们将 $Q_{t}^{(i)}$ 视为锚点，将其最相似的原型 $\Phi[p]$ 视为正样本，将第二相似的原型 $\Phi[n]$ 视为负样本（ $\lambda$ 表示正负样本对之间的间隔）。这里的思路是让记忆项尽可能紧凑，同时又尽可能不同。这两个约束条件引导记忆 $\Phi$ 在节点级别上区分不同的时空模式。在实践中，我们发现将它们添加到目标准则（即平均绝对误差）中有助于训练的收敛（带有平衡因子 $\kappa_{1},\kappa_{2}$ ）：

\mathcal{L}_{task}=\mathop{\sum}_{t}^{T}\mathop{\sum}_{\rho}^{\beta}|\hat{X}_{t+\rho}-X_{t+\rho}|+\kappa_{1}\mathcal{L}_{1}+\kappa_{2}\mathcal{L}_{2}

(10)

Experiment 实验

Table 1: Summary of Datasets
表 1：数据集概述

Dataset 数据集	METR-LA	PEMS-BAY	EXPY-TKY
Start Time 开始时间	2012/3/1	2017/1/1	2021/10/1
End Time 结束时间	2012/6/30	2017/5/31	2021/12/31
Time Interval 时间间隔	5 minutes 5 分钟	5 minutes 5 分钟	10 minutes 10 分钟
#Timesteps	34,272	52,116	13,248
#Spatial Units #空间单元	207 sensors 207 个传感器	325 sensors 325 个传感器	1,843 road links 1,843 条道路路段

Table 2: Forecasting Performance
表 2：预测性能

METR-LA	15min / horizon 3 15 分钟 / 预测时长 3			30min / horizon 6 30 分钟 / 预测时长 6			60min / horizon 12 60 分钟 / 预测时长 12
METR-LA	MAE	RMSE	MAPE	MAE	RMSE	MAPE	MAE	RMSE	MAPE
HA(Li et al. 2018)	4.16	7.80	13.00%	4.16	7.80	13.00%	4.16	7.80	13.00%
STGCN(Yu, Yin, and Zhu 2018)	2.88	5.74	7.62%	3.47	7.24	9.57%	4.59	9.40	12.70%
DCRNN(Li et al. 2018)	2.77	5.38	7.30%	3.15	6.45	8.80%	3.60	7.59	10.50%
GW-Net(Wu et al. 2019) GW-Net（Wu 等人，2019 年）	2.69	5.15	6.90%	3.07	6.22	8.37%	3.53	7.37	10.01%
STTN(Xu et al. 2020) STTN（Xu 等人，2020 年）	2.79	5.48	7.19%	3.16	6.50	8.53%	3.60	7.60	10.16%
GMAN(Zheng et al. 2020) $\star$ GMAN（郑等人，2020 年） $\star$	2.80	5.55	7.41%	3.12	6.49	8.73%	3.44	7.35	10.07%
MTGNN(Wu et al. 2020) MTGNN（吴等人，2020 年）	2.69	5.18	6.86%	3.05	6.17	8.19%	3.49	7.23	9.87%
StemGNN(Cao et al. 2020) $\dagger$ StemGNN（曹等人，2020 年） $\dagger$	2.56	5.06	6.46%	3.01	6.03	8.23%	3.43	7.23	9.85%
AGCRN(Bai et al. 2020) AGCRN（白等人，2020 年）	2.86	5.55	7.55%	3.25	6.57	8.99%	3.68	7.56	10.46%
CCRNN(Ye et al. 2021)	2.85	5.54	7.50%	3.24	6.54	8.90%	3.73	7.65	10.59%
GTS(Shang, Chen, and Bi 2021) $\star$	2.65	5.20	6.80%	3.05	6.22	8.28%	3.47	7.29	9.83%
PM-MemNet(Lee et al. 2022) PM-MemNet（Lee 等人，2022 年）	2.65	5.29	7.01%	3.03	6.29	8.42%	3.46	7.29	9.97%
MegaCRN (Ours) MegaCRN（我们的方法）	2.52	4.94	6.44%	2.93	6.06	7.96%	3.38	7.23	9.72%
PEMS-BAY	15min / horizon 3 15 分钟 / 预测时长 3			30min / horizon 6 30 分钟 / 预测时长 6			60min / horizon 12 60 分钟 / 预测时长 12
PEMS-BAY	MAE	RMSE	MAPE	MAE	RMSE	MAPE	MAE	RMSE	MAPE
HA(Li et al. 2018)	2.88	5.59	6.80%	2.88	5.59	6.80%	2.88	5.59	6.80%
STGCN(Yu, Yin, and Zhu 2018)	1.36	2.96	2.90%	1.81	4.27	4.17%	2.49	5.69	5.79%
DCRNN(Li et al. 2018)	1.38	2.95	2.90%	1.74	3.97	3.90%	2.07	4.74	4.90%
GW-Net(Wu et al. 2019) GW-Net（Wu 等人，2019 年）	1.30	2.74	2.73%	1.63	3.70	3.67%	1.95	4.52	4.63%
STTN(Xu et al. 2020) STTN（Xu 等人，2020 年）	1.36	2.87	2.89%	1.67	3.79	3.78%	1.95	4.50	4.58%
GMAN(Zheng et al. 2020) $\star$ GMAN（郑等人，2020 年） $\star$	1.35	2.90	2.87%	1.65	3.82	3.74%	1.92	4.49	4.52%
MTGNN(Wu et al. 2020) MTGNN（吴等人，2020 年）	1.32	2.79	2.77%	1.65	3.74	3.69%	1.94	4.49	4.53%
StemGNN(Cao et al. 2020) $\dagger$ StemGNN（曹等人，2020 年） $\dagger$	1.23	2.48	2.63%	N/A from (Cao et al. 2020)			N/A from (Cao et al. 2020)
AGCRN(Bai et al. 2020) AGCRN（白等人，2020 年）	1.36	2.88	2.93%	1.69	3.87	3.86%	1.98	4.59	4.63%
CCRNN(Ye et al. 2021)	1.38	2.90	2.90%	1.74	3.87	3.90%	2.07	4.65	4.87%
GTS(Shang, Chen, and Bi 2021) $\star$	1.34	2.84	2.83%	1.67	3.83	3.79%	1.98	4.56	4.59%
PM-MemNet(Lee et al. 2022) $\star$ PM-MemNet（Lee 等人，2022 年） $\star$	1.34	2.82	2.81%	1.65	3.76	3.71%	1.95	4.49	4.54%
MegaCRN (Ours) MegaCRN（我们的方法）	1.28	2.72	2.67%	1.60	3.68	3.57%	1.88	4.42	4.41%
EXPY-TKY	10min / horizon 1 10 分钟 / 预测步长 1			30min / horizon 3 30 分钟 / 预测步长 3			60min / horizon 6 60 分钟 / 预测时长 6
EXPY-TKY	MAE	RMSE	MAPE	MAE	RMSE	MAPE	MAE	RMSE	MAPE
HA(Li et al. 2018)	7.63	11.96	31.26%	7.63	11.96	31.25%	7.63	11.96	31.24%
STGCN(Yu, Yin, and Zhu 2018)	6.09	9.60	24.84%	6.91	10.99	30.24%	8.41	12.70	32.90%
DCRNN(Li et al. 2018)	6.04	9.44	25.54%	6.85	10.87	31.02%	7.45	11.86	34.61%
GW-Net(Wu et al. 2019) GW-Net（Wu 等人，2019 年）	5.91	9.30	25.22%	6.59	10.54	29.78%	6.89	11.07	31.71%
STTN(Xu et al. 2020) STTN（Xu 等人，2020 年）	5.90	9.27	25.67%	6.53	10.40	29.82%	6.99	11.23	32.52%
GMAN(Zheng et al. 2020) GMAN（郑等人，2020 年）	6.09	9.49	26.52%	6.64	10.55	30.19%	7.05	11.28	32.91%
MTGNN(Wu et al. 2020) MTGNN（吴等人，2020 年）	5.86	9.26	24.80%	6.49	10.44	29.23%	6.81	11.01	31.39%
StemGNN(Cao et al. 2020) $\dagger$ StemGNN（曹等人，2020 年） $\dagger$	6.08	9.46	25.87%	6.85	10.80	31.25%	7.46	11.88	35.31%
AGCRN(Bai et al. 2020) AGCRN（白等人，2020 年）	5.99	9.38	25.71%	6.64	10.63	29.81%	6.99	11.29	32.13%
CCRNN(Ye et al. 2021)	5.90	9.29	24.53%	6.68	10.77	29.93%	7.11	11.56	32.56%
GTS(Shang, Chen, and Bi 2021)	-	-	-	-	-	-	-	-	-
PM-MemNet(Lee et al. 2022) PM-MemNet（Lee 等人，2022 年）	5.94	9.25	25.10%	6.52	10.42	29.00%	6.87	11.14	31.22%
MegaCRN (Ours) MegaCRN（我们的方法）	5.81	9.20	24.49%	6.44	10.33	28.92%	6.83	11.04	31.02%

Datasets and Settings 数据集和设置

Datasets. We first evaluate our model by using two standard benchmark datasets from (Li et al. 2018): METR-LA and PEMS-BAY. They contain the traffic speed data from 207 sensors in Los Angeles and 325 sensors in Bay Area respectively. For the two benchmarks, we follow the tradition (Li et al. 2018; Wu et al. 2019; Shang, Chen, and Bi 2021; Lee et al. 2022) by splitting the datasets in chronological order with 70% for training, 10% for validation, and 20% for testing (namely 7:1:2). Besides, in this study, we publish a new traffic dataset called EXPY-TKY, that contains the traffic speed information and the corresponding traffic incident information in 10-minute interval for 1843 expressway road links in Tokyo over three months (2021/10 $\sim$ 2021/12). We use the first two months (Oct. 2021 and Nov. 2021) as the training and validation dataset, and the last one month (Dec. 2021) as the testing dataset. The specific spatio-temporal information of our datasets are summarized in Table 1.
数据集。我们首先使用来自（Li 等人，2018 年）的两个标准基准数据集来评估我们的模型：METR-LA 和 PEMS-BAY。它们分别包含洛杉矶 207 个传感器和湾区 325 个传感器的交通速度数据。对于这两个基准数据集，我们遵循传统做法（Li 等人，2018 年；Wu 等人，2019 年；Shang、Chen 和 Bi，2021 年；Lee 等人，2022 年），按时间顺序将数据集划分为 70%用于训练、10%用于验证、20%用于测试（即 7:1:2）。此外，在本研究中，我们发布了一个新的交通数据集，名为 EXPY-TKY，其中包含东京 1843 条高速公路路段在三个月（2021 年 10 月至 2021 年 12 月）内以 10 分钟为间隔的交通速度信息和相应的交通事件信息。我们将前两个月（2021 年 10 月和 2021 年 11 月）的数据作为训练和验证数据集，将最后一个月（2021 年 12 月）的数据作为测试数据集。我们数据集的具体时空信息汇总在表 1 中。

Settings. For EXPY-TKY, the Encoder and Decoder of our model consist of 1 RNN-layer respectively, where the number of hidden states is 32. We reserve 10 prototypes (i.e., meta-nodes) in the memory, each of which is a 32-dimension learnable vector. For METR-LA and PEMSBAY, each RNN layer in Encoder and Decoder has 64 units and the memory bank has 20 meta-nodes with 64-dimension. The observation step $\alpha$ and prediction horizon $\beta$ are both set to 12 on METR-LA and PEMS-BAY, while $\alpha$ / $\beta$ are both set to 6 on EXPY-TKY. Such settings can give us 1-hour lead time forecasting, which follow the tradition in previous literatures (Yu, Yin, and Zhu 2018; Li et al. 2018; Wu et al. 2019; Bai et al. 2020; Shang, Chen, and Bi 2021). Adam was used as the optimizer, where the learning rate was set to 0.01 and the batch size was set to 64. The optimizer would either be early-stopped if the validation error was converged within 20 epochs or be stopped after 200 epochs. L1 Loss is used as the loss function. Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE) are used as metrics. All experiments were performed with four GeForce RTX 3090 GPUs.
设置。对于 EXPY-TKY 数据集，我们模型的编码器和解码器分别由 1 个循环神经网络（RNN）层组成，其中隐藏状态的数量为 32。我们在记忆库中保留 10 个原型（即元节点），每个原型都是一个 32 维的可学习向量。对于 METR-LA 和 PEMSBAY 数据集，编码器和解码器中的每个 RNN 层有 64 个单元，记忆库中有 20 个 64 维的元节点。在 METR-LA 和 PEMS-BAY 数据集中，观测步长 $\alpha$ 和预测时长 $\beta$ 均设置为 12，而在 EXPY-TKY 数据集中， $\alpha$ / $\beta$ 均设置为 6。这样的设置可以实现 1 小时的提前预测，这遵循了以往文献中的惯例（Yu, Yin, and Zhu 2018; Li et al. 2018; Wu et al. 2019; Bai et al. 2020; Shang, Chen, and Bi 2021）。我们使用 Adam 作为优化器，学习率设置为 0.01，批次大小设置为 64。如果验证误差在 20 个周期内收敛，优化器将提前停止；否则，在 200 个周期后停止。我们使用 L1 损失作为损失函数。均方根误差（RMSE）、平均绝对误差（MAE）和平均绝对百分比误差（MAPE）作为评估指标。所有实验均在四块 GeForce RTX 3090 GPU 上进行。

Quantitative Evaluation 定量评估

We compare our model with the following baselines: 1) Historical Average (HA) averaged values of the same time slot from historical days (Li et al. 2018); 2) STGCN (Yu, Yin, and Zhu 2018), 3) DCRNN (Li et al. 2018), and 4) GW-Net (Wu et al. 2019), the most representative deep models for traffic forecasting, respectively embed spectral (Yu, Yin, and Zhu 2018) or diffusion graph convolution (Li et al. 2018; Wu et al. 2019) into temporal convolution (i.e., TCN or WaveNet)(Yu, Yin, and Zhu 2018; Wu et al. 2019) or recurrent unit (e.g., GRU)(Li et al. 2018); 5) STTN (Xu et al. 2020) and 6) GMAN (Zheng et al. 2020) are two Transformer-based SOTAs; 7) MTGNN (Wu et al. 2020) is an extended version of GW-Net that extends the adaptive graph leaning part; 8) StemGNN (Cao et al. 2020) first learns a latent graph via self-attention and performs the spatiotemporal modeling in spectral domain; 9) AGCRN (Bai et al. 2020) adaptively learns node-specific parameters for graph convolution; 10) CCRNN (Ye et al. 2021) learns multiple parameterized matrices for multiple layers of graph convolution; 11) GTS (Shang, Chen, and Bi 2021) learns each link’s (edge’s) probability based on each variable’s (node’s) long historical data; 12) PM-MemNet (Lee et al. 2022) also utilizes memory networks for traffic pattern matching. 9) $\sim$ 12) are built based upon GCRN (Seo et al. 2018; Li et al. 2018).
我们将我们的模型与以下基准模型进行比较：1) 历史平均（HA）：历史日期中相同时间段的平均值（Li 等人，2018 年）；2) STGCN（Yu、Yin 和 Zhu，2018 年）；3) DCRNN（Li 等人，2018 年）；4) GW-Net（Wu 等人，2019 年），这是交通预测中最具代表性的深度模型，分别将频谱（Yu、Yin 和 Zhu，2018 年）或扩散图卷积（Li 等人，2018 年；Wu 等人，2019 年）嵌入到时间卷积（即 TCN 或 WaveNet）（Yu、Yin 和 Zhu，2018 年；Wu 等人，2019 年）或循环单元（例如 GRU）（Li 等人，2018 年）中；5) STTN（Xu 等人，2020 年）和 6) GMAN（Zheng 等人，2020 年）是两个基于 Transformer 的最先进模型；7) MTGNN（Wu 等人，2020 年）是 GW-Net 的扩展版本，扩展了自适应图学习部分；8) StemGNN（Cao 等人，2020 年）首先通过自注意力学习潜在图，并在频谱域进行时空建模；9) AGCRN（Bai 等人，2020 年）自适应地学习图卷积的特定节点参数；10) CCRNN（Ye 等人 2021）为多层图卷积学习多个参数化矩阵；11）GTS（Shang, Chen, and Bi 2021）根据每个变量（节点）的长期历史数据学习每条链路（边）的概率；12）PM-MemNet（Lee et al. 2022）也利用记忆网络进行交通模式匹配。9） $\sim$ 12）是基于 GCRN（Seo et al. 2018；Li et al. 2018）构建的。

Overall Comparison. Most of the baselines’ results on the benchmarks are reported from their original papers. However, due to the reproducibility problem ( $\star$ in Table 2), we report the results of GMAN, GTS, and PM-MemNet either from our own experiments or other literature (e.g., GMAN on METR-LA (Shao et al. 2022)). Through Table 2, we can find our model outperformed the state-of-the-arts in almost all cases (dataset/horizon/metric). Among the SOTAs, GW-Net(Wu et al. 2019) and MTGNN (Wu et al. 2020) marked relatively good performances thanks to the WaveNet backbone. CCRNN (Ye et al. 2021) delivered better performance on our dataset than the benchmarks. Because it requires the 0-1 adjacency matrix of the road network to get a good initialization for the learnable graphs, which is not available in the benchmark datasets. GMAN (Zheng et al. 2020) and StemGNN (Cao et al. 2020) gave a worse performance on our dataset, because the number of nodes $N$ in ours is around 6 $\sim$ 9 times larger than the benchmarks and the self-attention in them struggled to work on such a large scale. GTS (Shang, Chen, and Bi 2021) could not even be applicable on our dataset, because it requires to parameterize an $\mathbb{R}^{N^{2}\times N}$ matrix ( $N^{2}$ edges and $N$ nodes) for edge generation based on each node’s features. StemGNN $\dagger$ is considered to have a data leakage problem as it averages all the graphs in minibatch for both training and testing (sample interaction).
总体比较。大多数基准模型在基准数据集上的结果来自其原始论文。然而，由于可重复性问题（表 2 中的 $\star$ ），我们报告的 GMAN、GTS 和 PM-MemNet 的结果要么来自我们自己的实验，要么来自其他文献（例如，GMAN 在 METR-LA 数据集上的结果（Shao 等人，2022 年））。从表 2 中可以看出，在几乎所有情况下（数据集/预测时长/评估指标），我们的模型均优于当前最先进的模型。在这些最先进的模型中，GW-Net（Wu 等人，2019 年）和 MTGNN（Wu 等人，2020 年）由于采用了 WaveNet 骨干网络，表现相对较好。CCRNN（Ye 等人，2021 年）在我们的数据集上的表现优于基准模型。这是因为它需要道路网络的 0 - 1 邻接矩阵来为可学习图提供良好的初始化，而基准数据集中没有该矩阵。GMAN（Zheng 等人，2020 年）和 StemGNN（Cao 等人，2020 年）在我们的数据集上表现较差，因为我们数据集中的节点数量 $N$ 约为基准数据集的 6 $\sim$ 9 倍，而它们中的自注意力机制难以在如此大规模的数据上运行。 GTS（Shang, Chen, and Bi 2021）甚至无法应用于我们的数据集，因为它需要根据每个节点的特征为边生成参数化一个 $\mathbb{R}^{N^{2}\times N}$ 矩阵（ $N^{2}$ 条边和 $N$ 个节点）。StemGNN $\dagger$ 被认为存在数据泄露问题，因为它在训练和测试（样本交互）时都会对小批次中的所有图求平均。

Table 3: Ablation Test across All Horizons
表 3：所有预测时段的消融实验

Ablation 消融；切除；脱离；（对理论、学说等的）舍弃	METR-LA	PEMS-BAY	EXPY-TKY
Ablation 消融；切除；脱离；（对理论、学说等的）舍弃	MAE / RMSE 平均绝对误差 / 均方根误差	MAE / RMSE 平均绝对误差 / 均方根误差	MAE / RMSE 平均绝对误差 / 均方根误差
Adaptive 自适应的	3.01 / 6.25	1.61 / 3.73	6.79 / 10.76
Momentary 瞬间的；片刻的；短暂的	2.96 / 6.16	1.62 / 3.75	6.68 / 10.59
Memory 记忆；存储器；内存	2.97 / 6.21	1.60 / 3.70	6.55 / 10.48
MegaCRN	2.89 / 6.02	1.54 / 3.59	6.44 / 10.35

Ablation Study. To evaluate the actual performance of each component of our model, we create a series of variants as follows: (1) Adaptive GCRN. It only keeps the GCRN encoder-decoder of MegaCRN and lets the encoder and decoder share a same adaptive graph, similar graph structure learning mechanism to GW-Net (Wu et al. 2019), MTGCNN (Wu et al. 2020), and AGCRN (Bai et al. 2020); (2) Memory GCRN. It excludes the Hyper-Network from MegaCRN and just uses a Memory Network (i.e., same as the Meta-Node Bank) to get an augmented hidden states $M_{t}$ (from the encoder) for the decoder, which shares the same adaptive graph $\mathcal{G}$ with the encoder. (3) Momentary GCRN. It excludes the Meta-Node Bank from MegaCRN and directly uses a Hyper-Network (i.e., FC layer) to take the encoder’s hidden states $H_{t}$ to generate a momentary graph for the decoder. Through Table 3, we can see that compared to Momentary GCRN, Memory GCRN brings a higher performance gain to Adaptive GCRN. Because Momentary GCRN is obtained from $H_{t}$ , it has to learn a separate graph for each sample in minibatch, which is non-trivial. All these demonstrate that MegaCRN is a complete and indivisible set.
消融研究。为评估我们模型中各组件的实际性能，我们创建了如下一系列变体： (1) 自适应 GCRN。它仅保留 MegaCRN 的 GCRN 编码器 - 解码器，并让编码器和解码器共享同一个自适应图，其图结构学习机制与 GW-Net（Wu 等人，2019 年）、MTGCNN（Wu 等人，2020 年）和 AGCRN（Bai 等人，2020 年）类似； (2) 记忆 GCRN。它从 MegaCRN 中排除超网络，仅使用记忆网络（即与元节点库相同）来为解码器获取增强的隐藏状态 $M_{t}$ （来自编码器），该隐藏状态与编码器共享相同的自适应图 $\mathcal{G}$ 。 (3) 瞬时 GCRN。它从 MegaCRN 中排除元节点库，直接使用超网络（即全连接层）获取编码器的隐藏状态 $H_{t}$ ，为解码器生成瞬时图。通过表 3，我们可以看到，与瞬时 GCRN 相比，记忆 GCRN 为自适应 GCRN 带来了更高的性能提升。因为瞬时 GCRN 是从 $H_{t}$ 中获得的，所以它必须为小批量中的每个样本学习一个单独的图，这并非易事。所有这些都表明 MegaCRN 是一个完整且不可分割的集合。

Efficiency Study. We also evaluate the efficiency of our model by comparing with the-state-of-the-arts. Here we just report the results on EXPY-TKY, because the spatial domain of our data is 5 $\sim$ 9 times larger than the benchmarks. A scatter plot is shown as Figure 3, where the x-axis of is the total number of parameters and the y-axis is the overall MAE. We can see that our model has the second-fewest parameters (merely 133,597) but the smallest overall MAE. For a large-scale dataset like EXPY-TKY, our model could be very memory-efficient. In contrast, some models, especially Transformer-based models including GMAN (Zheng et al. 2020) and STTN (Xu et al. 2020), are very memory/time-consuming due to the dot-product operation on big tensor. Although our model tends to need more epochs to converge, each round of training could be finished in very little time. To sum up, our model can achieve the state-of-the-art precision while keeping comparatively efficient.
效率研究。我们还通过与最先进的模型进行比较来评估我们模型的效率。这里我们仅报告在 EXPY-TKY 数据集上的结果，因为我们数据的空间域比基准数据集大 5 $\sim$ 9 倍。散点图如图 3 所示，其中 x 轴表示参数总数，y 轴表示整体平均绝对误差（MAE）。可以看出，我们的模型参数数量第二少（仅 133,597 个），但整体 MAE 最小。对于像 EXPY-TKY 这样的大规模数据集，我们的模型内存效率非常高。相比之下，一些模型，尤其是基于 Transformer 的模型，如 GMAN（郑等人，2020 年）和 STTN（徐等人，2020 年），由于需要对大张量进行点积运算，非常耗费内存和时间。虽然我们的模型收敛可能需要更多轮次，但每一轮训练都能在很短的时间内完成。综上所述，我们的模型在保持相对高效的同时能够达到最先进的精度。

Qualitative Evaluation 定性评估

Spatio-Temporal Disentanglement. We qualitatively evaluate the quality of node embeddings by visualizing them in a low-dimensional space with t-SNE. Compared with adaptive GSL illustrated as Figure 4(a), meta-graph can automatically learn to cluster nodes (i.e., road links) as shown in Figure 4(b)(c). Interestingly, as time evolves from t to t+1, this clustering effect persists but the cluster shape changes, which confirms the spatio-temporal disentangling capability as well as the time-adaptability of our approach. In addition, we map out the physical locations of the road links in discovered clusters with different colors (cluster 1 in blue, cluster 2 in red) in Figure 4(d). We observe a strong correlation between the spatial distribution of cluster 2 (in red) and interchanges/toll gates. From the daily averaged time series plot in the bottom of Figure 4(d), we can clearly tell the inter-cluster difference. While road links in cluster 1 (in blue) share a strong rush hour pattern, the other cluster (in red) has a lower speed on average but higher variations, which is characterized by large amount of speed change near interchanges/toll gates. These observations validate the power of Meta-Graph Learner to explicitly distinguish spatio-temporal heterogeneity.
时空解缠。我们通过使用 t-SNE 将节点嵌入在低维空间中进行可视化，定性评估节点嵌入的质量。与图 4(a)所示的自适应图结构学习（GSL）相比，元图能够自动学习对节点（即道路链路）进行聚类，如图 4(b)(c)所示。有趣的是，随着时间从 t 演变到 t+1，这种聚类效果持续存在，但聚类形状发生了变化，这证实了我们方法的时空解缠能力以及时间适应性。此外，我们在图 4(d)中用不同颜色（聚类 1 为蓝色，聚类 2 为红色）绘制出了所发现聚类中道路链路的物理位置。我们观察到聚类 2（红色）的空间分布与互通式立交/收费站之间存在很强的相关性。从图 4(d)底部的日平均时间序列图中，我们可以清楚地看出聚类间的差异。聚类 1（蓝色）中的道路链路呈现出明显的高峰时段模式，而另一个聚类（红色）的平均速度较低，但变化较大，其特点是在互通式立交/收费站附近速度变化较大。这些观察结果验证了元图学习器明确区分时空异质性的能力。

Incident-Awareness. We qualitatively study the robustness of MegaCRN to various traffic situations in Figure 5. Here we select an incident case that occurred at 21:50 at road link 1, marked in red in Figure 5(a), on December 13th, 2021. In terms of the prediction results (in 60-minute lead time), compared with two baselines, GW-Net (Wu et al. 2019) and CCRNN (Ye et al. 2021), our model can not only better capture normal fluctuations, but adapt to more complex situations including rush hour and traffic accident (in shaded red). Such sudden disturbance inevitably results in delay or failure of detection for other models. From the visualization of memory query weight in Figure 5(b), we can tell that the pattern querying to the Meta-Node Bank is different between normal situations and rush hour or incident case. This observation confirms the distinguishing power and generalizability to diverse traffic scenarios. We further visualize the learned local meta-graph as Figure 5(d), in which thicker line represents higher edge weight and bigger node size means larger weighted outdegree. Intuitively, we can find the meta-graph is changing with time. At 21:40 before the accident happened, node 1 (road link 1) held the biggest impact in the local meta-graph as road link 1 lies right at the center of the large road intersection. Then at 21:50 after the accident happened, the impact of node 1 dropped significantly and the graph became dominated by road link 7, 8, 9, and 10 that formed a separated subgraph at 21:40. This case study verifies the superior adaptability of our approach.
事故感知。我们在图 5 中定性研究了 MegaCRN 在各种交通状况下的鲁棒性。在此，我们选取了 2021 年 12 月 13 日 21:50 发生在道路路段 1 的一个事故案例，该路段在图 5(a)中用红色标记。就预测结果（提前 60 分钟）而言，与两个基准模型 GW-Net（Wu 等人，2019 年）和 CCRNN（Ye 等人，2021 年）相比，我们的模型不仅能更好地捕捉正常波动，还能适应包括高峰时段和交通事故（红色阴影部分）在内的更复杂情况。这种突发干扰不可避免地会导致其他模型的检测出现延迟或失败。从图 5(b)中记忆查询权重的可视化结果可以看出，在正常情况与高峰时段或事故情况下，对元节点库的模式查询是不同的。这一观察结果证实了该模型对不同交通场景的区分能力和泛化能力。我们进一步将学习到的局部元图可视化，如图 5(d)所示，其中较粗的线条表示较高的边权重，较大的节点尺寸表示较大的加权出度。直观地看，我们可以发现元图随时间变化。在事故发生前的 21:40，节点 1（道路路段 1）在局部元图中影响最大，因为道路路段 1 正位于大型道路交叉口的中心。随后在事故发生后的 21:50，节点 1 的影响显著下降，且该图开始由道路路段 7、8、9 和 10 主导，而这几个路段在 21:40 时构成了一个独立的子图。该案例研究验证了我们方法的卓越适应性。

Conclusion 结论

In this study, we propose Meta-Graph Convolutional Recurrent Network (MegaCRN) along with a novel spatio-temporal graph structure learning mechanism for traffic forecasting. Besides two benchmarks, METR-LA and PEMS-BAY, we further generate a brand-new traffic dataset called EXPY-TKY from large-scale car GPS records and collect the corresponding traffic incident information. Our model outperformed the state-of-the-arts to a large degree on all three datasets. Through a series of visualizations, it also demonstrated the capability to disentangle the time and nodes with different patterns as well as the adaptability to incident situations. We will further generalize our model for Multivariate Time Series forecasting tasks in the future.
在本研究中，我们提出了元图卷积循环网络（MegaCRN）以及一种新颖的时空图结构学习机制，用于交通流量预测。除了 METR-LA 和 PEMS-BAY 这两个基准数据集外，我们还从大规模汽车 GPS 记录中生成了一个全新的交通数据集 EXPY-TKY，并收集了相应的交通事件信息。我们的模型在所有三个数据集上的表现均大幅优于现有最佳模型。通过一系列可视化分析，该模型还展示了区分具有不同模式的时间和节点的能力，以及对事件情况的适应性。未来，我们将进一步将模型推广到多变量时间序列预测任务中。

Acknowledgments 致谢

This work was supported by Toyota Motor Corporation and JSPS KAKENHI Grant Number JP20K19859.
这项工作得到了丰田汽车公司和日本学术振兴会科学研究资助项目（项目编号：JP20K19859）的支持。

References

Bai et al. (2020) Bai, L.; Yao, L.; Li, C.; Wang, X.; and Wang, C. 2020. Adaptive graph convolutional recurrent network for traffic forecasting. Advances in Neural Information Processing Systems, 33: 17804–17815.
Cao et al. (2020) Cao, D.; Wang, Y.; Duan, J.; Zhang, C.; Zhu, X.; Huang, C.; Tong, Y.; Xu, B.; Bai, J.; Tong, J.; et al. 2020. Spectral temporal graph neural network for multivariate time-series forecasting. Advances in Neural Information Processing Systems, 33: 17766–17778.
Chung et al. (2014) Chung, J.; Gulcehre, C.; Cho, K.; and Bengio, Y. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555.
Defferrard, Bresson, and Vandergheynst (2016) Defferrard, M.; Bresson, X.; and Vandergheynst, P. 2016. Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in neural information processing systems, 3844–3852.
Deng et al. (2021) Deng, J.; Chen, X.; Jiang, R.; Song, X.; and Tsang, I. W. 2021. ST-Norm: Spatial and Temporal Normalization for Multi-variate Time Series Forecasting. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, 269–278.
Diao et al. (2019) Diao, Z.; Wang, X.; Zhang, D.; Liu, Y.; Xie, K.; and He, S. 2019. Dynamic spatial-temporal graph convolutional neural networks for traffic forecasting. In Proceedings of the AAAI conference on artificial intelligence, volume 33, 890–897.
Ding et al. (2021) Ding, Y.; Yao, Q.; Zhao, H.; and Zhang, T. 2021. Diffmg: Differentiable meta graph search for heterogeneous graph neural networks. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, 279–288.
Geng et al. (2019) Geng, X.; Li, Y.; Wang, L.; Zhang, L.; Yang, Q.; Ye, J.; and Liu, Y. 2019. Spatiotemporal multi-graph convolution network for ride-hailing demand forecasting. In Proceedings of the AAAI conference on artificial intelligence, volume 33, 3656–3663.
Gong et al. (2019) Gong, D.; Liu, L.; Le, V.; Saha, B.; Mansour, M. R.; Venkatesh, S.; and Hengel, A. v. d. 2019. Memorizing normality to detect anomaly: Memory-augmented deep autoencoder for unsupervised anomaly detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 1705–1714.
Guo et al. (2019) Guo, S.; Lin, Y.; Feng, N.; Song, C.; and Wan, H. 2019. Attention based spatial-temporal graph convolutional networks for traffic flow forecasting. In Proceedings of the AAAI conference on artificial intelligence, volume 33, 922–929.
Ha, Dai, and Le (2016) Ha, D.; Dai, A.; and Le, Q. V. 2016. Hypernetworks. arXiv preprint arXiv:1609.09106.
Hamilton and Susmel (1994) Hamilton, J. D.; and Susmel, R. 1994. Autoregressive conditional heteroskedasticity and changes in regime. Journal of econometrics, 64(1-2): 307–333.
Hochreiter and Schmidhuber (1997) Hochreiter, S.; and Schmidhuber, J. 1997. Long short-term memory. Neural computation, 9(8): 1735–1780.
Huang et al. (2014) Huang, W.; Song, G.; Hong, H.; and Xie, K. 2014. Deep architecture for traffic flow prediction: Deep belief networks with multitask learning. Intelligent Transportation Systems, IEEE Transactions on, 15(5): 2191–2201.
Jiang et al. (2021) Jiang, R.; Yin, D.; Wang, Z.; Wang, Y.; Deng, J.; Liu, H.; Cai, Z.; Deng, J.; Song, X.; and Shibasaki, R. 2021. DL-Traff: Survey and Benchmark of Deep Learning Models for Urban Traffic Prediction. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, 4515–4525.
Jiang and Luo (2021) Jiang, W.; and Luo, J. 2021. Graph neural network for traffic forecasting: A survey. arXiv preprint arXiv:2101.11174.
Kipf et al. (2018) Kipf, T.; Fetaya, E.; Wang, K.-C.; Welling, M.; and Zemel, R. 2018. Neural relational inference for interacting systems. In International Conference on Machine Learning, 2688–2697. PMLR.
Kipf and Welling (2017) Kipf, T. N.; and Welling, M. 2017. Semi-Supervised Classification with Graph Convolutional Networks.
Lai et al. (2018) Lai, G.; Chang, W.-C.; Yang, Y.; and Liu, H. 2018. Modeling long-and short-term temporal patterns with deep neural networks. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, 95–104.
Lee et al. (2022) Lee, H.; Jin, S.; Chu, H.; Lim, H.; and Ko, S. 2022. Learning to Remember Patterns: Pattern Matching Memory Networks for Traffic Forecasting. In International Conference on Learning Representations.
Li et al. (2021) Li, F.; Feng, J.; Yan, H.; Jin, G.; Jin, D.; and Li, Y. 2021. Dynamic graph convolutional recurrent network for traffic prediction: Benchmark and solution. arXiv preprint arXiv:2104.14917.
Li et al. (2019) Li, S.; Jin, X.; Xuan, Y.; Zhou, X.; Chen, W.; Wang, Y.-X.; and Yan, X. 2019. Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting. Advances in Neural Information Processing Systems, 32.
Li et al. (2018) Li, Y.; Yu, R.; Shahabi, C.; and Liu, Y. 2018. Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting. In International Conference on Learning Representations.
Lu et al. (2020) Lu, B.; Gan, X.; Jin, H.; Fu, L.; and Zhang, H. 2020. Spatiotemporal adaptive gated graph convolution network for urban traffic flow forecasting. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, 1025–1034.
Lv et al. (2014) Lv, Y.; Duan, Y.; Kang, W.; Li, Z.; and Wang, F.-Y. 2014. Traffic flow prediction with big data: a deep learning approach. IEEE Transactions on Intelligent Transportation Systems, 16(2): 865–873.
Lv et al. (2018) Lv, Z.; Xu, J.; Zheng, K.; Yin, H.; Zhao, P.; and Zhou, X. 2018. Lc-rnn: A deep learning model for traffic speed prediction. In IJCAI, volume 2018, 27th.
Ma et al. (2015a) Ma, X.; Tao, Z.; Wang, Y.; Yu, H.; and Wang, Y. 2015a. Long short-term memory neural network for traffic speed prediction using remote microwave sensor data. Transportation Research Part C: Emerging Technologies, 54.
Ma et al. (2015b) Ma, X.; Yu, H.; Wang, Y.; and Wang, Y. 2015b. Large-scale transportation network congestion evolution prediction using deep learning theory. PloS one, 10(3): e0119044.
Oord et al. (2016) Oord, A. v. d.; Dieleman, S.; Zen, H.; Simonyan, K.; Vinyals, O.; Graves, A.; Kalchbrenner, N.; Senior, A.; and Kavukcuoglu, K. 2016. Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499.
Pan, Demiryurek, and Shahabi (2012) Pan, B.; Demiryurek, U.; and Shahabi, C. 2012. Utilizing real-world transportation data for accurate traffic prediction. In 2012 ieee 12th international conference on data mining, 595–604. IEEE.
Park, Noh, and Ham (2020) Park, H.; Noh, J.; and Ham, B. 2020. Learning memory-guided normality for anomaly detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 14372–14381.
Santoro et al. (2016) Santoro, A.; Bartunov, S.; Botvinick, M.; Wierstra, D.; and Lillicrap, T. 2016. Meta-learning with memory-augmented neural networks. In International conference on machine learning, 1842–1850. PMLR.
Seo et al. (2018) Seo, Y.; Defferrard, M.; Vandergheynst, P.; and Bresson, X. 2018. Structured sequence modeling with graph convolutional recurrent networks. In International Conference on Neural Information Processing, 362–373. Springer.
Shang, Chen, and Bi (2021) Shang, C.; Chen, J.; and Bi, J. 2021. Discrete Graph Structure Learning for Forecasting Multiple Time Series. In International Conference on Learning Representations.
Shao et al. (2022) Shao, Z.; Zhang, Z.; Wang, F.; and Xu, Y. 2022. Pre-training Enhanced Spatial-temporal Graph Neural Network for Multivariate Time Series Forecasting. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 1567–1577.
Shih, Sun, and Lee (2019) Shih, S.-Y.; Sun, F.-K.; and Lee, H.-y. 2019. Temporal pattern attention for multivariate time series forecasting. Machine Learning, 108(8-9): 1421–1441.
Stock and Watson (2001) Stock, J. H.; and Watson, M. W. 2001. Vector autoregressions. Journal of Economic perspectives, 15(4): 101–115.
Vaswani et al. (2017) Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A. N.; Kaiser, Ł.; and Polosukhin, I. 2017. Attention is all you need. Advances in neural information processing systems, 30.
Veličković et al. (2017) Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; and Bengio, Y. 2017. Graph attention networks. arXiv preprint arXiv:1710.10903.
Vinyals et al. (2016) Vinyals, O.; Blundell, C.; Lillicrap, T.; Wierstra, D.; et al. 2016. Matching networks for one shot learning. Advances in neural information processing systems, 29.
Wang et al. (2020) Wang, X.; Ma, Y.; Wang, Y.; Jin, W.; Wang, X.; Tang, J.; Jia, C.; and Yu, J. 2020. Traffic flow prediction via spatial temporal graph neural network. In Proceedings of The Web Conference 2020, 1082–1092.
Wang et al. (2022) Wang, Z.; Jiang, R.; Xue, H.; Salim, F. D.; Song, X.; and Shibasaki, R. 2022. Event-aware multimodal mobility nowcasting. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, 4228–4236.
Wu et al. (2020) Wu, Z.; Pan, S.; Long, G.; Jiang, J.; Chang, X.; and Zhang, C. 2020. Connecting the dots: Multivariate time series forecasting with graph neural networks. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 753–763.
Wu et al. (2019) Wu, Z.; Pan, S.; Long, G.; Jiang, J.; and Zhang, C. 2019. Graph WaveNet for Deep Spatial-Temporal Graph Modeling. In IJCAI.
Xu et al. (2021) Xu, J.; Wang, J.; Long, M.; et al. 2021. Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting. Advances in Neural Information Processing Systems, 34.
Xu et al. (2020) Xu, M.; Dai, W.; Liu, C.; Gao, X.; Lin, W.; Qi, G.-J.; and Xiong, H. 2020. Spatial-temporal transformer networks for traffic flow forecasting. arXiv preprint arXiv:2001.02908.
Yao et al. (2019) Yao, H.; Liu, Y.; Wei, Y.; Tang, X.; and Li, Z. 2019. Learning from multiple cities: A meta-learning approach for spatial-temporal prediction. In The World Wide Web Conference, 2181–2191.
Ye et al. (2021) Ye, J.; Sun, L.; Du, B.; Fu, Y.; and Xiong, H. 2021. Coupled Layer-wise Graph Convolution for Transportation Demand Prediction. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, 4617–4625.
Yu, Yin, and Zhu (2018) Yu, B.; Yin, H.; and Zhu, Z. 2018. Spatio-temporal graph convolutional networks: a deep learning framework for traffic forecasting. In Proceedings of the 27th International Joint Conference on Artificial Intelligence, 3634–3640.
Yu and Koltun (2016) Yu, F.; and Koltun, V. 2016. Multi-Scale Context Aggregation by Dilated Convolutions. In ICLR.
Zhang et al. (2020) Zhang, Q.; Chang, J.; Meng, G.; Xiang, S.; and Pan, C. 2020. Spatio-temporal graph structure learning for traffic forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, 1177–1185.
Zhang et al. (2021) Zhang, X.; Huang, C.; Xu, Y.; Xia, L.; Dai, P.; Bo, L.; Zhang, J.; and Zheng, Y. 2021. Traffic Flow Forecasting with Spatial-Temporal Graph Diffusion Network. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, 15008–15015.
Zhao et al. (2017) Zhao, H.; Yao, Q.; Li, J.; Song, Y.; and Lee, D. L. 2017. Meta-graph based recommendation fusion over heterogeneous information networks. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, 635–644.
Zhao et al. (2019) Zhao, L.; Song, Y.; Zhang, C.; Liu, Y.; Wang, P.; Lin, T.; Deng, M.; and Li, H. 2019. T-gcn: A temporal graph convolutional network for traffic prediction. IEEE Transactions on Intelligent Transportation Systems, 21(9): 3848–3858.
Zheng et al. (2020) Zheng, C.; Fan, X.; Wang, C.; and Qi, J. 2020. Gman: A graph multi-attention network for traffic prediction. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, 1234–1241.
Zhou et al. (2021) Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Xiong, H.; and Zhang, W. 2021. Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of AAAI.
Zhu et al. (2021) Zhu, Y.; Xu, W.; Zhang, J.; Liu, Q.; Wu, S.; and Wang, L. 2021. Deep graph structure learning for robust representations: A survey. arXiv preprint arXiv:2103.03036.

Spatio-Temporal Meta-Graph Learning for Traffic Forecasting用于交通流量预测的时空元图学习