2024_08_20_f178afe87f4bf2e86668g

Unisolver: PDE-Conditional Transformers Are Universal PDE Solvers
Unisolver：PDE 条件变换器是通用 PDE 求解器

Hang Zhou, Yuezhou Ma, Haixu Wu , Haowen Wang, Mingsheng Long
杭州，岳州马，吴海旭，王浩文，龙明生 School of Software, BNRist, Tsinghua University, China
清华大学软件学院，北京信息科学与技术国家研究中心，中国{zhou-h23, mayz20, wuhx23, wang-hw21}@mails.tsinghua.edu.cn, {mingsheng}@tsinghua.edu.cn

Abstract 摘要

Deep models have recently emerged as a promising tool to solve partial differential equations (PDEs), known as neural PDE solvers. While neural solvers trained from either simulation data or physics-informed loss can solve the PDEs reasonably well, they are mainly restricted to a specific set of PDEs, e.g. a certain equation or a finite set of coefficients. This bottleneck limits the generalizability of neural solvers, which is widely recognized as its major advantage over numerical solvers. In this paper, we present the Universal PDE solver (Unisolver) capable of solving a wide scope of PDEs by leveraging a Transformer pre-trained on diverse data and conditioned on diverse PDEs. Instead of simply scaling up data and parameters, Unisolver stems from the theoretical analysis of the PDE-solving process. Our key finding is that a PDE solution is fundamentally under the control of a series of PDE components, e.g. equation symbols, coefficients, and initial and boundary conditions. Inspired by the mathematical structure of PDEs, we define a complete set of PDE components and correspondingly embed them as domain-wise (e.g. equation symbols) and point-wise (e.g. boundaries) conditions for Transformer PDE solvers. Integrating physical insights with recent Transformer advances, Unisolver achieves consistent state-of-the-art results on three challenging large-scale benchmarks, showing impressive gains and endowing favorable generalizability and scalability.
深度模型最近已成为解决偏微分方程（PDEs）的一种有前景的工具，被称为神经 PDE 求解器。虽然通过模拟数据或物理信息损失训练的神经求解器可以相当好地解决 PDEs，但它们主要局限于特定的 PDE 集合，例如某个特定方程或有限的系数集。这个瓶颈限制了神经求解器的泛化能力，而泛化能力被广泛认为是其相对于数值求解器的主要优势。在本文中，我们提出了能够解决广泛 PDEs 的通用 PDE 求解器（Unisolver），它利用在多样化数据上预训练并对多样化 PDEs 进行条件设置的 Transformer。Unisolver 不仅仅是简单地扩大数据和参数规模，而是源于对 PDE 求解过程的理论分析。我们的关键发现是，PDE 解本质上受一系列 PDE 组件的控制，如方程符号、系数以及初始和边界条件。受 PDEs 数学结构的启发，我们定义了一套完整的 PDE 组件，并相应地将它们嵌入为域级（如方程符号）和点级（如 Transformer PDE 求解器的边界条件。Unisolver 将物理洞见与最新的 Transformer 进展相结合，在三个具有挑战性的大规模基准测试中取得了一致的最先进结果，显示出令人印象深刻的进步，并赋予了良好的泛化能力和可扩展性。

1 Introduction 1 引言

Partial differential equations (PDEs) are essential for numerous scientific and engineering problems 8 2], such as meteorology, electromagnetism and thermodynamics [41]. Since it is usually hard to obtain an analytic solution for PDEs, numerical methods are widely explored [1]. However, these numerical methods often require huge computation costs to generate a precise solution for each PDE. Recently, deep models have empowered many breakthroughs in wide areas [5, 22] and have also been applied to solve PDEs, i.e. neural PDE solvers. Owing to the excellent capability in approximating nonlinear mappings, deep models can learn to fit pre-collected data [19] or physics-informed loss function [33] and generalize in a flash to new samples, offering an efficient way for solving PDEs.
偏微分方程（PDE）对于许多科学和工程问题至关重要，如气象学、电磁学和热力学。由于通常很难获得 PDE 的解析解，因此广泛研究了数值方法。然而，这些数值方法通常需要巨大的计算成本才能为每个 PDE 生成精确的解决方案。近年来，深度模型在广泛领域中促成了许多突破，也被应用于解决 PDE，即神经 PDE 求解器。由于在逼近非线性映射方面具有出色的能力，深度模型可以学习拟合预先收集的数据或基于物理的损失函数，并迅速推广到新样本，为解决 PDE 提供了一种高效的方法。
As shown in Figure 1, previous neural solvers involve two different paradigms: physics-informed neural works (PINNs) [33] and neural operators [19, 44], which train the deep models with formalized PDE loss function and pre-calculated simulation data respectively. However, for the first paradigm, while formalizing the PDE equations as objective functions can ensure a relatively accurate solution, PINNs are hard to generalize to new scenarios and we have to retrain the model for different tasks. As for neural operators, they directly learn from the simulation data, which can generalize better for diverse initial states or conditions than PINNs. Nevertheless, purely based on the training data may be insufficient to guide PDE solving. For example, given a fluid governed by Navier-Stokes equations, the typical task of neural operators is to predict the future based on past observations [19], while different viscosity coefficients and forcing terms will lead to distinct solutions even under the
如图 1 所示，以往的神经求解器涉及两种不同的范式：物理信息神经网络（PINNs）[33]和神经算子[19, 44]，它们分别通过形式化的 PDE 损失函数和预先计算的模拟数据来训练深度模型。然而，对于第一种范式，虽然将 PDE 方程形式化为目标函数可以确保相对准确的解，但 PINNs 难以推广到新场景，我们必须为不同任务重新训练模型。至于神经算子，它们直接从模拟数据中学习，可以比 PINNs 更好地适应多样的初始状态或条件。尽管如此，仅仅依靠训练数据可能不足以指导 PDE 求解。例如，给定由纳维-斯托克斯方程支配的流体，神经算子的典型任务是根据过去的观察预测未来[19]，而不同的粘性系数和强迫项即使在相同的

\footnotetext{ \footnotetext{ 脚注
*Equal Contribution *同等贡献

Complete PDE Components 完整的 PDE 组件
Equation Symbols Boundary Type PDE Coefficients Domain Geometry Force Term Boundary Value
方程符号边界类型偏微分方程系数域几何力项边界值

Figure 1: Neural PDE solvers typically consist of two paradigms: physics informed and data driven. Our proposed Unisolver combines data-driven methods with physical insights from complete PDE components in a conditional modeling framework, thereby boosting generalizability and scalability.
图 1：神经偏微分方程求解器通常包含两种范式：基于物理规律和数据驱动。我们提出的 Unisolver 在条件建模框架中将数据驱动方法与完整偏微分方程组件的物理洞察相结合，从而提高了泛化能力和可扩展性。
same initial state. Thus, due to the omission of PDE information, current neural operators are mainly trained and tested on a specific set of PDEs. Notably, as neural solvers are expected to be efficient surrogates of neural methods, generalization to various PDEs is essential for a practical neural solver.
相同的初始状态。因此，由于省略了偏微分方程信息，当前的神经算子主要在特定的一组偏微分方程上进行训练和测试。值得注意的是，由于神经求解器被期望成为神经方法的高效替代品，对各种偏微分方程的泛化能力对于实用的神经求解器来说至关重要。
To tackle the generalization deficiency, several works are proposed by incorporating the PDE information into deep models or training models with a large-scale dataset. For example, message-passing neural PDE solver [3] concatenates the PDE coefficients with inputs. PDEformer [45] formalizes the PDE equation as a computation graph and employs the graph Transformer [46] to aggregate PDE information. Although these methods have explored the potential of training models with both data and PDE information, they fail to support a complete set of PDE information, thereby limiting the generalizability of models in some aspects. As for the other branches, such as DPOT [10], they just scale up the training sets with diverse PDEs and expect the generalizability emerges from large data and parameters. However, without the complete PDE information, it is theoretically impossible to generate a correct solution because what the models essentially perform is fitting an incomplete data distribution, so it is hard to expect satisfactory generalizability.
为了解决泛化能力不足的问题，一些研究通过将偏微分方程（PDE）信息整合到深度模型中或使用大规模数据集训练模型来提出解决方案。例如，消息传递神经 PDE 求解器[3]将 PDE 系数与输入连接在一起。PDEformer[45]将 PDE 方程形式化为计算图，并使用图形 Transformer[46]来聚合 PDE 信息。尽管这些方法探索了利用数据和 PDE 信息训练模型的潜力，但它们未能支持完整的 PDE 信息集，从而在某些方面限制了模型的泛化能力。至于其他分支，如 DPOT[10]，它们仅仅是通过多样化的 PDE 扩大训练集，期望从大量数据和参数中产生泛化能力。然而，没有完整的 PDE 信息，理论上不可能生成正确的解决方案，因为模型本质上执行的是拟合不完整的数据分布，所以很难期望获得令人满意的泛化能力。

Going beyond prior methods, as shown in Figure 11 this paper presents Unisolver as a Universal PDE solver. Concretely, Unisolver takes the advantages from both data-driven and physics-informed paradigms and empowers Transformer with favorable PDE generalizability by introducing complete physics information as conditions. Instead of simply scaling up data and parameters, we start from the theoretical analysis of PDE solving and conclude with a complete set of PDE components. Further, drawing inspiration from the mathematical structure of PDEs, we propose to classify PDE components into domain-wise and point-wise categories according to their effect on the final solution and aggregate them as two types of deep PDE conditions. Afterward, to capture the special influence of different condition types on the hidden input feature, we separate the Transformer representation into two subspaces and integrate these deep conditions into Transformer in a decoupled way. We conduct extensive experiments on our own generated dataset and two large-scale benchmarks with various PDE conditions, where Unisolver achieves consistent state-of-the-art with sharp relative gains. Overall, our contributions are summarized as follows:
如图 11 所示，本文提出的 Unisolver 作为通用 PDE 求解器超越了先前的方法。具体而言，Unisolver 结合了数据驱动和物理信息两种范式的优势，通过引入完整的物理信息作为条件，赋予 Transformer 有利的 PDE 泛化能力。我们不是简单地增加数据和参数，而是从 PDE 求解的理论分析开始，得出了一套完整的 PDE 组件。此外，受 PDE 数学结构的启发，我们提出根据 PDE 组件对最终解的影响将其分为域级和点级两类，并将它们聚合为两种深度 PDE 条件。之后，为了捕捉不同条件类型对隐藏输入特征的特殊影响，我们将 Transformer 表示分为两个子空间，并以解耦的方式将这些深度条件整合到 Transformer 中。我们在自行生成的数据集和两个具有各种 PDE 条件的大规模基准上进行了广泛实验，Unisolver 取得了一致的最先进结果，相对收益显著。总的来说，我们的贡献可总结如下：

We introduce Unisolver as a conditional Transformer architecture utilizing embedded PDE information completely, marking the first demonstration of the potential of the canonical Transformer as a scalable backbone for solving multiple PDEs with diverse conditions.
我们将 Unisolver 引入作为一种条件 Transformer 架构，它充分利用嵌入的偏微分方程信息，这是首次展示经典 Transformer 作为可扩展骨干网络解决多个具有不同条件的偏微分方程的潜力。
Motivated by the mathematical structure of PDEs, we classify universal PDE conditions into domain-wise and point-wise categories, which further derives a decoupled conditioning mechanism incorporating designs for introducing PDE conditions to deep representations.
受偏微分方程数学结构的启发，我们将通用偏微分方程条件分为域级和点级两类，这进一步衍生出一种解耦的条件机制，其中包含了将偏微分方程条件引入深度表示的设计。
Unisolver achieves consistent state-of-the-art performances across three large-scale benchmarks with impressive relative gains and presents favorable generalizability and scalability.
Unisolver 在三个大规模基准测试中实现了一致的最先进性能，取得了令人印象深刻的相对收益，并展现了良好的通用性和可扩展性。

2.1 Neural PDE Solvers 2.1 神经偏微分方程求解器

Previous neural PDE solvers can be roughly categorized into the following two paradigms. The first paradigm is physics-informed neural networks (PINNs) [33], which optimize the deep model
先前的神经偏微分方程求解器大致可以分为以下两种范式。第一种范式是物理信息神经网络（PINNs）[33]，它优化深度模型
by formalizing the PDE equations as objective functions. During training, the model outputs and gradients will gradually satisfy the targeted PDE, thereby successfully instantiating the solution as a deep model. However, PINNs are usually hard to generalize to unseen PDEs, limiting their applications in broader practice [41]. Another booming direction is neural operators, which learn from extensive data to approximate functional dependence between input and output Banach spaces [25, 15]. Among various neural operators, FNO [19] and its variants [18, 32, 42] are popular and wellestablished. FNO [19] effectively approximates the kernel integral operator in the frequency domain through Fourier transformation. Besides, LSM [43] generalizes the classical spectral method in latent space to avoid the curse of dimensionality. Recently, given the impressive progress achieved by Transformers [40], they have also been applied to solve PDEs. However, due to the quadratic complexity of the attention mechanism, the main challenge in applying Transformers is defining the tokens used to compute attention in PDE-solving tasks. OFormer [16] and GNOT [11] treat each mesh point as a token and utilize the linear Transformer to avert the complexity problem. Factformer [17] axially factorizes the attention block to boost the model efficiency. Recently, Transolver [44] proposes to learn the intrinsic physical states as tokens behind the input meshes. Despite the success of neural operators, they are only tested on the dataset that contains one certain PDE. The effectiveness of these methods under large and diverse datasets has not been fully explored.
通过将偏微分方程方程式化为目标函数来实现。在训练过程中，模型输出和梯度将逐渐满足目标偏微分方程，从而成功地将解实例化为深度模型。然而，物理信息神经网络通常难以推广到未见过的偏微分方程，限制了它们在更广泛实践中的应用[41]。另一个蓬勃发展的方向是神经算子，它从大量数据中学习以近似输入和输出巴拿赫空间之间的函数依赖关系[25, 15]。在各种神经算子中，傅里叶神经算子[19]及其变体[18, 32, 42]是流行且成熟的。傅里叶神经算子[19]通过傅里叶变换在频域中有效地近似核积分算子。此外，潜在谱方法[43]在潜在空间中推广了经典谱方法以避免维数灾难。最近，鉴于 Transformer[40]取得的令人印象深刻的进展，它们也被应用于求解偏微分方程。然而，由于注意力机制的二次复杂度，应用 Transformer 的主要挑战是定义在偏微分方程求解任务中用于计算注意力的标记。 OFormer [16]和 GNOT [11]将每个网格点视为一个标记，并使用线性 Transformer 来避免复杂性问题。Factformer [17]轴向分解注意力块以提高模型效率。最近，Transolver [44]提出学习输入网格背后的内在物理状态作为标记。尽管神经算子取得了成功，但它们仅在包含某一特定偏微分方程的数据集上进行了测试。这些方法在大型和多样化数据集上的有效性尚未得到充分探索。

2.2 Generalizable PDE Solvers
2.2 通用偏微分方程求解器

In addition to model architectures, the generalizability of neural solvers, the major advantage over numerical solvers, has also been explored. The research mainly lies in the following two directions.
除了模型架构，神经求解器的泛化能力（相对于数值求解器的主要优势）也得到了探索。研究主要集中在以下两个方向。

Incorporating PDE information To guide the PDE-solving process, the PDE information has been introduced to deep models. For example, PINO [20] imposes explicit equation constraints at a higher resolution to assist the learning of neural operators. CAPE [36] embeds PDE coefficients to adapt neural solvers to unseen PDE parameters. Prose [21] and PITT [23] tokenize PDEs and embed the mathematical expressions to make the transformer backbone aware of the underlying physics. PDEformer [45] represents the symbolic form of equations as a graph and the numeric components as nodes to optimize the processing of complex interactions between symbolic and numeric information. Still, all of these methods, while incorporating equation information, do not leverage the mathematical structure of PDEs for complete and categorized embedding or integrating the meanings of equation symbols within the context of natural language. In contrast, Unisolver applies powerful Large Language Models (LLMs) [38] to semantically embed the equation symbolic information and categorize the complete equation conditions based on mathematical insights, thereby better at modeling intricate physical correlations.
融入 PDE 信息为引导 PDE 求解过程，PDE 信息已被引入深度模型中。例如，PINO [20]在更高分辨率下施加显式方程约束来辅助神经算子的学习。CAPE [36]嵌入 PDE 系数以使神经求解器适应未见过的 PDE 参数。Prose [21]和 PITT [23]将 PDE 标记化并嵌入数学表达式，使 transformer 骨干网络了解底层物理原理。PDEformer [45]将方程的符号形式表示为图，将数值组件表示为节点，以优化符号和数值信息之间复杂交互的处理。然而，所有这些方法虽然融入了方程信息，但并未利用 PDE 的数学结构进行完整和分类嵌入，也未在自然语言语境中整合方程符号的含义。相比之下，Unisolver 应用强大的大型语言模型（LLMs）[38]来语义化嵌入方程符号信息，并基于数学洞察对完整方程条件进行分类，从而更好地建模复杂的物理相关性。

Large-scale pre-training As a vital cornerstone of deep learning [4, 12], recent research has also started to explore the effectiveness of large-scale training in solving PDEs. In [35], authors examined the scaling capabilities and transfer learning behaviors of FNO on three time-independent equation families. MPP [27] proposes an auto-regressive strategy to pre-train on a broad fluid mechanics-oriented benchmark. DPOT [10] enhances MPP with the denoising method and pre-trains a Fourier transformer on massive PDE data comprised of 12 datasets. PDEformer [45] focuses on a 1D time-dependent PDE family and pre-trains a graph transformer on 3M samples under various equation conditions. However, most of the existing methods fall short in effectively and completely integrating PDE conditions. This will be well addressed by Unisolver in a natural and insightful way.
作为深度学习的重要基石[4, 12]，近期研究也开始探索大规模训练在解决偏微分方程方面的有效性。在[35]中，作者研究了 FNO 在三个时间无关方程族上的扩展能力和迁移学习行为。MPP [27]提出了一种自回归策略，在广泛的流体力学导向基准上进行预训练。DPOT [10]通过去噪方法增强了 MPP，并在由 12 个数据集组成的大量偏微分方程数据上预训练了一个傅里叶变换器。PDEformer [45]专注于一维时间相关偏微分方程族，并在 300 万个样本上预训练了一个图变换器，涵盖各种方程条件。然而，大多数现有方法在有效且完整地整合偏微分方程条件方面仍有不足。Unisolver 将以一种自然且富有洞察力的方式很好地解决这个问题。

3 Unisolver 3 通解器

To tackle the incapability in generalization behind neural PDE solvers, we deeply dive into the PDEsolving process and present Unisolver to model the intricate interactions between initial observations and complete equation components, leading to a novel PDE-conditional Transformer model.
为了解决神经偏微分方程求解器在泛化方面的不足，我们深入研究了偏微分方程求解过程，提出了 Unisolver 来模拟初始观测值与完整方程组件之间的复杂相互作用，从而提出了一种新型的偏微分方程条件下的 Transformer 模型。
Problem setup Let domain

be a bounded continous set and

be a

-point discretization of

, recording the coordinate information of each point. Assume that we have observations of initial conditions

as input and target quantities

as output on mesh

with governed PDE components

(e.g. equation symbols, coefficients, etc) for each observation pair. The PDE-solving task is to approximate the input-PDE-output mapping

. For instance, in fluid prediction, we need to predict the future vorticity field based on the initial field, observation grid, as well as PDE equations, viscosity coefficients, and force terms.
问题设置让域

为一个有界连续集合，

为

的

点离散化，记录每个点的坐标信息。假设我们在网格

上有初始条件

的观测作为输入，目标量

作为输出，每对观测都有相应的 PDE 组件

（如方程符号、系数等）。PDE 求解任务是近似输入-PDE-输出映射

。例如，在流体预测中，我们需要根据初始场、观测网格以及 PDE 方程、粘性系数和力项来预测未来的涡度场。

3.1 Complete PDE Components
3.1 完成 PDE 组件

To enable complete modeling of the PDE, we attempt to incorporate all the underlying components that affect final solutions into neural solvers. Here, we elucidate the complete PDE components by considering the classical vibrating string equation with fixed endpoints as a motivating example:
为了实现偏微分方程的完整建模，我们尝试将影响最终解的所有基础组成部分纳入神经求解器中。在这里，我们通过考虑固定端点的经典振动弦方程作为一个激励性例子来阐明完整的偏微分方程组成部分：

The equation formulation precisely models the physical laws governing string vibrations. In this PDE, the coefficient

is determined by physical properties, such as tension and linear density, while

represents the external force that drive the vibrations of the string. The domain geometry spans the range

, which affects the mesh discretization in numerical solving, especially for irregular geometries. In addition, Equation (1b) sets Dirichlet boundary conditions at the endpoints, maintaining zero displacement throughout the vibration period. Equation (1C) specifies the string's position

and velocity

at time zero. These complete equation components theoretically determine a unique solution. In the following theorem, we dive into the PDE-solving process and explore how the equation components affect the final solution.
方程式的构建精确地模拟了支配弦振动的物理定律。在这个偏微分方程中，系数

由物理性质决定，如张力和线密度，而

代表驱动弦振动的外力。域的几何范围跨越了

，这影响了数值求解中的网格离散化，特别是对于不规则几何形状。此外，方程(1b)在端点设置狄利克雷边界条件，在整个振动周期内保持零位移。方程(1c)指定了弦在时间零点的位置

和速度

。这些完整的方程组成部分理论上确定了一个唯一解。在接下来的定理中，我们深入探讨偏微分方程的求解过程，并探究方程组成部分如何影响最终解。
Theorem 3.1 (The connections between PDE solutions and complete PDE components). The analytical solution of equation (1a) with boundary conditions

and initial conditions

is
定理 3.1（偏微分方程解与完整偏微分方程组成部分之间的联系）。方程（1a）在边界条件

和初始条件

下的解析解是

where

and

are odd, periodic functions with period

defined on the upper half plane, extended from

and

. The boundary conditions will be explicit by extending the equation to the upper half plane and solving it by operator splitting and characteristic lines.
其中

和

是奇函数，周期为

，定义在上半平面，由

和

扩展而来。通过将方程扩展到上半平面并使用算子分裂法和特征线法求解，边界条件将变得明确。

Proof. See Appendix B for the explicit solution and the complete proof.
证明。详见附录 B 中的显式解和完整证明。
From the analytical solution of the simple motivating example, we pinpoint that the PDE is solved under the complex interactions between a series of equation components. Specifically, the coefficient

exerts a consistent influence over the domain, while the impact of the external force

is imposed point-wisely. This distinction inspires us to classify the PDE components into two categories, domain-wise and point-wise, which better capture the intricate interactions.
从这个简单激励例子的分析解中，我们指出 PDE 是在一系列方程组件的复杂相互作用下解决的。具体来说，系数

对整个域施加一致影响，而外力

的影响则是点状施加的。这种区别启发我们将 PDE 组件分为两类：域级和点级，这更好地捕捉了复杂的相互作用。
As presented in Table 1 , the equation formulation is a domain-wise component due to its consistency across all locations. Domain geometry is a point-wise component since it is represented by a binary mask and each point's inclusion is determined individually. Boundary conditions are slightly more complex to handle. Commonly used boundary conditions
如表 1 所示，方程式的表述是一个适用于整个领域的组成部分，因为它在所有位置都保持一致。领域几何形状是一个点状组成部分，因为它是由二进制掩码表示的，每个点的包含与否都是单独确定的。边界条件的处理稍微复杂一些。常用的边界条件
are divided into two main types: periodic boundary conditions and Robin (non-periodic) conditions. We classify the boundary condition type as a domain-wise component and the specific boundary value functions with zero values at non-boundary points as point-wise components
分为两种主要类型：周期性边界条件和罗宾（非周期性）条件。我们将边界条件类型归类为域级组件，将在非边界点处值为零的特定边界值函数归类为点级组件。

3.2 Universal Components Embedding
3.2 通用组件嵌入

As described in Section 3.1. PDE solutions can be obtained by intricate interactions between initial conditions and multiple equation components classified into two categories. In previous works, these components are coarsely and incompletely included as conditions by using them to modulate the input observations. In this paper, we will elaborate on how Unisolver finely and completely embeds all equation components (Table 1) into deep PDE conditions based on our insights from the analysis.
如第 3.1 节所述，PDE 解可以通过初始条件与分为两类的多个方程组件之间的复杂交互来获得。在以往的研究中，这些组件被粗略且不完整地作为条件包含，用于调节输入观测。在本文中，我们将详细阐述 Unisolver 如何基于我们从分析中获得的见解，将所有方程组件（表 1）精细且完整地嵌入到深度 PDE 条件中。

Equation formulation Since the mathematical symbols convey rich mathematical information, we utilize a Large Language Model (LLM) for symbolic embedding. Specifically, we employ the
方程式公式化由于数学符号传达丰富的数学信息，我们利用大型语言模型（LLM）进行符号嵌入。具体来说，我们采用

Figure 2: Overview of Unisolver. We universally embed all PDE components into deep conditions and employ a conditional Transformer to aggregate deep conditions in the decoupled subspace.
图 2：Unisolver 概述。我们将所有偏微分方程组件普遍嵌入深度条件中，并使用条件 Transformer 在解耦子空间中聚合深度条件。
recently released LLaMA 38 B model

to embed the equation formulation, leveraging its symbolic understanding ability from pre-training on 15 TB language tokens. The input to the LLM is the LaTeX code of the equation formulation. For example, the string vibration equation is prompted as
最近发布的 LLaMA 38B 模型

嵌入方程公式，利用其从 15 TB 语言标记预训练中获得的符号理解能力。LLM的输入是方程公式的 LaTeX 代码。例如，弦振动方程的提示如下

Prompt: "\partial_\{tt\} u - a^2\partial_\{xx\} u = f(x,t)"

Then we take the output of the last Transformer block of the LLM and compute the mean across the sequence dimension, resulting in a 4096-dimensional embedding for each equation. Since we embed the complete equation components separately, we represent the other components, e.g. coefficients and external force, as symbols in the prompt rather than using their corresponding values. After LLM embedding, the hidden formulation is encoded by a two-layer MLP to obtain deep conditions.
然后我们取LLM最后一个 Transformer 块的输出，并在序列维度上计算平均值，得到每个方程的 4096 维嵌入。由于我们分别嵌入完整的方程组件，我们将其他组件（如系数和外力）表示为提示中的符号，而不是使用它们对应的值。在LLM嵌入之后，隐藏公式通过两层 MLP 进行编码，以获得深层条件。

Other components Other domain-wise components encompass coefficients, which are real-valued vectors, and boundary types, which are analogous to class labels. Thus we embed them using two MLP layers with in-between SiLU activation function [7]. Moreover, point-wise components include external force, binary geometry mask, and boundary value functions, which are physical fields observed on mesh

. We employ the same patchify method used for the input observations, transforming them into token sequences and linearly mapping each token to a deep representation.
其他组成部分其他特定领域的组成部分包括系数（实值向量）和边界类型（类似于类标签）。因此，我们使用两个 MLP 层嵌入它们，中间使用 SiLU 激活函数[7]。此外，点态组成部分包括外力、二进制几何掩码和边界值函数，这些都是在网格

上观察到的物理场。我们采用与输入观测相同的分块方法，将它们转换为令牌序列，并将每个令牌线性映射到深层表示。

Deep condition consolidation After universal components embedding, deep conditions within the same category are aggregated together to consolidate their influence. This strategy is advantageous because it prevents excessive separation of component conditions that could weaken the model's expressive capabilities, and thus will enhance representation learning for PDE solving via pre-training. Additionally, it allows for easy adaptation to new conditions in downstream tasks by only requiring finetuning a zero-initialized MLP.
深度条件合并在通用组件嵌入之后，同一类别内的深度条件被聚合在一起以巩固它们的影响。这种策略是有利的，因为它防止了组件条件的过度分离，这可能会削弱模型的表达能力，从而通过预训练增强 PDE 求解的表示学习。此外，它只需要微调一个零初始化的 MLP，就能轻松适应下游任务中的新条件。

3.3 PDE-Conditional Transformer
3.3 PDE 条件变换器

We propose a conditional Transformer to adaptively fuse these deep conditions embedded from the equation components, empowered by the projections of input observations in decoupled subspaces.
我们提出了一种条件 Transformer，通过在解耦子空间中对输入观测进行投影，自适应地融合从方程组成部分嵌入的深度条件。

Subspace decoupling We split the hidden inputs along the feature dimension guided by a hyperparameter, the domain-wise condition ratio

, which is set to 0.5 by default. This ratio determines the subspace for domain-wise conditions with dimension

, while the remaining subspace of dimension

is used by point-wise conditions. When combined with multi-head attention [40], subspace decoupling is equivalent to assigning some heads to learn the impact of domain-wise conditions while others focusing on point-wise conditions. This leads to improved representation learning for both categories, and minimized interference across the subspace.
子空间解耦我们根据超参数域条件比率

（默认设置为 0.5）沿特征维度分割隐藏输入。这个比率决定了域条件子空间的维度

，而剩余的维度

的子空间则用于点条件。当与多头注意力[40]结合时，子空间解耦相当于指定一些头学习域条件的影响，而其他头则专注于点条件。这导致两类条件的表示学习得到改善，并最小化了跨子空间的干扰。

Deep condition aggregation We utilize MLPs to individually project domain-wise conditions and point-wise conditions into the corresponding subspace. After projection, domain-wise conditions are
深度条件聚合我们利用 MLP 分别将领域级条件和点级条件投影到相应的子空间。投影后，领域级条件被

\footnotetext{ \footnotetext{ 被译文本：

https://ai.meta.com/blog/meta-llama-3/
repeated along the sequence dimension to match the length of hidden tokens and ensure consistent physical guiding along the entire sequence. The transformed conditions convey both token-wise and feature-wise information, which are then integrated adaptively by aggregation functions.
在序列维度上重复以匹配隐藏标记的长度，并确保整个序列的一致性物理引导。转换后的条件传递了标记级和特征级的信息，然后通过聚合函数进行自适应整合。

As shown in Figure 2, we aggregate conditions either before or after the attention and feedforward modules. Inspired by recent conditional Transformer advances like DiT [29] and other conditional normalization approaches [28, 30], we take the aggregation paradigm to finely capture the intricate correlations between hidden inputs of initial observations and deep equation conditions. Specifically, we scale and shift the hidden inputs based on the conditions. After passing through the Transformer modules, we use the conditions to softly select whether this information should be retained.
如图 2 所示，我们在注意力和前馈模块之前或之后聚合条件。受到最近条件 Transformer 进展的启发，如 DiT [29]和其他条件归一化方法[28, 30]，我们采用聚合范式来精细捕捉初始观测的隐藏输入与深度方程条件之间的复杂相关性。具体来说，我们根据条件对隐藏输入进行缩放和平移。在通过 Transformer 模块后，我们使用条件来软性选择是否应该保留这些信息。

Overall design We embed input of initial observations

using a patchification layer similar to ViT [6] to obtain the input embeddings

, and embed the complete PDE equation components

following Section 3.2 to obtain deep conditions

and

. Suppose there are

layers in the PDE-conditional Transformer architecture, the

-th layer of Unisolver can be formalized as follows:
总体设计我们使用类似于 ViT [6]的分块层嵌入初始观测

的输入，以获得输入嵌入

，并按照 3.2 节嵌入完整的 PDE 方程组件

，以获得深度条件

和

。假设 PDE 条件 Transformer 架构中有

层，Unisolver 的第

层可以形式化如下：

where

scale, shift, select

, and

is the output of the

-th layer. As the PDE components have a crucial impact on the scale of the output, we apply conditional scale and shift operations before a linear projection on

to obtain the final output

, as shown in Figure 2
其中

缩放、偏移、选择

，而

是第

层的输出。由于 PDE 组件对输出的尺度有重要影响，我们在对

进行线性投影以获得最终输出

之前，应用条件缩放和偏移操作，如图 2 所示。
While our model takes similar aggregation mechanisms as DiT, it includes several PDE-inspired differences. First, we do not use a diffusion model because PDE solving demands accuracy more than diversity. Second, our conditions are drawn from equation components with physical insights, unlike the timesteps and text descriptions typically used in generative models [13]. Third, our conditions are not confined to scalar values; instead, we decouple the feature space and leverage point-wise conditions special to PDEs to learn interactions between initial observations and PDE components.
虽然我们的模型采用了与 DiT 类似的聚合机制，但它包含了几个受偏微分方程启发的差异。首先，我们不使用扩散模型，因为偏微分方程求解更需要准确性而非多样性。其次，我们的条件来自具有物理洞察力的方程组成部分，不同于通常用于生成模型的时间步长和文本描述[13]。第三，我们的条件不局限于标量值；相反，我们解耦特征空间，并利用特定于偏微分方程的点态条件来学习初始观测和偏微分方程组成部分之间的相互作用。

4 Experiments 4 个实验

We conduct extensive experiments to evaluate Unisolver on a challenging, self-generated Heterogeneous 2D Navier-Stokes Equations dataset and two large-scale benchmarks proposed by PDEformer [45] and DPOT [10], covering a broad range of PDEs and diverse generalization scenarios.
我们进行了广泛的实验，在一个具有挑战性的自生成异质二维纳维-斯托克斯方程数据集以及 PDEformer [45]和 DPOT [10]提出的两个大规模基准上评估 Unisolver，涵盖了广泛的偏微分方程和多样化的泛化场景。

Benchmarks As summarized in Table 2, our experiments involve three large-scale datasets and numerous equation conditions in both 1D and 2D spaces. Specifically, we generate the HeterNS dataset as an extension for the NS dataset of FNO [19], including multiple coefficients and forces. PDEformer proposes a large-scale dataset with 3M samples of structured 1D PDEs and evaluates the pre-trained model on PDEBench [37]. DPOT collects 12 datasets from FNO [19], PDEBench [37], PDEArena [9] and CFDBench [26] for diversity. More details can be found in Appendix C.
如表 2 所总结，我们的实验涉及三个大规模数据集和 1D 和 2D 空间中的多个方程条件。具体来说，我们生成 HeterNS 数据集作为 FNO [19] NS 数据集的扩展，包括多个系数和力。PDEformer 提出了一个包含 300 万个结构化 1D PDE 样本的大规模数据集，并在 PDEBench [37]上评估预训练模型。DPOT 从 FNO [19]、PDEBench [37]、PDEArena [9]和 CFDBench [26]收集了 12 个数据集以增加多样性。更多详细信息可以在附录 C 中找到。
Table 2: Summary of benchmarks.

indicates the PDE component will change among different samples, while

refers to unchanged ones. Unisolver can handle all kinds of changed components.
表 2：基准测试总结。

表示 PDE 组件在不同样本间会发生变化，而

指不变的组件。Unisolver 能够处理所有类型的变化组件。

Benchmarks 基准	#Dim	#Resolution	# Samples # 样本	#Size	\|Symbols \|符号	Coefficien 系数	Force 力量	Geometry 几何学	Boundary 边界
	2D+	(6	15k 1.5 万	\|4.6 GB
DEfo		(256,	3M	300 GB
DPOT [10]	2D+Time 2D+时间		74.1k 7.41 万	384 GB

Baselines We compare Unisolver with three advanced baselines on the HeterNS dataset to demonstrate its generalizability under varied conditions, including classical ViT [6], FNO [19] and current state-of-the-art neural operator Factformer [17]. Specifically, to ensure a fair comparison, we also include the additional PDE components used in Unisolver to these baselines by concatenating them with inputs. We also compare Unisolver with other generalizable solvers: PDEformer [45] and DPOT [10] on pre-training loss and zero-shot and fine-tuning settings.
基线模型我们将 Unisolver 与三个先进的基线模型在 HeterNS 数据集上进行比较，以展示其在不同条件下的泛化能力，包括经典的 ViT [6]、FNO [19]和当前最先进的神经算子 Factformer [17]。具体来说，为了确保公平比较，我们还将 Unisolver 中使用的额外 PDE 组件添加到这些基线模型中，方法是将它们与输入连接起来。我们还将 Unisolver 与其他可泛化求解器进行比较：PDEformer [45]和 DPOT [10]，比较内容包括预训练损失、零样本和微调设置。

Table 3: Performance comparison on HeterNS with different viscosity coefficients and fixed force frequency coefficient

. Relative L2 is recorded. Promtotion refers to the relative improvement over the second-best method. "ID" means in-distribution case and "OOD" means the out-of-distribution.
表 3：不同黏度系数和固定力频率系数

下 HeterNS 的性能比较。记录相对 L2 误差。改进指相对于第二好方法的相对提升。"ID"表示分布内情况，"OOD"表示分布外情况。

HeterNS	Viscosity 粘度	8e-6 8x10⁻⁶
HeterNS	Params 参数	OOD	ID	OOD	ID	OOD	ID	OOD	ID	OOD	ID	OOD
FNO	4.7 M 4.7 米	0.0702	0.0669	0.0373	0.0225	0.0141	0.0114	0.0088	0.0031		0.0011	0.2057
Factformer 因子转换器	9.4 M 940 万	0.0489	0.0438	0.0489	0.0128	0.0297	0.0064	0.1386	0.0018	0.0631	0.0010	0.3207
ViT ViT（视觉 transformer）	4.8 M 4.8 米	0.0458	0.0432	0.0353	0.0206	0.0119	0.0098	0.0100	0.0031	0.0174	0.0015	0.1878
Unisolver 联合解算器	4.1 M 4.1 米									0.0096
Promotion 促销	-									-

Table 4: Comparison (relative L2) on HeterNS with varied force and fixed viscosity

.
表 4：HeterNS 中不同力和固定粘度

的比较（相对 L2）

HeterNS 异构 NS	Force 强制	0.5	1	1.5	2	2.5	3	3.5
HeterNS 异构 NS	Params 参数	OOD	ID	OOD	ID	OOD	ID	OOD
FNO	4.7M	1.110	0.0640	0.1742	0.0661	0.1449	0.1623	0.2974
Factformer 事实构造者	9.4 M 940 万	0.9998	0.0326	0.1110	0.0438	0.1243	0.0803	0.2257
ViT ViT（Vision Transformer）	4.8 M 4.8 米	0.7900	0.0348	0.1412	0.0432	0.1240	0.1000	0.2080
Unisolver 一体化求解器	4.1M	0.0980	0.0244	0.0770	0.0321	0.0720	0.0720	0.1740
Promotion 促销	-	87.6%				41.9%	10.3%	16.4%

Implementations For fairness, in the HeterNS dataset, all the methods are trained with relative L2 loss for 300 epochs, using the ADAM [14] optimizer with an initial learning rate of 0.0005 and a cosine annealing learning rate scheduler [24]. The batch size is set to 60. In the large-scale PDEformer and DPOT benchmarks, we follow their training strategies to compare unbiasedly. We adopt relative L2 as the evaluation metric. See Appendix D for thorough implementation details.
为了公平起见，在 HeterNS 数据集中，所有方法都使用相对 L2 损失训练 300 个周期，采用 ADAM [14]优化器，初始学习率为 0.0005，并使用余弦退火学习率调度器[24]。批量大小设置为 60。在大规模 PDEformer 和 DPOT 基准测试中，我们遵循其训练策略以进行无偏比较。我们采用相对 L2 作为评估指标。详细的实现细节请参见附录 D。

4.1 Heterogeneous 2D Navier-Stokes Equation (HeterNS)
4.1 非均质二维纳维-斯托克斯方程（HeterNS）

Setups To test the model performance on diverse varied coefficients, we generate HeterNS, which is extended from the widely used 2D NS dataset [19]. The original 2D NS dataset only contains the fixed viscosity coefficient and external force for each task, which cannot be used to evaluate the effectiveness of the model in handling varied equation components. In contrast, HeterNS involves five different viscosity coefficients and three different external force within the family of trigonometric functions, resulting in 15 different equation component pairs and 15k samples. Specifically, HeterNS considers viscosity coefficient

and force frequency coefficient

. In this benchmark, we evaluate the in-distribution fitting ability and out-of-distribution generalization performance on different viscosity

and force

.
设置为了测试模型在各种不同系数下的性能，我们生成了 HeterNS，这是在广泛使用的 2D NS 数据集[19]基础上扩展而来的。原始 2D NS 数据集仅包含每个任务的固定粘性系数和外力，无法用于评估模型处理不同方程组件的有效性。相比之下，HeterNS 涉及五种不同的粘性系数和三种不同的三角函数族外力，产生 15 种不同的方程组件对和 15k 个样本。具体来说，HeterNS 考虑粘性系数

和力频率系数

。在这个基准测试中，我们评估模型在不同粘性

和力

下的分布内拟合能力和分布外泛化性能。

Results As shown in Tables 344. Unisolver achieves the best performance on 10 of 11 tasks, covering both ID and OOD settings. Notably, for external force generalization, Unisolver beats other methods more significantly in OOD conditions (average

) than ID conditions (average 20.4%), demonstrating the effectiveness of our design in capturing generalizable physics relations between external force and model inputs. Even though we explicitly concatenate the condition information with the model inputs, without special designs for PDE conditions, most advanced neural operators perform unremarkably on HeterNS. Specifically, all neural operators fail to solve the case of

in Table 4, highlighting the benefits of our conditional Transformer design in learning generalizable features. We also include experiments in Appendix E.1 where both

and

are OOD. Unisolver still achieves considerable improvement (average

) on this challenging double OOD setting.
结果如表 3-4 所示，Unisolver 在 11 项任务中的 10 项上取得了最佳性能，涵盖了 ID 和 OOD 设置。值得注意的是，对于外部力泛化，Unisolver 在 OOD 条件下（平均

）比 ID 条件下（平均 20.4%）更显著地超越了其他方法，证明了我们的设计在捕捉外部力与模型输入之间可泛化的物理关系方面的有效性。尽管我们明确地将条件信息与模型输入连接起来，但由于没有针对 PDE 条件的特殊设计，大多数先进的神经算子在 HeterNS 上表现平平。具体而言，所有神经算子都无法解决表 4 中

的情况，突显了我们条件 Transformer 设计在学习可泛化特征方面的优势。我们还在附录 E.1 中包含了实验，其中

和

都是 OOD。Unisolver 在这个具有挑战性的双 OOD 设置上仍然取得了显著的改进（平均

）。

4.2 1D Time-dependent PDEs
4.2 一维时变偏微分方程

Setups Proposed by PDEformer [45], this benchmark contains 3 million high-quality 1D timedependent PDE samples with varied equation symbols, coefficients, forces, and boundary conditions for pre-training, with periodic and Robin boundary conditions each comprising half of the dataset. The input for the pre-training task consists of all the PDE components, and the output is the full space-time field. Similar to PDEformer, we utilize an adapted version of Poly-INR [34] to decode
PDEformer[45]提出的这个基准包含 300 万个高质量的 1D 时间相关 PDE 样本，具有不同的方程符号、系数、力和边界条件，用于预训练，其中周期性和罗宾边界条件各占数据集的一半。预训练任务的输入包括所有 PDE 组件，输出是完整的时空场。与 PDEformer 类似，我们使用改编版的 Poly-INR[34]进行解码。

Table 5: Performance (relative L2) comparison of pre-training and finetuning on 1D time-dependent PDEs. PDEformer-L means the large version of PDEformer. FT-100 means finetuning 100 epochs.
表 5：一维时间相关偏微分方程预训练和微调的性能（相对 L2）比较。PDEformer-L 表示 PDEformer 的大型版本。FT-100 表示微调 100 轮。

1D Time-dependent PDEs 1D 时变偏微分方程

Pretrain 预训练

Burgers 汉堡包

平流

Advection

Periodic 周期的

Robin 罗宾

PDEfo

22M

0.0211

0.0238

0.00744

0.0144

0.0393

0.0178

Unisolver 优力解算器

19 M 19 米

0.0107

0.0108

0.00513

0.00995

0.0299

0.0138

Promotion 促销

PDEformer-L (FT-100) PDEformer-L（FT-100）

22M

0.00364

0.0112

0.0331

0.00975

Unisolver (FT-100) 优力解（FT-100）

19 M 19 米

0.00105

0.00474

0.0170

0.00420

Promotion 促销

71.2%

57.7%

48.6%

59.6%

Table 6: Performance comparison across 12 pre-training datasets. Relative L2

is recorded.
表 6：12 个预训练数据集的性能比较。记录相对 L2

。

2D 混合偏微分方程

Mixed PDEs

方程式参数

Equations

Params

FNO-NS-

PDEBench-CNS-(M,

)

PDEArena-NS

CFDBench-NS 几何

CFDBench-NS

Geometry

Mean 平均值

1e-4

1e-3

Force 力量

DPOT-S

30M

5.53

4.42

1.31

1.53

3.37

1.19

1.87

3.79

0.66

9.91

31.6

0.70

5.50

Unisolver 统一求解器

33 M 33 米

4.17

3.36

0.61

1.2

2.89

1.0

1.5

4.39

0.45

6.87

0.5

4.54

Promotion 促销

24.0%

15.0%

13.3%

17.5%

DPOT

30 M 30 米

4.49

3.42

0.68

1.5

1.71

0.22

8.92

0.44

4.63

Unisolver-FT 统一求解器-FT

33M

3.82

2.79

0.31

0.95

1.99

1.01

1.34

1.37

0.20

6.67

26.8

0.47

3.98

Promotion 促销

5.69%

7.6%

14.0%

the PDE solution from the features extracted by Unisolver. After pre-training, the pre-trained model is evaluated using Burgers equation and Advection equation from PDEBench[37], where the PDE coefficients fall within the distribution of the pre-training data. Concretely, we recorded the results for zero-shot on Burgers and Advection, and finetuning on the these two dataset after 100 epochs.
Unisolver 提取的特征得出 PDE 解决方案。预训练后，使用来自 PDEBench[37]的伯格斯方程和平流方程评估预训练模型，其中 PDE 系数在预训练数据的分布范围内。具体而言，我们记录了伯格斯和平流方程的零样本结果，以及在这两个数据集上经过 100 个周期的微调结果。

Results Table 5 presents that the pre-training performance of Unisolver is significantly better than that of PDEformer-L on both periodic and Robin boundary conditions, which indicates that our design of incorporating PDE conditions is more effective than the computational graph utilized by PDEformer in modeling complex distributions. Additionally, UniSolver achieves better performance in zeroshot evaluations on downstream tasks, with an average improvement of

over PDEformer-L. Furthermore, fine-tuning on downstream tasks yields even more pronounced improvements for Unisolver (average 59.3%), highlighting its larger model capacity despite relatively fewer parameters.
表 5 显示，Unisolver 在周期性和 Robin 边界条件下的预训练性能明显优于 PDEformer-L，这表明我们将 PDE 条件纳入设计的方法比 PDEformer 使用的计算图在建模复杂分布方面更有效。此外，UniSolver 在下游任务的零样本评估中取得了更好的表现，平均比 PDEformer-L 提高了

。更重要的是，在下游任务上进行微调后，Unisolver 的改进更加显著（平均 59.3%），突显了其尽管参数相对较少但模型容量更大的优势。

4.3 2D Mixed PDEs 4.3 二维混合偏微分方程

Setups This benchmark is gathered by DPOT [10], which involves 12 datasets from four different sources, covering a wide range of PDEs. For conciseness, we adopt the "source-PDE" to record different tasks in Table 6, e.g. FNO-NS. The second row lists the primary conditions considered for each dataset. For example, 1e-5 represents the viscosity coefficient

. Unlike the wellbalanced data in PDEformer, these datasets exhibit significant imbalance among different conditions. To address this issue, we adopted the balanced data sampling method used in DPOT, but it still presents thorny challenges in tackling intricate and multifarious PDE samples.
设置该基准测试由 DPOT [10]收集，涉及来自四个不同来源的 12 个数据集，涵盖了广泛的偏微分方程。为简洁起见，我们在表 6 中采用"来源-PDE"来记录不同的任务，例如 FNO-NS。第二行列出了每个数据集考虑的主要条件。例如，1e-5 代表粘度系数

。与 PDEformer 中的平衡数据不同，这些数据集在不同条件下表现出显著的不平衡。为解决这个问题，我们采用了 DPOT 中使用的平衡数据采样方法，但在处理复杂多样的偏微分方程样本时仍然面临棘手的挑战。

Results We pre-trained Unisolver with 33M parameters to compare with DPOT-S, the small version of DPOT. As shown in Table 6, our method outperforms DPOT in 11 out of 12 datasets except for a small dataset DR whose relative L2 is less than

, verifying the effectiveness of our design in modeling complex conditions. After 200 epochs of fine-tuning on each dataset, Unisolver achieves a reduction in error of more than

compared with zero-shot results, demonstrating its ability to extract useful knowledge from complex pre-training datasets. These results further highlight the potential of Unisolver for serving as a foundation PDE solver.
结果我们预训练了具有 3300 万参数的 Unisolver，以与 DPOT 的小版本 DPOT-S 进行比较。如表 6 所示，我们的方法在 12 个数据集中的 11 个上都优于 DPOT，除了相对 L2 小于

的小数据集 DR，验证了我们在建模复杂条件方面设计的有效性。在每个数据集上经过 200 轮微调后，Unisolver 与零样本结果相比，误差减少了超过

，展示了其从复杂预训练数据集中提取有用知识的能力。这些结果进一步突显了 Unisolver 作为基础 PDE 求解器的潜力。

4.4 Model Analysis 4.4 模型分析

Ablations We conduct detailed ablations on 1D time-dependent PDEs of 50k samples, with 50% periodic and

Robin boundaries. From Table 7, we have the following observations. Firstly, without equation symbols, model performance on all tasks will drop consistently, demonstrating the necessity of introducing symbolic information. However, the choice of LLM has minimal
消融实验我们在具有 50k 个样本的 1D 时间相关偏微分方程上进行详细的消融实验，其中 50%为周期性边界条件，

为罗宾边界条件。从表 7 中，我们得出以下观察结果。首先，没有方程符号，模型在所有任务上的表现都会持续下降，这证明了引入符号信息的必要性。然而，LLM的选择影响最小

Table 7: Ablations on different LLM choices, removing our designs (W/o), and replacing domain-wise or point-wise conditions from our design to input concat (Concat-D and Concat-P).
表 7：对不同LLM选择的消融实验，移除我们的设计（W/o），以及将我们设计中的领域条件或点条件替换为输入拼接（Concat-D 和 Concat-P）。

L2RE 数据集

L2RE

Dataset

Different LLMs 不同的 LLMs

W/o 无

Condition Modeling | Ours
条件建模 | 我们的

T5 [31]

LLaMa-2 [39]

LLM Embedding LLM 嵌入

Subspaces 子空间

Concat-D 拼接-D

Concat-P 串联 P

Pretrain-periodic 周期性预训练

0.0329

0.0358

0.0331

0.0433

0.0327

| 0.0330

Pretrain-robin 预训练罗宾

0.0223

0.0224

0.0232

0.0243

0.0427

0.0306

| 0.0223

Burgers

汉堡包

0.0664

0.0647

0.0692

0.0675

0.0950

0.0769

0.0659

impact on performance, probably due to the limited ablation data and the prompt design. Secondly, removing subspace decoupling impairs the performance on robin boundaries more significantly for the insufficient modeling of point-wise conditions influenced by domain-wise conditions. Thirdly, replacing the conditional modeling with directly concatenating conditions to the initial inputs severely harms relation learning, further highlighting the advantages of our design.
对性能的影响可能是由于有限的消融数据和提示设计。其次，移除子空间解耦对罗宾边界的性能影响更为显著，这是由于对受域条件影响的点条件建模不足。第三，用直接将条件连接到初始输入的方式替代条件建模会严重损害关系学习，进一步凸显了我们设计的优势。

Case study To provide an intuitive comparison, we provide showcases on the HeterNS benchmark in Figure 3. First, all the presented trajectories are generated from the same initial condition but exhibit distinct final fields, underscoring the importance of PDE components. Further, it is observed that Unisolver significantly outperforms FNO under complex conditions, such as smaller

and larger

, particularly in OOD conditions. More showcases can be found in Appendix A
案例研究为了提供直观的比较，我们在图 3 中展示了 HeterNS 基准测试的案例。首先，所有呈现的轨迹都是从相同的初始条件生成的，但展现出不同的最终场，突显了偏微分方程组件的重要性。此外，观察到 Unisolver 在复杂条件下，如较小的

和较大的

，特别是在分布外（OOD）条件下，明显优于 FNO。更多案例展示可以在附录 A 中找到。

Figure 3: Showcases on error maps of FNO and Unisolver on HeterNS benchmark. All the cases have the same initial condition but different viscosity coefficients and force-frequency coefficients, which are presented in the first row. The left shows ID conditions, while the right depicts OOD conditions.
图 3：FNO 和 Unisolver 在 HeterNS 基准测试上的误差图展示。所有案例具有相同的初始条件，但不同的粘度系数和力频率系数，这些在第一行中呈现。左侧显示 ID 条件，而右侧描述 OOD 条件。
Scalability Scalability is vital for building a universal PDE solver. Figure 4 illustrates our exploration of Unisolver's scalability, where we progressively increase the training data by 60 times and the model parameters by 21 times. Unisolver exactly displays the scaling law, achieving better performance with increased data and parameters, posing the potential for a universal PDE solver.
可扩展性可扩展性对于构建通用偏微分方程求解器至关重要。图 4 展示了我们对 Unisolver 可扩展性的探索，我们逐步将训练数据增加 60 倍，模型参数增加 21 倍。Unisolver 完全展现了扩展定律，随着数据和参数的增加，性能得到提升，显示出成为通用偏微分方程求解器的潜力。

Figure 4: Data scalability (60x) and model scalability (21x) on the PDEformer benchmark. Relative L2 error for pre-training (periodic and Robin) and Burgers equation are plotted on a log-log scale.
图 4：PDEformer 基准测试中的数据可扩展性（60 倍）和模型可扩展性（21 倍）。预训练（周期性和罗宾）和伯格斯方程的相对 L2 误差以对数-对数尺度绘制。

5 Conclusions 5 个结论

To break the generalization bottleneck, this paper presents Unisolver as a PDE-conditional Transformer, which stems from the theoretical analysis of the PDE-solving process. Concretely, Unisolver proposes a complete set of PDE components and embeds these components into domain-wise and point-wise deep conditions separately for deep conditions. By integrating these conditions with Transformers through a decoupled mechanism, Unisolver can handle universal PDE components and achieve consistent state-of-the-art results across three challenging, large-scale benchmarks. Extensive analyses are provided to verify the performance, generalizability and scalability of our model.
为了突破泛化瓶颈，本文提出了 Unisolver 作为一种 PDE 条件下的 Transformer 模型，这源于对 PDE 求解过程的理论分析。具体而言，Unisolver 提出了一套完整的 PDE 组件，并将这些组件分别嵌入到域级和点级的深度条件中。通过将这些条件与 Transformer 以解耦机制集成，Unisolver 能够处理通用 PDE 组件，并在三个具有挑战性的大规模基准测试中 consistently 达到最先进的结果。本文还提供了广泛的分析，以验证我们模型的性能、泛化能力和可扩展性。

References 参考文献

[1] William F Ames. Numerical methods for partial differential equations. Academic press, 2014.
[1] William F Ames. 偏微分方程数值方法. 学术出版社, 2014.
[2] Vladimir Igorevich Arnol'd. Mathematical methods of classical mechanics. Springer Science & Business Media, 2013.
[2] 弗拉基米尔·伊戈列维奇·阿诺尔德. 经典力学的数学方法. 施普林格科学与商业媒体, 2013.
[3] Johannes Brandstetter, Daniel Worrall, and Max Welling. Message passing neural pde solvers. In ICLR, 2022 .
[3] Johannes Brandstetter、Daniel Worrall 和 Max Welling。消息传递神经 PDE 求解器。发表于 ICLR，2022 年。
[4] Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. In NeurIPS, 2020.
[4] Tom Brown、Benjamin Mann、Nick Ryder、Melanie Subbiah、Jared D Kaplan、Prafulla Dhariwal、Arvind Neelakantan、Pranav Shyam、Girish Sastry、Amanda Askell 等。语言模型是少样本学习器。发表于 NeurIPS，2020 年。
[5] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. In NAACL, 2019
[5] Jacob Devlin, Ming-Wei Chang, Kenton Lee 和 Kristina Toutanova. Bert: 用于语言理解的深度双向变换器的预训练. 发表于北美计算语言学年会（NAACL），2019 年
[6] Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR, 2020.
[6] Alexey Dosovitskiy、Lucas Beyer、Alexander Kolesnikov、Dirk Weissenborn、Xiaohua Zhai、Thomas Unterthiner、Mostafa Dehghani、Matthias Minderer、Georg Heigold、Sylvain Gelly 等。一张图像价值 16x16 个词：大规模图像识别的 Transformer 模型。发表于 ICLR，2020 年。
[7] Stefan Elfwing, Eiji Uchibe, and Kenji Doya. Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural networks, 2018.
[7] Stefan Elfwing, Eiji Uchibe, 和 Kenji Doya. 用于强化学习中神经网络函数逼近的 Sigmoid 加权线性单元. 神经网络, 2018.
[8] Lawrence C Evans. Partial differential equations. American Mathematical Society, 2022.
[8] Lawrence C Evans. 偏微分方程. 美国数学学会, 2022.
[9] Jayesh K Gupta and Johannes Brandstetter. Towards multi-spatiotemporal-scale generalized pde modeling. TMLR, 2023.
[9] Jayesh K Gupta 和 Johannes Brandstetter。走向多时空尺度广义偏微分方程建模。TMLR，2023。
[10] Zhongkai Hao, Chang Su, Songming Liu, Julius Berner, Chengyang Ying, Hang Su, Anima Anandkumar, Jian Song, and Jun Zhu. Dpot: Auto-regressive denoising operator transformer for large-scale pde pre-training. In ICML, 2024.
[10] 郝中凯、苏畅、刘松明、Julius Berner、应成阳、苏航、Anima Anandkumar、宋健、朱军。DPOT：用于大规模 PDE 预训练的自回归去噪算子变换器。ICML，2024 年。
[11] Zhongkai Hao, Chengyang Ying, Zhengyi Wang, Hang Su, Yinpeng Dong, Songming Liu, Ze Cheng, Jun Zhu, and Jian Song. Gnot: A general neural operator transformer for operator learning. In ICML, 2023.
[11] 郝中凯、应承阳、王正一、苏航、董银鹏、刘松明、程泽、朱军、宋健。Gnot：用于算子学习的通用神经算子变换器。发表于 ICML，2023 年。
[12] Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. Masked autoencoders are scalable vision learners. In

.
[12] Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, 和 Ross Girshick. 掩码自动编码器是可扩展的视觉学习器. 发表于

.
[13] Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine. Elucidating the design space of diffusion-based generative models. NeurIPS, 2022.
[13] Tero Karras、Miika Aittala、Timo Aila 和 Samuli Laine。阐明扩散型生成模型的设计空间。NeurIPS，2022。
[14] Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In ICLR, 2015.
[14] Diederik P. Kingma 和 Jimmy Ba。Adam：一种随机优化方法。发表于 ICLR 会议，2015 年。
[15] Nikola Kovachki, Zongyi Li, Burigede Liu, Kamyar Azizzadenesheli, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Neural operator: Learning maps between function spaces with applications to pdes. JMLR, 2023.
[15] Nikola Kovachki, Zongyi Li, Burigede Liu, Kamyar Azizzadenesheli, Kaushik Bhattacharya, Andrew Stuart, 和 Anima Anandkumar. 神经算子：学习函数空间之间的映射及其在偏微分方程中的应用. JMLR, 2023.
[16] Zijie Li, Kazem Meidani, and Amir Barati Farimani. Transformer for partial differential equations' operator learning. TMLR, 2023.
[16] Zijie Li, Kazem Meidani, 和 Amir Barati Farimani. 用于偏微分方程算子学习的 Transformer 模型. TMLR, 2023.
[17] Zijie Li, Dule Shu, and Amir Barati Farimani. Scalable transformer for pde surrogate modeling. In NeurIPS, 2024.
[17] Zijie Li, Dule Shu, 和 Amir Barati Farimani. 用于 PDE 代理建模的可扩展 transformer. 发表于 NeurIPS, 2024.
[18] Zongyi Li, Daniel Zhengyu Huang, Burigede Liu, and Anima Anandkumar. Fourier neural operator with learned deformations for pdes on general geometries. JMLR, 2023.
[18] Zongyi Li, Daniel Zhengyu Huang, Burigede Liu, 和 Anima Anandkumar. 用于一般几何上偏微分方程的学习变形傅里叶神经算子. JMLR, 2023.
[19] Zongyi Li, Nikola Borislavov Kovachki, Kamyar Azizzadenesheli, Burigede liu, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Fourier neural operator for parametric partial differential equations. In ICLR, 2021.
[19] Zongyi Li, Nikola Borislavov Kovachki, Kamyar Azizzadenesheli, Burigede liu, Kaushik Bhattacharya, Andrew Stuart, 和 Anima Anandkumar. 用于参数偏微分方程的傅里叶神经算子. 发表于 ICLR, 2021.
[20] Zongyi Li, Hongkai Zheng, Nikola Kovachki, David Jin, Haoxuan Chen, Burigede Liu, Kamyar Azizzadenesheli, and Anima Anandkumar. Physics-informed neural operator for learning partial differential equations. J. Data Sci., 2021.
[20] Zongyi Li, Hongkai Zheng, Nikola Kovachki, David Jin, Haoxuan Chen, Burigede Liu, Kamyar Azizzadenesheli, 和 Anima Anandkumar. 用于学习偏微分方程的物理信息神经算子. 数据科学杂志, 2021.
[21] Yuxuan Liu, Zecheng Zhang, and Hayden Schaeffer. Prose: Predicting operators and symbolic expressions using multimodal transformers. arXiv preprint arXiv:2309.16816, 2023.
[21] Yuxuan Liu, Zecheng Zhang, 和 Hayden Schaeffer. Prose: 使用多模态变换器预测运算符和符号表达式。arXiv 预印本 arXiv:2309.16816, 2023。
[22] Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. In ICCV, 2021.
[22] 刘泽, 林雨桐, 曹月, 胡翰, 魏益轩, 张政, Stephen Lin, 郭百宁. Swin transformer: 使用移位窗口的层次化视觉 transformer. 发表于国际计算机视觉会议（ICCV），2021 年。
[23] Cooper Lorsung, Zijie Li, and Amir Barati Farimani. Physics informed token transformer for solving partial differential equations. Mach. Learn.: Sci. Technol, 2024.
[23] Cooper Lorsung、Zijie Li 和 Amir Barati Farimani。用于求解偏微分方程的物理信息令牌转换器。机器学习：科学技术，2024。
[24] Ilya Loshchilov and Frank Hutter. Sgdr: Stochastic gradient descent with warm restarts. In ICLR, 2016.
[24] Ilya Loshchilov 和 Frank Hutter。SGDR：带有热重启的随机梯度下降。发表于 ICLR，2016 年。
[25] Lu Lu, Pengzhan Jin, Guofei Pang, Zhongqiang Zhang, and George Em Karniadakis. Learning nonlinear operators via deeponet based on the universal approximation theorem of operators. Nat. Mach. Intell, 2021.
[25] Lu Lu、Pengzhan Jin、Guofei Pang、Zhongqiang Zhang 和 George Em Karniadakis。基于算子通用近似定理通过 DeepONet 学习非线性算子。自然机器智能，2021。
[26] Yining Luo, Yingfa Chen, and Zhen Zhang. Cfdbench: A comprehensive benchmark for machine learning methods in fluid dynamics. arXiv preprint arXiv:2310.05963 2023.
[26] 罗一宁、陈颖发和张臻。CFDbench：流体动力学中机器学习方法的综合基准测试。arXiv 预印本 arXiv:2310.05963 2023。
[27] Michael McCabe, Bruno Régaldo-Saint Blancard, Liam Holden Parker, Ruben Ohana, Miles Cranmer, Alberto Bietti, Michael Eickenberg, Siavash Golkar, Geraud Krawezik, Francois Lanusse, et al. Multiple physics pretraining for physical surrogate models. In NeurIPS AI for Science Workshop, 2023.
[27] Michael McCabe、Bruno Régaldo-Saint Blancard、Liam Holden Parker、Ruben Ohana、Miles Cranmer、Alberto Bietti、Michael Eickenberg、Siavash Golkar、Geraud Krawezik、Francois Lanusse 等。物理替代模型的多物理预训练。发表于 2023 年 NeurIPS 科学人工智能研讨会。
[28] Taesung Park, Ming-Yu Liu, Ting-Chun Wang, and Jun-Yan Zhu. Semantic image synthesis with spatiallyadaptive normalization. In

.
[28] Taesung Park、Ming-Yu Liu、Ting-Chun Wang 和 Jun-Yan Zhu。具有空间自适应归一化的语义图像合成。发表于

。
[29] William Peebles and Saining Xie. Scalable diffusion models with transformers. In ICCV, 2023.
[29] William Peebles 和 Saining Xie. 基于 transformer 的可扩展扩散模型. 发表于国际计算机视觉会议(ICCV), 2023.
[30] Ethan Perez, Florian Strub, Harm De Vries, Vincent Dumoulin, and Aaron Courville. Film: Visual reasoning with a general conditioning layer. In

.
[30] Ethan Perez、Florian Strub、Harm De Vries、Vincent Dumoulin 和 Aaron Courville。Film：使用通用调节层进行视觉推理。见

。
[31] Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. Exploring the limits of transfer learning with a unified text-to-text transformer. 2020.
[31] Colin Raffel、Noam Shazeer、Adam Roberts、Katherine Lee、Sharan Narang、Michael Matena、Yanqi Zhou、Wei Li 和 Peter J Liu。探索统一文本到文本转换器的迁移学习极限。2020 年。
[32] Md Ashiqur Rahman, Zachary E Ross, and Kamyar Azizzadenesheli. U-no: U-shaped neural operators. TMLR, 2023.
[32] Md Ashiqur Rahman, Zachary E Ross, 和 Kamyar Azizzadenesheli. U-no：U 形神经算子。TMLR，2023。
[33] Maziar Raissi, Paris Perdikaris, and George Em Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys., 2019.
[33] Maziar Raissi、Paris Perdikaris 和 George Em Karniadakis。物理信息神经网络：用于求解涉及非线性偏微分方程的正问题和反问题的深度学习框架。计算物理学杂志，2019 年。
[34] Rajhans Singh, Ankita Shukla, and Pavan Turaga. Polynomial implicit neural representations for large diverse datasets. In CVPR, 2023.
[34] Rajhans Singh、Ankita Shukla 和 Pavan Turaga。用于大型多样数据集的多项式隐式神经表示。发表于 CVPR, 2023。
[35] Shashank Subramanian, Peter Harrington, Kurt Keutzer, Wahid Bhimji, Dmitriy Morozov, Michael W Mahoney, and Amir Gholami. Towards foundation models for scientific machine learning: Characterizing scaling and transfer behavior. In NeurIPS, 2023.
[35] Shashank Subramanian、Peter Harrington、Kurt Keutzer、Wahid Bhimji、Dmitriy Morozov、Michael W Mahoney 和 Amir Gholami。面向科学机器学习的基础模型：表征扩展和迁移行为。发表于 NeurIPS，2023 年。
[36] Makoto Takamoto, Francesco Alesiani, and Mathias Niepert. Learning neural pde solvers with parameterguided channel attention. In ICML, 2023
[36] Makoto Takamoto、Francesco Alesiani 和 Mathias Niepert。通过参数引导的通道注意力学习神经 PDE 求解器。发表于 ICML，2023 年
[37] Makoto Takamoto, Timothy Praditia, Raphael Leiteritz, Daniel MacKinlay, Francesco Alesiani, Dirk Pflüger, and Mathias Niepert. Pdebench: An extensive benchmark for scientific machine learning. NeurIPS, 2022 .
[37] 高本诚、Timothy Praditia、Raphael Leiteritz、Daniel MacKinlay、Francesco Alesiani、Dirk Pflüger 和 Mathias Niepert。Pdebench：一个广泛的科学机器学习基准。NeurIPS，2022。
[38] Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, et al. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 2023.
[38] Hugo Touvron、Thibaut Lavril、Gautier Izacard、Xavier Martinet、Marie-Anne Lachaux、Timothée Lacroix、Baptiste Rozière、Naman Goyal、Eric Hambro、Faisal Azhar 等人。Llama：开放和高效的基础语言模型。arXiv 预印本 arXiv:2302.13971 2023。
[39] Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et al. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
[39] Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale 等人。Llama 2：开放基础和微调聊天模型。arXiv 预印本 arXiv:2307.09288，2023 年。
[40] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. Attention is all you need. In NeurIPS, 2017.
[40] Ashish Vaswani、Noam Shazeer、Niki Parmar、Jakob Uszkoreit、Llion Jones、Aidan N Gomez、Łukasz Kaiser 和 Illia Polosukhin。注意力机制就是你所需要的。发表于 NeurIPS，2017 年。
[41] Hanchen Wang, Tianfan Fu, Yuanqi Du, Wenhao Gao, Kexin Huang, Ziming Liu, Payal Chandak, Shengchao Liu, Peter Van Katwyk, Andreea Deac, et al. Scientific discovery in the age of artificial intelligence. Nature, 2023.
[41] 王涵晨, 付天帆, 杜远琦, 高文浩, 黄可欣, 刘子明, Payal Chandak, 刘圣超, Peter Van Katwyk, Andreea Deac, 等. 人工智能时代的科学发现. 自然, 2023.
[42] Gege Wen, Zongyi Li, Kamyar Azizzadenesheli, Anima Anandkumar, and Sally M Benson. U-fno-an enhanced fourier neural operator-based deep-learning model for multiphase flow. Adv. Water Resour., 2022.
[42] 葛戈文, 李宗易, Kamyar Azizzadenesheli, Anima Anandkumar, 和 Sally M Benson. U-fno-一种增强型傅里叶神经算子的多相流深度学习模型. 水资源进展, 2022.
[43] Haixu Wu, Tengge Hu, Huakun Luo, Jianmin Wang, and Mingsheng Long. Solving high-dimensional pdes with latent spectral models. In ICML, 2023.
[43] 吴海旭, 胡腾格, 罗华昆, 王建民, 龙明生. 使用潜在谱模型求解高维偏微分方程. 国际机器学习会议, 2023.
[44] Haixu Wu, Huakun Luo, Haowen Wang, Jianmin Wang, and Mingsheng Long. Transolver: A fast transformer solver for pdes on general geometries. In ICML, 2024.
[44] Haixu Wu, Huakun Luo, Haowen Wang, Jianmin Wang, 和 Mingsheng Long. Transolver：一种用于一般几何上偏微分方程的快速 Transformer 求解器。发表于 ICML，2024 年。
[45] Zhanhong Ye, Xiang Huang, Leheng Chen, Hongsheng Liu, Zidong Wang, and Bin Dong. Pdeformer: Towards a foundation model for one-dimensional partial differential equations. In ICLR AI4Differential Equations In Science Workshop, 2024.
[45] 叶展宏, 黄翔, 陈乐恒, 刘洪生, 王子东, 董彬. Pdeformer: 朝向一维偏微分方程的基础模型. 2024 年 ICLR 人工智能 4 微分方程在科学中的应用研讨会.
[46] Chengxuan Ying, Tianle Cai, Shengjie Luo, Shuxin Zheng, Guolin Ke, Di He, Yanming Shen, and Tie-Yan Liu. Do transformers really perform badly for graph representation? NeurIPS, 2021.
[46] 应成轩, 蔡天乐, 罗盛杰, 郑树鑫, 柯国霖, 何迪, 沈艳明, 刘铁岩. Transformer 真的在图表示方面表现不佳吗？NeurIPS, 2021.

A More Showcases 更多展示

We provide more showcases here as a supplement to the numerical results of the main text.
我们在此提供更多展示案例，作为正文数值结果的补充。

Figure 5: Showcases comparison for in-distribution (ID) data with three baselines on the HeterNS dataset. See Table 3 and 4 for numerical comparison. All data has the same initial condition but different viscosity coefficients and force frequency coefficients, as shown in the first row. Error maps are provided. A darker color indicates a lower error.
图 5：在 HeterNS 数据集上展示了与三个基线的同分布（ID）数据比较。数值比较见表 3 和 4。所有数据具有相同的初始条件，但粘度系数和力频率系数不同，如第一行所示。提供了误差图。颜色越深表示误差越低。

Figure 6: Showcases comparison for out-of-distribution (OOD) data with three baselines on the HeterNS dataset with the same initial conditions but different viscosity coefficients and force frequency coefficients. See Table 3 and 4 for numerical comparison. Error maps are provided. A darker color indicates a lower error.
图 6：在 HeterNS 数据集上，展示了三个基线模型对分布外（OOD）数据的比较，初始条件相同但粘度系数和力频率系数不同。数值比较见表 3 和表 4。提供了误差图，颜色越深表示误差越低。

Figure 7: Showcases comparison with PDEformer on the pre-training and downstream tasks of 1D time-dependent PDEs. See Table 5 for numerical comparison. The pre-training dataset contains both periodic and robin boundary conditions. The numbers in the Burgers columns refer to the diffusion coefficient

and the number in the Advection column indicates the constant advection speed

. Error maps are provided. A darker color indicates a lower error.
图 7：展示了在一维时间相关偏微分方程的预训练和下游任务上与 PDEformer 的比较。数值对比见表 5。预训练数据集包含周期性和罗宾边界条件。伯格斯方程列中的数字表示扩散系数

，对流方程列中的数字表示恒定对流速度

。提供了误差图。颜色越深表示误差越低。

Figure 8: Showcases of Unisolver on 2d mixed PDE benchmark that covers a wide range of PDE components. See Table 6 for numerical comparison with DPOT. Both prediction results and error maps are provided. As shown in the CFDBench-NS columns, Unisolver presents an impressive ability to handle different geometry conditions.
图 8：Unisolver 在 2D 混合偏微分方程基准测试中的展示，涵盖了广泛的偏微分方程组件。与 DPOT 的数值比较见表 6。提供了预测结果和误差图。如 CFDBench-NS 列所示，Unisolver 展现了处理不同几何条件的卓越能力。

B Analytical Solution for the String Vibration Equation
B 弦振动方程的解析解

Here we derive the analytical solution for the PDE 1ab with boundary conditions (1b) and initial conditions (1C.
我们在此推导出偏微分方程 1ab 的解析解，其中包括边界条件(1b)和初始条件(1C)。

We extend

and

as odd, periodic functions of

with period

, defining

and

as follows:
我们将

和

扩展为

的奇周期函数，周期为

，定义

、

和

如下：

Thus, we obtain a wave equation defined on the upper half-plane, with initial conditions

and

.
因此，我们得到了一个定义在上半平面的波动方程，其初始条件为

和

。

Lemma B.1. The solution of the simplified equation (9) is

.
引理 B.1. 简化方程(9)的解为

。

Proof. The above expression for

can be derived using the method of operator splitting and characteristic lines.
证明。上述

的表达式可以使用算子分裂法和特征线法推导得出。

Note that 请注意

Let 让

Then 然后

Using the method of characteristic lines for first-order equations, we obtain
对一阶方程使用特征线法，我们得到

Thus,

satisfies 因此，

满足

For this first-order equation, along the characteristic line passing through the point

, defined by

, we have
对于这个一阶方程，沿着通过点

并由

定义的特征线，我们有

By integrating on both sides, we get
通过对两边积分，我们得到

Therefore, 因此，

It is straightforward to verify it's indeed a solution by substituting it into the equation, given that

.
通过将其代入方程中进行验证很容易确认它确实是一个解，前提是

。

Inspired by the principle of linear superposition for linear problems, the solution

for equation (5) can be decomposed as
受线性问题线性叠加原理的启发，方程(5)的解

可以分解为

where

, and

are the solutions to the following three equations, respectively,
其中

和

分别是以下三个方程的解，

Lemma B.2. The three solutions have the following simple relationships. Denoting that

, we have
引理 B.2. 这三个解有以下简单关系。令

，我们有

Proof. These relations can be verified by substitution.
证明。这些关系可以通过代入验证。
Thus, the explicit expression of

is,
因此，

的显式表达式为，

Noting that due to the construction of

and

, the solution

of equation 5 is an odd function of

with a period of

. Moreover, it satisfies

, ensuring that the boundary conditions of the original string vibrating equation 1ab are met. Therefore, we obtain the solution of equation 1a):
注意到由于

和

的构造，方程 5 的解

是

的奇函数，周期为

。此外，它满足

，确保满足原始弦振动方程 1ab 的边界条件。因此，我们得到方程 1a)的解：

C Benchmarks C 语言基准测试

C. 1 HeterNS C. 1 异构网络切片

Similar to FNO [19], we consider the 2-d Navier-Stokes equation in vorticity formulation for the viscous, incompressible fluid on a unit torus:
类似于 FNO [19]，我们考虑单位环面上粘性不可压缩流体的二维 Navier-Stokes 方程的涡度形式：

We experiment with viscosity

and external force in the form

. Specifically, we pre-train models on a dataset with

and

, and test the models on both in-domain and out-of-domain equation components. Pretraining dataset can be found here

我们对粘度

和外力形式

进行实验。具体而言，我们在包含

和

的数据集上预训练模型，并在域内和域外方程组件上测试这些模型。预训练数据集可在此处找到

C. 2 1D Time-dependent PDEs
C. 2 一维时间相关偏微分方程

As described in PDEformer [45], this benchmark contains 3 million high equality 1D time-dependent PDEs with various equation components for pre-training and then evaluates the pre-trained model using two downstream tasks from PDEBench, e.g. advection equation and Burgers equation.
如 PDEformer [45]中所述，该基准包含 300 万个高质量的一维时间相关偏微分方程，具有各种方程组成部分，用于预训练，然后使用来自 PDEBench 的两个下游任务评估预训练模型，例如平流方程和伯格斯方程。

Pre-training The pre-training dataset is generated following the PDE family:
预训练数据集是按照 PDE 家族生成的：

where

. Each coefficient

is set to zero with probability 0.5 , and uniformly sampled from

otherwise.
其中

。每个系数

以 0.5 的概率被设置为零，否则从

中均匀采样。
Notably, terms with a zero coefficient will be excluded in the equation symbolic expression embedded by a LLM. The initial condition

is randomly generated within the family of trigonometric functions, which is same as downstream tasks.
值得注意的是，系数为零的项将在由LLM嵌入的方程符号表达式中被排除。初始条件

是在三角函数族中随机生成的，这与下游任务相同。

This dataset includes both periodic and non-periodic boundary conditions, with 1.5 million samples each. For the non-periodic cases, the boundary condition type at each endpoint are randomly selected from Dirichlet, Neumann, and Robin types. The Dirichlet condition specifies the solution value at the boundary, the Neumann condition specifies the derivative value at the boundary, and the Robin condition is a linear combination of the two, where Dirichlet and Neumann conditions can be seen as corner cases of the Robin condition.
该数据集包括周期性和非周期性边界条件，各有 150 万个样本。对于非周期性情况，每个端点的边界条件类型是从狄利克雷、诺伊曼和罗宾类型中随机选择的。狄利克雷条件指定边界上的解值，诺伊曼条件指定边界上的导数值，而罗宾条件是两者的线性组合，其中狄利克雷和诺伊曼条件可以被视为罗宾条件的特殊情况。
In conclusion, the domain-wise components of the pre-training dataset include equation symbolic expression, boundary condition types, and coefficients of

and point-wise components include the force fields

and

and boundary value functions. The input consists of the equation's initial condition with a resolution of 256 and above classified components, and the output is solution

with a resolution of

.
总之，预训练数据集的领域组成部分包括方程符号表达式、边界条件类型以及

的系数，而点态组成部分包括力场

和

以及边界值函数。输入包括方程的初始条件（分辨率为 256）和上述分类的组成部分，输出是分辨率为

的解

。

Downstream tasks We employed the following two 1D PDE datasets from PDEBench as downstream evaluation tasks. All tasks adhere to periodic boundary conditions and the same initial condition family. The resolution of these samples is

. For each dataset, we utilize the first 9000 samples for finetuning and the rest 1000 samples for testing. We downsample the spatial resolution of these datasets to 256 and maintain the temporal resolution unchanged. The downstream PDEs consist of the Advection equation and the Burgers equation.
下游任务我们使用了 PDEBench 中的以下两个一维 PDE 数据集作为下游评估任务。所有任务遵循周期性边界条件和相同的初始条件族。这些样本的分辨率为

。对于每个数据集，我们使用前 9000 个样本进行微调，剩余的 1000 个样本用于测试。我们将这些数据集的空间分辨率降采样到 256，并保持时间分辨率不变。下游 PDE 包括平流方程和伯格斯方程。
(1) Advection equation The advection equation models pure advection behavior without nonlinearity, which can be formalized as:
(1) 平流方程平流方程模拟纯平流行为，不包含非线性，可以形式化表示为：

where constant advection speed

and equation symbols are domain-wise components in this dataset. The exact solution of the equation is

. Only the periodic boundary condition is considered. The initial condition uses a super-position of sinusoidal waves as,

, where

are wave numbers and

are selected randomly in

.
其中恒定对流速度

和方程符号在此数据集中是区域性分量。方程的精确解为

。仅考虑周期性边界条件。初始条件使用正弦波的叠加，如

，其中

为波数，

在

中随机选择。

\footnotetext{ \footnotetext{ 脚注{

https://drive.google.com/drive/folders/1te5IyQHTznu_Kw7v3zDHg0i_KCHysPKw
(2) Burgers equation This fundamental equation in fluid mechanics models the non-linear behavior and diffusion process of fluid dynamics as
(2) 伯格斯方程这个流体力学中的基本方程以如下方式模拟流体动力学的非线性行为和扩散过程

where diffusion coefficient

and equation formulation are the domain-wise components. To align with the solution domain with pretraining dataset, we rescale the spatial-temporal coordinates

. The PDEs after the coordinate transformation are given by:
其中扩散系数

和方程式公式是分域的组成部分。为了使求解域与预训练数据集对齐，我们将时空坐标

重新缩放为

。坐标变换后的偏微分方程如下：

Burgers' equation: where .
伯格斯方程：，其中。
Advection equation: where .
平流方程：，其中。

C. 3 2D Mixed PDEs
C. 3 二维混合偏微分方程

This dataset is presented by DPOT [10], which consists of the following 12 datasets from 4 benchmarks. These datasets are aggregated to pre-train a foundational neural PDE solver.
该数据集由 DPOT [10]提供，包含来自 4 个基准测试的以下 12 个数据集。这些数据集被合并用于预训练一个基础神经偏微分方程求解器。

FNO-

This benchmark considers the 2D Navier-Stokes equation for a viscous, incompressible fluid in vorticity form on the unit torus. The task is to estimate the vorticity field of the future ten timesteps on a regular

grid based on the initial ten timesteps observations of the vorticity field. The only varying PDE component in this dataset is the viscosity coefficient, which varies in

. We utilize 1000,9800 , and 1000 instances for each viscosity value to pre-train or finetune our model, and use the rest 200 instances to test its performance.
FNO-

该基准考虑了单位环面上黏性、不可压缩流体的二维纳维-斯托克斯方程的涡量形式。任务是根据涡量场的初始十个时间步长观测，在规则的

网格上估计未来十个时间步长的涡量场。此数据集中唯一变化的偏微分方程组成部分是黏性系数，其变化范围在

内。我们对每个黏性值使用 1000、9800 和 1000 个实例来预训练或微调我们的模型，并使用剩余的 200 个实例来测试其性能。

PDEBench The following three datasets are derived from PDEBench, encompassing the compressible Navier-Stokes equation, the diffusion-reaction equation, and the shallow-water equation. All datasets considered in PDEBench adhere to periodic boundary conditions.
以下三个数据集源自 PDEBench，包括可压缩纳维-斯托克斯方程、扩散-反应方程和浅水方程。PDEBench 中考虑的所有数据集都遵循周期性边界条件。
(1) The compressible Navier-Stokes equation models compressible fluid dynamics, such as shock wave formation and propagation. In this dataset, two domain-wise components are considered, namely Mach number and shear viscosity. For these two components, this dataset provides four combinations, which are

. Each combination provides 9000 instances for training and 200 instances for testing. The instances in this dataset take the spatial resolution of

, and the task is to predict the future 11 timesteps of vorticity, pressure and density given the initial 10 timesteps of observations.
(1) 可压缩纳维-斯托克斯方程可以模拟可压缩流体动力学，如激波的形成和传播。在此数据集中，考虑了两个域级组件，即马赫数和剪切粘度。对于这两个组件，该数据集提供了四种组合，即

。每种组合提供 9000 个训练实例和 200 个测试实例。本数据集中的实例采用

的空间分辨率，任务是根据初始 10 个时间步长的观测预测涡度、压力和密度的未来 11 个时间步长。
(2) The shallow-water equations, derived from the general Navier-Stokes equations, model freesurface flow problems,
(2) 浅水方程由一般的纳维-斯托克斯方程推导而来，用于模拟自由表面流动问题，

where

describes the water depth,

describes a spatially varying bathymetry,

describes the gravitational acceleration, and

can be interpreted as the directional momentum.
其中

表示水深，

表示空间变化的海底地形，

表示重力加速度，

可以解释为方向动量。
The spatial resolution of this benchmark is

, and the task is to predict the future 91 timesteps of water depth given the first 10 timesteps of observations.
这个基准测试的空间分辨率为

，任务是根据前 10 个时间步的观测数据预测未来 91 个时间步的水深。
(3) The 2D Diffusion-Reaction Equation involves two non-linearly coupled variables, namely the activator

and the inhibitor

. It is primarily applicable for modeling biological pattern formation.
(3) 二维扩散-反应方程涉及两个非线性耦合变量，即激活剂

和抑制剂

。它主要适用于模拟生物模式的形成。

where

and

are the diffusion coefficient for the activator and inhibitor, respectively, and

and

are the corresponding reaction functions for the activator and inhibitor, which are defined by the Fitzhugh-Nagumo equation as,
其中

和

分别是激活剂和抑制剂的扩散系数，

和

分别是激活剂和抑制剂的相应反应函数，它们由 Fitzhugh-Nagumo 方程定义如下，

where

. The initial condition is generated as standard normal random noise

for

and

. The dataset is discretized into

, and

. The task is to predict the future 91 timesteps of

and

given the initial 10 timesteps of observations.

处。初始条件由标准正态随机噪声生成

，对于

和

。数据集被离散化为

，以及

。任务是根据给定的前 10 个时间步的观测值预测未来 91 个时间步的

和

。

PDEArena-NS1/2 The velocity function formulation of the incompressible Navier-Stokes equations is very common in real-life applications.
PDEArena-NS1/2 不可压缩纳维-斯托克斯方程的速度函数表述在实际应用中非常常见。

where

is the convection which means the rate of change of

along

is the viscosity, i.e. the diffusion or net movement of

is the internal pressure, and

is the external buoyancy force. The inclusion of the incompressibility constraint

ensures mass conservation in the Navier-Stokes equations. The spatial resolution is

. This benchmark contains 2 datasets, one with fixed external force and another with varying externel force. Given the initial 10 timesteps of observations, the task on the fixed-force dataset is to predict the future 4 timesteps of velocity, while on the varying-force dataset, the number of timesteps to predict is 46 .
其中

是对流，表示

沿

的变化率为粘性，即

的扩散或净移动，

是内部压力，

是外部浮力。包含不可压缩性约束

确保了纳维-斯托克斯方程中的质量守恒。空间分辨率为{{7}}。这个基准包含两个数据集，一个具有固定外力，另一个具有变化的外力。给定初始 10 个时间步长的观测，固定力数据集的任务是预测未来 4 个时间步长的速度，而在变化力数据集上，需要预测的时间步长数为 46。

CFDBench We consider three important and representative fluid problems that can comprehensively evaluate methods' capabilities to generalize to unseen operating conditions. They are (1) the flow in the lid-driven cavity, (2) the flow into the circular tube, (3) the flow around the cylinder.
CFDBench 我们考虑三个重要且具有代表性的流体问题，可以全面评估方法对未见操作条件的泛化能力。它们是：(1)盖驱动腔内流动，(2)圆管内流动，(3)圆柱绕流。

where

is the constant density,

is the dynamic viscosity,

is the velocity field, and

is the pressure. For each problem, flows are generated with different operating parameters, which is the combination of the three kinds of components: (1) the Boundary conditions, (2) the fluid physical properties including the density and the viscosity, and (3) the geometry of the field. The boundary conditions refer to the inlet velocity or the movement velocity according to different cases. Each kind of operating parameter corresponds to one subset. In each subset, the corresponding operating conditions are varied while other parameters remain constant. The task is to predict the future 10 timesteps of velocity given the initial 10 timesteps of observations.
其中

是常数密度，

是动态粘度，

是速度场，

是压力。对于每个问题，流体是通过不同的操作参数生成的，这些参数是以下三种组成部分的组合：(1)边界条件，(2)包括密度和粘度在内的流体物理特性，以及(3)场的几何形状。边界条件根据不同情况指入口速度或移动速度。每种操作参数对应一个子集。在每个子集中，相应的操作条件会发生变化，而其他参数保持不变。任务是根据初始 10 个时间步长的观测预测未来 10 个时间步长的速度。

D Implementation Details D 实现细节

D. 1 Metrics D. 1 指标

Relative L2 for physics fields We can calculate the relative L2 error of model prediction

and ground truth

as follows:
物理场的相对 L2 误差我们可以按以下方式计算模型预测

和真实值

的相对 L2 误差：

where

is the

-distance between the predicted solution

and the ground-truth solution

, and

is the

-norm of the ground-truth solution.
其中

是预测解

和真实解

之间的

距离，

是真实解的

范数。

Relative Promotion Given the error of our model

and the error of the second best model

, we can calculate the relative promotion as follows:
给定我们模型的误差

和第二好模型的误差

，我们可以按以下方式计算相对提升：

D. 2 Implementations for Each Benchmark
D. 每个基准测试 2 个实现

HeterNS As for the HeterNS dataset, the model hyperparameters of the baselines and Unisolver are summarized in Table 8. The count of trainable parameters of each model and the training configurations are already shown in Section 4 For baselines, the viscosity coefficient and the external force are concatenated to the model input along the channel dimension to ensure fair comparison.
关于 HeterNS 数据集，基线模型和 Unisolver 的模型超参数总结在表 8 中。每个模型的可训练参数数量和训练配置已在第 4 节中显示。对于基线模型，粘度系数和外力沿通道维度连接到模型输入中，以确保公平比较。

Table 8: Model hyperparameters of Unisolver and all baselines on the HeterNS benchmark.
表 8：Unisolver 和所有基准模型在 HeterNS 基准测试中的模型超参数。

Hyperparameter 超参数

Value 价值

Description 描述

FNO 模式频道深度

FNO

modes

channels

depths

傅里叶模式的截断数隐藏层中的通道数量神经网络中傅里叶层的数量

The truncation number of Fourier modes

The number of channels in the hidden layers

The number of Fourier Layers in the neural network

事实转换器暗淡头数头部维度深度

Factformer

dim

n_head

dim_head

depths

128

transformer 的隐藏维度注意力头数量每个注意力头的隐藏维度神经网络中 Transformer 块的数量

hidden dimension of the transformer

num of attention heads

hidden dimension of each attention heads

The number of Transformer Blocks in the neural network

视觉变换器注意力维度 MLP 维度补丁大小注意力头数头部维度深度

ViT

Attention dim

MLP dim

patch_size

n_head

dim_head

depths

256

Transformer 注意力层的隐藏维度 transformer FFN 层的隐藏维度 ViT 补丁的高度和宽度注意力头的数量每个注意力头的隐藏维度神经网络中 Transformer 块的数量

The hidden dimension of the transformer attention layer

The hidden dimension of the transformer FFN layer

The height and width of the ViT patches

num of attention heads

The hidden dimension of each attention heads

The number of Transformer Blocks in the neural network

统一求解器注意力维度 MLP 维度补丁大小 n_head 头部维度深度

Unisolver

Attention dim

MLP dim

patch_size

n_head

dim_head

depths

256

Transformer 注意力层的隐藏维度 Transformer FFN 层的隐藏维度 Unisolver 贴片的高度和宽度注意力头数量每个注意力头的隐藏维度神经网络中 Transformer 块的数量

The hidden dimension of the transformer attention layer

The hidden dimension of the transformer FFN layer

The height and width of the Unisolver patches

num of attention heads

The hidden dimension of each attention heads

The number of Transformer Blocks in the neural network

1D Time-dependent PDEs We conduct experiments using the checkpoints and dataset provided by the official repository of PDEformer

. We randomly select one subset with periodic boundaries and another subset with Robin boundaries from the pre-training datasets. Both of the subset contain 10,000 samples and are used to calculate the pre-training relative L2 error.
1D 时间相关偏微分方程我们使用 PDEformer 官方仓库

提供的检查点和数据集进行实验。我们从预训练数据集中随机选择一个具有周期性边界条件的子集和另一个具有罗宾边界条件的子集。这两个子集各包含 10,000 个样本，用于计算预训练相对 L2 误差。
For finetuning on the downstream tasks, we utilize the finetuning script provided in the repository and set the finetuning epochs to 100 for fair comparison. The pre-training and finetuing configurations for our model and the finetuing configurations for PDEformer are shown in Table 10
对于下游任务的微调，我们使用存储库中提供的微调脚本，并将微调轮数设置为 100 以进行公平比较。我们模型的预训练和微调配置以及 PDEformer 的微调配置如表 10 所示。

As shown in the model scalability experiments in Section 4.4 we progressively increase the Unisolver parameter from 3 M to 63 M , thus resulting in 4 different model configurations. We present the configurations of these models in Table 9
如第 4.4 节中的模型可扩展性实验所示，我们逐步将 Unisolver 参数从 3M 增加到 63M，从而产生 4 种不同的模型配置。我们在表 9 中展示了这些模型的配置。
2D Mixed PDEs The training hyperparameter and model configurations are presented in Table 11 As this dataset includes multiple distinct PDEs, each accompanied by its unique equation components, we introduce a binary masking channel, where an "1" indicates that the current component is a valid condition and a " 0 " indicates an invalid component.
2D 混合偏微分方程训练超参数和模型配置见表 11。由于该数据集包含多个不同的偏微分方程，每个方程都有其独特的方程组成部分，我们引入了一个二进制掩码通道，其中"1"表示当前组成部分是有效条件，"0"表示无效组成部分。

Table 9: Model configurations of Unisolver with different sizes.
表 9：不同规模 Unisolver 的模型配置

Parameter Count 参数数量	Attention dim 注意力维度	MLP dim MLP 维度	Layers 图层	Heads 头
3M	256	256	6	4
10M	384	384	8	8
19M	512	512	8	8
63M	768	768	12	12

Table 10: Pre-training and finetuning configurations on the 1D time-dependent PDE benchmark.
表 10：1D 时间相关偏微分方程基准测试的预训练和微调配置。

Parameter 参数	Value 价值	Description 描述
Unisolver Pre-training Unisolver 预训练
batch_size 批量大小	1024	Total batchsize used in one iteration 一次迭代中使用的总批量大小
learning_rate 学习率		The initial learning rate for the optimizer 优化器的初始学习率
epochs 世代	500	The total number of training epochs 总训练轮数
loss_type 损失类型	Relative-12 亲戚-12	Use relative L2-Norm for pretraining 在预训练中使用相对 L2 范数
optimizer 优化器	Adam 亚当	The optimization algorithm 优化算法
lr_scheduler 学习率调度器	Cosine Annealing 余弦退火	The learning rate scheduler 学习率调度器

Unisolver Finetuning Unisolver 微调
batch_size 批量大小	256	Total batchsize used in one iteration 一次迭代中使用的总批量大小
learning_rate 学习率	1 e-5 1×10^-5	The initial learning rate for the optimizer 优化器的初始学习率
epochs 世代	100	The total number of training epochs 总训练轮数
loss_type 损失类型	Relative-12 亲戚-12	Use relative L2-Norm for finetuning 使用相对 L2 范数进行微调
optimizer 优化器	Adam 亚当	The optimization algorithm 优化算法
lr_scheduler 学习率调度器	Cosine Annealing 余弦退火	The learning rate scheduler 学习率调度器

PDEformer Finetuning PDEformer 微调
batch_size

Total batchsize used in one iteration
批次大小

每次迭代中使用的总批次大小
learning_rate

The initial learning rate for the optimizer
学习率

优化器的初始学习率
epochs

The total number of training epochs
周期

总训练周期数
loss_type

Relative-12 Use relative L2-Norm for finetuning
损失类型

Relative-12 使用相对 L2 范数进行微调
optimizer Adam The optimization algorithm
优化器 Adam 优化算法
lr_scheduler Cosine Annealing The learning rate scheduler
学习率调度器余弦退火
warmup_epochs

Epochs to linearly increase the learning rate
warmup_epochs

线性增加学习率的轮次

Table 11: Training configurations on the 2D mixed PDE benchmark.
表 11：二维混合偏微分方程基准测试的训练配置。

Parameter 参数	Value 价值	Description 描述
Unisolver Training Configurations Unisolver 培训配置
batch_size 批量大小	320	Total batchsize used in one iteration 单次迭代中使用的总批量大小
learning_rate 学习率		The initial learning rate for the optimizer 优化器的初始学习率
epochs 时期	1000	The total number of training epochs 训练轮数总数
loss_type 损失类型	Relative-12 亲属-12	Use relative L2-Norm for pretraining 使用相对 L2 范数进行预训练
optimizer 优化器	AdamW	The optimization algorithm 优化算法
lr_scheduler 学习率调度器	OneCycle 单周期	The learning rate scheduler 学习率调度器
warmup_epochs 预热轮数	200	Epochs to linearly increase the learning rate 线性增加学习率的周期
Unisolver Model Configurations Unisolver 模型配置
Attention dim 注意力维度	768	The hidden dimension of the transformer attention layer Transformer 注意力层的隐藏维度
MLP dim MLP 维度	768	The hidden dimension of the transformer FFN layer Transformer FFN 层的隐藏维度
patch_size 块大小	8	The height and width of the ViT patches ViT 图像块的高度和宽度
n_head 注意力头数	8	num of attention heads 注意力头的数量
dim_head 头维度	96	The hidden dimension of each attention heads 每个注意力头的隐藏维度
depths 深度	6	The number of Transformer Blocks in the neural network 神经网络中的 Transformer 块数量

E Additional Experiments E 补充实验

E. 1 OOD Viscosity and OOD External Force
E. 1 OOD 粘度和 OOD 外力

In addition to Table 3 and 4, we further compare the generalizability of Unisolver with other baselines on the both OOD viscosity coefficient and OOD external force of the HeterNS benchmark. Specifically, we generate nine different component pairs

, where

and

are both out of the pre-training distribution. The full results are recorded in Table 12
除了表 3 和 4 之外，我们还在 HeterNS 基准测试的分布外粘度系数和分布外外力上进一步比较了 Unisolver 与其他基线的泛化能力。具体而言，我们生成了九个不同的组件对

，其中

和

都在预训练分布之外。完整结果记录在表 12 中。

Table 12: Performance comparison on OOD viscosity

and OOD force frequency coefficients

. The pairs in the first row are in the form of

. For clarity, the best result is in bold.
表 12：对超分布粘度

和超分布力频系数

的性能比较。第一行中的对值采用

的形式。为清晰起见，最佳结果以粗体显示。

L2RE
FNO	0.1862	0.0640	0.1176	0.2404	0.4226	0.0873	0.1516	0.0655	1.3102
Factformer 事实变形金刚	0.2727	0.0735	0.1262	0.3238	0.2868	0.0589	0.1230	0.0495	0.3147
ViT	0.1961	0.0690	0.1075	0.2057	0.2226	0.0488	0.1305	0.0772	0.2276
Unisolver 通用求解器
Promotion 促销

E. 2 Hyperparamter Sensitivity
E. 2 超参数敏感性

As shown in Table E.2 we test the hyperparameter sensitivity of our model by changing the domainwise condition ratio

and keeping the other hyperparameters fixed. We pre-train models with domain-wise condition ratios of

and 0.75 on the 1D time-dependent PDE dataset, which contains 50 k samples, all adhering to periodic boundary conditions, and report the relative L 2 on the pre-training dataset and downstream Burgers dataset with

. In this case, the domain-wise ratio of 0.5 leads to the best results, with the ratio of 0.25 being the worst, which suggests that an appropriate ratio should be chosen in practice.
如表 E.2 所示，我们通过改变域条件比率

并保持其他超参数不变来测试我们模型的超参数敏感性。我们在包含 50k 个样本的一维时间相关偏微分方程数据集上预训练模型，域条件比率为

和 0.75，所有样本都遵循周期性边界条件，并报告预训练数据集和下游伯格斯数据集的相对 L2 误差，其中

。在这种情况下，0.5 的域条件比率导致最佳结果，0.25 的比率最差，这表明在实践中应选择适当的比率。

Table 13: Hyperparameter sensitivity of domain-wise condition ratio.
表 13：领域条件比例的超参数敏感性

L2RE	Domain-wise condition ratio 领域条件比率
Dataset 数据集	0.25	0.50	0.75
PDEformer-50k	0.0331	0.0298	0.0314
Burgers 汉堡	0.1127	0.1040	0.1070

E. 3 Discussion of Ablation
E. 3 消融实验讨论

Why Concat-P is better than ours in pretrain-periodic dataset? Since the periodic boundary does not have the boundary value functions, the model can concentrate on the dominated domain-wise conditions for concatenating the point-wise conditions with model input directly. It is similar to set the domain-wise condition ratio to 1 , which is beneficial for learning the complex relations of domain-wise conditions.
为什么 Concat-P 在预训练周期数据集上比我们的表现更好？由于周期边界没有边界值函数，模型可以直接将点态条件与模型输入连接，从而集中关注主导的域态条件。这类似于将域态条件比率设置为 1，有利于学习域态条件的复杂关系。

E. 4 Full Scalability E. 4 完全可扩展性

As a supplement to Figure 4 in the main text, we also conduct experiments on different downstream tasks and record the concrete data in Table 14 for clarity.
作为对正文图 4 的补充，我们还对不同的下游任务进行了实验，并将具体数据记录在表 14 中以便清晰说明。

Table 14: Scalability results on pre-training dataset and downstream tasks, as depicted in Figure 4
表 14：图 4 所示的预训练数据集和下游任务的可扩展性结果

L2RE 规模

L2RE

Scale

Data Scalability (Samples)
数据可扩展性（样本）

Model Scalability (Parameters)
模型可扩展性（参数）

50k 5 万

100k 10 万

200k 20 万

3 M 3 米

10M

19 M 19 米

63 M 63 米

Pretraining Dataset / Circ
预训练数据集 / 循环

0.0282

0.0241

0.0198

0.0112

0.0400

0.0267

0.0241

0.0183

Pretraining Dataset / Robin
预训练数据集 / 罗宾

0.0181

0.0162

0.0141

0.0099

0.0283

0.0185

0.0162

0.0128

Burgers

汉堡包

0.0161

0.0116

0.0081

0.0051

0.0143

0.0134

0.0116

0.0091

Burgers

汉堡

0.0649

0.0412

0.0260

0.0144

0.0552

0.0421

0.0412

0.0351

Burgers

汉堡包

0.1399

0.1003

0.0689

0.0299

0.1188

0.0976

0.1003

0.0889

E. 5 Detailed Compute Resources
E. 5 详细计算资源

Our models were trained on servers with A100-40G GPUs. Here we present the compute resources needed to reproduce our experiment results in Table 15
我们的模型在配备 A100-40G GPU 的服务器上进行训练。在表 15 中，我们展示了复现我们实验结果所需的计算资源。

Table 15: Compute resources for each of our benchmarks. For the HeterNS dataset, we report the compute resources needed for all the baselines and our model. For the rest two datasets, we report the compute resources needed for pre-training Unisolver only.
表 15：我们每个基准测试的计算资源。对于 HeterNS 数据集，我们报告了所有基线和我们的模型所需的计算资源。对于其余两个数据集，我们仅报告了预训练 Unisolver 所需的计算资源。

基准测试模型

Benchmarks

Models

HeterNS

| FNO | 函数数值算子

Factformer 事实转换器

ViT ViT（视觉变换器）

Unisolver 尤尼索尔弗

A100 GPU hours A100 GPU 小时数

100

3000

800

F Limitations and Future Work
F 局限性和未来工作

This paper presents Unisolver to solve PDEs under universal PDE components, which achieves impressive performance supported by extensive analyses and visualizations. However, our method is currently limited to grid data due to the patchifying process during the embedding of pointwise components. Actually, this limitation is shared in all the generalizable PDE solvers, such as PDEformer and DPOT. One foundation reason is the lack of suitable and large-scale irregular mesh datasets, which will require extremely high computation costs for generation. Since we mainly focus on the foundation model design in this paper, we would like to leave the mesh dataset as a future work. And the capability to handle irregular meshes of Unisolver can be achieved by replacing the canonical Transformer with the latest geometry-general PDE solver: Transolver [44].
本文提出了 Unisolver 来解决通用 PDE 组件下的偏微分方程，通过广泛的分析和可视化支持，取得了令人印象深刻的性能。然而，由于在嵌入点态组件时的分块处理过程，我们的方法目前仅限于网格数据。实际上，这一限制在所有可推广的 PDE 求解器中都存在，如 PDEformer 和 DPOT。一个根本原因是缺乏合适的大规模不规则网格数据集，生成这样的数据集需要极高的计算成本。由于本文主要关注基础模型设计，我们将网格数据集作为未来的工作。通过用最新的几何通用 PDE 求解器 Transolver[44]替换标准 Transformer，Unisolver 处理不规则网格的能力可以实现。

https://gitee.com/mindspore/mindscience/tree/master/MindFlow/applications/ pdeformer1d/

Unisolver: PDE-Conditional Transformers Are Universal PDE Solvers Unisolver：PDE 条件变换器是通用 PDE 求解器

Abstract 摘要

1 Introduction 1 引言

2 Related Work 2 相关工作

2.1 Neural PDE Solvers 2.1 神经偏微分方程求解器

2.2 Generalizable PDE Solvers2.2 通用偏微分方程求解器

3 Unisolver 3 通解器

3.1 Complete PDE Components3.1 完成 PDE 组件

3.2 Universal Components Embedding3.2 通用组件嵌入

3.3 PDE-Conditional Transformer3.3 PDE 条件变换器

4 Experiments 4 个实验

4.1 Heterogeneous 2D Navier-Stokes Equation (HeterNS)4.1 非均质二维纳维-斯托克斯方程（HeterNS）

4.2 1D Time-dependent PDEs4.2 一维时变偏微分方程

4.3 2D Mixed PDEs 4.3 二维混合偏微分方程

4.4 Model Analysis 4.4 模型分析

5 Conclusions 5 个结论

References 参考文献

A More Showcases 更多展示

B Analytical Solution for the String Vibration EquationB 弦振动方程的解析解

C Benchmarks C 语言基准测试

C. 1 HeterNS C. 1 异构网络切片

C. 2 1D Time-dependent PDEsC. 2 一维时间相关偏微分方程

C. 3 2D Mixed PDEsC. 3 二维混合偏微分方程

D Implementation Details D 实现细节

D. 1 Metrics D. 1 指标

D. 2 Implementations for Each BenchmarkD. 每个基准测试 2 个实现

E Additional Experiments E 补充实验

E. 1 OOD Viscosity and OOD External ForceE. 1 OOD 粘度和 OOD 外力

E. 2 Hyperparamter SensitivityE. 2 超参数敏感性

E. 3 Discussion of AblationE. 3 消融实验讨论

E. 4 Full Scalability E. 4 完全可扩展性

E. 5 Detailed Compute ResourcesE. 5 详细计算资源

F Limitations and Future WorkF 局限性和未来工作

Unisolver: PDE-Conditional Transformers Are Universal PDE Solvers
Unisolver：PDE 条件变换器是通用 PDE 求解器

2.2 Generalizable PDE Solvers
2.2 通用偏微分方程求解器

3.1 Complete PDE Components
3.1 完成 PDE 组件

3.2 Universal Components Embedding
3.2 通用组件嵌入

3.3 PDE-Conditional Transformer
3.3 PDE 条件变换器

4.1 Heterogeneous 2D Navier-Stokes Equation (HeterNS)
4.1 非均质二维纳维-斯托克斯方程（HeterNS）

4.2 1D Time-dependent PDEs
4.2 一维时变偏微分方程

B Analytical Solution for the String Vibration Equation
B 弦振动方程的解析解

C. 2 1D Time-dependent PDEs
C. 2 一维时间相关偏微分方程

C. 3 2D Mixed PDEs
C. 3 二维混合偏微分方程

D. 2 Implementations for Each Benchmark
D. 每个基准测试 2 个实现

E. 1 OOD Viscosity and OOD External Force
E. 1 OOD 粘度和 OOD 外力

E. 2 Hyperparamter Sensitivity
E. 2 超参数敏感性

E. 3 Discussion of Ablation
E. 3 消融实验讨论

E. 5 Detailed Compute Resources
E. 5 详细计算资源

F Limitations and Future Work
F 局限性和未来工作