Hang Zhou, Yuezhou Ma, Haixu Wu , Haowen Wang, Mingsheng Long 杭州,岳州马,吴海旭 ,王浩文,龙明生 School of Software, BNRist, Tsinghua University, China 清华大学软件学院,北京信息科学与技术国家研究中心,中国{zhou-h23, mayz20, wuhx23, wang-hw21}@mails.tsinghua.edu.cn, {mingsheng}@tsinghua.edu.cn
Abstract 摘要
Deep models have recently emerged as a promising tool to solve partial differential equations (PDEs), known as neural PDE solvers. While neural solvers trained from either simulation data or physics-informed loss can solve the PDEs reasonably well, they are mainly restricted to a specific set of PDEs, e.g. a certain equation or a finite set of coefficients. This bottleneck limits the generalizability of neural solvers, which is widely recognized as its major advantage over numerical solvers. In this paper, we present the Universal PDE solver (Unisolver) capable of solving a wide scope of PDEs by leveraging a Transformer pre-trained on diverse data and conditioned on diverse PDEs. Instead of simply scaling up data and parameters, Unisolver stems from the theoretical analysis of the PDE-solving process. Our key finding is that a PDE solution is fundamentally under the control of a series of PDE components, e.g. equation symbols, coefficients, and initial and boundary conditions. Inspired by the mathematical structure of PDEs, we define a complete set of PDE components and correspondingly embed them as domain-wise (e.g. equation symbols) and point-wise (e.g. boundaries) conditions for Transformer PDE solvers. Integrating physical insights with recent Transformer advances, Unisolver achieves consistent state-of-the-art results on three challenging large-scale benchmarks, showing impressive gains and endowing favorable generalizability and scalability. 深度模型最近已成为解决偏微分方程(PDEs)的一种有前景的工具,被称为神经 PDE 求解器。虽然通过模拟数据或物理信息损失训练的神经求解器可以相当好地解决 PDEs,但它们主要局限于特定的 PDE 集合,例如某个特定方程或有限的系数集。这个瓶颈限制了神经求解器的泛化能力,而泛化能力被广泛认为是其相对于数值求解器的主要优势。在本文中,我们提出了能够解决广泛 PDEs 的通用 PDE 求解器(Unisolver),它利用在多样化数据上预训练并对多样化 PDEs 进行条件设置的 Transformer。Unisolver 不仅仅是简单地扩大数据和参数规模,而是源于对 PDE 求解过程的理论分析。我们的关键发现是,PDE 解本质上受一系列 PDE 组件的控制,如方程符号、系数以及初始和边界条件。受 PDEs 数学结构的启发,我们定义了一套完整的 PDE 组件,并相应地将它们嵌入为域级(如方程符号)和点级(如 Transformer PDE 求解器的边界条件。Unisolver 将物理洞见与最新的 Transformer 进展相结合,在三个具有挑战性的大规模基准测试中取得了一致的最先进结果,显示出令人印象深刻的进步,并赋予了良好的泛化能力和可扩展性。
1 Introduction 1 引言
Partial differential equations (PDEs) are essential for numerous scientific and engineering problems 8 2], such as meteorology, electromagnetism and thermodynamics [41]. Since it is usually hard to obtain an analytic solution for PDEs, numerical methods are widely explored [1]. However, these numerical methods often require huge computation costs to generate a precise solution for each PDE. Recently, deep models have empowered many breakthroughs in wide areas [5, 22] and have also been applied to solve PDEs, i.e. neural PDE solvers. Owing to the excellent capability in approximating nonlinear mappings, deep models can learn to fit pre-collected data [19] or physics-informed loss function [33] and generalize in a flash to new samples, offering an efficient way for solving PDEs. 偏微分方程(PDE)对于许多科学和工程问题至关重要,如气象学、电磁学和热力学。由于通常很难获得 PDE 的解析解,因此广泛研究了数值方法。然而,这些数值方法通常需要巨大的计算成本才能为每个 PDE 生成精确的解决方案。近年来,深度模型在广泛领域中促成了许多突破,也被应用于解决 PDE,即神经 PDE 求解器。由于在逼近非线性映射方面具有出色的能力,深度模型可以学习拟合预先收集的数据或基于物理的损失函数,并迅速推广到新样本,为解决 PDE 提供了一种高效的方法。
As shown in Figure 1, previous neural solvers involve two different paradigms: physics-informed neural works (PINNs) [33] and neural operators [19, 44], which train the deep models with formalized PDE loss function and pre-calculated simulation data respectively. However, for the first paradigm, while formalizing the PDE equations as objective functions can ensure a relatively accurate solution, PINNs are hard to generalize to new scenarios and we have to retrain the model for different tasks. As for neural operators, they directly learn from the simulation data, which can generalize better for diverse initial states or conditions than PINNs. Nevertheless, purely based on the training data may be insufficient to guide PDE solving. For example, given a fluid governed by Navier-Stokes equations, the typical task of neural operators is to predict the future based on past observations [19], while different viscosity coefficients and forcing terms will lead to distinct solutions even under the 如图 1 所示,以往的神经求解器涉及两种不同的范式:物理信息神经网络(PINNs)[33]和神经算子[19, 44],它们分别通过形式化的 PDE 损失函数和预先计算的模拟数据来训练深度模型。然而,对于第一种范式,虽然将 PDE 方程形式化为目标函数可以确保相对准确的解,但 PINNs 难以推广到新场景,我们必须为不同任务重新训练模型。至于神经算子,它们直接从模拟数据中学习,可以比 PINNs 更好地适应多样的初始状态或条件。尽管如此,仅仅依靠训练数据可能不足以指导 PDE 求解。例如,给定由纳维-斯托克斯方程支配的流体,神经算子的典型任务是根据过去的观察预测未来[19],而不同的粘性系数和强迫项即使在相同的
Complete PDE Components 完整的 PDE 组件
Equation Symbols Boundary Type PDE Coefficients Domain Geometry Force Term Boundary Value 方程符号 边界类型 偏微分方程系数 域几何 力项 边界值
Figure 1: Neural PDE solvers typically consist of two paradigms: physics informed and data driven. Our proposed Unisolver combines data-driven methods with physical insights from complete PDE components in a conditional modeling framework, thereby boosting generalizability and scalability. 图 1:神经偏微分方程求解器通常包含两种范式:基于物理规律和数据驱动。我们提出的 Unisolver 在条件建模框架中将数据驱动方法与完整偏微分方程组件的物理洞察相结合,从而提高了泛化能力和可扩展性。
same initial state. Thus, due to the omission of PDE information, current neural operators are mainly trained and tested on a specific set of PDEs. Notably, as neural solvers are expected to be efficient surrogates of neural methods, generalization to various PDEs is essential for a practical neural solver. 相同的初始状态。因此,由于省略了偏微分方程信息,当前的神经算子主要在特定的一组偏微分方程上进行训练和测试。值得注意的是,由于神经求解器被期望成为神经方法的高效替代品,对各种偏微分方程的泛化能力对于实用的神经求解器来说至关重要。
To tackle the generalization deficiency, several works are proposed by incorporating the PDE information into deep models or training models with a large-scale dataset. For example, message-passing neural PDE solver [3] concatenates the PDE coefficients with inputs. PDEformer [45] formalizes the PDE equation as a computation graph and employs the graph Transformer [46] to aggregate PDE information. Although these methods have explored the potential of training models with both data and PDE information, they fail to support a complete set of PDE information, thereby limiting the generalizability of models in some aspects. As for the other branches, such as DPOT [10], they just scale up the training sets with diverse PDEs and expect the generalizability emerges from large data and parameters. However, without the complete PDE information, it is theoretically impossible to generate a correct solution because what the models essentially perform is fitting an incomplete data distribution, so it is hard to expect satisfactory generalizability. 为了解决泛化能力不足的问题,一些研究通过将偏微分方程(PDE)信息整合到深度模型中或使用大规模数据集训练模型来提出解决方案。例如,消息传递神经 PDE 求解器[3]将 PDE 系数与输入连接在一起。PDEformer[45]将 PDE 方程形式化为计算图,并使用图形 Transformer[46]来聚合 PDE 信息。尽管这些方法探索了利用数据和 PDE 信息训练模型的潜力,但它们未能支持完整的 PDE 信息集,从而在某些方面限制了模型的泛化能力。至于其他分支,如 DPOT[10],它们仅仅是通过多样化的 PDE 扩大训练集,期望从大量数据和参数中产生泛化能力。然而,没有完整的 PDE 信息,理论上不可能生成正确的解决方案,因为模型本质上执行的是拟合不完整的数据分布,所以很难期望获得令人满意的泛化能力。
Going beyond prior methods, as shown in Figure 11 this paper presents Unisolver as a Universal PDE solver. Concretely, Unisolver takes the advantages from both data-driven and physics-informed paradigms and empowers Transformer with favorable PDE generalizability by introducing complete physics information as conditions. Instead of simply scaling up data and parameters, we start from the theoretical analysis of PDE solving and conclude with a complete set of PDE components. Further, drawing inspiration from the mathematical structure of PDEs, we propose to classify PDE components into domain-wise and point-wise categories according to their effect on the final solution and aggregate them as two types of deep PDE conditions. Afterward, to capture the special influence of different condition types on the hidden input feature, we separate the Transformer representation into two subspaces and integrate these deep conditions into Transformer in a decoupled way. We conduct extensive experiments on our own generated dataset and two large-scale benchmarks with various PDE conditions, where Unisolver achieves consistent state-of-the-art with sharp relative gains. Overall, our contributions are summarized as follows: 如图 11 所示,本文提出的 Unisolver 作为通用 PDE 求解器超越了先前的方法。具体而言,Unisolver 结合了数据驱动和物理信息两种范式的优势,通过引入完整的物理信息作为条件,赋予 Transformer 有利的 PDE 泛化能力。我们不是简单地增加数据和参数,而是从 PDE 求解的理论分析开始,得出了一套完整的 PDE 组件。此外,受 PDE 数学结构的启发,我们提出根据 PDE 组件对最终解的影响将其分为域级和点级两类,并将它们聚合为两种深度 PDE 条件。之后,为了捕捉不同条件类型对隐藏输入特征的特殊影响,我们将 Transformer 表示分为两个子空间,并以解耦的方式将这些深度条件整合到 Transformer 中。我们在自行生成的数据集和两个具有各种 PDE 条件的大规模基准上进行了广泛实验,Unisolver 取得了一致的最先进结果,相对收益显著。总的来说,我们的贡献可总结如下:
We introduce Unisolver as a conditional Transformer architecture utilizing embedded PDE information completely, marking the first demonstration of the potential of the canonical Transformer as a scalable backbone for solving multiple PDEs with diverse conditions. 我们将 Unisolver 引入作为一种条件 Transformer 架构,它充分利用嵌入的偏微分方程信息,这是首次展示经典 Transformer 作为可扩展骨干网络解决多个具有不同条件的偏微分方程的潜力。
Motivated by the mathematical structure of PDEs, we classify universal PDE conditions into domain-wise and point-wise categories, which further derives a decoupled conditioning mechanism incorporating designs for introducing PDE conditions to deep representations. 受偏微分方程数学结构的启发,我们将通用偏微分方程条件分为域级和点级两类,这进一步衍生出一种解耦的条件机制,其中包含了将偏微分方程条件引入深度表示的设计。
Unisolver achieves consistent state-of-the-art performances across three large-scale benchmarks with impressive relative gains and presents favorable generalizability and scalability. Unisolver 在三个大规模基准测试中实现了一致的最先进性能,取得了令人印象深刻的相对收益,并展现了良好的通用性和可扩展性。
2 Related Work 2 相关工作
2.1 Neural PDE Solvers 2.1 神经偏微分方程求解器
Previous neural PDE solvers can be roughly categorized into the following two paradigms. The first paradigm is physics-informed neural networks (PINNs) [33], which optimize the deep model 先前的神经偏微分方程求解器大致可以分为以下两种范式。第一种范式是物理信息神经网络(PINNs)[33],它优化深度模型
by formalizing the PDE equations as objective functions. During training, the model outputs and gradients will gradually satisfy the targeted PDE, thereby successfully instantiating the solution as a deep model. However, PINNs are usually hard to generalize to unseen PDEs, limiting their applications in broader practice [41]. Another booming direction is neural operators, which learn from extensive data to approximate functional dependence between input and output Banach spaces [25, 15]. Among various neural operators, FNO [19] and its variants [18, 32, 42] are popular and wellestablished. FNO [19] effectively approximates the kernel integral operator in the frequency domain through Fourier transformation. Besides, LSM [43] generalizes the classical spectral method in latent space to avoid the curse of dimensionality. Recently, given the impressive progress achieved by Transformers [40], they have also been applied to solve PDEs. However, due to the quadratic complexity of the attention mechanism, the main challenge in applying Transformers is defining the tokens used to compute attention in PDE-solving tasks. OFormer [16] and GNOT [11] treat each mesh point as a token and utilize the linear Transformer to avert the complexity problem. Factformer [17] axially factorizes the attention block to boost the model efficiency. Recently, Transolver [44] proposes to learn the intrinsic physical states as tokens behind the input meshes. Despite the success of neural operators, they are only tested on the dataset that contains one certain PDE. The effectiveness of these methods under large and diverse datasets has not been fully explored. 通过将偏微分方程方程式化为目标函数来实现。在训练过程中,模型输出和梯度将逐渐满足目标偏微分方程,从而成功地将解实例化为深度模型。然而,物理信息神经网络通常难以推广到未见过的偏微分方程,限制了它们在更广泛实践中的应用[41]。另一个蓬勃发展的方向是神经算子,它从大量数据中学习以近似输入和输出巴拿赫空间之间的函数依赖关系[25, 15]。在各种神经算子中,傅里叶神经算子[19]及其变体[18, 32, 42]是流行且成熟的。傅里叶神经算子[19]通过傅里叶变换在频域中有效地近似核积分算子。此外,潜在谱方法[43]在潜在空间中推广了经典谱方法以避免维数灾难。最近,鉴于 Transformer[40]取得的令人印象深刻的进展,它们也被应用于求解偏微分方程。然而,由于注意力机制的二次复杂度,应用 Transformer 的主要挑战是定义在偏微分方程求解任务中用于计算注意力的标记。 OFormer [16]和 GNOT [11]将每个网格点视为一个标记,并使用线性 Transformer 来避免复杂性问题。Factformer [17]轴向分解注意力块以提高模型效率。最近,Transolver [44]提出学习输入网格背后的内在物理状态作为标记。尽管神经算子取得了成功,但它们仅在包含某一特定偏微分方程的数据集上进行了测试。这些方法在大型和多样化数据集上的有效性尚未得到充分探索。
2.2 Generalizable PDE Solvers 2.2 通用偏微分方程求解器
In addition to model architectures, the generalizability of neural solvers, the major advantage over numerical solvers, has also been explored. The research mainly lies in the following two directions. 除了模型架构,神经求解器的泛化能力(相对于数值求解器的主要优势)也得到了探索。研究主要集中在以下两个方向。
Incorporating PDE information To guide the PDE-solving process, the PDE information has been introduced to deep models. For example, PINO [20] imposes explicit equation constraints at a higher resolution to assist the learning of neural operators. CAPE [36] embeds PDE coefficients to adapt neural solvers to unseen PDE parameters. Prose [21] and PITT [23] tokenize PDEs and embed the mathematical expressions to make the transformer backbone aware of the underlying physics. PDEformer [45] represents the symbolic form of equations as a graph and the numeric components as nodes to optimize the processing of complex interactions between symbolic and numeric information. Still, all of these methods, while incorporating equation information, do not leverage the mathematical structure of PDEs for complete and categorized embedding or integrating the meanings of equation symbols within the context of natural language. In contrast, Unisolver applies powerful Large Language Models (LLMs) [38] to semantically embed the equation symbolic information and categorize the complete equation conditions based on mathematical insights, thereby better at modeling intricate physical correlations. 融入 PDE 信息 为引导 PDE 求解过程,PDE 信息已被引入深度模型中。例如,PINO [20]在更高分辨率下施加显式方程约束来辅助神经算子的学习。CAPE [36]嵌入 PDE 系数以使神经求解器适应未见过的 PDE 参数。Prose [21]和 PITT [23]将 PDE 标记化并嵌入数学表达式,使 transformer 骨干网络了解底层物理原理。PDEformer [45]将方程的符号形式表示为图,将数值组件表示为节点,以优化符号和数值信息之间复杂交互的处理。然而,所有这些方法虽然融入了方程信息,但并未利用 PDE 的数学结构进行完整和分类嵌入,也未在自然语言语境中整合方程符号的含义。相比之下,Unisolver 应用强大的大型语言模型(LLMs)[38]来语义化嵌入方程符号信息,并基于数学洞察对完整方程条件进行分类,从而更好地建模复杂的物理相关性。
Large-scale pre-training As a vital cornerstone of deep learning [4, 12], recent research has also started to explore the effectiveness of large-scale training in solving PDEs. In [35], authors examined the scaling capabilities and transfer learning behaviors of FNO on three time-independent equation families. MPP [27] proposes an auto-regressive strategy to pre-train on a broad fluid mechanics-oriented benchmark. DPOT [10] enhances MPP with the denoising method and pre-trains a Fourier transformer on massive PDE data comprised of 12 datasets. PDEformer [45] focuses on a 1D time-dependent PDE family and pre-trains a graph transformer on 3M samples under various equation conditions. However, most of the existing methods fall short in effectively and completely integrating PDE conditions. This will be well addressed by Unisolver in a natural and insightful way. 作为深度学习的重要基石[4, 12],近期研究也开始探索大规模训练在解决偏微分方程方面的有效性。在[35]中,作者研究了 FNO 在三个时间无关方程族上的扩展能力和迁移学习行为。MPP [27]提出了一种自回归策略,在广泛的流体力学导向基准上进行预训练。DPOT [10]通过去噪方法增强了 MPP,并在由 12 个数据集组成的大量偏微分方程数据上预训练了一个傅里叶变换器。PDEformer [45]专注于一维时间相关偏微分方程族,并在 300 万个样本上预训练了一个图变换器,涵盖各种方程条件。然而,大多数现有方法在有效且完整地整合偏微分方程条件方面仍有不足。Unisolver 将以一种自然且富有洞察力的方式很好地解决这个问题。
3 Unisolver 3 通解器
To tackle the incapability in generalization behind neural PDE solvers, we deeply dive into the PDEsolving process and present Unisolver to model the intricate interactions between initial observations and complete equation components, leading to a novel PDE-conditional Transformer model. 为了解决神经偏微分方程求解器在泛化方面的不足,我们深入研究了偏微分方程求解过程,提出了 Unisolver 来模拟初始观测值与完整方程组件之间的复杂相互作用,从而提出了一种新型的偏微分方程条件下的 Transformer 模型。
Problem setup Let domain be a bounded continous set and be a -point discretization of , recording the coordinate information of each point. Assume that we have observations of initial conditions as input and target quantities as output on mesh with governed PDE components (e.g. equation symbols, coefficients, etc) for each observation pair. The PDE-solving task is to approximate the input-PDE-output mapping . For instance, in fluid prediction, we need to predict the future vorticity field based on the initial field, observation grid, as well as PDE equations, viscosity coefficients, and force terms. 问题设置 让域 为一个有界连续集合, 为 的 点离散化,记录每个点的坐标信息。假设我们在网格 上有初始条件 的观测作为输入,目标量 作为输出,每对观测都有相应的 PDE 组件 (如方程符号、系数等)。PDE 求解任务是近似输入-PDE-输出映射 。例如,在流体预测中,我们需要根据初始场、观测网格以及 PDE 方程、粘性系数和力项来预测未来的涡度场。
3.1 Complete PDE Components 3.1 完成 PDE 组件
To enable complete modeling of the PDE, we attempt to incorporate all the underlying components that affect final solutions into neural solvers. Here, we elucidate the complete PDE components by considering the classical vibrating string equation with fixed endpoints as a motivating example: 为了实现偏微分方程的完整建模,我们尝试将影响最终解的所有基础组成部分纳入神经求解器中。在这里,我们通过考虑固定端点的经典振动弦方程作为一个激励性例子来阐明完整的偏微分方程组成部分:
The equation formulation precisely models the physical laws governing string vibrations. In this PDE, the coefficient is determined by physical properties, such as tension and linear density, while represents the external force that drive the vibrations of the string. The domain geometry spans the range , which affects the mesh discretization in numerical solving, especially for irregular geometries. In addition, Equation (1b) sets Dirichlet boundary conditions at the endpoints, maintaining zero displacement throughout the vibration period. Equation (1C) specifies the string's position and velocity at time zero. These complete equation components theoretically determine a unique solution. In the following theorem, we dive into the PDE-solving process and explore how the equation components affect the final solution. 方程式的构建精确地模拟了支配弦振动的物理定律。在这个偏微分方程中,系数 由物理性质决定,如张力和线密度,而 代表驱动弦振动的外力。域的几何范围跨越了 ,这影响了数值求解中的网格离散化,特别是对于不规则几何形状。此外,方程(1b)在端点设置狄利克雷边界条件,在整个振动周期内保持零位移。方程(1c)指定了弦在时间零点的位置 和速度 。这些完整的方程组成部分理论上确定了一个唯一解。在接下来的定理中,我们深入探讨偏微分方程的求解过程,并探究方程组成部分如何影响最终解。
Theorem 3.1 (The connections between PDE solutions and complete PDE components). The analytical solution of equation (1a) with boundary conditions and initial conditions is 定理 3.1(偏微分方程解与完整偏微分方程组成部分之间的联系)。方程(1a)在边界条件 和初始条件 下的解析解是
where and are odd, periodic functions with period defined on the upper half plane, extended from and . The boundary conditions will be explicit by extending the equation to the upper half plane and solving it by operator splitting and characteristic lines. 其中 和 是奇函数,周期为 ,定义在上半平面,由 和 扩展而来。通过将方程扩展到上半平面并使用算子分裂法和特征线法求解,边界条件将变得明确。
Proof. See Appendix B for the explicit solution and the complete proof. 证明。详见附录 B 中的显式解和完整证明。
From the analytical solution of the simple motivating example, we pinpoint that the PDE is solved under the complex interactions between a series of equation components. Specifically, the coefficient exerts a consistent influence over the domain, while the impact of the external force is imposed point-wisely. This distinction inspires us to classify the PDE components into two categories, domain-wise and point-wise, which better capture the intricate interactions. 从这个简单激励例子的分析解中,我们指出 PDE 是在一系列方程组件的复杂相互作用下解决的。具体来说,系数 对整个域施加一致影响,而外力 的影响则是点状施加的。这种区别启发我们将 PDE 组件分为两类:域级和点级,这更好地捕捉了复杂的相互作用。
As presented in Table 1 , the equation formulation is a domain-wise component due to its consistency across all locations. Domain geometry is a point-wise component since it is represented by a binary mask and each point's inclusion is determined individually. Boundary conditions are slightly more complex to handle. Commonly used boundary conditions 如表 1 所示,方程式的表述是一个适用于整个领域的组成部分,因为它在所有位置都保持一致。领域几何形状是一个点状组成部分,因为它是由二进制掩码表示的,每个点的包含与否都是单独确定的。边界条件的处理稍微复杂一些。常用的边界条件
are divided into two main types: periodic boundary conditions and Robin (non-periodic) conditions. We classify the boundary condition type as a domain-wise component and the specific boundary value functions with zero values at non-boundary points as point-wise components 分为两种主要类型:周期性边界条件和罗宾(非周期性)条件。我们将边界条件类型归类为域级组件,将在非边界点处值为零的特定边界值函数归类为点级组件。
3.2 Universal Components Embedding 3.2 通用组件嵌入
As described in Section 3.1. PDE solutions can be obtained by intricate interactions between initial conditions and multiple equation components classified into two categories. In previous works, these components are coarsely and incompletely included as conditions by using them to modulate the input observations. In this paper, we will elaborate on how Unisolver finely and completely embeds all equation components (Table 1) into deep PDE conditions based on our insights from the analysis. 如第 3.1 节所述,PDE 解可以通过初始条件与分为两类的多个方程组件之间的复杂交互来获得。在以往的研究中,这些组件被粗略且不完整地作为条件包含,用于调节输入观测。在本文中,我们将详细阐述 Unisolver 如何基于我们从分析中获得的见解,将所有方程组件(表 1)精细且完整地嵌入到深度 PDE 条件中。
Equation formulation Since the mathematical symbols convey rich mathematical information, we utilize a Large Language Model (LLM) for symbolic embedding. Specifically, we employ the 方程式公式化 由于数学符号传达丰富的数学信息,我们利用大型语言模型(LLM)进行符号嵌入。具体来说,我们采用
Figure 2: Overview of Unisolver. We universally embed all PDE components into deep conditions and employ a conditional Transformer to aggregate deep conditions in the decoupled subspace. 图 2:Unisolver 概述。我们将所有偏微分方程组件普遍嵌入深度条件中,并使用条件 Transformer 在解耦子空间中聚合深度条件。
recently released LLaMA 38 B model to embed the equation formulation, leveraging its symbolic understanding ability from pre-training on 15 TB language tokens. The input to the LLM is the LaTeX code of the equation formulation. For example, the string vibration equation is prompted as 最近发布的 LLaMA 38B 模型 嵌入方程公式,利用其从 15 TB 语言标记预训练中获得的符号理解能力。LLM的输入是方程公式的 LaTeX 代码。例如,弦振动方程的提示如下
Prompt: "\partial_\{tt\} u - a^2\partial_\{xx\} u = f(x,t)"
Then we take the output of the last Transformer block of the LLM and compute the mean across the sequence dimension, resulting in a 4096-dimensional embedding for each equation. Since we embed the complete equation components separately, we represent the other components, e.g. coefficients and external force, as symbols in the prompt rather than using their corresponding values. After LLM embedding, the hidden formulation is encoded by a two-layer MLP to obtain deep conditions. 然后我们取LLM最后一个 Transformer 块的输出,并在序列维度上计算平均值,得到每个方程的 4096 维嵌入。由于我们分别嵌入完整的方程组件,我们将其他组件(如系数和外力)表示为提示中的符号,而不是使用它们对应的值。在LLM嵌入之后,隐藏公式通过两层 MLP 进行编码,以获得深层条件。
Other components Other domain-wise components encompass coefficients, which are real-valued vectors, and boundary types, which are analogous to class labels. Thus we embed them using two MLP layers with in-between SiLU activation function [7]. Moreover, point-wise components include external force, binary geometry mask, and boundary value functions, which are physical fields observed on mesh . We employ the same patchify method used for the input observations, transforming them into token sequences and linearly mapping each token to a deep representation. 其他组成部分
其他特定领域的组成部分包括系数(实值向量)和边界类型(类似于类标签)。因此,我们使用两个 MLP 层嵌入它们,中间使用 SiLU 激活函数[7]。此外,点态组成部分包括外力、二进制几何掩码和边界值函数,这些都是在网格 上观察到的物理场。我们采用与输入观测相同的分块方法,将它们转换为令牌序列,并将每个令牌线性映射到深层表示。
Deep condition consolidation After universal components embedding, deep conditions within the same category are aggregated together to consolidate their influence. This strategy is advantageous because it prevents excessive separation of component conditions that could weaken the model's expressive capabilities, and thus will enhance representation learning for PDE solving via pre-training. Additionally, it allows for easy adaptation to new conditions in downstream tasks by only requiring finetuning a zero-initialized MLP. 深度条件合并
在通用组件嵌入之后,同一类别内的深度条件被聚合在一起以巩固它们的影响。这种策略是有利的,因为它防止了组件条件的过度分离,这可能会削弱模型的表达能力,从而通过预训练增强 PDE 求解的表示学习。此外,它只需要微调一个零初始化的 MLP,就能轻松适应下游任务中的新条件。
3.3 PDE-Conditional Transformer 3.3 PDE 条件变换器
We propose a conditional Transformer to adaptively fuse these deep conditions embedded from the equation components, empowered by the projections of input observations in decoupled subspaces. 我们提出了一种条件 Transformer,通过在解耦子空间中对输入观测进行投影,自适应地融合从方程组成部分嵌入的深度条件。
Subspace decoupling We split the hidden inputs along the feature dimension guided by a hyperparameter, the domain-wise condition ratio , which is set to 0.5 by default. This ratio determines the subspace for domain-wise conditions with dimension , while the remaining subspace of dimension is used by point-wise conditions. When combined with multi-head attention [40], subspace decoupling is equivalent to assigning some heads to learn the impact of domain-wise conditions while others focusing on point-wise conditions. This leads to improved representation learning for both categories, and minimized interference across the subspace. 子空间解耦 我们根据超参数域条件比率 (默认设置为 0.5)沿特征维度分割隐藏输入。这个比率决定了域条件子空间的维度 ,而剩余的维度 的子空间则用于点条件。当与多头注意力[40]结合时,子空间解耦相当于指定一些头学习域条件的影响,而其他头则专注于点条件。这导致两类条件的表示学习得到改善,并最小化了跨子空间的干扰。
Deep condition aggregation We utilize MLPs to individually project domain-wise conditions and point-wise conditions into the corresponding subspace. After projection, domain-wise conditions are 深度条件聚合 我们利用 MLP 分别将领域级条件和点级条件投影到相应的子空间。投影后,领域级条件被
\footnotetext{ \footnotetext{
被译文本: https://ai.meta.com/blog/meta-llama-3/
repeated along the sequence dimension to match the length of hidden tokens and ensure consistent physical guiding along the entire sequence. The transformed conditions convey both token-wise and feature-wise information, which are then integrated adaptively by aggregation functions. 在序列维度上重复以匹配隐藏标记的长度,并确保整个序列的一致性物理引导。转换后的条件传递了标记级和特征级的信息,然后通过聚合函数进行自适应整合。
As shown in Figure 2, we aggregate conditions either before or after the attention and feedforward modules. Inspired by recent conditional Transformer advances like DiT [29] and other conditional normalization approaches [28, 30], we take the aggregation paradigm to finely capture the intricate correlations between hidden inputs of initial observations and deep equation conditions. Specifically, we scale and shift the hidden inputs based on the conditions. After passing through the Transformer modules, we use the conditions to softly select whether this information should be retained. 如图 2 所示,我们在注意力和前馈模块之前或之后聚合条件。受到最近条件 Transformer 进展的启发,如 DiT [29]和其他条件归一化方法[28, 30],我们采用聚合范式来精细捕捉初始观测的隐藏输入与深度方程条件之间的复杂相关性。具体来说,我们根据条件对隐藏输入进行缩放和平移。在通过 Transformer 模块后,我们使用条件来软性选择是否应该保留这些信息。
Overall design We embed input of initial observations using a patchification layer similar to ViT [6] to obtain the input embeddings , and embed the complete PDE equation components following Section 3.2 to obtain deep conditions and . Suppose there are layers in the PDE-conditional Transformer architecture, the -th layer of Unisolver can be formalized as follows: 总体设计 我们使用类似于 ViT [6]的分块层嵌入初始观测 的输入,以获得输入嵌入 ,并按照 3.2 节嵌入完整的 PDE 方程组件 ,以获得深度条件 和 。假设 PDE 条件 Transformer 架构中有 层,Unisolver 的第 层可以形式化如下:
where scale, shift, select , and is the output of the -th layer. As the PDE components have a crucial impact on the scale of the output, we apply conditional scale and shift operations before a linear projection on to obtain the final output , as shown in Figure 2 其中 缩放、偏移、选择 ,而 是第 层的输出。由于 PDE 组件对输出的尺度有重要影响,我们在对 进行线性投影以获得最终输出 之前,应用条件缩放和偏移操作,如图 2 所示。
While our model takes similar aggregation mechanisms as DiT, it includes several PDE-inspired differences. First, we do not use a diffusion model because PDE solving demands accuracy more than diversity. Second, our conditions are drawn from equation components with physical insights, unlike the timesteps and text descriptions typically used in generative models [13]. Third, our conditions are not confined to scalar values; instead, we decouple the feature space and leverage point-wise conditions special to PDEs to learn interactions between initial observations and PDE components. 虽然我们的模型采用了与 DiT 类似的聚合机制,但它包含了几个受偏微分方程启发的差异。首先,我们不使用扩散模型,因为偏微分方程求解更需要准确性而非多样性。其次,我们的条件来自具有物理洞察力的方程组成部分,不同于通常用于生成模型的时间步长和文本描述[13]。第三,我们的条件不局限于标量值;相反,我们解耦特征空间,并利用特定于偏微分方程的点态条件来学习初始观测和偏微分方程组成部分之间的相互作用。
4 Experiments 4 个实验
We conduct extensive experiments to evaluate Unisolver on a challenging, self-generated Heterogeneous 2D Navier-Stokes Equations dataset and two large-scale benchmarks proposed by PDEformer [45] and DPOT [10], covering a broad range of PDEs and diverse generalization scenarios. 我们进行了广泛的实验,在一个具有挑战性的自生成异质二维纳维-斯托克斯方程数据集以及 PDEformer [45]和 DPOT [10]提出的两个大规模基准上评估 Unisolver,涵盖了广泛的偏微分方程和多样化的泛化场景。
Benchmarks As summarized in Table 2, our experiments involve three large-scale datasets and numerous equation conditions in both 1D and 2D spaces. Specifically, we generate the HeterNS dataset as an extension for the NS dataset of FNO [19], including multiple coefficients and forces. PDEformer proposes a large-scale dataset with 3M samples of structured 1D PDEs and evaluates the pre-trained model on PDEBench [37]. DPOT collects 12 datasets from FNO [19], PDEBench [37], PDEArena [9] and CFDBench [26] for diversity. More details can be found in Appendix C. 如表 2 所总结,我们的实验涉及三个大规模数据集和 1D 和 2D 空间中的多个方程条件。具体来说,我们生成 HeterNS 数据集作为 FNO [19] NS 数据集的扩展,包括多个系数和力。PDEformer 提出了一个包含 300 万个结构化 1D PDE 样本的大规模数据集,并在 PDEBench [37]上评估预训练模型。DPOT 从 FNO [19]、PDEBench [37]、PDEArena [9]和 CFDBench [26]收集了 12 个数据集以增加多样性。更多详细信息可以在附录 C 中找到。
Table 2: Summary of benchmarks. indicates the PDE component will change among different samples, while refers to unchanged ones. Unisolver can handle all kinds of changed components. 表 2:基准测试总结。 表示 PDE 组件在不同样本间会发生变化,而 指不变的组件。Unisolver 能够处理所有类型的变化组件。
Benchmarks 基准
#Dim
#Resolution
# Samples # 样本
#Size
|Symbols |符号
Coefficien 系数
Force 力量
Geometry 几何学
Boundary 边界
2D+
(6
15k 1.5 万
|4.6 GB
DEfo
(256,
3M
300 GB
DPOT [10]
2D+Time 2D+时间
74.1k 7.41 万
384 GB
Baselines We compare Unisolver with three advanced baselines on the HeterNS dataset to demonstrate its generalizability under varied conditions, including classical ViT [6], FNO [19] and current state-of-the-art neural operator Factformer [17]. Specifically, to ensure a fair comparison, we also include the additional PDE components used in Unisolver to these baselines by concatenating them with inputs. We also compare Unisolver with other generalizable solvers: PDEformer [45] and DPOT [10] on pre-training loss and zero-shot and fine-tuning settings. 基线模型
我们将 Unisolver 与三个先进的基线模型在 HeterNS 数据集上进行比较,以展示其在不同条件下的泛化能力,包括经典的 ViT [6]、FNO [19]和当前最先进的神经算子 Factformer [17]。具体来说,为了确保公平比较,我们还将 Unisolver 中使用的额外 PDE 组件添加到这些基线模型中,方法是将它们与输入连接起来。我们还将 Unisolver 与其他可泛化求解器进行比较:PDEformer [45]和 DPOT [10],比较内容包括预训练损失、零样本和微调设置。
Table 3: Performance comparison on HeterNS with different viscosity coefficients and fixed force frequency coefficient . Relative L2 is recorded. Promtotion refers to the relative improvement over the second-best method. "ID" means in-distribution case and "OOD" means the out-of-distribution. 表 3:不同黏度系数和固定力频率系数 下 HeterNS 的性能比较。记录相对 L2 误差。改进指相对于第二好方法的相对提升。"ID"表示分布内情况,"OOD"表示分布外情况。
HeterNS
Viscosity 粘度
8e-6 8x10⁻⁶
Params 参数
OOD
ID
OOD
ID
OOD
ID
OOD
ID
OOD
ID
OOD
FNO
4.7 M 4.7 米
0.0702
0.0669
0.0373
0.0225
0.0141
0.0114
0.0088
0.0031
0.0011
0.2057
Factformer 因子转换器
9.4 M 940 万
0.0489
0.0438
0.0489
0.0128
0.0297
0.0064
0.1386
0.0018
0.0631
0.0010
0.3207
ViT ViT(视觉 transformer)
4.8 M 4.8 米
0.0458
0.0432
0.0353
0.0206
0.0119
0.0098
0.0100
0.0031
0.0174
0.0015
0.1878
Unisolver 联合解算器
4.1 M 4.1 米
0.0096
Promotion 促销
-
-
Table 4: Comparison (relative L2) on HeterNS with varied force and fixed viscosity . 表 4:HeterNS 中不同力和固定粘度 的比较(相对 L2)