这是用户在 2024-12-23 9:44 为 https://ar5iv.org/html/2301.02613 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?

Learning Deep MRI Reconstruction Models from Scratch in Low-Data Regimes

Salman Ul Hassan Dar    Şaban Öztürk    Muzaffer Özbey    and Tolga Çukur This work was supported in part by a TUBA GEBIP 2015 fellowship, by a BAGEP 2017 fellowship, and by a TUBITAK 121E488 grant awarded to T. Çukur.S. UH. Dar, Ş. Öztürk, M. Özbey, and T. Çukur are with the Department of Electrical and Electronics Engineering, and the National Magnetic Resonance Research Center, Bilkent University, Ankara, Turkey (e-mails: {salman,muzaffer,cukur}@ee.bilkent.edu.tr, saban.ozturk@amasya.edu.tr). Ş. Öztürk is also with the Amasya University, Amasya, Turkey.
Abstract

Magnetic resonance imaging (MRI) is an essential diagnostic tool that suffers from prolonged scan times. Reconstruction methods can alleviate this limitation by recovering clinically usable images from accelerated acquisitions. In particular, learning-based methods promise performance leaps by employing deep neural networks as data-driven priors. A powerful approach uses scan-specific (SS) priors that leverage information regarding the underlying physical signal model for reconstruction. SS priors are learned on each individual test scan without the need for a training dataset, albeit they suffer from computationally burdening inference with nonlinear networks. An alternative approach uses scan-general (SG) priors that instead leverage information regarding the latent features of MRI images for reconstruction. SG priors are frozen at test time for efficiency, albeit they require learning from a large training dataset. Here, we introduce a novel parallel-stream fusion model (PSFNet) that synergistically fuses SS and SG priors for performant MRI reconstruction in low-data regimes, while maintaining competitive inference times to SG methods. PSFNet implements its SG prior based on a nonlinear network, yet it forms its SS prior based on a linear network to maintain efficiency. A pervasive framework for combining multiple priors in MRI reconstruction is algorithmic unrolling that uses serially alternated projections, causing error propagation under low-data regimes. To alleviate error propagation, PSFNet combines its SS and SG priors via a novel parallel-stream architecture with learnable fusion parameters. Demonstrations are performed on multi-coil brain MRI for varying amounts of training data. PSFNet outperforms SG methods in low-data regimes, and surpasses SS methods with few tens of training samples. In both supervised and unsupervised setups, PSFNet requires an order of magnitude lower samples compared to SG methods, and enables an order of magnitude faster inference compared to SS methods. Thus, the proposed model improves deep MRI reconstruction with elevated learning and computational efficiency.
磁共振成像 (MRI) 是一种必不可少的诊断工具,扫描时间较长。重建方法可以通过从加速采集中恢复临床上可用的图像来缓解这一限制。特别是,基于学习的方法通过将深度神经网络用作数据驱动的先验来保证性能飞跃。一种强大的方法使用扫描特异性 (SS) 先验,利用有关底层物理信号模型的信息进行重建。SS 先验在每次单独的测试扫描中学习,无需训练数据集,尽管它们受到非线性网络推理的计算负担。另一种方法使用扫描一般 (SG) 先验,而是利用有关 MRI 图像潜在特征的信息进行重建。为了提高效率,SG 先验在测试时被冻结,尽管它们需要从大型训练数据集中学习。在这里,我们介绍了一种新的并行流融合模型 (PSFNet),该模型协同融合 SS 和 SG 先验,以便在低数据范围内进行高性能 MRI 重建,同时保持与 SG 方法的竞争性推理时间。PSFNet 基于非线性网络实现其 SG 先验,但它基于线性网络形成其 SS 先验以保持效率。在 MRI 重建中组合多个先验的普遍框架是算法展开,它使用序列交替投影,在低数据范围内导致误差传播。为了减少错误传播,PSFNet 通过一种新颖的并行流架构和可学习的融合参数来组合其 SS 和 SG 先验。在多线圈脑 MRI 上对不同数量的训练数据进行演示。 PSFNet 在低数据范围内优于 SG 方法,并且以几十个训练样本超过了 SS 方法。在有监督和无监督设置中,与 SG 方法相比,PSFNet 需要的样本量要低一个数量级,并且与 SS 方法相比,推理速度提高了一个数量级。因此,所提出的模型通过提高学习和计算效率来改进深度 MRI 重建。

{IEEEkeywords}

image reconstruction, deep learning, scan specific, scan general, low data, supervised, unsupervised.

1 Introduction

The unparalleled soft-tissue contrast and non-invasiveness of MRI render it a preferred modality in many diagnostic applications [1, 2], and downstream imaging tasks such as classification [3] and segmentation [4, 5]. However, the adverse effects of low spin polarization at mainstream field strengths on the signal-to-noise ratio make it slower against alternate modalities such as CT [6]. Since long scan durations inevitably constrain clinical utility, there is an ever-growing interest in accelerated MRI methods to improve scan efficiency. Accelerated MRI involves an ill-posed inverse problem with the aim of mapping undersampled acquisitions in k-space to high-quality images corresponding to fully-sampled acquisitions. Conventional frameworks for solving this problem rely on parallel imaging (PI) capabilities of receive coil arrays [7, 8], in conjunction with hand-constructed MRI priors [9, 10]. A joint objective is iteratively optimized comprising a data-consistency (DC) term based on the physical signal model, and a regularization term that enforces the MRI prior [9]. The physical model constrains reconstructed data to be consistent with acquired data while considering coil sensitivities and undersampling patterns [11]. Meanwhile, the regularization term, often based on a linear transform where data are assumed to be compressible [9], introduces suboptimality when the distribution of MRI data diverges from the hand-constructed prior.

Deep learning (DL) methods have recently been adopted as a promising framework to improve reconstruction performance [12, 13, 14, 15, 16]. Inspired by traditional methods, a powerful approach is based on scan-specific (SS) priors that leverage the physical signal model to learn a reconstruction specific to each test scan, i.e. undersampled k-space data from a given test subject. Similar to autocalibration procedures in PI, a first group of SS methods perform training using a fully-sampled calibration region and then exercise learned dependencies in broader k-space [17, 15, 18, 16]. Following the deep image prior technique, a second group of methods use unconditional CNNs as a native MRI prior [19, 20, 21]. These CNNs map low-dimensional latent variables onto MR images, and latents and network weights are optimized to ensure consistency to acquired data based on the physical signal model. In general, SS priors learned on each subject at test time avoid the need for separate training datasets, and promise improved reliability against atypical anatomy. However, they suffer from long inference times that can be prohibitive particularly when nonlinear networks are adopted [22, 23, 24].

A fundamental alternative is to employ scan-general (SG) priors based on deep nonlinear networks that capture latent features of MR images [12, 13, 14, 25, 26, 27, 28, 29, 30, 31, 32, 33]. Numerous successful architectures have been reported including perceptrons [34], basic convolutional neural networks (CNNs) [35, 36, 37, 38], residual or recurrent CNNs [29, 39, 40, 41], generative adversarial networks (GANs) [42, 43, 44, 45, 46], transformers [47, 48] and diffusion models [49, 50]. Physics-guided unrolled methods have received particular attention that combine the physical signal model as in traditional frameworks and regularization via a deep network serving as an SG prior [27, 13, 51, 52, 53]. Reconstruction is achieved via serially alternated projections through the physical signal model and the SG prior [38, 40, 54, 55, 56]. However, under low-data regimes, the suboptimally trained SG prior introduces errors that are propagated across the unrolled architecture, compromising performance [6, 57, 58]. Furthermore, learning of SG priors requires large training datasets from several tens to hundreds of subjects [28, 59, 60], which can limit practicality.

Here, we propose a novel parallel-stream fusion model (PSFNet) that consolidates SS and SG priors to enable data-efficient training and computation-efficient inference in deep MRI reconstruction111see [61] for a preliminary version of this work presented at ISMRM 2021.. PSFNet leverages an SS stream to perform linear reconstruction based on the physical signal model, and an SG stream to perform nonlinear reconstruction based on a deep network. Unlike conventional unrolled methods based on serial projections, here we propose a parallel-stream architecture with learnable fusion of SS and SG priors. Fusion parameters are adapted across cascades and training iterations to emphasize task-critical information. Comprehensive experiments on brain MRI datasets are reported to demonstrate PSFNet under both supervised and unsupervised settings [62, 63, 64, 65, 66]. PSFNet is compared against an unrolled SG method [27], two SS methods [17, 67], and conventional SPIRiT reconstructions [11]. Compared to the unrolled model, PSFNet lowers training data requirements an order of magnitude. Compared to SS models, PSFNet offers significantly faster inference times. Our main contributions are summarized below:

  • A novel cascaded network architecture is introduced that adaptively fuses SS and SG priors across cascades and training iterations to improve learning-based MRI reconstruction in low-data regimes.
    引入了一种新的级联网络架构,该架构在级联和训练迭代中自适应地融合 SS 和 SG 先验,以改进低数据状态下基于学习的 MRI 重建。

  • The SS prior facilitates learning of the SG prior with limited data, and empowers PSFNet to successfully generalize to out-of-domain samples.

  • The SG prior improves performance by capturing nonlinear residuals, and enhances resilience against suboptimal hyperparameter selection in the SS component.

  • Parallel-stream fusion of SS and SG priors yields robust performance with limited training data in both supervised and unsupervised settings.

2 Theory

2.1 Image Reconstruction in Accelerated MRI

MRI reconstruction is an inverse problem that aims to recover an image from a respective undersampled acquisition:
MRI 重建是一个逆问题,旨在从相应的欠采样采集中恢复图像:

MFx=y\displaystyle MFx=y (1)

where FF is the Fourier transform, MM is the sampling mask defining acquired k-space locations, xx is the multi-coil image to be reconstructed and yy are acquired multi-coil k-space data. To improve problem conditioning, additional prior information regarding the expected distribution of MR images is incorporated in the form of a regularization term:

x^=argmin𝑥λMFxy22+R(x)\displaystyle\hat{x}=\underset{x}{\arg\min}\quad\lambda||MFx-y||_{2}^{2}+R(x) (2)

where the first term enforces DC between reconstructed and acquired k-space data, R(x)R(x) reflects the MRI prior, and λ\lambda controls the balance between the DC and regularization terms.

The DC term can be implemented by injecting the acquired values of k-space data into the reconstruction [13]. Thus, mapping through a DC block is given as:

fDC(x)=F1ΛFx+λ1+λF1y\displaystyle f_{DC}(x)=F^{-1}\Lambda Fx+\dfrac{\lambda}{1+\lambda}F^{-1}y (3)

where Λ\Lambda is a diagonal matrix with diagonal entries set to 11+λ\frac{1}{1+\lambda} at acquired k-space locations and set to 1 in unacquired locations.

In traditional methods, the regularization term is based on a hand-constructed transform domain where data are assumed to have a sparse representation [9]. For improved conformation to the distribution of MRI data, recent frameworks instead adopt deep network models to capture either SG priors learned from a large MRI database with hundreds of subjects, or SS priors learned from individual test scans. Learning procedures for the two types of priors are discussed below.

SG priors: In MRI, SG priors are typically adopted to suppress aliasing artifacts in the zero-filled reconstruction (i.e., inverse Fourier transform) of undersampled k-space acquisitions [27]. A deep network model that performs de-aliasing can be learned from a large training dataset of undersampled and corresponding fully-sampled k-space acquisitions, and then employed to implement R(.)R(.) in Eq. 2 during inference. The regularization term based on an SG prior is given as:
SG 先验:在 MRI 中,通常采用 SG 先验来抑制欠采样 k 空间采集的零填充重建(即逆傅里叶变换)中的混叠伪影 [27]。执行去混叠的深度网络模型可以从欠采样和相应的全采样 k 空间采集的大型训练数据集中学习,然后在推理过程中用于等式 2 中的实现 R(.)R(.) 。基于 SG 先验的正则化项如下:

RSG(x)=argmin𝑥CSG(F1y;θ^SG)x22\displaystyle R_{SG}\left(x\right)=\underset{x}{\arg\min}||C_{SG}(F^{-1}y;\hat{\theta}_{SG})-x||^{2}_{2} (4)

where CSGC_{SG} is an image-domain deep network with learned parameters θ^SG\hat{\theta}_{SG}. The formulation in Eq. 4 assumes that CSGC_{SG} recovers multi-coil output images provided multi-coil input images. The parameters θSG{\theta}_{SG} for CSGC_{SG} can be learned based on a pixel-wise loss between reconstructed and ground-truth images. Training is conducted offline via an empirical risk minimization approach based on Monte Carlo sampling [13]:

SG(θSG)=n=1NCSG(F1yn;θSG)x˘np\displaystyle\mathcal{L}_{SG}(\theta_{SG})=\sum_{n=1}^{N}||C_{SG}(F^{-1}y^{n};\theta_{SG})-\breve{x}^{n}||_{p} (5)

where NN is the number of training scans, nn is the training scan index, ||.||p||.||_{p} denotes p\ell_{p} norm, x˘n\breve{x}^{n} is the ground-truth multi-coil image derived from the fully-sampled acquisition for the nnth scan, and yny^{n} are respective undersampled k-space data.

A common approach to build CSGC_{SG} is based on unrolled architectures that perform cascaded projections through CNN blocks to regularize the image and DC blocks to ensure conformance to the physical signal model [27]. Given a total of KK cascades with tied CNN parameters across cascades, the mapping through the kkth cascade is [13, 68, 69]:

xkr=fDC(fSG(xk1r;θSG))\displaystyle x^{r}_{k}=f_{DC}\left(f_{SG}\left(x^{r}_{k-1};{\theta}_{SG}\right)\right) (6)

where xkrx_{k}^{r} is the image for the rrth scan (that could be a training or test scan) at the output of the kkth cascade (k[1,2,,K]k\in[1,2,...,K]), and x0r=F1yrx_{0}^{r}=F^{-1}y^{r} where yry^{r} are the acquired undersampled data for the rrth scan. Meanwhile, fSGf_{SG} is the CNN block embedded in the kkth cascade with parameters θSG{\theta}_{SG}.
其中 xkrsuperscriptsubscriptx_{k}^{r}rrkk Th 级联输出端的第 Th 扫描(可能是训练或测试扫描)的图像 ( k[1,2,,K]12k\in[1,2,...,K] ),其中 x0r=F1yrsuperscriptsubscript0superscript1superscriptx_{0}^{r}=F^{-1}y^{r} yrsuperscripty^{r} 是第 Th 扫描获取的 rr 欠采样数据。同时, fSGsubscriptf_{SG} 是嵌入在 kk 带有参数 θSGsubscript{\theta}_{SG} 的 th 级联中的 CNN 块。

As the parameters of SG priors are trained offline and then frozen during inference, deeper network architectures can be used for enhanced reconstruction performance along with fast inference. However, learning deep networks requires substantial training datasets that may be difficult to collect. Moreover, since SG priors learn aggregate representations of MRI data across training subjects, they may show poor generalization to subject-specific variability in anatomy [19].

SS priors: Unlike SG priors, SS priors are not learned from a dedicated training dataset but instead they are learned directly for individual test scans to improve generalization [15]. The SS prior can also be used to implement R(.)R(.) in Eq. 2 with the respective regularization term expressed as:

RSS(x)=argmin𝑥CSS(F1y;θ^SS)x22\displaystyle R_{SS}\left({x}\right)=\underset{x}{\arg\min}||C_{SS}(F^{-1}y;\hat{\theta}_{SS})-x||^{2}_{2} (7)

where CSSC_{SS} is an image-domain network with parameters θ^SS\hat{\theta}_{SS}. In the absence of ground-truth images, the parameters θSSq{\theta}_{SS}^{q} for the qqth test scan can be learned based on proxy k-space losses between reconstructed and acquired undersampled data [22]. Learning is conducted online to minimize this proxy loss:
其中 CSSsubscriptC_{SS} 是具有参数 θ^SSsubscript\hat{\theta}_{SS} 的影像域网络。在没有真实图像的情况下,可以根据重建和采集的欠采样数据之间的代理 k 空间损失来学习 qq 第 x 次测试扫描的参数 θSSqsuperscriptsubscript{\theta}_{SS}^{q} [22]。学习在线进行,以尽量减少这种代理损失:

SS(θSSq)=MFCSS(F1yq;θSSq)yqp\displaystyle\mathcal{L}_{SS}(\theta_{SS}^{q})=||MFC_{SS}(F^{-1}y^{q};\theta_{SS}^{q})-y^{q}||_{p} (8)

where yqy^{q} are acquired undersampled k-space data for the qqth scan. An unrolled architecture can be adopted to build CSSC_{SS} by performing cascaded projections through network and DC blocks, resulting in the following mapping for the kkth cascade:

xkq=fDC(fSS(xk1q;θSSq))\displaystyle x^{q}_{k}=f_{DC}\left(f_{SS}\left(x^{q}_{k-1};\theta_{SS}^{q}\right)\right) (9)

fSSf_{SS} can be operationalized as a linear or nonlinear network [23, 22]. As the parameters of SS priors are learned independently for each test scan, they promise enhanced generalization to subject-specific anatomy. However, since training is performed online during inference, SS priors can introduce substantial computational burden, particularly when deep nonlinear networks are used that also increase the risk of overfitting [70].

Refer to caption
Figure 1: (a) PSFNet comprises a parallel-stream cascade of sub-networks where each sub-network contains (b) a scan-general (SG) block, and (c) a scan-specific (SS) block. The two parallel blocks are each succeeded by (d) a data-consistency (DC) block, and their outputs are aggregated with learnable fusion weights, ηk\eta_{k} and γk\gamma_{k} where kk is the cascade index. At the end of KK cascades, coil-combination is performed on multi-coil data using sensitivity maps estimated via ESPIRiT [71]. The SG block is implemented as a deep convolutional neural network (CNN) and the SS block was implemented as a linear projection layer.
图 1:(a) PSFNet 由子网络的并行流级联组成,其中每个子网络包含 (b) 一个通用扫描 (SG) 块,以及 (c) 一个扫描特定 (SS) 块。这两个并行块分别由 (d) 数据一致性 (DC) 块继承,它们的输出与可学习的融合权重聚合, ηksubscript\eta_{k} γksubscript\gamma_{k} 其中 kk 是级联索引。在级联结束时 KK ,使用通过 ESPIRiT [71] 估计的灵敏度图对多线圈数据进行线圈组合。SG 块实现为深度卷积神经网络 (CNN),SS 块实现为线性投影层。

2.2 PSFNet

Here, we propose to combine SS and SG priors to maintain a favorable trade-off between generalization performance and computational efficiency under low-data regimes. In the conventional unrolling framework, this requires computation of serially alternated projections through the SS, SG and DC blocks:

xkr=fDC(fSG(fSS(xk1r;θSSr);θSG))\displaystyle x^{r}_{k}=f_{DC}\left(f_{SG}\left(f_{SS}\left(x^{r}_{k-1};\theta^{r}_{SS}\right);{\theta}_{SG}\right)\right) (10)

The unrolled architecture with KK cascades can be learned offline using the training set. Note that scarcely-trained SG blocks under low-data regimes can perform suboptimally, introducing residual errors in their output. In turn, these errors will accumulate across serial projections to degrade the overall performance.

To address this limitation, here we introduce a novel architecture, PSFNet, that performs parallel-stream fusion of SS and SG priors as opposed to the serial combination in conventional unrolled methods. PSFNet utilizes a nonlinear SG prior for high performance, and a linear SS prior to enhance generalization without excessive computational burden. The two priors undergo parallel-stream fusion with learnable fusion parameters η\eta and γ\gamma, as displayed in Figure 1. These parameters adaptively control the relative weighting of information extracted by the SG versus SS streams during the course of training in order to alleviate error accumulation. As such, the mapping through the kkth cascade in PSFNet is:

xkr=ηkfDC(fSS(xk1r;θSSr))+γkfDC(fSG(xk1r;θSG))x^{r}_{k}=\eta_{k}f_{DC}(f_{SS}(x^{r}_{k-1};\theta_{SS}^{r}))+\gamma_{k}f_{DC}(f_{SG}(x^{r}_{k-1};{\theta}_{SG})) (11)

In Eq. 11, the learnable fusion parameters for the SS and SG blocks at the kkth cascade are ηk\eta_{k} and γk\gamma_{k}, respectively. To enforce fidelity to acquired data, DC projections are performed on the outputs of SG and SS blocks. In PSFNet, the SG prior is learned collectively from the set of training scans and then frozen during inference on test scans. In contrast, the SS prior is learned individually for each scan, during both training and inference.

Training: PSFNet involves a training phase to learn model parameters for the SG prior as well as its fusion with the SS prior. For each individual scan in the training set, PSFNet learns a dedicated SS prior for the given scan. Since learning of a nonlinear SS prior has substantial computational burden, we adopt a linear SS prior in PSFNet. In particular, the SS block performs dealiasing via convolution with a linear kernel [71]:

fSS(xk1n;θSSn)=F1{θSSnFxk1n}\displaystyle f_{SS}(x^{n}_{k-1};\theta^{n}_{SS})=F^{-1}\{\theta^{n}_{SS}\circledast Fx^{n}_{k-1}\} (12)

where θSSn(z×z×w×w)\theta^{n}_{SS}\in\mathbb{C}^{(z\times z\times w\times w)} with nn denoting the training scan index, zz denoting the number of coil elements, and ww denoting the kernel size in k-space. The SS blocks contain unlearned Fourier and inverse Fourier transformation layers as their input and output layers, respectively, and convolution is computed over the spatial frequency dimensions in k-space. Meanwhile, the SG prior is implemented as a deep CNN operating in image domain:
其中 θSSn(z×z×w×w)subscriptsuperscriptsuperscript\theta^{n}_{SS}\in\mathbb{C}^{(z\times z\times w\times w)} with nn 表示训练扫描索引, zz 表示线圈元件的数量,并 ww 表示 k 空间中的内核大小。SS 块包含未学习的傅里叶和逆傅里叶变换层分别作为其输入和输出层,并在 k 空间中的空间频率维度上计算卷积。同时,SG 先验被实现为在图像域中运行的深度 CNN:

fSG(xk1n;θSG)=CNN(xk1n)\displaystyle f_{SG}(x^{n}_{k-1};\theta_{SG})=CNN(x^{n}_{k-1}) (13)

Across the scans in the training set, the training loss for PSFNet can then be expressed in constrained form as:

PSFNet(θSG,𝜸,𝜼)=n=1N||ηKfDC(fSS(xK1n;θ^SSn))+γKfDC(fSG(xK1n;θSG))x˘n||ps.t. θ^SSn=argminθSSnF1WnynfSS(F1Wnyn;θSSn)22\mathcal{L}_{PSFNet}(\theta_{SG},\boldsymbol{\gamma},\boldsymbol{\eta})=\sum_{n=1}^{N}||\eta_{K}f_{DC}(f_{SS}(x^{n}_{K-1};\hat{\theta}_{SS}^{n}))\\ +\gamma_{K}f_{DC}(f_{SG}(x^{n}_{K-1};\theta_{SG}))-\breve{x}^{n}||_{p}\\ \mbox{s.t. }\hat{\theta}_{SS}^{n}=\underset{\theta_{SS}^{n}}{\arg\min}||F^{-1}W^{n}y^{n}-f_{SS}(F^{-1}W^{n}y^{n};\theta^{n}_{SS})||_{2}^{2} (14)

The constraint in Eq. 14 corresponds to the scan-specific learning of the SS prior θ^SSn\hat{\theta}^{n}_{SS}, which is then adopted to calculate the loss. Assuming that the linear relationships among neighboring spatial frequencies are similarly distributed across k-space [71], θ^SSn\hat{\theta}^{n}_{SS} is learned by solving a self-regression problem on the subset of fully-sampled data in central k-space, where WnW^{n} is a mask operator that selects data within this calibration region.

Note that, unlike deep reconstruction models purely based on SG priors, the SG prior in PSFNet is not directly trained to remove artifacts in zero-filled reconstructions of undersampled data. Instead, the SG prior is trained to concurrently suppress artifacts in reconstructed images along with the SS prior; and the relative importance attributed to the two priors is determined by the fusion parameters at each cascade. As such, the SS prior can be given higher weight during initial training iterations where the SG prior is scarcely trained, whereas its weight can be relatively reduced during later iterations once the SG prior has been sufficiently trained. This adaptive fusion approach thereby lowers reliance on the availability of large training sets.

Inference: During inference on the qqth test scan, the respective SS prior is learned online as:

θ^SSq=argminθSSqF1WqyqfSSq(F1Wqyq;θSSq)22\displaystyle\hat{\theta}_{SS}^{q}=\underset{\theta_{SS}^{q}}{\arg\min}||F^{-1}W^{q}y^{q}-f_{SS}^{q}(F^{-1}W^{q}y^{q};\theta^{q}_{SS})||_{2}^{2} (15)

Afterwards, the learned θ^SSq\hat{\theta}_{SS}^{q} is used along with the previously trained θ^SG\hat{\theta}_{SG} to perform repeated projections through KK cascades as described in Eq. 11. The multi-coil image recovered by PSFNet at the output of the KK cascade is:

x^q=ηKfDC(fSS(xK1q;θ^SSq))+γKfDC(fSG(xK1q;θ^SG))\hat{x}^{q}=\eta_{K}f_{DC}(f_{SS}(x^{q}_{K-1};\hat{\theta}_{SS}^{q}))+\gamma_{K}f_{DC}(f_{SG}(x_{K-1}^{q};\hat{\theta}_{SG})) (16)

where x^q\hat{x}^{q} denotes the recovered image. The final reconstruction can be obtained by performing combination across coils:

x^combinedq=Ax^q\hat{x}^{q}_{combined}=A^{*}\hat{x}^{q} (17)

where AA are coil sensitivities, and AA^{*} denotes the conjugate of AA.

3 Methods

3.1 Implementation Details

In each cascade, PSFNet contained two parallel streams with SG and SS blocks. The SG blocks comprised an input layer followed by a stack of 4 convolutional layers with 64 channels and 3x3 kernel size each, and an output layer with ReLU activation functions. They processed complex images with separate channels for real and imaginary components. The SS blocks comprised a Fourier layer, 5 projection layers with identity activation functions, and an inverse Fourier layer. They processed complex images directly without splitting real and imaginary components. The linear convolution kernel used in the projection layers was learned from the calibration region by solving a Tikhonov regularized self-regression problem [11]. The DC blocks comprised 3 layers respectively to implement forward Fourier transformation, restoration of acquired k-space data and inverse Fourier transformation. PSFNet was implemented with 5 cascades, KK=5. The weights of SG, SS, and DC blocks were tied across cascades to limit model complexity [27]. The only exception were fusion coefficients that determine the relative weighting of the SG and SS blocks at each stage (γ1,..,γk,,γ5\gamma_{1},..,\gamma_{k},...,\gamma_{5} η1,ηk,,η5\eta_{1},...\eta_{k},...,\eta_{5}). These fusion parameters were kept distinct across cascades. Coil-combination on the recovered multi-coil images was performed using sensitivity maps estimated via ESPIRiT [71].
在每个级联中,PSFNet 包含两个带有 SG 和 SS 块的并行流。SG 块包括一个输入层,后跟一个由 4 个卷积层组成的堆栈,每个卷积层有 64 个通道,每个卷积层的大小为 3x3 内核,以及一个具有 ReLU 激活函数的输出层。他们使用单独的通道处理实部和虚部的复杂图像。SS 块包括一个傅里叶层、5 个具有身份激活函数的投影层和一个逆傅里叶层。他们直接处理复杂的图像,而无需拆分实部和虚部。投影层中使用的线性卷积核是通过解决 Tikhonov 正则化自回归问题 [11] 从校准区域学习的。DC 模块由 3 层组成,分别实现正向傅里叶变换、获取的 k 空间数据恢复和逆傅里叶变换。PSFNet 使用 5 个级联 KK (=5) 实现。SG、SS 和 DC 块的权重在级联之间捆绑在一起,以限制模型复杂性 [27]。唯一的例外是确定每个阶段 SG 和 SS 块的相对权重的融合系数 ( γ1,..,γk,,γ5\gamma_{1},..,\gamma_{k},...,\gamma_{5} η1,ηk,,η5subscript1subscriptsubscript5\eta_{1},...\eta_{k},...,\eta_{5} )。这些融合参数在级联中保持不同。使用通过 ESPIRiT [71] 估计的灵敏度图对恢复的多线圈图像进行线圈组合。

3.2 MRI Dataset

Experimental demonstrations were performed using brain MRI scans from the NYU fastMRI database [72]. Here, contrast-enhanced T1-weighted (cT1-weighted) and T2-weighted acquisitions were considered. The fastMRI dataset contains volumetric MRI data with varying image and coil dimensionality across subjects. Note that a central aim of this work was to systematically examine the learning capabilities of models for varying number of training samples. To minimize potential biases due to across-subject variability in MRI protocols, here we selected subjects with matching imaging matrix size and number of coils. To do this, we only selected subjects with at least 10 cross-sections and only the central 10 cross-sections were retained in each subject. We further selected subjects with an in-plane matrix size of 256x320 for cT1 acquisitions, and of 288x384 for T2 acquisitions. Background regions in MRI data with higher dimensions were cropped. Lastly, we restricted our sample selection to subjects with at least 5 coil elements, and geometric coil compression [73] was applied to unify the number of coils to 5 in all subjects.

Fully-sampled acquisitions were retrospectively undersampled to achieve acceleration rates of R=4x and 8x. Random undersampling patterns were designed via either a bi-variate normal density function peaking at the center of k-space, or a uniform density function across k-space. The standard deviation of the normal density function was adjusted to maintain the expected value of R across k-space. The fully-sampled calibration region spanned a 40x40 window in central k-space.

3.3 Competing Methods

PSFNet was compared against several state-of-the-art approaches including SG methods, SS methods, and traditional PI reconstructions. For methods containing SG priors, both supervised and unsupervised variants were implemented.

PSFNet: A supervised variant of PSFNet was trained using paired sets of undersampled and fully-sampled acquisitions.

PSFNetUS: An unsupervised variant of PSFNet was implemented using self-supervision based on only undersampled training data. Acquired data were split into two non-overlapping sets where 40% of samples was reserved for evaluating the training loss and 60% of samples was reserved to enforce DC [64].

MoDL: A supervised SG methods based on an unrolled architecture with tied weights across cascades was used [27]. MoDL serially interleaves SG and DC blocks. The number of cascades and the structure of SG and DC blocks were identical to those in PSFNet.

MoDLUS: An unsupervised variant of MoDL was implemented using self-supervision. A 40%-60% split was performed on acquired data to evaluate the training loss and enforce data consistency, respectively [64].

sRAKI-RNN: An SS method was implemented based on the MoDL architecture [67]. Learning was performed to minimize DC loss on the fully-sampled calibration region. Calibration data were randomly split with 75% of samples used to define the training loss and 25% of samples reserved to enforce DC. Multiple input-output pairs were produced for a single test sample by utilizing this split.

SPIRiT: A traditional PI reconstruction was performed using the SPIRiT method [11]. Reconstruction parameters including the regularization weight for kernel estimation (κ\kappa), kernel size (ww), and the number of iterations (NiterN_{iter}) were independently optimized for each reconstruction task via cross-validation.

SPARK: An SS method was used to correct residual errors from an initial SPIRiT reconstruction [17]. Learning was performed to minimize DC loss on the calibration region. The learned SS prior was then used to correct residual errors in the remainder of k-space.

3.4 Optimization Procedures

For all methods, hyperparameter selection was performed via cross-validation on a three-way split of data across subjects. There was no overlap among training, validation and test sets in terms of subjects. Data from 10 subjects were reserved for validation, and data from a separate set of 40 subjects were reserved for testing. The number of subjects in the training set was varied from 1 to 50. Hyperparameters that maximized peak signal-to-noise ratio (PSNR) on the validation set were selected for each method.

Training was performed via the Adam optimizer with learning rate ζ\zeta=10410^{-4}, β1\beta_{1}=0.90 and β2\beta_{2}=0.99 [74]. All deep learning methods were trained to minimize hybrid 1\ell_{1}-2\ell_{2}-norm loss between recovered and target data (e.g., between reconstructed and ground truth images for PSFNet, between recovered and acquired k-space samples for PSFNetUS) [64]. For PSFNet and MoDL, the selected number of epochs was 200, batch size was set to 2 for the limited number of training samples (Nsamples<N_{samples}<10), and to 5 otherwise. In DC blocks, λ=\lambda=\infty was used to enforce strict data consistency. For PSFNet and SPIRiT, the kernel width (ww) and regularization parameter (κ\kappa) values were set as (κ\kappa, ww) = (10210^{-2}, 9) at R= 4 and (10210^{-2}, 9) at R=8 for cT1-weighted reconstructions, and as (100, 17) at R=4 and (10210^{-2}, 17) at R=8 for T2-weighted reconstructions. For SPIRiT, the number of iterations NiterN_{iter} was set as 13 at R=4 and 27 at R=8 for cT1-weighted reconstructions, 20 at R=4 and 38 at R=8 for T2-weighted reconstructions. For sRAKI-RNN, the selected number of epochs was 500 and batch size was set to 32. All other optimization procedures were identical to MoDL. For SPARK, network architecture and training procedures were adopted from [17], except for the number of epochs (NepochN_{epoch}) and learning rate (ζ\zeta) which were optimized on the validation set as (NepochN_{epoch}, ζ\zeta)= (100, 10210^{-2}) For cT1-weighted reconstructions, and (NepochN_{epoch}, ζ\zeta)= (250, 10310^{-3}) for T2-weighted reconstructions.

All competing methods were executed on an NVidia RTX 3090 GPU, and models were coded in Tensorflow except for SPARK which was implemented in PyTorch. SPARK was implemented using the toolbox at https://github.com/YaminArefeen/spark_mrm_2021. The code to implement PSFNet will be available publicly at https://github.com/icon-lab/PSFNet upon publication.
所有竞争方法都在 NVidia RTX 3090 GPU 上执行,除了 SPARK 在 PyTorch 中实现外,模型都是用 Tensorflow 编码的。SPARK 是使用 https://github.com/YaminArefeen/spark_mrm_2021 的工具箱实现的。实施 PSFNet 的代码将在发布后于 https://github.com/icon-lab/PSFNet 公开提供。

3.5 Performance Metrics

Performance assessments for reconstruction methods were carried out by visual observations and quantitative metrics. PSNR and structural similarity index (SSIM) were used for quantitative evaluation. For each method, metrics were computed on coil-combined images from the reconstruction and from the fully-sampled ground truth acquisition. Statistical differences between competing methods were examined via non-parametric Wilcoxon signed-rank tests.

Refer to caption
Figure 2: Average PSNR across test subjects for (a) cT1- and (b) T2-weighted image reconstructions at R=4x. Model training was performed for varying number of training samples (NsamplesN_{samples}, lower x-axis) and thereby training subjects (NsubjectsN_{subjects}, upper x-axis). Results are shown for SPIRiT, SPARK, sRAKI-RNN, MoDL and PSFNet.
Refer to caption
Figure 3: cT1-weighted image reconstructions at R=4x via SPIRiT, SPARK, sRAKI-RNN, MoDL, and PSFNet along with the zero-filled reconstruction (ZF) and the reference image obtained from the fully-sampled acquisition. Error maps for each method are shown in the bottom row. MoDL and PSFNet were trained on 10 cross-sections from a single subject. SPIRiT, SPARK and sRAKI-RNN directly performed inference on test data without a priori model training. PSFNet shows superior performance to competing methods in terms of residual reconstruction errors.
Refer to caption
Figure 4: T2-weighted image reconstructions at R=4x via SPIRiT, SPARK, sRAKI-RNN, MoDL, and PSFNet along with the zero-filled reconstruction (ZF) and the reference image obtained from the fully-sampled acquisition. Error maps for each method are shown in the bottom row. MoDL and PSFNet were trained on 10 cross-sections from a single subject. SPIRiT, SPARK and sRAKI-RNN directly performed inference on test data without a priori model training. PSFNet shows superior performance to competing methods in terms of residual reconstruction errors.

3.6 Experiments

Several different experiments were conducted to systematically examine the performance of competing methods. Assessments aimed to investigate reconstruction performance under low training data regimes, generalization performance in case of mismatch between training and testing domains, contribution of the parallel-stream design to reconstruction performance, sensitivity to hyperparameter selection, performance in unsupervised learning, and computational complexity.

Performance in low-data regimes: Deep SG methods for MRI reconstruction typically suffer from suboptimal performance as the size of the training dataset is constrained. To systematically examine reconstruction performance, we trained supervised variants of PSFNet and MoDL while the number of training samples (NsamplesN_{samples}) was varied in the range [2-500] cross sections. To attain a given number of samples, sequential selection was performed across subjects and across cross-sections within each subject. Thus, the number of unique subjects included in the training set roughly corresponded to Nsamples/10N_{samples}/10 (since there were 10 cross-sections per subject). SS reconstructions were also performed with sRAKI-RNN, SPIRiT and SPARK. In the absence of fully-sampled ground truth data to guide the learning of the prior, unsupervised training of deep reconstruction models may prove relatively more difficult compared to supervised training. In turn, this may elevate requirements on training datasets for unsupervised models. To examine data efficiency for unsupervised training, we compared the reconstruction performance of PSFNetUS and MoDLUS as NsamplesN_{samples} was varied in the range of [2-500] cross sections. Comparisons were also provided against sRAKI-RNN, SPIRiT and SPARK.

Generalization performance: Deep reconstruction models can suffer from suboptimal generalization when the MRI data distribution shows substantial variation between the training and testing domains. To examine generalizability, PSFNet models were trained on data from a source domain and tested on data from a different target domain. The domain-transferred models were then compared to models trained and tested directly in the target domain. Three different factors were altered to induce domain variation: tissue contrast, undersampling pattern, and acceleration rate. First, the capability to generalize to different tissue contrasts was evaluated. Models were trained on data from a source contrast and tested on data from a different target contrast. Domain-transferred models were compared to target-domain models trained on data from the target contrast. Next, the capability to generalize to different undersampling patterns was assessed. Models were trained on data undersampled with variable-density patterns and tested on data undersampled with uniform-density patterns. Domain-transferred models were compared to target-domain models trained on uniformly undersampled data. Lastly, the capability to generalize to different acceleration rates was examined. Models were trained on acquisitions accelerated at R=4x and tested on acquisitions accelerated at R=8x. Domain-transferred models were compared to target-domain models trained at R=8x.

Sensitivity to hyperparameters: SS priors are learned from individual test scans as opposed to SG priors trained on larger training datasets. Thus, SS priors might show elevated sensitivity to hyperparameter selection. We assessed the reliability of reconstruction performance against suboptimal hyperparameter selection for SS priors. For this purpose, analyses were conducted on SPIRiT, SPARK and PSFNet that embody SS methods to perform linear reconstructions in k-space. The set of hyperparameters examined included regularization parameters for kernel estimation (κ\kappa) and kernel size (ww). Separate models were trained using κ\kappa in range [10-3-100] and ww in range [5-17].

Computational complexity: Finally, we assessed the computational complexity of competing methods. For each method, training and inference times were measured for a single subject with 10 cross-sections. Each cross-section had an imaging matrix size of 256x320 and contained data from 5 coils. For all methods including SS priors, hyperparameters optimized for cT1-weighted reconstructions at R=4 were used.

Ablation analysis: To assess the contribution of the parallel-stream design in PSFNet, a conventional unrolled variant of PSFNet was formed, named as PSFNetSerial. PSFNetSerial combined the SG and SS priors via serial projections as described in Eq. 10. Modeling procedures and the design of SG and SS blocks were kept identical between PSFNet and PSFNetSerial for fair comparison. Performance was assessed as NsamplesN_{samples} was varied in the range of [2-500] cross sections.

4 Results

4.1 Performance in Low-Data Regimes

Common SG methods for MRI reconstruction are based on deep networks that require copious amounts of training data, so performance can substantially decline on limited training sets [28, 59]. In contrast, PSFNet leverages an SG prior to concurrently reconstruct an image along with an SS prior. Therefore, we reasoned that its performance should scale favorably under low-data regimes compared to SG methods. We also reasoned that PSFNet should yield elevated performance compared to SS methods due to residual corrections from its SG prior. To test these predictions, we trained supervised variants of PSFNet and MoDL along with SPIRiT, sRAKI-RNN, and SPARK while the number of training samples (NsamplesN_{samples}) was systematically varied. Figure 2 displays PSNR performance for cT1-weighted and T2-weighted image reconstruction as a function of NsamplesN_{samples}. PSFNet outperforms the scan-general MoDL method for all values of NsamplesN_{samples} (p<0.05p<0.05). As expected, performance benefits with PSFNet become more prominent towards lower values of NsamplesN_{samples}. PSFNet also outperforms traditional SPIRiT and scan-specific sRAKI-RNN and SPARK methods broadly across the examined range of NsamplesN_{samples} (p<0.05p<0.05). Note that while MoDL requires Nsamples=30N_{samples}=30 (3 subjects) to offer on par performance to SS methods, PSFNet yields superior performance with as few as Nsamples=2N_{samples}=2. Representative reconstructions for cT1- and T2-weighted images are depicted in Figures 3 and 4, where Nsamples=10N_{samples}=10 from a single subject were used for training. PSFNet yields lower reconstruction errors compared to all other methods in this low-data regime, where competing methods either show elevated noise or blurring.
用于 MRI 重建的常见 SG 方法基于需要大量训练数据的深度网络,因此在有限的训练集上性能可能会大幅下降 [2859]。相比之下,PSFNet 在同时重建映像之前利用 SG 以及 SS 先验。因此,我们推断,与 SG 方法相比,它的性能在低数据制度下应该具有良好的扩展性。我们还推断,与 SS 方法相比,PSFNet 应该产生更高的性能,因为其 SG 先验的残差校正。为了测试这些预测,我们训练了 PSFNet 和 MoDL 的监督变体以及 SPIRiT、sRAKI-RNN 和 SPARK,而训练样本的数量 ( NsamplessubscriptN_{samples} ) 是系统变化的。图 2 显示了 cT1 加权和 T2 加权图像重建的 PSNR 性能与 的函数关系。 NsamplessubscriptN_{samples} 对于 ( p<0.050.05p<0.05 ) 的所有 NsamplessubscriptN_{samples} 值,PSFNet 的性能都优于扫描通用 MoDL 方法。正如预期的那样,PSFNet 的性能优势在较低的 . NsamplessubscriptN_{samples} PSFNet 在检查的范围内也广泛优于传统的 SPIRiT 和扫描特异性 sRAKI-RNN 和 SPARK 方法 NsamplessubscriptN_{samples}p<0.050.05p<0.05 )。请注意,虽然 MoDL 需要 Nsamples=30subscript30N_{samples}=30 (3 个主题)提供与 SS 方法相当的性能,但 PSFNet 只需 Nsamples=2subscript2N_{samples}=2 即可产生卓越的性能。图 34 描述了 cT1 和 T2 加权图像的代表性重建,其中 Nsamples=10subscript10N_{samples}=10 使用单个受试者进行训练。与这种低数据范围内的所有其他方法相比,PSFNet 产生的重建误差较低,其中竞争方法要么显示更高的噪声,要么显示模糊。

Refer to caption
Figure 5: Weighting of the SG (γ\gamma) and SS (η\eta) blocks in the final cascade of PSFNet. Weights were averaged across models trained for cT1- and T2-weighted reconstructions at R=4x. Model training was performed for varying number of training samples (NsamplesN_{samples}, lower x-axis) and thereby training subjects (NsubjectsN_{subjects}, upper x-axis). Both blocks are equally weighted with very limited training data. As NsamplesN_{samples} increases, the weighting of the SG prior becomes more dominant over the weighting of the SS prior.
Refer to caption
Figure 6: Average PSNR across test subjects for (a) cT1- and (b) T2-weighted image reconstructions at R=4x. Model training was performed for varying number of training samples (NsamplesN_{samples}, lower x-axis) and thereby training subjects (NsubjectsN_{subjects}, upper x-axis). Results are shown for SPIRiT, SPARK, sRAKI-RNN, MoDLUS and PSFNetUS.

Naturally, the performance of PSFNet increases as more training samples are available. Since the SS prior is independently learned for individual samples, it should not elicit systematic performance variations depending on NsamplesN_{samples}. Thus, the performance gains can be attributed to improved learning of the SG prior. In turn, we predicted that PSFNet would put more emphasis on its SG stream as its reliability increases. To examine this issue, we inspected the weightings of the SG (γ\gamma) and SS (η\eta) streams as the training set size was varied. Figure 5 displays weightings at the last cascade as a function of NsamplesN_{samples}. For lower values of NsamplesN_{samples} where the quality of the SG prior is relatively limited, the SG and SS priors are almost equally weighted. In contrast, as the learning of the SG prior improves with higher NsamplesN_{samples}, the emphasis on the SG prior increases while the SS prior is less heavily weighted.

We then questioned whether the performance benefits of PSFNet are also apparent during unsupervised training of deep network models. For this purpose, unsupervised variants PSFNetUS and MoDLUS were trained via self-supervision [64]. PSFNetUS was compared against MoDLUS, SPIRiT, sRAKI-RNN, and SPARK while the number of training samples (NsamplesN_{samples}) was systematically varied. Figure 6 displays PSNR performance for cT1-weighted and T2-weighted image reconstruction as a function of NsamplesN_{samples}. Similar to the supervised setting, PSFNetUS outperforms MoDLUS for all values of NsamplesN_{samples} (p<0.05p<0.05), and the performance benefits are more noticeable at lower NsamplesN_{samples}. In this case, however, MoDLUS is unable to reach the performance of the best performing SS method (SPARK) even at Nsamples=500N_{samples}=500. In contrast, PSFNetUS starts outperforming SPARK with approximately Nsamples=50N_{samples}=50 (5 subjects). The enhanced reconstruction quality with PSFNetUS is corroborated in representative reconstructions for cT1- and T2-weighted images depicted in Figures 7 and 8, where Nsamples=100N_{samples}=100 were used for training. Taken together, these results indicate that the data-efficient nature of PSFNet facilitates the training of both supervised and unsupervised MRI reconstruction models.

Refer to caption
Figure 7: cT1-weighted image reconstructions at R=4x via SPIRiT, SPARK, sRAKI-RNN, MoDLUS, and PSFNetUS along with the zero-filled reconstruction (ZF) and the reference image obtained from the fully-sampled acquisition. Error maps for each method are shown in the bottom row. MoDLUS and PSFNetUS were trained on 100 cross-sections (from 10 subjects). SPIRiT, SPARK and sRAKI-RNN directly performed inference on test data without a priori model training. PSFNetUS shows superior performance to competing methods in terms of residual reconstruction errors.
Refer to caption
Figure 8: T2-weighted image reconstructions at R=4x via SPIRiT, SPARK, sRAKI-RNN, MoDLUS, and PSFNetUS along with the zero-filled reconstruction (ZF) and the reference image obtained from the fully-sampled acquisition. Error maps for each method are shown in the bottom row. MoDLUS and PSFNetUS were trained on 100 cross-sections (from 10 subjects). SPIRiT, SPARK and sRAKI-RNN directly performed inference on test data without a priori model training. PSFNetUS shows superior performance to competing methods in terms of residual reconstruction errors.

4.2 Generalization Performance

An important advantage of SS priors is that they allow model adaptation to individual test samples, thereby promise enhanced performance in out-of-domain reconstructions [22]. Yet, SG priors with fixed parameters might show relatively limited generalizability during inference [23, 75]. To assess generalization performance, we introduced domain variations by altering three experimental factors: tissue contrast, undersampling pattern, and acceleration rate. For methods comprising SG components, we built both target-domain models that were trained in the target domain, and domain-transferred models that were trained in a non-target domain. We then compared the reconstruction performances of the two models in the target domain.

First, we examined generalization performance when the tissue contrast varied between training and testing domains (e.g., trained on cT1, tested on T2). Table 1 lists performance metrics for competing methods with Nsamples=500N_{samples}=500. While performance losses are incurred for domain-transferred PSFNet-DT and MoDL-DT models that contain SG components, these losses are modest. On average, MoDL-DT shows a loss of 0.3dB PSNR and 0.1% SSIM (p<0.05p<0.05), and PSFNet-DT shows a loss of 0.2dB PSNR and 0.1% SSIM (p<0.05p<0.05). Note that PSFNet-DT still outperforms the closest competing SS method by 2.2dB PSNR and 1.8% SSIM (p<0.05p<0.05).

Second, we examined generalization performance when models were trained with variable-density and tested on uniform-density undersampling patterns. Table 2 lists performance metrics for competing methods. On average across tissue contrasts, MoDL-DT suffers a notable performance loss of 3.6dB PSNR and 2.5% SSIM (p<0.05p<0.05). In contrast, PSFNet-DT shows a relatively limited loss of 0.4dB PSNR and 0.2% SSIM (p<0.05p<0.05). Note that PSFNet-DT again outperforms the closest competing SS method by 3.4dB PSNR and 3.7% SSIM (p<0.05p<0.05).

Third, we examined generalization performance when models were trained at R=4x and tested on R=8x. Table 3 lists performance metrics for competing methods. On average across tissue contrasts, MoDL-DT suffers a notable performance loss of 1.0dB PSNR and performs slightly better in SSIM by 0.2%SSIM (p<0.05p<0.05), whereas PSFNet-DT shows a lower loss of 0.6dB PSNR (p<0.05p<0.05) and performs similarly in SSIM (p>0.05p>0.05). PSFNet-DT outperforms the closest competing SS method by 1.2dB PSNR and 1.9% SSIM (p<0.05p<0.05). Taken together, these results clearly suggest that the SS prior in PSFNet contributes to its improved generalization performance over the scan-general MoDL method, while the SG prior in PSFNet enables it to outperform competing SS methods.

Table 1: Generalization across tissue contrasts. PSNR and SSIM values (mean±\pmstandard error) across test subjects. Results are shown for scan-specific models (SPIRiT, SPARK, sRAKI-RNN), target-domain models (MoDL, PSFNet) and domain-transferred models (MoDL-DT, PSFNet-DT) at R=4x. The tissue contrast in the target domain is listed in the left-most column (cT1 or T2), domain-transferred models were trained for the non-target tissue contrast.
SPIRiT SPARK sRAKI-RNN MoDL MoDL-DT PSFNet PSFNet-DT
PSNR
cT1 37.6 37.6 36.8 38.5 38.2 39.9 39.4
±\pm1.5 ±\pm1.5 ±\pm1.3 ±\pm 1.5 ±\pm1.5 ±\pm1.7 ±\pm1.6
T2 35.8 36.5 35.2 37.9 37.5 39.0 39.0
±\pm1.0 ±\pm1.0 ±\pm1.1 ±\pm 1.0 ±\pm1.1 ±\pm1.0 ±\pm0.9
SSIM
cT1 93.1 93.3 93.8 95.1 94.8 95.8 95.6
±\pm1.5 ±\pm1.4 ±\pm1.0 ±\pm1.0 ±\pm1.1 ±\pm1.0 ±\pm1.0
T2 90.8 93.1 94.9 96.2 96.2 96.7 96.8
±\pm1.2 ±\pm1.0 ±\pm0.6 ±\pm0.5 ±\pm0.5 ±\pm0.4 ±\pm0.4
Table 2: Generalization across undersampling patterns. PSNR and SSIM values (mean±\pmstandard error) across test subjects. Results are shown for Results are shown for scan-specific models (SPIRiT, SPARK, sRAKI-RNN), target-domain models (MoDL, PSFNet) and domain-transferred models (MoDL-DT, PSFNet-DT) at R=4x. Domain-transferred models were trained with variable-density undersampling, and tested on uniform-density undersampling. Target-domain models were trained and tested with uniform-density undersampling.
SPIRiT SPARK sRAKI-RNN MoDL MoDL-DT PSFNet PSFNet-DT
PSNR
cT1 37.1 37.1 33.6 37.0 33.6 40.2 39.9
±\pm1.8 ±\pm1.7 ±\pm1.4 ±\pm 1.7 ±\pm1.8 ±\pm1.6 ±\pm1.6
T2 35.1 35.6 31.6 37.0 33.2 40.2 39.7
±\pm1.3 ±\pm1.3 ±\pm1.5 ±\pm 1.1 ±\pm1.2 ±\pm1.1 ±\pm1.2
SSIM
cT1 92.9 93.0 91.2 93.4 91.2 95.9 95.6
±\pm1.5 ±\pm1.5 ±\pm1.5 ±\pm1.3 ±\pm2.0 ±\pm1.2 ±\pm1.2
T2 90.6 92.1 91.5 95.6 92.7 97.1 96.9
±\pm1.5 ±\pm1.5 ±\pm1.2 ±\pm0.7 ±\pm1.1 ±\pm0.6 ±\pm0.6
Table 3: Generalization across acceleration rates. PSNR and SSIM values (mean±\pmstandard error) across test subjects. Results are shown for scan-specific models (SPIRiT, SPARK, sRAKI-RNN), target-domain models (MoDL, PSFNet) and domain-transferred models (MoDL-DT, PSFNet-DT). Domain-transferred models were trained at R=4x and tested at R=8x. Target-domain models were trained and tested at R=8x.
SPIRiT SPARK sRAKI-RNN MoDL MoDL-DT PSFNet PSFNet-DT
PSNR
cT1 34.7 34.8 34.3 35.3 34.5 36.5 36.2
±\pm1.5 ±\pm1.5 ±\pm1.5 ±\pm 1.4 ±\pm1.7 ±\pm1.5 ±\pm1.5
T2 33.6 33.7 32.6 34.6 33.4 35.6 34.6
±\pm1.0 ±\pm1.0 ±\pm0.9 ±\pm 1.0 ±\pm1.2 ±\pm1.1 ±\pm1.2
SSIM
cT1 89.8 90.8 91.4 92.1 92.2 93.3 93.3
±\pm1.9 ±\pm1.6 ±\pm1.4 ±\pm1.5 ±\pm1.4 ±\pm1.4 ±\pm1.4
T2 89.0 90.1 92.7 93.5 93.7 94.6 94.5
±\pm1.3 ±\pm1.1 ±\pm0.9 ±\pm0.8 ±\pm0.8 ±\pm0.7 ±\pm0.7

4.3 Sensitivity to Hyperparameters

Parameters of deep networks that implement SS priors are to be learned from a single test sample, so the resultant models can show elevated sensitivity to the selection of hyperparameters compared to SG priors learned from a collection of training samples. Thus, we investigated the sensitivity of PSFNet to key hyperparameters of its SS prior. SPIRiT, SPARK and PSFNet methods all embody a linear k-space reconstruction, so the relevant hyperparameters are the regularization weight and width for the convolution kernel. Performance was evaluated for models were trained in the low-data regime (i.e., Nsamples=10N_{samples}=10, 1 subject) for varying hyperparameter values.

Figure 9 displays PSNR measurements for SPIRiT, SPARK and PSFNet across κ\kappa in range (10-3-100). While the performance of SPIRiT and SPARK is notably influenced by κ\kappa, PSFNet is minimally affected by sub-optimal selection. On average across contrasts, the difference between the maximum and minimum PSNR values is 8.4dB for SPIRiT, 4.5dB for SPARK, and a lower 0.7dB for PSFNet. Note that PSFNet outperforms competing methods across the entire range of κ\kappa (p<0.05p<0.05). Figure 10 shows PSNR measurements for competing methods across ww in range (5-17). In this case, all methods show relatively limited sensitivity to the selection of ww. On average across contrasts, the difference between the maximum and minimum PSNR values is 1.5dB for SPIRiT, 0.5dB for SPARK, and 0.2dB for PSFNet. Again, PSFNet outperforms competing methods across the entire range of ww (p<0.05p<0.05). Overall, our results indicate that PSFNet yields improved reliability against sub-optimal hyperparameter selection than competing SS methods.

Refer to caption
Figure 9: PSNR measurements were performed on recovered cT1- and T2-weighted images at R=4x. Bar plots in blue color show average PSNR across κ\kappa\in 10-3-101 (i.e., the regularization parameter for kernel estimation). Error bars denote the 90% interval across κ\kappa. Bar plots in red color show PSNR for methods that do not depend on the value of κ\kappa.
Refer to caption
Figure 10: PSNR measurements were performed on recovered cT1- and T2-weighted images at R=4x. Bar plots in blue color show the average PSNR across ww\in 5-17 (i.e., the kernel size). Error bars denote the 90% interval across ww. Bar plots in red color show PSNR for methods that do not depend on the value of ww.

4.4 Computational Complexity

Next, we assessed the computational complexity of competing methods. Table 4 lists the training times of methods with SG priors, MoDL and PSFNet. Note that the remaining SS based methods do not involve a pre-training step. As it involves learning of an SS prior on each training sample, PSFNet yields elevated training time compared to MoDL. In return, however, it offers enhanced generalization performance and data-efficient learning. Table 4 also lists the inference times of SPIRiT, SPARK, sRAKI-RNN, MoDL and PSFNet. MoDL and PSFNet that employ SG priors with fixed weights during inference offer fast run times. In contrast, SPARK and sRAKI-RNN that involve SS priors learned on individual test samples have a high computational burden. Although PSFNet also embodies an SS prior, its uses a relatively lightweight linear prior as opposed to the nonlinear priors in competing SS methods. Therefore, PSFNet benefits from data-efficient learning while maintaining computationally-efficient inference.

Table 4: Computational complexity of competing methods. Training and inference times for data from a single subject, with 10 cross-sections, imaging matrix size 256x320 and 5 coils. Run times are listed for SPARK, sRAKI-RNN, MoDL, and PSFNet.
SPIRiT SPARK sRAKI-RNN MoDL PSFNet
Training(s) - - - 132 337
Inference(s) 0.85 23.35 285.00 0.25 1.13

4.5 Ablation Analysis

To demonstrate the value of the parallel-stream fusion strategy in PSFNet over conventional unrolling, PSFNet was compared against a variant model PSFNetSerial that combined SS and SG priors through serially alternated projections. Separate models were trained with number of training samples in the range NsamplesN_{samples}=[2-500]. Performance in cT1- and T2 -weighted image reconstruction is displayed in Figure 11. PSFNet significantly improves reconstruction performance over PSFNetSerial across the entire range of NsamplesN_{samples} considered (p<0.05p<0.05), and the benefits grow stronger for smaller training sets. On average across contrasts for Nsamples<10N_{samples}<10, PSFNet outperforms PSFNetSerial by 1.8dB PSNR and 0.6% SSIM (p<0.05p<0.05). These results indicate that the parallel-stream fusion of SG and SS priors in PSFNet is superior to the serial projections in conventional unrolling.

Refer to caption
Figure 11: Average (a) PSNR and (b) SSIM values for cT1- and T2-weighted image reconstructions at R=4x. Model training was performed for varying number of training samples (NsamplesN_{samples}, lower x-axis) and thereby training subjects (NsubjectsN_{subjects}, upper x-axis). Results are shown for PSFNet and PSFNetSerial.

5 Discussion and Conclusion

In this study, we introduced PSFNet for data-efficient training of deep reconstruction models in accelerated MRI. PSFNet synergistically fuses SS and SG priors in a parallel-stream architecture. The linear SS prior improves learning efficiency while mataining relatively low computational footprint, whereas the nonlinear SG prior enables improved reconstruction performance. For both supervised and unsupervised training setups, the resulting model substantially reduces dependence on the availability of large MRI datasets. Furthermore, it achieves competitive inference times to SG methods, and reliably generalizes across tissue contrasts, sampling patterns and acceleration rates.

Several prominent approaches have been introduced in the literature to address the training requirements of deep models based on SG priors. One approach is to pre-train models on readily available datasets from a separate source domain and then to fine-tune on several tens of samples from the target domain [28, 59] or else perform SS fine-tuning [76]. This transfer learning approach relaxes the domain requirements for training datasets. However, the domain-transferred models might be suboptimal when training and testing data distributions are divergent. In such cases, additional training for domain-alignment might be necessary to mitigate performance losses. In contrast, PSFNet contains a SS prior that allows it to better generalize to out-of-domain data without further training. Another approach is to build unsupervised models to alleviate dependency on training datasets with paired undersampled, fully-sampled acquisitions. Model training can be performed either directly on undersampled acquisitions via self-supervision [64] or on unpaired sets of undersampled and fully-sampled acquisitions via cycle-consistent learning [77]. This approach can prove beneficial when fully-sampled acquisitions are costly to collect. Nonetheless, the resulting models still require relatively large datasets form tens of subjects during training [64]. Note that our experiments on self-supervised variants of PSFNet and MoDL suggest that unsupervised models can be more demanding for data than their supervised counterparts. Therefore, the data-efficiency benefits of PSFNet might be particularly useful for unsupervised deep MRI reconstruction.

A fundamentally different framework to lower requirements on training datasets while offering improved generalizability is based on SS priors. In this case, learning can be performed directly on test data and models can be adapted to each scan [15, 17]. A group of studies have proposed SS methods based on relatively compact nonlinear models to facilitate learning during inference [15, 18, 78, 17]. However, because learning is performed in central k-space, these methods implicitly assume that local relationships among spatial frequency samples are largely invariant across k-space. While the SS prior in PSFNet also rests on a similar assumption, the SG components helps correct residual errors that can be introduced due to this assumption. Another group of studies have alternatively adopted the deep image prior (DIP) approach to build SS methods [22, 23, 19, 20]. In DIP, unconditional deep network models that map latent variables onto images are used as native priors for MR images. The priors are learned by ensuring the consistency of reconstructed and acquired data across the entire k-space. Despite improved generalization, these relatively more complex models require increased inference times. In comparison, PSFNet provides faster inference since the weights for its SG prior are fixed, and its SS prior involves a compact linear operator that is easier to learn.

Few independent studies on MRI have proposed approaches related to PSFNet by combining nonlinear and linear reconstructions [78, 17, 6]. Residual RAKI and SPARK methods initially perform a linear reconstruction, and then use an SS method to correct residual errors via minimizing a DC loss in the calibration region [78, 17]. As local relationships among data samples might vary across k-space, the learned SS priors might be suboptimal. Moreover, these methods perform online learning of nonlinear SS priors that introduces relatively high computational burden. In contrast, PSFNet incorporates an SG prior to help improve reliability against sub-optimalities in the SS prior, and uses a linear SS prior for efficiency. Another related method is GrappaNet that improves reconstruction performance by cascading GRAPPA and network-based nonlinear reconstruction steps [6]. While [6] intends to improve image quality, the main aim of our study is to improve practicality by lowering training data requirements of deep models, and improving domain generalizability without elevating inference times. Note that GrappaNet follows the conventional unrolling approach by performing serially alternated projections through linear and nonlinear reconstructions, which can lead to error propagation under low-data regimes [79]. In contrast, PSFNet maintains linear and nonlinear reconstructions as two parallel streams in its architecture, and learns to optimally fuse the information from the two streams.

The proposed method can be improved along several lines of technical development. First, to improve the capture of high-frequency information by the SG prior, an adversarial loss term along with a discriminator subnetwork can be included in PSFNet [80]. It remains to be demonstrated whether the data-efficiency benefits of PSFNet are apparent for adversarial training setups. Second, nonlinear activation functions can be included in the SS stream to improve the expressiveness of the SS prior [78]. While learning of nonlinear priors can elevate inference complexity, generalization performance might be further improved. Third, the expressiveness of both SS and SG priors might be enhanced by incorporating attention mechanisms as proposed in recent transformer models [81]. Fourth, using multimodal image fusion approaches can improve performance in case of having a repository with multimodal data [82, 83]. Lastly, the benefits of transfer learning and PSFNet can be aggregated by pre-training the SG prior on natural images to further lower requirements on training data.

6 Acknowledgments

This work was supported in part by a TUBA GEBIP 2015 fellowship, by a BAGEP 2017 fellowship, and by a TUBITAK 121E488 grant awarded to T. Çukur.

References

  • [1] S. Bauer, R. Wiest, L.-P. Nolte, and M. Reyes, “A survey of mri-based medical image analysis for brain tumor studies,” Physics in Medicine & Biology, vol. 58, no. 13, p. R97, 2013.
  • [2] A. Shoeibi, M. Khodatars, M. Jafari, N. Ghassemi, P. Moridian, R. Alizadehsani, S. H. Ling, A. Khosravi, H. Alinejad-Rokny, H. Lam, M. Fuller-Tyszkiewicz, U. R. Acharya, D. Anderson, Y. Zhang, and J. M. Gorriz, “Diagnosis of brain diseases in fusion of neuroimaging modalities using deep learning: A review,” Information Fusion, vol. 93, pp. 85–117, 2023.
  • [3] M. Hu, X. Qian, S. Liu, A. J. Koh, K. Sim, X. Jiang, C. Guan, and J. H. Zhou, “Structural and diffusion mri based schizophrenia classification using 2d pretrained and 3d naive convolutional neural networks,” Schizophrenia Research, vol. 243, pp. 330–341, 2022.
  • [4] K. R. M. Fernando and C. P. Tsokos, “Deep and statistical learning in biomedical imaging: State of the art in 3d mri brain tumor segmentation,” Information Fusion, vol. 92, pp. 450–465, 2023.
  • [5] Z. Zhu, X. He, G. Qi, Y. Li, B. Cong, and Y. Liu, “Brain tumor segmentation based on the fusion of deep semantics and edge information in multimodal mri,” Information Fusion, vol. 91, pp. 376–387, 2023.
  • [6] A. Sriram, J. Zbontar, T. Murrell, C. L. Zitnick, A. Defazio, and D. K. Sodickson, “GrappaNet: Combining parallel imaging with deep learning for multi-coil MRI reconstruction,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020, pp. 14 303–14 310.
  • [7] K. P. Pruessmann, M. Weiger, M. B. Scheidegger, and P. Boesiger, “SENSE: sensitivity encoding for fast MRI.” Magnetic Resonance in Medicine, vol. 42, no. 5, pp. 952–62, 1999.
  • [8] M. A. Griswold, P. M. Jakob, R. M. Heidemann, M. Nittka, V. Jellus, J. Wang, B. Kiefer, and A. Haase, “Generalized autocalibrating partially parallel acquisitions (GRAPPA),” Magnetic Resonance in Medicine, vol. 47, no. 6, pp. 1202–1210, 2002.
  • [9] M. Lustig, D. Donoho, and J. M. Pauly, “Sparse MRI: The application of compressed sensing for rapid MR imaging,” Magnetic Resonance in Medicine, vol. 58, no. 6, pp. 1182–1195, 2007.
  • [10] A. Majumdar, “Improving synthesis and analysis prior blind compressed sensing with low-rank constraints for dynamic mri reconstruction,” Magnetic resonance imaging, vol. 33, no. 1, pp. 174–179, 2015.
  • [11] M. Lustig and J. M. Pauly, “SPIRiT: Iterative self-consistent parallel imaging reconstruction from arbitrary k-space.” Magnetic Resonance in Medicine, vol. 64, no. 2, pp. 457–71, 2010.
  • [12] Y. Yang, J. Sun, H. Li, and Z. Xu, “ADMM-CSNet: A deep learning approach for image compressive sensing,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 42, no. 3, pp. 521–538, 2020.
  • [13] J. Schlemper, J. Caballero, J. V. Hajnal, A. Price, and D. Rueckert, “A Deep Cascade of Convolutional Neural Networks for MR Image Reconstruction,” in International Conference on Information Processing in Medical Imaging, 2017, pp. 647–658.
  • [14] K. Hammernik, T. Klatzer, E. Kobler, M. P. Recht, D. K. Sodickson, T. Pock, and F. Knoll, “Learning a variational network for reconstruction of accelerated MRI data,” Magnetic Resonance in Medicine, vol. 79, no. 6, pp. 3055–3071, 2017.
  • [15] M. Akçakaya, S. Moeller, S. Weingärtner, and K. Uğurbil, “Scan-specific robust artificial-neural-