Main 主要的

Fluorescence microscopy image restoration (FMIR), which aims to provide images with high signal-to-noise ratios (SNRs) from low-SNR images, has received significant attention from the research community, as it helps reveal important nanoscale imaging information for the accurate observation and scientific analysis of biological structures and processes1,2,3. Currently, benefiting from the rapid development of deep learning, the literature is experiencing a large influx of contributions in this direction. Much deep learning-based fluorescence microscopy-based image restoration works4,5,6,7,8,9,10,11,12,13,14,15 (Supplementary Note 1) have pushed the physical limits of fluorescence microscopy through computations and have achieved significant improvements over the classic deconvolution algorithms1,16.
荧光显微镜图像恢复(FMIR)旨在从低信噪比图像中提供高信噪比(SNR)的图像,受到了研究界的极大关注,因为它有助于揭示重要的纳米级成像信息,以实现准确的纳米级成像信息。生物结构和过程的观察和科学分析1,2,3 。目前,受益于深度学习的快速发展,该方向的文献大量涌现。许多基于深度学习的荧光显微镜图像恢复工作4、5、6、7、8、9、10、11、12、13、14、15 补充说明1 通过计算突破荧光显微镜的物理极限并取得了对经典反卷积算法116 的显着改进。

Although significant progress has been achieved, these deep learning-based fluorescence microscopy-based image restoration methods are still affected by several weaknesses, limiting the further development of biological processes. First, the prevailing models address specific fluorescence microscopy-based image restoration problems, such as denoising, super-resolution (SR) and isotropic reconstruction, by training a specific deep model (for example, U-Net-inspired models4,7,8,9,14 and RCAN-inspired models5,6,10,13) with limited parameters (no more than a few million) on a specific dataset from scratch (Supplementary Table 1b). In addition, these models have poor generalization, as significant performance degradations can be observed when facing large domain gaps between different datasets and different fluorescence microscopy-based image restoration problems. Achieving promising results across different imaging modalities, biological samples and image restoration tasks requires training multiple specific models. Last, the common data dependence problem in the deep learning field also affects most fluorescence microscopy-based image restoration models, the performance of which highly depends on the quality and quantity of the training data due to the data-driven characteristics of deep learning-based methods. Consequently, the realistic difficulty of experimentally acquiring low-quality and high-quality training image pairs makes the practical application of deep learning-based fluorescence microscopy-based image restoration methods complicated4. Therefore, the main purpose of this paper is to overcome the above weaknesses and explore the performance upper limit of deep models while inspiring and fostering subsequent research.
尽管已经取得了重大进展,但这些基于深度学习的基于荧光显微镜的图像恢复方法仍然受到一些弱点的影响,限制了生物过程的进一步发展。首先,主流模型通过训练特定的深度模型(例如受 U-Net 启发的模型4 , 7 , 8 )来解决特定的基于荧光显微镜的图像恢复问题,例如去噪、超分辨率(SR)和各向同性重建914和受 RCAN 启发的模型561013 ),从头开始在特定数据集上使用有限参数(不超过几百万)(补充表1b )。此外,这些模型的泛化性很差,因为当面对不同数据集之间的大域差距和不同的基于荧光显微镜的图像恢复问题时,可以观察到显着的性能下降。要在不同的成像模式、生物样本和图像恢复任务中取得有希望的结果,需要训练多个特定模型。最后,深度学习领域常见的数据依赖问题也影响了大多数基于荧光显微镜的图像恢复模型,由于基于深度学习的数据驱动特性,其性能高度依赖于训练数据的质量和数量。方法。因此,通过实验获取低质量和高质量训练图像对的现实困难使得基于深度学习的荧光显微镜图像恢复方法的实际应用变得复杂4 。 因此,本文的主要目的是克服上述弱点,探索深度模型的性能上限,同时启发和培育后续研究。

As the latest generation of artificial intelligence models, foundation models17,18, which can be applied to a diverse set of downstream tasks by training them on massive and diverse datasets, have profoundly advanced the development of deep learning and have exhibited remarkable domain transfer capabilities, particularly in the fields of natural language processing19,20, computer vision21,22 and multimodal learning23,24,25. Recently, the concept of foundation models has been utilized in diverse life science applications, and foundation models have also demonstrated their impressive capabilities for clinical cases26,27,28,29,30, biotechnology31,32 and so on (Supplementary Note 1).

As stated by Moor et al.30, foundation models will offer amazing abilities through dataset size and model size increases, in addition to model architecture advances, in agreement with observations32,33 that large-scale pretraining with larger and more diverse data consistently improved the model’s predictive potential. We observed a similar phenomenon in the fluorescence microscopy-based image restoration field, which has never been studied before (Supplementary Note 2). Pretraining also enables better generalization ability and efficient training on new datasets with limited training data, by transferring knowledge in a pretrained model to a specific task or data modality. This claim can be supported by Zamir et al.34, who explored the idea of transfer learning between many visual learning tasks and showed that the amount of training data required for solving multiple tasks together can be greatly reduced compared to the amount of training data required for independent training. In addition, assembling multiple image restoration processes in a foundation model is a more practical and convenient strategy, as it is difficult to directly determine which type of image restoration operation is needed for the realistic fluorescence microscopy images at hand.

However, task-specific or modality-specific deep models are still the main deep learning-based approaches for fluorescence microscopy-based image restoration. Although individual models can now achieve state-of-the-art (SOTA) performance, foundation models have the merit of versatility. Instead of training a new model from scratch for each task, the above approaches have demonstrated that a foundation model can democratize the fundamental knowledge, learned during the pretraining phase, in general datasets and can transfer this knowledge to a multitude of tasks through fine-tuning32. The enormous progress made by pretrained large-scale models brings new momentum to the development of fluorescence microscopy-based image restoration approaches.

Here, we first presented a UniFMIR solution to handle diverse image degradations and imaging modalities simultaneously. We took inspiration from the existing foundation models, where large pretrained models can be flexibly transferred to solve diverse tasks and achieve significant performance improvements via efficient fine-tuning. Specifically, we constructed the UniFMIR model, which adopted a multihead and multitail network structure (Fig. 1a and Extended Data Figs. 1 and 2). Specifically, UniFMIR consists of a multihead module, a feature enhancement module and a multitail module, where the multihead module and the multitail module adopt different branches to extract task-specific shallow features and yield accurate results for different image restoration problems, respectively. The feature enhancement module uses an advanced Swin transformer structure35 to enhance the feature representations and to reconstruct general and effective features for high-quality fluorescence microscopy-based image restoration. Different fluorescence microscopy-based image restoration operations cover different head and tail branches, but share the same feature enhancement module. We collected a large training dataset (~30 GB) from 14 public datasets, covering a wide range of imaging modalities, biological samples and image restoration tasks. Furthermore, the UniFMIR is pretrained on this large-scale dataset and fine-tuned on different subdatasets, covering various degradation conditions, imaging modalities and biological samples. We also created a baseline by training models, which have the same architecture as UniFMIR, from scratch on specific datasets for different tasks to enable better comparison. We showcased that efficient fine-tuning can feasibly transfer the prior knowledge learned during pretraining to handle different problems (Supplementary Figs. 1517). We demonstrated the effectiveness of the UniFMIR model on a set of high-impact applications and compared its performance with that of SOTA methods for solving specific problems.

Fig. 1: Applying the proposed UniFMIR approach to reconstruct SR SIM images from diffraction-limited WF images.
figure 1

a, Architecture of UniFMIR, which comprises multihead, multitail and Swin transformer-based feature enhancement modules. b, Shown are the LR inputs; the SR results obtained by the SOTA methods (XTC15, DFCAN5 and ENLCN36), baseline (same network structure as UniFMIR trained from scratch) and our fine-tuned UniFMIR approach; and the GT SIM images. The NRMSE (lower is better) values are shown on the residual images under the SR results. c, PSNR/SSIM (higher is better)/NRMSE comparisons for ×2 upscaling on different datasets (MTs, CCPs, F-actin and ERs), n = 100. Scale bar, 0.75 μm for CCPs; 3 μm for other specimens.

Results

SR

The lack of high-resolution (HR) microscopy images has impeded the further exploration of the life science phenomena of related structures or cellular tissues. To overcome the theoretical spatial resolution limitation in live-cell imaging, SR, aiming to enhance the resolution of scientific microscopy images, has been widely studied in the field of fluorescence microscopy imaging. Deep learning-based SR models have greatly promoted the development of conventional SR microscopy approaches by reconstructing HR microscopy images from their low-resolution (LR) versions.

We first determined the potential of our UniFMIR approach to deal with the SR problem (×2 upscaling) involving images with increasing structural complexity levels from the BioSR dataset5 obtained via multimodal structured illumination microscopy (SIM) system, including clathrin-coated pits (CCPs), endoplasmic reticula (ERs), microtubules (MTs) and F-actin filaments. Our UniFMIR successfully inferred SR SIM images from wide-field (WF) images at a diffraction-limited scale with a high fluorescence level and revealed clear structural details. Compared with two deep learning-based fluorescence microscopy SR models (XTC15 and DFCAN5) and a single-image super-resolution model (ENLCN36) for macroscale photographs, UniFMIR could correctly reconstruct most MTs without losing or merging them, even if the MTs were densely distributed and were close to each other. The baselines for different datasets were obtained by training a model of the same network structure as UniFMIR from scratch on the specific training dataset. For diverse subcellular structures, UniFMIR also restored hollow, ring-shaped CCPs and crisscrossing F-actin with high fidelity (Fig. 1b).

We also quantified the attained SR accuracy with the peak SNR (PSNR), structural similarity index measure (SSIM), normalized root mean square error (NRMSE), resolution estimate of a decorrelation analysis37, Fourier ring correlation (FRC)38, SQUIRREL analysis39 and segmentation metrics (Fig. 1c, Supplementary Figs. 24 and 10). Higher PSNR/SSIM values and lower NRMSE values denote better SR when assessing SR SIM images in terms of their fluorescence intensities and structures, which signify images that are closer to the ground-truth (GT) SIM images.

Isotropic reconstruction

Volumetric fluorescence microscopy methods, such as three-dimensional (3D) SIM, are generally limited by the anisotropic spatial resolution, where the axial resolution of 300 nm is inferior to the lateral resolution. Such an anisotropy, which is caused by the inherent optical point spread function of microscopy or a low axial sampling rate, compromises the imaging quality of the volumes of interest. Therefore, isotropic reconstruction is also a common problem encountered in fluorescence microscopy when restoring isotropic image resolutions.

We applied our UniFMIR approach on anisotropic raw data (with up to tenfold lower axial resolutions) from volumetric mouse liver imaging4 to predict isotropic axial slices and compared it with two deep learning-based isotropic reconstruction models, CARE4 and the 3D U-Net model proposed by Li et al.40. The proposed UniFMIR method could offer near-isotropic imaging by enhancing the axial resolution, facilitating the subsequent quantification of the shapes and volumes of biological samples (Fig. 2a and Supplementary Fig. 5). Our UniFMIR method yielded isotropic reconstruction results with more accurate pixel distributions, and the pixel intensities along the columns of our axial outputs were closer to those of the GTs (Fig. 2b). The same conclusion could also be drawn from the average PSNR/SSIM/NRMSE results obtained on data from the liver dataset (Fig. 2c).

Fig. 2: Applying UniFMIR to the isotropic reconstruction of 3D volumes.
figure 2

a, XZ and XY slices of anisotropic raw LR data (with a subsampling rate of 10); the GTs; and the reconstruction results of CARE4, Li et al.40, baseline (same network structure as UniFMIR trained from scratch) and our fine-tuned UniFMIR model. Magnified images of the regions of interest (red boxes) are displayed to the right of the corresponding images. The NRMSE is shown on each residual image. b, The line plots show the pixel intensities along the dashed lines for the images in a. c, FRC curves38 for resolution estimation. d, Statistical comparison on the liver dataset in terms of PSNR/SSIM/NRMSE/FWHM across n = 7 slices. Scale bar, 50 μm.

3D image denoising

High-SNR fluorescence microscopy imaging always requires high laser power or long exposure times, which are accompanied by bleaching, phototoxicity and other side effects that are detrimental to the sample. Deep learning-based denoising methods4,6,7 can computationally restore acquired low-SNR fluorescence microscopy images by leveraging the available knowledge about the data at hand.

We further benchmarked the performance of our UniFMIR approach in a live-cell image denoising task conducted on the Planaria and Tribolium datasets4, each of which contains well-registered high-/low-SNR 3D images captured by a spinning-disk confocal microscope and a multiphoton laser-scanning microscope, for training and testing. Compared with two U-Net-based denoising models, CARE4 and GVTNets7, our UniFMIR model considerably suppressed the noise of the low-SNR fluorescence microscopy images under different laser powers/exposure time (C1–C3) and clearly depicted the planarian Schmidtea mediterranea and Tribolium castaneum volumes with labeled nuclei, helping to observe embryonic development (Fig. 3a and Supplementary Figs. 6 and 7). UniFMIR resulted in higher statistical accuracy (PSNR/SSIM/NRMSE values) between the denoised images and the well-registered high-SNR images (GTs; Fig. 3b).

Fig. 3: Applying UniFMIR to content-aware 3D image denoising.
figure 3

a, Visual results of a 3D image denoising task conducted on flatworm (S. mediterranea). Comparison among CARE4, GVTNets7, baseline (same network structure as UniFMIR trained from scratch) and our fine-tuned UniFMIR model. The line plots show the pixel intensities along the dashed lines for the images. b, Box plots of the PSNR/SSIM/NRMSE results obtained on the Planaria (n = 20) and Tribolium (n = 6) datasets4 under three imaging conditions (C1–C3). Scale bar, 50 μm.

Surface projection

To better analyze and study the cell behavior in developing epithelia of the Drosophila melanogaster fruit fly, surface projection helps project a 3D volume into a two-dimensional (2D) surface image. The current deep learning models (CARE4 and GVTNets7) formulate this image restoration problem as two subproblems, 3D-to-2D surface projection and 2D image denoising, and use two task-specific networks, following the same encoder–decoder framework as that of U-Net, to solve them.

We further examined UniFMIR in a more complex composite fluorescence microscopy-based image restoration task and adopted the public Flywing dataset4, which contains 3D–2D image pairs for training and testing. Similarly to the Planaria and Tribolium datasets, the Flywing dataset also covers different laser power conditions (C1–C3). To project each 3D volume onto a 2D plane, we adopted a U-Net-shaped head for UniFMIR, which achieved promising results with end-to-end network calculations on the 3D D. melanogaster Flywing imaging task and yielded higher reconstruction accuracy in terms of the PSNR/SSIM/NRMSE metrics; meanwhile, the pixel intensities were closer to those of the GT (Fig. 4 and Supplementary Fig. 8).

Fig. 4: Applying UniFMIR to joint surface projection.
figure 4

a, First row, columns from left to right: high-SNR 2D image (GT), projection results of maximum projection and CARE4. Second row, columns from left to right: reconstruction results of GVTNets7, baseline (same network structure as UniFMIR trained from scratch) and our fine-tuned UniFMIR model. The residual images are shown on the right, and the magnified regions (in the red and yellow dashed boxes) are shown below the projection results under the C2 condition. b, The line plot shows the pixel intensities along the dashed lines in the images in a. c, Box plots of the PSNR/SSIM/NRMSE, n = 26 results obtained on the Flywing dataset4 under four imaging conditions (C0–C3). Scale bar, 50 μm.

Volumetric reconstruction

The volumetric reconstruction of light-field microscopy images, permitting the acquisition of artifact-free 3D image sequences with uniform spatial resolutions from 2D information, is significant for instantaneously imaging fast biological processes. As a demonstration, we verified the volumetric reconstruction ability of UniFMIR on the data provided by VCD-Net8. Each view of a reconstructed 3D volume can identify the motion trajectory of the imaging object (Fig. 5 and Supplementary Fig. 9), which is beneficial for revealing the underlying mechanisms of many complicated live-cell dynamics involving various subcellular structures. Since no GT was available for calculating more quantitative metrics, such as the PSNR/SSIM/NRMSE, to quantify the accuracy of the volumetric reconstruction results, we adopted decorrelation analysis37 to measure the nanometer resolution of each reconstructed image sequence (Fig. 5 and Supplementary Fig. 9).

Fig. 5: Applying UniFMIR to volumetric reconstruction.
figure 5

a, The image sequences of the artifact-free 3D volume reconstructed from a 2D input. b, More visual comparisons between the volumetric reconstruction results of our UniFMIR model and VCD-Net8. The resolution evaluation of the images (depth = 1, 4, 42, 43) in the reconstructed 3D volume performed with decorrelation analysis. c, Box plots of the resolutions (nm) of n = 61 image sequences. Scale bar, 10 μm.

Generalization ability analysis

To demonstrate the generalization ability of our pretrained UniFMIR approach, we validated its image restoration performance on unseen data from DeepBacs41 for bacterial image analysis purposes, including two denoising datasets (E. coli_H-NS-mScarlet-I, E. coli_MreB) and two SR datasets (Escherichia coli, Staphylococcus aureus). We fine-tuned UniFMIR on the new bacterial microscopy images and then conducted denoising and SR experiments. In addition to the denoising and SR models developed on DeepBacs, we also compared our results with those of a baseline model trained from scratch, which had the same structure as UniFMIR, and the pretrained UniFMIR model without fine-tuning. Our UniFMIR method restored clear prokaryote structures to enhance the low-phototoxicity live-cell microscopy data and predict accurate mappings of biological target shapes, obtaining higher PSNR/SSIM values (Fig. 6). Compared with the model trained from scratch, our UniFMIR approach achieved better performance on new datasets (Supplementary Fig. 13).

Fig. 6: Generalization ability analysis conducted on unseen datasets41.
figure 6

a, SR and denoising results obtained on the S.aureus_MreB and E.coli_MreB datasets, respectively. Visual comparison among the outputs of the SOTA model (DeepBacs41), pretrained UniFMIR model without fine-tuning, baseline (same network structure as UniFMIR trained from scratch) and our fine-tuned UniFMIR model. The NRMSE is shown on each residual image. b, Box plots show the PSNR/SSIM results obtained on the test sets of the two datasets. c, k-fold validation (k = 5) on the S.aureus_MreB dataset (top) and the E. coli_MreB dataset (bottom). The PSNR/SSIM results obtained on the n = 5 test images of different subsets. Scale bar, 10 μm.

We also analyzed whether UniFMIR could be generalized to other SR modalities in addition to the SIM images used in the pretraining stage. First, we adopted single-molecule localization microscopy data from the Shareloc platform42 and applied our model to direct stochastic optical reconstruction microscopy (dSTORM) images of MTs stained with Alexa Fluor 647 in U2OS cells incubated with nocodazole. Since the input WF images and GT were not well matched, we fine-tuned our UniFMIR model with the contextual bilateral (CoBi) loss43 instead of the L1 or L2 loss, which requires pixel-wise alignment between the input and GT. We compared a U-Net-based SR single-molecule microscopy model (DeepSTORM44), a baseline model and a pretrained UniFMIR model without fine-tuning. In addition to the pretrained UniFMIR model, all competing models were trained with the CoBi loss on the same training data as those used by our method. Our UniFMIR model could restore accurate structures that were similar to the GTs, and the other models failed to learn mappings between the unaligned images (Extended Data Fig. 3).

Discussion

The focus of this paper was to present a UniFMIR solution for maximizing the potential of deep learning-based methods and circumventing the limitations exhibited by the existing fluorescence microscopy-based image restoration deep models. Here, we outlined how recent advances in foundation model research enable the development of FMIR. Inspired by the success of pretrained large-scale models in artificial intelligence, we developed a unified foundation model for fluorescence microscopy-based image restoration, facilitating high-quality image restoration in different fluorescence microscopy-based image restoration tasks with various imaging modalities by extending the strong transfer capabilities of large-scale pretrained models to fluorescence microscopy-based image restoration. We also collected 14 public fluorescence microscopy-based image restoration datasets with 196,418 training pairs covering various biological samples, microscopes and degradation conditions. The UniFMIR model pretrained on the collected data could be easily applied to different tasks and new image distributions through efficient fine-tuning, transferring the knowledge of a foundation model to a specific one.

The experimental results obtained in different fluorescence microscopy-based image restoration tasks suggested the excellent performance of UniFMIR in restoring high-fidelity microscopy images. The HR results obtained for the SR and isotropic resolution reconstruction problems could clearly resolve diffraction-limited image details to improve the resolutions of images by uncovering subtle biological structures (Figs. 1 and 2 and Supplementary Fig. 5). The denoising and projection results restored clean signals from the noisy inputs, achieving accurate reconstruction quality (Figs. 3 and 4 and Supplementary Figs. 68). The volume reconstruction results showed transient biological image dynamics with minimal artifacts (Fig. 5 and Supplementary Fig. 9). The generalization ability of the pretrained UniFMIR was also shown (Fig. 6 and Extended Data Fig. 3).

Key capabilities

We outline three key capabilities that distinguish UniFMIR from conventional FMIR models. (1) Higher restoration quality. UniFMIR achieved the highest restoration precision and was often superior to the task-specific FMIR models. (2) Better generalization ability. UniFMIR displayed an impressive generalization ability and enabled efficient training processes on new FMIR datasets by transferring knowledge from the available datasets to the new data. (3) Unifying FMIR tasks. UniFMIR possessed applicability to handle multiple FMIR problems with one model and unified the restoration processes concerning different data modalities for different fluorescence microscopes. Our work also identified that publicly shared fluorescence images can be considered a tremendous resource that can be harnessed to develop foundation models for enhancing fluorescence microscopy images.

We demonstrated the ability of UniFMIR to achieve high precision with promising generalization performance. The key intuition behind this idea is that the foundation model, pretrained on more diverse data distributions, could learn more generalized representations29 of different high-quality image modalities and biological structures. Existing works32,33 have found that large-scale pretraining allows the training of deeper models with greater predictive potential and even promotes robustness by gaining a fundamental understanding of the knowledge in training data. As stated by Guo et al.45, the foundation model may be effective in terms of acquiring informative global patterns that can improve the robustness of task-specific models.

Different fluorescence microscopy-based image restoration operations pushed the distribution spaces of the low-quality images to the distribution spaces of high-quality images. During pretraining, the UniFMIR gained a fundamental understanding of the high-quality images and regularized optimization direction for different tasks, unlike the task-specific models that learned a path from low quality to high quality. Thus, the pretraining knowledge benefits the fine-tuning toward diverse tasks, facilitating faster and better convergence and enabling the model to perform better than a model trained from scratch (Supplementary Fig. 15).

Limitations and future work

Despite the impressive results of UniFMIR, it is noteworthy that this paper is an exploratory and inspiring work that demonstrates the feasibility of foundation models in fluorescence microscopy-based image restoration. We hope that this work, accompanied by a brief review of the current deep learning-based fluorescence microscopy-based image restoration approaches (Supplementary Table 1), will provide new insights for more researchers. Looking to the future, much room remains for further improvement and evaluation to advance the frontiers of fluorescence microscopy-based image restoration. We discuss some limitations and future research directions as follows. First, a large model (>100 MB or even 1 TB) often requires a considerable number of training images (millions or even billions), a time-consuming pretraining process (weeks or even months) and costly calculation resources (graphics processing units, GPUs). To reduce the deployment cost and make UniFMIR more energy efficient, we optimized UniFMIR models by applying two model compression methods (pruning and quantization). The UniFMIR model still takes a long calculation time, especially for those 3D image-related tasks. Therefore, faster inference and higher efficiency require further exploration. Second, we utilized existing public datasets for pretraining and found that the current data have not yet saturated the model’s performance (Supplementary Fig. 14). Theodoris et al.32 outlined that as the amount of publicly available data continues to expand, pretraining on larger-scale data may further enhance the model’s performance, especially for tasks with increasingly limited task-specific data. Therefore, efforts are expected to contribute to a more diverse and larger dataset. In future work, we will continuously train the foundation model with new data to make the fluorescence microscopy-based image restoration foundation model stronger and share it with the community timely and freely.

Methods

Data preparation process

To cover as many imaging modalities and fluorescence microscopy-based image restoration tasks as possible, we collected datasets from the literature (Supplementary Table 2) and grouped numerous datasets for different fluorescence microscopy-based image restoration tasks and imaging modalities. As these datasets vary significantly in terms of formats, domains and numerical ranges, we processed the images for convenient training and cross-dataset validation.

First, we wrote the input and GT images of existing datasets with different storage formats, including ‘TIF’, ‘npz’, ‘png’ and ‘nii.gz’, into an ‘.npz’ file. In addition, we normalized the images to unify the numerical distributions of different datasets by following the data processing method in CARE4. Because the spatial sizes of the features in a deep neural network are fixed during training, we further cropped the training images into multiple patches with the same spatial size to facilitate simultaneous training on images from different datasets.

Network architectures

We designed a multihead and multitail network architecture for the UniFMIR model, which included three components, including multiple feature extraction modules, a Swin transformer-based feature enhancement module and multiple image reconstruction modules (Fig. 1a and Extended Data Fig. 1). More specifically, the multihead and multitail branches for different fluorescence microscopy-based image restoration tasks adopted different feature extraction and image reconstruction modules to extract task-specific shallow features and reconstruct images, respectively (Supplementary Note 3).

Different fluorescence microscopy-based image restoration calculations shared the same feature enhancement module. Inspired by SwinIR46, a SOTA model for natural image restoration, the Swin transformer35-based feature enhancement module adopted several vision transformer-based blocks to enhance the feature representations and to restore the final features for high-quality image reconstruction. As shown in Extended Data Fig. 2, the feature enhancement module consisted of convolutional layers and a series of Swin transformer blocks, each of which included several Swin transformer layers, a convolutional layer and a residual connection. The Swin transformer layer was composed of layer normalization operations, a multihead self-attention mechanism and a multilayer perceptron. In the multihead self-attention mechanism, the input features fin were first divided into multiple small patches with a moving window operation, and then the self-attention in each patch was calculated with the function in equation (1).

$$\begin{array}{rcl}Q&=&Con{v}_{Q}(\,{f}_{\mathrm{in}}),K=Con{v}_{K}(\,{f}_{\mathrm{in}}),V=Con{v}_{V}(\,{f}_{\mathrm{in}}),\\ {f}_{\mathrm{out}}&=&Softmax\left(\frac{Q{K}^{T}}{\sqrt{{d}_{k}}}\right)V,\end{array}$$
(1)

where Q, K and V represent the query, key and value, respectively, which were separately obtained by three convolutional layers. dk is the dimensionality of K. SoftMax() normalized the similarity between Q and K, and the output feature fout was obtained by multiplying V. The multilayer perceptron was composed of two fully connected layers and Gaussian-error linear unit activation.

Training losses

We used a combination of the \({{{\mathcal{L}}}}\)1 and \({{{\mathcal{L}}}}\)2 losses during the pretraining stage to exploit the robustness of the \({{{\mathcal{L}}}}\)1 loss and the stability of the \({{{\mathcal{L}}}}\)2 loss. During fine-tuning, \({{{\mathcal{L}}}}\)1 is adopted to pursue a higher quantitative metric (PSNR). Suppose that \({({x}_{i},{y}_{i})}_{i = 1:N}\) denotes N pairs of input and GT training data and that fθ denotes the UniFMIR model with a parameter θ (equation (2)).

$$\begin{array}{rcl}{{{\mathcal{L}}}}&=&0.5\times {{{\mathcal{L}}}}1+0.5\times {{{\mathcal{L}}}}2,\\ {{{\mathcal{L}}}}1&=&\mathop{\sum }\limits_{i=1}^{N}| \,{y}_{i}-{f}_{\theta }({x}_{i})| ,\\ {{{\mathcal{L}}}}2&=&\mathop{\sum }\limits_{i=1}^{N}{(\,{y}_{i}-{f}_{\theta }({x}_{i}))}^{2}.\\ \end{array}$$
(2)

To apply our UniFMIR model to unmatched data, in which the input WF images and the GT single-molecule localization microscopy data42 were not well aligned in a pixel-wise manner, we fine-tuned our model with the CoBi loss43, which improved its robustness to mild misalignment in the input–output image pairs (equation (3)).

$$\begin{array}{rcl}CoBi(\,y,{\tilde{y}})&=&\frac{1}{N}\mathop{\sum }\limits_{i}^{N}{\mathrm{mi{n}}}_{j = 1,\ldots ,M}({D}_{{p}_{i},{q}_{j}}+{w}_{s}{D}_{{p}_{i},{q}_{j}}^{{\prime} }),\\ {D}_{{p}_{i},{q}_{j}}&=&{\mathrm{Distance}}(\,{p}_{i},{q}_{j}),\\ {D}_{{p}_{i},{q}_{j}}^{{\prime} }&=&| | ({h}_{i},{w}_{i})-({h}_{j},{w}_{j})| {| }_{2},\\ \end{array}$$
(3)

where (\({\tilde{y}},y\)) denotes a pair of restored and GT images and pi=1,…,N, qj=1,…,M are the features of X and Y extracted by the pretrained Visual Geometry Group 19 (VGG-19)47 network, respectively. ‘Distance()’ denotes a distance function for calculating the cosine similarity between features pi, qj, (hi, wi), (hj, wj), are the spatial coordinates of features pi, qj and ws = 0.1 denotes a weight that is flexible to the degree of misalignment.

Training details

The UniFMIR model was based on a PyTorch implementation and optimized by adaptive moment estimation (Adam)48 with β1 = 0.9 and β2 = 0.999 for 500 epochs. The initial learning rate started at 5 × 10−5 and was halved after 200 epochs. All experiments were conducted on a machine with an Nvidia GeForce RTX 3090 GPU (with 24 GB of RAM).

In the pretraining stage, we set the batch size to 1 and the patch size to 64 × 64. We fed all training data to the model and optimized different head and tail branches for different tasks with the corresponding data. The middle feature enhancement branch was optimized using all training data. During the fine-tuning stage, we set the batch size/patch size to 4/128, 32/64, 32/64, 4/64 and 1/16 for the SR, isotropic reconstruction, denoising, projection and volume reconstruction tasks, respectively, to produce a better learning effect (Extended Data Table 1).

Evaluation metrics

To evaluate the quantitative accuracy of the fluorescence microscopy-based image restoration results, we adopted common image quality assessment metrics as follows (Supplementary Note 4).

The PSNR, NRMSE and SSIM49 are proposed to measure the pixel-level and structure-level similarities between a restored image \(\tilde{y}\) and a GT image y.

The resolution evaluation with decorrelation analysis37, a comprehensive measurement of the resolution and SNR, was performed to estimate the highest frequency from the local maxima of the decorrelation functions rather than the theoretical resolution stated by Abbe50. NanoJ-SQUIRREL, an ImageJ-based analytical approach, was proposed to quantitatively assess SR quality by comparing diffraction-limited reference images and SR equivalents of the same acquisition volume.

FRC

FRC-based resolution measures38,51 can estimate image resolution without a reference image. We adopt the public GitHub codes and microscope image processing library (MIPLIB), a Python-based software library, for FRC-based image resolution analysis of fluorescence microscopy images.

Giga floating-point operation (GFLOPs), a common computational complexity measure, refers to the number of FLOPs, including the addition, subtraction, multiplication and division of floating-point numbers, that the model requires to process the input data. GFLOPs can reflect the processing power needed to execute the model and can vary depending on the number of convolutional layers, the types of operations performed within each layer, and the size of the input. The GFLOPs of a deep model are composed of operations contained in all convolution layers. The number of calculations (FLOPs) of a convolutional layer with a kernel size of K × K is 2K2 − 1, and the total number of FLOPs for conducting a convolutional calculation on an input with a size of H × W can be calculated according to equation (4):

$${\mathrm{FLOP}}=\left(\frac{H-K+P}{S}+1\right)\times \left(\frac{W-K+P}{S}+1\right),$$
(4)

where P and S denote the pooling and stride parameters of a convolution, respectively.

BOP (bit operations)52, which stands for number of bits times FLOPs, is considered to quantify the computational complexity of a deep model on a GPU that supports 32-bit, 16-bit or lower arithmetic. Since GFLOPs cannot well measure the computational complexity of low-precision and high-precision networks composed of integer or float operations, we also calculated the BOPs by the following function in equation (5):

$${\mathrm{BOPs=FLOP}}\times bw\times ba,$$
(5)

where bw and ba denote the weight and activation bit-width and are set to 32 and 16 for the 32-bit and 16-bit models, respectively.

Image restoration and segmentation

To explore whether performing image restoration with UniFMIR could improve the downstream image analysis and segmentation tasks in live-cell imaging, we applied a common segmentation pipeline (trainable Weka segmentation53) to the raw and restored images in the CCP, ER, Tribolium and Flywing datasets. The resulting UniFMIR model improved the segmentation effect by performing denoising and increasing the image resolution (Supplementary Note 5 and Supplementary Figs. 1012).

Optimization of memory and complexity

To make UniFMIR more energy efficient, we produced optimized models by adopting two model compression methods (pruning and quantization). First, we adopted structure pruning to cut some useless branches out of the original model. Specifically, we removed the redundant head and tail branches and only kept the head and tail branches for fine-tuning a task-specific model to reduce the number of redundant parameters, resulting in ‘prune-UniFMIR’. Inspired by Jacob et al.54, we also conducted model quantization, converting floating-point weights (float32) and activation values to low-precision numbers, to reduce the storage requirements and calculation time of the model; for this task, we adopted float16 quantization (Extended Data Tables 26).

Pixel size correction

As the features learned by deep models depend on the scale and resolution of the training data, the performance of the well-trained models is influenced by the pixel-wise scale of input images. Given input images with varying pixel sizes, to make the outputs of the UniFMIR consistent, we equipped the UniFMIR software platform with a pixel size correction option for the input images, enabling an automatic pixel size calibration feature. Specifically, the input image is resized into different scales and input into the model, and then the multiscale outputs of the model are fused to obtain the final result.

Competing methods

All competing models, accompanied by their quantitative results, are listed in Extended Data Tables 36. To conduct a fair comparison, we downloaded the codes and the saved model checkpoints of all competing approaches from their GitHub repositories (ENLCN, CARE, DFCAN, GVTNets, VCD-Net and XTC). We retrained the ENLCN and XTC methods on the BioSR dataset5. Because DFCAN, CARE and VCD-Net do not provide well-trained models, we also retrained these models using their codes on the datasets used in our experiments for different tasks. The results of GVTNets were obtained by directly using their public models and codes. All experiments were conducted on the same machine with Nvidia GeForce RTX 3090 GPU (24 GB of RAM).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.