Abstract 抽象的
Fluorescence microscopy-based image restoration has received widespread attention in the life sciences and has led to significant progress, benefiting from deep learning technology. However, most current task-specific methods have limited generalizability to different fluorescence microscopy-based image restoration problems. Here, we seek to improve generalizability and explore the potential of applying a pretrained foundation model to fluorescence microscopy-based image restoration. We provide a universal fluorescence microscopy-based image restoration (UniFMIR) model to address different restoration problems, and show that UniFMIR offers higher image restoration precision, better generalization and increased versatility. Demonstrations on five tasks and 14 datasets covering a wide range of microscopy imaging modalities and biological samples demonstrate that the pretrained UniFMIR can effectively transfer knowledge to a specific situation via fine-tuning, uncover clear nanoscale biomolecular structures and facilitate high-quality imaging. This work has the potential to inspire and trigger new research highlights for fluorescence microscopy-based image restoration.
基于荧光显微镜的图像恢复在生命科学领域受到了广泛关注,并得益于深度学习技术取得了重大进展。然而,当前大多数特定任务的方法对于不同的基于荧光显微镜的图像恢复问题的通用性有限。在这里,我们寻求提高通用性并探索将预训练基础模型应用于基于荧光显微镜的图像恢复的潜力。我们提供了一种通用的基于荧光显微镜的图像恢复(UniFMIR)模型来解决不同的恢复问题,并表明 UniFMIR 提供更高的图像恢复精度、更好的泛化性和增强的多功能性。对涵盖广泛显微镜成像模式和生物样本的五项任务和 14 个数据集的演示表明,预训练的 UniFMIR 可以通过微调有效地将知识转移到特定情况,揭示清晰的纳米级生物分子结构并促进高质量成像。这项工作有可能激发和引发基于荧光显微镜的图像恢复的新研究亮点。
Similar content being viewed by others
其他人正在查看类似内容
Main 主要的
Fluorescence microscopy image restoration (FMIR), which aims to provide images with high signal-to-noise ratios (SNRs) from low-SNR images, has received significant attention from the research community, as it helps reveal important nanoscale imaging information for the accurate observation and scientific analysis of biological structures and processes1,2,3. Currently, benefiting from the rapid development of deep learning, the literature is experiencing a large influx of contributions in this direction. Much deep learning-based fluorescence microscopy-based image restoration works4,5,6,7,8,9,10,11,12,13,14,15 (Supplementary Note 1) have pushed the physical limits of fluorescence microscopy through computations and have achieved significant improvements over the classic deconvolution algorithms1,16.
荧光显微镜图像恢复(FMIR)旨在从低信噪比图像中提供高信噪比(SNR)的图像,受到了研究界的极大关注,因为它有助于揭示重要的纳米级成像信息,以实现准确的纳米级成像信息。生物结构和过程的观察和科学分析1,2,3 。目前,受益于深度学习的快速发展,该方向的文献大量涌现。许多基于深度学习的荧光显微镜图像恢复工作4、5、6、7、8、9、10、11、12、13、14、15 (补充说明1 )通过计算突破了荧光显微镜的物理极限并取得了对经典反卷积算法1 、 16 的显着改进。
Although significant progress has been achieved, these deep learning-based fluorescence microscopy-based image restoration methods are still affected by several weaknesses, limiting the further development of biological processes. First, the prevailing models address specific fluorescence microscopy-based image restoration problems, such as denoising, super-resolution (SR) and isotropic reconstruction, by training a specific deep model (for example, U-Net-inspired models4,7,8,9,14 and RCAN-inspired models5,6,10,13) with limited parameters (no more than a few million) on a specific dataset from scratch (Supplementary Table 1b). In addition, these models have poor generalization, as significant performance degradations can be observed when facing large domain gaps between different datasets and different fluorescence microscopy-based image restoration problems. Achieving promising results across different imaging modalities, biological samples and image restoration tasks requires training multiple specific models. Last, the common data dependence problem in the deep learning field also affects most fluorescence microscopy-based image restoration models, the performance of which highly depends on the quality and quantity of the training data due to the data-driven characteristics of deep learning-based methods. Consequently, the realistic difficulty of experimentally acquiring low-quality and high-quality training image pairs makes the practical application of deep learning-based fluorescence microscopy-based image restoration methods complicated4. Therefore, the main purpose of this paper is to overcome the above weaknesses and explore the performance upper limit of deep models while inspiring and fostering subsequent research.
尽管已经取得了重大进展,但这些基于深度学习的基于荧光显微镜的图像恢复方法仍然受到一些弱点的影响,限制了生物过程的进一步发展。首先,主流模型通过训练特定的深度模型(例如受 U-Net 启发的模型4 , 7 , 8 )来解决特定的基于荧光显微镜的图像恢复问题,例如去噪、超分辨率(SR)和各向同性重建、 9 、 14和受 RCAN 启发的模型5 、 6 、 10 、 13 ),从头开始在特定数据集上使用有限参数(不超过几百万)(补充表1b )。此外,这些模型的泛化性很差,因为当面对不同数据集之间的大域差距和不同的基于荧光显微镜的图像恢复问题时,可以观察到显着的性能下降。要在不同的成像模式、生物样本和图像恢复任务中取得有希望的结果,需要训练多个特定模型。最后,深度学习领域常见的数据依赖问题也影响了大多数基于荧光显微镜的图像恢复模型,由于基于深度学习的数据驱动特性,其性能高度依赖于训练数据的质量和数量。方法。因此,通过实验获取低质量和高质量训练图像对的现实困难使得基于深度学习的荧光显微镜图像恢复方法的实际应用变得复杂4 。 因此,本文的主要目的是克服上述弱点,探索深度模型的性能上限,同时启发和培育后续研究。
As the latest generation of artificial intelligence models, foundation models17,18, which can be applied to a diverse set of downstream tasks by training them on massive and diverse datasets, have profoundly advanced the development of deep learning and have exhibited remarkable domain transfer capabilities, particularly in the fields of natural language processing19,20, computer vision21,22 and multimodal learning23,24,25. Recently, the concept of foundation models has been utilized in diverse life science applications, and foundation models have also demonstrated their impressive capabilities for clinical cases26,27,28,29,30, biotechnology31,32 and so on (Supplementary Note 1).
As stated by Moor et al.30, foundation models will offer amazing abilities through dataset size and model size increases, in addition to model architecture advances, in agreement with observations32,33 that large-scale pretraining with larger and more diverse data consistently improved the model’s predictive potential. We observed a similar phenomenon in the fluorescence microscopy-based image restoration field, which has never been studied before (Supplementary Note 2). Pretraining also enables better generalization ability and efficient training on new datasets with limited training data, by transferring knowledge in a pretrained model to a specific task or data modality. This claim can be supported by Zamir et al.34, who explored the idea of transfer learning between many visual learning tasks and showed that the amount of training data required for solving multiple tasks together can be greatly reduced compared to the amount of training data required for independent training. In addition, assembling multiple image restoration processes in a foundation model is a more practical and convenient strategy, as it is difficult to directly determine which type of image restoration operation is needed for the realistic fluorescence microscopy images at hand.
However, task-specific or modality-specific deep models are still the main deep learning-based approaches for fluorescence microscopy-based image restoration. Although individual models can now achieve state-of-the-art (SOTA) performance, foundation models have the merit of versatility. Instead of training a new model from scratch for each task, the above approaches have demonstrated that a foundation model can democratize the fundamental knowledge, learned during the pretraining phase, in general datasets and can transfer this knowledge to a multitude of tasks through fine-tuning32. The enormous progress made by pretrained large-scale models brings new momentum to the development of fluorescence microscopy-based image restoration approaches.
Here, we first presented a UniFMIR solution to handle diverse image degradations and imaging modalities simultaneously. We took inspiration from the existing foundation models, where large pretrained models can be flexibly transferred to solve diverse tasks and achieve significant performance improvements via efficient fine-tuning. Specifically, we constructed the UniFMIR model, which adopted a multihead and multitail network structure (Fig. 1a and Extended Data Figs. 1 and 2). Specifically, UniFMIR consists of a multihead module, a feature enhancement module and a multitail module, where the multihead module and the multitail module adopt different branches to extract task-specific shallow features and yield accurate results for different image restoration problems, respectively. The feature enhancement module uses an advanced Swin transformer structure35 to enhance the feature representations and to reconstruct general and effective features for high-quality fluorescence microscopy-based image restoration. Different fluorescence microscopy-based image restoration operations cover different head and tail branches, but share the same feature enhancement module. We collected a large training dataset (~30 GB) from 14 public datasets, covering a wide range of imaging modalities, biological samples and image restoration tasks. Furthermore, the UniFMIR is pretrained on this large-scale dataset and fine-tuned on different subdatasets, covering various degradation conditions, imaging modalities and biological samples. We also created a baseline by training models, which have the same architecture as UniFMIR, from scratch on specific datasets for different tasks to enable better comparison. We showcased that efficient fine-tuning can feasibly transfer the prior knowledge learned during pretraining to handle different problems (Supplementary Figs. 15–17). We demonstrated the effectiveness of the UniFMIR model on a set of high-impact applications and compared its performance with that of SOTA methods for solving specific problems.
Results
SR
The lack of high-resolution (HR) microscopy images has impeded the further exploration of the life science phenomena of related structures or cellular tissues. To overcome the theoretical spatial resolution limitation in live-cell imaging, SR, aiming to enhance the resolution of scientific microscopy images, has been widely studied in the field of fluorescence microscopy imaging. Deep learning-based SR models have greatly promoted the development of conventional SR microscopy approaches by reconstructing HR microscopy images from their low-resolution (LR) versions.
We first determined the potential of our UniFMIR approach to deal with the SR problem (×2 upscaling) involving images with increasing structural complexity levels from the BioSR dataset5 obtained via multimodal structured illumination microscopy (SIM) system, including clathrin-coated pits (CCPs), endoplasmic reticula (ERs), microtubules (MTs) and F-actin filaments. Our UniFMIR successfully inferred SR SIM images from wide-field (WF) images at a diffraction-limited scale with a high fluorescence level and revealed clear structural details. Compared with two deep learning-based fluorescence microscopy SR models (XTC15 and DFCAN5) and a single-image super-resolution model (ENLCN36) for macroscale photographs, UniFMIR could correctly reconstruct most MTs without losing or merging them, even if the MTs were densely distributed and were close to each other. The baselines for different datasets were obtained by training a model of the same network structure as UniFMIR from scratch on the specific training dataset. For diverse subcellular structures, UniFMIR also restored hollow, ring-shaped CCPs and crisscrossing F-actin with high fidelity (Fig. 1b).
We also quantified the attained SR accuracy with the peak SNR (PSNR), structural similarity index measure (SSIM), normalized root mean square error (NRMSE), resolution estimate of a decorrelation analysis37, Fourier ring correlation (FRC)38, SQUIRREL analysis39 and segmentation metrics (Fig. 1c, Supplementary Figs. 2–4 and 10). Higher PSNR/SSIM values and lower NRMSE values denote better SR when assessing SR SIM images in terms of their fluorescence intensities and structures, which signify images that are closer to the ground-truth (GT) SIM images.
Isotropic reconstruction
Volumetric fluorescence microscopy methods, such as three-dimensional (3D) SIM, are generally limited by the anisotropic spatial resolution, where the axial resolution of 300 nm is inferior to the lateral resolution. Such an anisotropy, which is caused by the inherent optical point spread function of microscopy or a low axial sampling rate, compromises the imaging quality of the volumes of interest. Therefore, isotropic reconstruction is also a common problem encountered in fluorescence microscopy when restoring isotropic image resolutions.
We applied our UniFMIR approach on anisotropic raw data (with up to tenfold lower axial resolutions) from volumetric mouse liver imaging4 to predict isotropic axial slices and compared it with two deep learning-based isotropic reconstruction models, CARE4 and the 3D U-Net model proposed by Li et al.40. The proposed UniFMIR method could offer near-isotropic imaging by enhancing the axial resolution, facilitating the subsequent quantification of the shapes and volumes of biological samples (Fig. 2a and Supplementary Fig. 5). Our UniFMIR method yielded isotropic reconstruction results with more accurate pixel distributions, and the pixel intensities along the columns of our axial outputs were closer to those of the GTs (Fig. 2b). The same conclusion could also be drawn from the average PSNR/SSIM/NRMSE results obtained on data from the liver dataset (Fig. 2c).
3D image denoising
High-SNR fluorescence microscopy imaging always requires high laser power or long exposure times, which are accompanied by bleaching, phototoxicity and other side effects that are detrimental to the sample. Deep learning-based denoising methods4,6,7 can computationally restore acquired low-SNR fluorescence microscopy images by leveraging the available knowledge about the data at hand.
We further benchmarked the performance of our UniFMIR approach in a live-cell image denoising task conducted on the Planaria and Tribolium datasets4, each of which contains well-registered high-/low-SNR 3D images captured by a spinning-disk confocal microscope and a multiphoton laser-scanning microscope, for training and testing. Compared with two U-Net-based denoising models, CARE4 and GVTNets7, our UniFMIR model considerably suppressed the noise of the low-SNR fluorescence microscopy images under different laser powers/exposure time (C1–C3) and clearly depicted the planarian Schmidtea mediterranea and Tribolium castaneum volumes with labeled nuclei, helping to observe embryonic development (Fig. 3a and Supplementary Figs. 6 and 7). UniFMIR resulted in higher statistical accuracy (PSNR/SSIM/NRMSE values) between the denoised images and the well-registered high-SNR images (GTs; Fig. 3b).
Surface projection
To better analyze and study the cell behavior in developing epithelia of the Drosophila melanogaster fruit fly, surface projection helps project a 3D volume into a two-dimensional (2D) surface image. The current deep learning models (CARE4 and GVTNets7) formulate this image restoration problem as two subproblems, 3D-to-2D surface projection and 2D image denoising, and use two task-specific networks, following the same encoder–decoder framework as that of U-Net, to solve them.
We further examined UniFMIR in a more complex composite fluorescence microscopy-based image restoration task and adopted the public Flywing dataset4, which contains 3D–2D image pairs for training and testing. Similarly to the Planaria and Tribolium datasets, the Flywing dataset also covers different laser power conditions (C1–C3). To project each 3D volume onto a 2D plane, we adopted a U-Net-shaped head for UniFMIR, which achieved promising results with end-to-end network calculations on the 3D D. melanogaster Flywing imaging task and yielded higher reconstruction accuracy in terms of the PSNR/SSIM/NRMSE metrics; meanwhile, the pixel intensities were closer to those of the GT (Fig. 4 and Supplementary Fig. 8).
Volumetric reconstruction
The volumetric reconstruction of light-field microscopy images, permitting the acquisition of artifact-free 3D image sequences with uniform spatial resolutions from 2D information, is significant for instantaneously imaging fast biological processes. As a demonstration, we verified the volumetric reconstruction ability of UniFMIR on the data provided by VCD-Net8. Each view of a reconstructed 3D volume can identify the motion trajectory of the imaging object (Fig. 5 and Supplementary Fig. 9), which is beneficial for revealing the underlying mechanisms of many complicated live-cell dynamics involving various subcellular structures. Since no GT was available for calculating more quantitative metrics, such as the PSNR/SSIM/NRMSE, to quantify the accuracy of the volumetric reconstruction results, we adopted decorrelation analysis37 to measure the nanometer resolution of each reconstructed image sequence (Fig. 5 and Supplementary Fig. 9).
Generalization ability analysis
To demonstrate the generalization ability of our pretrained UniFMIR approach, we validated its image restoration performance on unseen data from DeepBacs41 for bacterial image analysis purposes, including two denoising datasets (E. coli_H-NS-mScarlet-I, E. coli_MreB) and two SR datasets (Escherichia coli, Staphylococcus aureus). We fine-tuned UniFMIR on the new bacterial microscopy images and then conducted denoising and SR experiments. In addition to the denoising and SR models developed on DeepBacs, we also compared our results with those of a baseline model trained from scratch, which had the same structure as UniFMIR, and the pretrained UniFMIR model without fine-tuning. Our UniFMIR method restored clear prokaryote structures to enhance the low-phototoxicity live-cell microscopy data and predict accurate mappings of biological target shapes, obtaining higher PSNR/SSIM values (Fig. 6). Compared with the model trained from scratch, our UniFMIR approach achieved better performance on new datasets (Supplementary Fig. 13).
We also analyzed whether UniFMIR could be generalized to other SR modalities in addition to the SIM images used in the pretraining stage. First, we adopted single-molecule localization microscopy data from the Shareloc platform42 and applied our model to direct stochastic optical reconstruction microscopy (dSTORM) images of MTs stained with Alexa Fluor 647 in U2OS cells incubated with nocodazole. Since the input WF images and GT were not well matched, we fine-tuned our UniFMIR model with the contextual bilateral (CoBi) loss43 instead of the L1 or L2 loss, which requires pixel-wise alignment between the input and GT. We compared a U-Net-based SR single-molecule microscopy model (DeepSTORM44), a baseline model and a pretrained UniFMIR model without fine-tuning. In addition to the pretrained UniFMIR model, all competing models were trained with the CoBi loss on the same training data as those used by our method. Our UniFMIR model could restore accurate structures that were similar to the GTs, and the other models failed to learn mappings between the unaligned images (Extended Data Fig. 3).
Discussion
The focus of this paper was to present a UniFMIR solution for maximizing the potential of deep learning-based methods and circumventing the limitations exhibited by the existing fluorescence microscopy-based image restoration deep models. Here, we outlined how recent advances in foundation model research enable the development of FMIR. Inspired by the success of pretrained large-scale models in artificial intelligence, we developed a unified foundation model for fluorescence microscopy-based image restoration, facilitating high-quality image restoration in different fluorescence microscopy-based image restoration tasks with various imaging modalities by extending the strong transfer capabilities of large-scale pretrained models to fluorescence microscopy-based image restoration. We also collected 14 public fluorescence microscopy-based image restoration datasets with 196,418 training pairs covering various biological samples, microscopes and degradation conditions. The UniFMIR model pretrained on the collected data could be easily applied to different tasks and new image distributions through efficient fine-tuning, transferring the knowledge of a foundation model to a specific one.
The experimental results obtained in different fluorescence microscopy-based image restoration tasks suggested the excellent performance of UniFMIR in restoring high-fidelity microscopy images. The HR results obtained for the SR and isotropic resolution reconstruction problems could clearly resolve diffraction-limited image details to improve the resolutions of images by uncovering subtle biological structures (Figs. 1 and 2 and Supplementary Fig. 5). The denoising and projection results restored clean signals from the noisy inputs, achieving accurate reconstruction quality (Figs. 3 and 4 and Supplementary Figs. 6–8). The volume reconstruction results showed transient biological image dynamics with minimal artifacts (Fig. 5 and Supplementary Fig. 9). The generalization ability of the pretrained UniFMIR was also shown (Fig. 6 and Extended Data Fig. 3).
Key capabilities
We outline three key capabilities that distinguish UniFMIR from conventional FMIR models. (1) Higher restoration quality. UniFMIR achieved the highest restoration precision and was often superior to the task-specific FMIR models. (2) Better generalization ability. UniFMIR displayed an impressive generalization ability and enabled efficient training processes on new FMIR datasets by transferring knowledge from the available datasets to the new data. (3) Unifying FMIR tasks. UniFMIR possessed applicability to handle multiple FMIR problems with one model and unified the restoration processes concerning different data modalities for different fluorescence microscopes. Our work also identified that publicly shared fluorescence images can be considered a tremendous resource that can be harnessed to develop foundation models for enhancing fluorescence microscopy images.
We demonstrated the ability of UniFMIR to achieve high precision with promising generalization performance. The key intuition behind this idea is that the foundation model, pretrained on more diverse data distributions, could learn more generalized representations29 of different high-quality image modalities and biological structures. Existing works32,33 have found that large-scale pretraining allows the training of deeper models with greater predictive potential and even promotes robustness by gaining a fundamental understanding of the knowledge in training data. As stated by Guo et al.45, the foundation model may be effective in terms of acquiring informative global patterns that can improve the robustness of task-specific models.
Different fluorescence microscopy-based image restoration operations pushed the distribution spaces of the low-quality images to the distribution spaces of high-quality images. During pretraining, the UniFMIR gained a fundamental understanding of the high-quality images and regularized optimization direction for different tasks, unlike the task-specific models that learned a path from low quality to high quality. Thus, the pretraining knowledge benefits the fine-tuning toward diverse tasks, facilitating faster and better convergence and enabling the model to perform better than a model trained from scratch (Supplementary Fig. 15).
Limitations and future work
Despite the impressive results of UniFMIR, it is noteworthy that this paper is an exploratory and inspiring work that demonstrates the feasibility of foundation models in fluorescence microscopy-based image restoration. We hope that this work, accompanied by a brief review of the current deep learning-based fluorescence microscopy-based image restoration approaches (Supplementary Table 1), will provide new insights for more researchers. Looking to the future, much room remains for further improvement and evaluation to advance the frontiers of fluorescence microscopy-based image restoration. We discuss some limitations and future research directions as follows. First, a large model (>100 MB or even 1 TB) often requires a considerable number of training images (millions or even billions), a time-consuming pretraining process (weeks or even months) and costly calculation resources (graphics processing units, GPUs). To reduce the deployment cost and make UniFMIR more energy efficient, we optimized UniFMIR models by applying two model compression methods (pruning and quantization). The UniFMIR model still takes a long calculation time, especially for those 3D image-related tasks. Therefore, faster inference and higher efficiency require further exploration. Second, we utilized existing public datasets for pretraining and found that the current data have not yet saturated the model’s performance (Supplementary Fig. 14). Theodoris et al.32 outlined that as the amount of publicly available data continues to expand, pretraining on larger-scale data may further enhance the model’s performance, especially for tasks with increasingly limited task-specific data. Therefore, efforts are expected to contribute to a more diverse and larger dataset. In future work, we will continuously train the foundation model with new data to make the fluorescence microscopy-based image restoration foundation model stronger and share it with the community timely and freely.
Methods
Data preparation process
To cover as many imaging modalities and fluorescence microscopy-based image restoration tasks as possible, we collected datasets from the literature (Supplementary Table 2) and grouped numerous datasets for different fluorescence microscopy-based image restoration tasks and imaging modalities. As these datasets vary significantly in terms of formats, domains and numerical ranges, we processed the images for convenient training and cross-dataset validation.
First, we wrote the input and GT images of existing datasets with different storage formats, including ‘TIF’, ‘npz’, ‘png’ and ‘nii.gz’, into an ‘.npz’ file. In addition, we normalized the images to unify the numerical distributions of different datasets by following the data processing method in CARE4. Because the spatial sizes of the features in a deep neural network are fixed during training, we further cropped the training images into multiple patches with the same spatial size to facilitate simultaneous training on images from different datasets.
Network architectures
We designed a multihead and multitail network architecture for the UniFMIR model, which included three components, including multiple feature extraction modules, a Swin transformer-based feature enhancement module and multiple image reconstruction modules (Fig. 1a and Extended Data Fig. 1). More specifically, the multihead and multitail branches for different fluorescence microscopy-based image restoration tasks adopted different feature extraction and image reconstruction modules to extract task-specific shallow features and reconstruct images, respectively (Supplementary Note 3).
Different fluorescence microscopy-based image restoration calculations shared the same feature enhancement module. Inspired by SwinIR46, a SOTA model for natural image restoration, the Swin transformer35-based feature enhancement module adopted several vision transformer-based blocks to enhance the feature representations and to restore the final features for high-quality image reconstruction. As shown in Extended Data Fig. 2, the feature enhancement module consisted of convolutional layers and a series of Swin transformer blocks, each of which included several Swin transformer layers, a convolutional layer and a residual connection. The Swin transformer layer was composed of layer normalization operations, a multihead self-attention mechanism and a multilayer perceptron. In the multihead self-attention mechanism, the input features fin were first divided into multiple small patches with a moving window operation, and then the self-attention in each patch was calculated with the function in equation (1).
where Q, K and V represent the query, key and value, respectively, which were separately obtained by three convolutional layers. dk is the dimensionality of K. SoftMax(⋅) normalized the similarity between Q and K, and the output feature fout was obtained by multiplying V. The multilayer perceptron was composed of two fully connected layers and Gaussian-error linear unit activation.
Training losses
We used a combination of the \({{{\mathcal{L}}}}\)1 and \({{{\mathcal{L}}}}\)2 losses during the pretraining stage to exploit the robustness of the \({{{\mathcal{L}}}}\)1 loss and the stability of the \({{{\mathcal{L}}}}\)2 loss. During fine-tuning, \({{{\mathcal{L}}}}\)1 is adopted to pursue a higher quantitative metric (PSNR). Suppose that \({({x}_{i},{y}_{i})}_{i = 1:N}\) denotes N pairs of input and GT training data and that fθ denotes the UniFMIR model with a parameter θ (equation (2)).
To apply our UniFMIR model to unmatched data, in which the input WF images and the GT single-molecule localization microscopy data42 were not well aligned in a pixel-wise manner, we fine-tuned our model with the CoBi loss43, which improved its robustness to mild misalignment in the input–output image pairs (equation (3)).
where (\({\tilde{y}},y\)) denotes a pair of restored and GT images and pi=1,…,N, qj=1,…,M are the features of X and Y extracted by the pretrained Visual Geometry Group 19 (VGG-19)47 network, respectively. ‘Distance(⋅)’ denotes a distance function for calculating the cosine similarity between features pi, qj, (hi, wi), (hj, wj), are the spatial coordinates of features pi, qj and ws = 0.1 denotes a weight that is flexible to the degree of misalignment.
Training details
The UniFMIR model was based on a PyTorch implementation and optimized by adaptive moment estimation (Adam)48 with β1 = 0.9 and β2 = 0.999 for 500 epochs. The initial learning rate started at 5 × 10−5 and was halved after 200 epochs. All experiments were conducted on a machine with an Nvidia GeForce RTX 3090 GPU (with 24 GB of RAM).
In the pretraining stage, we set the batch size to 1 and the patch size to 64 × 64. We fed all training data to the model and optimized different head and tail branches for different tasks with the corresponding data. The middle feature enhancement branch was optimized using all training data. During the fine-tuning stage, we set the batch size/patch size to 4/128, 32/64, 32/64, 4/64 and 1/16 for the SR, isotropic reconstruction, denoising, projection and volume reconstruction tasks, respectively, to produce a better learning effect (Extended Data Table 1).
Evaluation metrics
To evaluate the quantitative accuracy of the fluorescence microscopy-based image restoration results, we adopted common image quality assessment metrics as follows (Supplementary Note 4).
The PSNR, NRMSE and SSIM49 are proposed to measure the pixel-level and structure-level similarities between a restored image \(\tilde{y}\) and a GT image y.
The resolution evaluation with decorrelation analysis37, a comprehensive measurement of the resolution and SNR, was performed to estimate the highest frequency from the local maxima of the decorrelation functions rather than the theoretical resolution stated by Abbe50. NanoJ-SQUIRREL, an ImageJ-based analytical approach, was proposed to quantitatively assess SR quality by comparing diffraction-limited reference images and SR equivalents of the same acquisition volume.
FRC
FRC-based resolution measures38,51 can estimate image resolution without a reference image. We adopt the public GitHub codes and microscope image processing library (MIPLIB), a Python-based software library, for FRC-based image resolution analysis of fluorescence microscopy images.
Giga floating-point operation (GFLOPs), a common computational complexity measure, refers to the number of FLOPs, including the addition, subtraction, multiplication and division of floating-point numbers, that the model requires to process the input data. GFLOPs can reflect the processing power needed to execute the model and can vary depending on the number of convolutional layers, the types of operations performed within each layer, and the size of the input. The GFLOPs of a deep model are composed of operations contained in all convolution layers. The number of calculations (FLOPs) of a convolutional layer with a kernel size of K × K is 2K2 − 1, and the total number of FLOPs for conducting a convolutional calculation on an input with a size of H × W can be calculated according to equation (4):
where P and S denote the pooling and stride parameters of a convolution, respectively.
BOP (bit operations)52, which stands for number of bits times FLOPs, is considered to quantify the computational complexity of a deep model on a GPU that supports 32-bit, 16-bit or lower arithmetic. Since GFLOPs cannot well measure the computational complexity of low-precision and high-precision networks composed of integer or float operations, we also calculated the BOPs by the following function in equation (5):
where bw and ba denote the weight and activation bit-width and are set to 32 and 16 for the 32-bit and 16-bit models, respectively.
Image restoration and segmentation
To explore whether performing image restoration with UniFMIR could improve the downstream image analysis and segmentation tasks in live-cell imaging, we applied a common segmentation pipeline (trainable Weka segmentation53) to the raw and restored images in the CCP, ER, Tribolium and Flywing datasets. The resulting UniFMIR model improved the segmentation effect by performing denoising and increasing the image resolution (Supplementary Note 5 and Supplementary Figs. 10–12).
Optimization of memory and complexity
To make UniFMIR more energy efficient, we produced optimized models by adopting two model compression methods (pruning and quantization). First, we adopted structure pruning to cut some useless branches out of the original model. Specifically, we removed the redundant head and tail branches and only kept the head and tail branches for fine-tuning a task-specific model to reduce the number of redundant parameters, resulting in ‘prune-UniFMIR’. Inspired by Jacob et al.54, we also conducted model quantization, converting floating-point weights (float32) and activation values to low-precision numbers, to reduce the storage requirements and calculation time of the model; for this task, we adopted float16 quantization (Extended Data Tables 2–6).
Pixel size correction
As the features learned by deep models depend on the scale and resolution of the training data, the performance of the well-trained models is influenced by the pixel-wise scale of input images. Given input images with varying pixel sizes, to make the outputs of the UniFMIR consistent, we equipped the UniFMIR software platform with a pixel size correction option for the input images, enabling an automatic pixel size calibration feature. Specifically, the input image is resized into different scales and input into the model, and then the multiscale outputs of the model are fused to obtain the final result.
Competing methods
All competing models, accompanied by their quantitative results, are listed in Extended Data Tables 3–6. To conduct a fair comparison, we downloaded the codes and the saved model checkpoints of all competing approaches from their GitHub repositories (ENLCN, CARE, DFCAN, GVTNets, VCD-Net and XTC). We retrained the ENLCN and XTC methods on the BioSR dataset5. Because DFCAN, CARE and VCD-Net do not provide well-trained models, we also retrained these models using their codes on the datasets used in our experiments for different tasks. The results of GVTNets were obtained by directly using their public models and codes. All experiments were conducted on the same machine with Nvidia GeForce RTX 3090 GPU (24 GB of RAM).
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
All training and testing data involved in the experiments come from existing literature and can be downloaded from the corresponding links provided in Supplementary Table 2 or via Zenodo at https://doi.org/10.5281/zenodo.8401470 (ref. 55).
Code availability
The PyTorch code of our UniFMIR, together with trained models, as well as some example images for inference are publicly available at https://github.com/cxm12/UNiFMIR (https://doi.org/10.5281/zenodo.10117581)56. Furthermore, We also provide a live demo for UniFMIR at http://unifmir.fdudml.cn/. Users can also access the colab at https://colab.research.google.com/github/cxm12/UNiFMIR/blob/main/UniFMIR.ipynb or use the steps in our GitHub documentation to run the demo locally. This newly built interactive software platform facilitates users to freely and easily use the pretrained foundation model. It also makes it easy for us to continuously train the foundation model with new data and share it with the community. Finally, we shared all models on BioImage.IO at https://bioimage.io/#/. Data are available via Zenodo at https://doi.org/10.5281/zenodo.10577218, https://doi.org/10.5281/zenodo.10579778, https://doi.org/10.5281/zenodo.10579822, https://doi.org/10.5281/zenodo.10595428, https://doi.org/10.5281/zenodo.10595460, https://doi.org/10.5281/zenodo.8420081 and https://doi.org/10.5281/zenodo.8420100 (refs. 57,58,59,60,61,62,63). We used the Pycharm software for code development.
References
Preibisch, S. et al. Efficient bayesian-based multiview deconvolution. Nat. Methods 11, 645–648 (2014).
Gustafsson, N. et al. Fast live-cell conventional fluorophore nanoscopy with ImageJ through super-resolution radial fluctuations. Nat. Commun. 7, 12471 (2016).
Arigovindan, M. et al. High-resolution restoration of 3D structures from widefield images with extreme low signal-to-noise-ratio. Proc. Natl Acad. Sci. USA 110, 17344–17349 (2013).
Weigert, M. et al. Content-aware image restoration: pushing the limits of fluorescence microscopy. Nat. Methods 15, 1090–1097 (2018).
Qiao, C. et al. Evaluation and development of deep neural networks for image super-resolution in optical microscopy. Nat. Methods 18, 194–202 (2021).
Chen, J. et al. Three-dimensional residual channel attention networks denoise and sharpen fluorescence microscopy image volumes. Nat. Methods 18, 678–687 (2021).
Wang, Z., Xie, Y. & Ji, S. Global voxel transformer networks for augmented microscopy. Nat. Mach. Intell. 3, 161–171 (2021).
Wang, Z. et al. Real-time volumetric reconstruction of biological dynamics with light-field microscopy and deep learning. Nat. Methods 18, 551–556 (2021).
Li, X. et al. Reinforcing neuron extraction and spike inference in calcium imaging using deep self-supervised denoising. Nat. Methods 18, 1395–1400 (2021).
Qiao, C. et al. Rationalized deep neural network for sustained super-resolution live imaging of rapid subcellular processes. Nat. Biotechol. 41, 367–377 (2022).
Belthangady, C. & Royer, L. A. Applications, promises, and pitfalls of deep learning for fluorescence image reconstruction. Nat. Methods 16, 1215–1225 (2019).
Wu, Y. & Shroff, H. Faster, sharper, and deeper: structured illumination microscopy for biological imaging. Nat. Methods 15, 1011–1019 (2018).
Wu, Y. et al. Multiview confocal super-resolution microscopy. Nature 600, 279–284 (2021).
Chen, R. et al. Single-frame deep-learning super-resolution microscopy for intracellular dynamics imaging. Nat. Commun. 14, 2854 (2023).
Xu, Y. K. T. et al. Cross-modality supervised image restoration enables nanoscale tracking of synaptic plasticity in living mice. Nat. Methods 20, 935–944 (2023).
Arigovindan, M. et al. High-resolution restoration of 3D structures from widefield images with extreme low signal-to-noise-ratio. Proc. Natl Acad. Sci. USA 110, 17344–17349 (2013).
Bommasani, R. et al. On the opportunities and risks of foundation models. Preprint at https://arxiv.org/abs/2108.07258 (2021).
Fei, N. et al. Towards artificial general intelligence via a multimodal foundation model. Nat. Commun. 13, 3094 (2022).
Zhang, Y. et al. DialoGPT: large-scale generative pre-training for conversational response generation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. 270–278 (2020).
Yang, Z. et al. Xlnet: generalized autoregressive pretraining for language understanding. In Conference on Neural Information Processing Systems (NeurIPS) (2019).
Dai, Z. et al. Coatnet: marrying convolution and attention for all data sizes. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021).
Kirillov, A. et al. Segment anything. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 4015–4026 (2023).
Achiam, J. et al. Gpt-4 technical report. Preprint at https://arxiv.org/abs/2303.08774 (2023).
Bao, F. et al. One transformer fits all distributions in multi-modal diffusion at scale. In International Conference on Machine Learning (ICML) (2023).
Bi, K. et al. Accurate medium-range global weather forecasting with 3D neural networks. Nature 619, 533–538 (2023).
Singhal, K. et al. Large language models encode clinical knowledge. Nature 620, 172–180 (2023).
Jiang, L. Y. et al. Health system-scale language models are all-purpose prediction engines. Nature 619, 357–362 (2023).
Huang, Z. et al. A visual-language foundation model for pathology image analysis using medical twitter. Nat. Methods 29, 2307–2316 (2023).
Zhou, Y. et al. A foundation model for generalizable disease detection from retinal images. Nature 622, 156–163 (2023).
Moor, M. et al. Foundation models for generalist medical artificial intelligence. Nature 616, 259–265 (2023).
Madani, A. et al. Large language models generate functional protein sequences across diverse families. Nature Biotechnol. 41, 1099–1106 (2023).
Theodoris, C. V. et al. Transfer learning enables predictions in network biology. Nature 618, 616–624 (2023).
Henighan, T. et al. Scaling laws for autoregressive generative modeling. Preprint at https://arxiv.org/abs/2010.14701 (2020).
Zamir, A. et al. Taskonomy: disentangling task transfer learning. In Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI), 3712–3722 (2019).
Liu, Z. et al. Swin transformer: hierarchical vision transformer using shifted windows. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021).
Xia, B. et al. Efficient non-local contrastive attention for image super-resolution. In Association for the Advancement of Artificial Intelligence (AAAI) (2022).
Descloux, A., Grubmayer, K. S. & Radenovic, A. Parameter-free image resolution estimation based on decorrelation analysis. Nat. Methods 16, 918–924 (2019).
Nieuwenhuizen, R. et al. Measuring image resolution in optical nanoscopy. Nat. Methods 10, 557–562 (2013).
Culley, S. et al. Quantitative mapping and minimization of super-resolution optical imaging artifacts. Nat. Methods 15, 263–266 (2018).
Li, X. et al. Three-dimensional structured illumination microscopy with enhanced axial resolution. Nat. Biotechnol. 41, 1307–1319 (2023).
Spahn, C. et al. DeepBacs for multi-task bacterial image analysis using open-source deep learning approaches. Commun. Biol. 5, 688 (2022).
Ouyang, W. et al. ShareLoc—an open platform for sharing localization microscopy data. Nat. Methods 19, 1331–1333 (2022).
Zhang, X. C. et al. Zoom to learn, learn to zoom. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019).
Nehme, E. et al. Deep-storm: super-resolution single-molecule microscopy by deep learning. Optica 5, 458–464 (2018).
Guo, L. L. et al. EHR foundation models improve robustness in the presence of temporal distribution shift. Sci. Rep. 13, 3767 (2023).
Liang, J. et al. Swinir: image restoration using swin transformer. In IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), 1833–1844 (2021).
Simonyan, K. & Zisserman, A. Very deep convolutional networks for large-scale image recognition. In International Conference on Machine Learning (ICLR) (2015).
Kingma, D. & Ba, J. Adam: a method for stochastic optimization. Preprint at https://arxiv.org/abs/1412.6980 (2014).
Wang, Z. et al. Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13, 600–612 (2004).
Abbe, E. Beiträge zur theorie des mikroskops und der mikroskopischen wahrnehmung. Archiv. f. Mikrosk. Anatomie 9, 413–418 (1873).
Koho, S. et al. Fourier ring correlation simplifies image restoration in fluorescence microscopy. Nat. Commun. 10, 3103 (2019).
Baskin, C. et al. UNIQ: uniform noise injection for non-uniform quantization of neural networks. ACM Transactions on Computer Systems (TOCS), 37 (1–4), 1–15 (2021).
Arganda, C. et al. Trainable weka segmentation: a machine learning tool for microscopy pixel classification. Bioinformatics 33, 2424–2426 (2017).
Jacob, B. et al. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2704–2713 (2018).
Ma, C., Tan, W., He, R. & Yan, B. UniFMIR: pre-training a foundation model for universal fluorescence microscopy image restoration (2023.10.03). Zenodo https://doi.org/10.5281/zenodo.8401470 (2023).
Ma, C., Tan, W., He, R., & Yan, B. UniFMIR: pre-training a foundation model for universal fluorescence microscopy image restoration (version 2023.11.13). Zenodo https://doi.org/10.5281/zenodo.10117581 (2023).
Ma, C., Tan, W., He, R. & Yan, B. UniFMIRProjectionOnFlyWing. Zenodo https://doi.org/10.5281/zenodo.10577218 (2024).
Ma, C., Tan, W., He, R. & Yan, B. UniFMIRDenoiseOnPlanaria. Zenodo https://doi.org/10.5281/zenodo.10579778 (2024).
Ma, C., Tan, W., He, R. & Yan, B. UniFMIRDenoiseOnTribolium. Zenodo https://doi.org/10.5281/zenodo.10579822 (2024).
Ma, C., Tan, W., He, R. & Yan, B. UniFMIRVolumetricReconstructionOnVCD. Zenodo https://doi.org/10.5281/zenodo.10595428 (2024).
Ma, C., Tan, W., He, R. & Yan, B. UniFMIRIsotropicReconstructionOnLiver. Zenodo https://doi.org/10.5281/zenodo.10595460 (2024) .
Ma, C., Tan, W., He, R. & Yan, B. UniFMIRSuperResolutionOnMicrotubules. Zenodo https://doi.org/10.5281/zenodo.8420081 (2023).
Ma, C., Tan, W., He, R. & Yan, B. UniFMIRSuperResolutionOnFactin. Zenodo https://doi.org/10.5281/zenodo.8420100 (2023).
Acknowledgements
We gratefully acknowledge support for this work provided by the National Natural Science Foundation of China (NSFC) (grant nos. U2001209 to B.Y. and 62372117 to W.T.) and the Natural Science Foundation of Shanghai (grant no. 21ZR1406600 to W.T.).
Author information
Authors and Affiliations
Contributions
B.Y. and W.T. supervised the research. C.M. and W.T. conceived of the technique. C.M. implemented the algorithm. C.M. and W.T. designed the validation experiments. C.M. trained the network and performed the validation experiments. R.H. implemented the interactive software platform and organized the codes and models. All authors had access to the study and wrote the paper.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Methods thanks Ricardo Henriques and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Rita Strack, in collaboration with the Nature Methods team.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Overall architecture of the UniFMIR.
The proposed UniFMIR approach is composed of three submodules: a multihead module, a Swin transformer-based feature enhancement module, and a multitail module. The numbers of parameters (M) and calculations (GFLOPs) required for the head, feature enhancement and tail modules for different tasks are marked below the structures of the respective modules. The input sizes and output sizes of training batches for different tasks are also marked below the images.
Extended Data Fig. 2 Network architecture of the Swin transformer-based feature enhancement module46.
The feature enhancement module consists of convolutional layers and a series of Swin transformer blocks (STB), each of which includes several Swin transformer layers (STL), a convolutional layer and a residual connection. The STL is composed of layer normalization operations, a multihead self-attention (MSA) mechanism and a multilayer perceptron (MLP). In the MSA mechanism, the input features are first divided into multiple small patches with a moving window operation, and then the self-attention in each patch is calculated to output features fout. The MLP is composed of two fully connected layers (FCs) and Gaussian-error linear unit (GELU) activation.
Extended Data Fig. 3 Generalization ability analysis of super-resolution on unseen modality of single-molecule localization microscopy data from the Shareloc platform52.
a, SR results obtained by the SOTA model (DeepSTORM54), the pretrained UniFMIR model without fine-tuning, Baseline (same network structure as UniFMIR trained from scratch), and our fine-tuned UniFMIR model. The GT dSTORM images of microtubules stained with Alexa 647 in U2OS cells incubated with nocodazole and the input synthesized LR images are also shown. The PSNR/NRMSE results of the SR outputs obtained on n = 16 synthetic inputs are shown on the right. b, SR results obtained on the real-world wide-field images. The NRMSE values are depicted on the residual images under different SR results and the raw input images. The PSNR/NRMSE results on n = 9 real-world inputs are shown on the right. Box-plot elements are defined as follows: center line (median); box limits (upper and lower quartiles); whiskers (1.5x interquartile range). The line plots show the pixel intensities along the dashed lines in the corresponding images. Scale bar: 6.5 μm.
Supplementary information
Supplementary Information
Supplementary Notes 1–5, Figs. 1–17 and Tables 1 and 2.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ma, C., Tan, W., He, R. et al. Pretraining a foundation model for generalizable fluorescence microscopy-based image restoration. Nat Methods 21, 1558–1567 (2024). https://doi.org/10.1038/s41592-024-02244-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41592-024-02244-3
This article is cited by
-
Multimodal large language models for bioimage analysis
Nature Methods (2024)
-
Embedding AI in biology
Nature Methods (2024)
-
Convolutional neural network transformer (CNNT) for fluorescence microscopy image denoising with improved generalization and fast adaptation
Scientific Reports (2024)
-
One-Dimensional Rock and Soil Characteristic Parameters Prediction Method Based on SRR
Arabian Journal for Science and Engineering (2024)