這是用戶在 2024-12-22 16:00 為 https://app.immersivetranslate.com/pdf-pro/594cb2ce-f667-4134-b0c6-2f02c0b7c79c 保存的雙語快照頁面,由 沉浸式翻譯 提供雙語支持。了解如何保存?

High-quality Image Dehazing with Diffusion Model
高品質圖像去霧與擴散模型

Hu Yu, Jie Huang, Kaiwen Zheng and Feng Zhao, Member, IEEE
胡宇、謝黃、鄭開文和趙鳳,會員,IEEE

Abstract  摘要

Image dehazing is quite challenging in dense-haze scenarios, where quite less original information remains in the hazy image. Though previous methods have made marvelous progress, they still suffer from information loss in content and color in dense-haze scenarios. The recently emerged Denoising Diffusion Probabilistic Model (DDPM) exhibits strong generation ability, showing potential for solving this problem. However, DDPM fails to consider the physics property of dehazing task, limiting its information completion capacity. In this work, we propose DehazeDDPM: A DDPM-based and physics-aware image dehazing framework that applies to complex hazy scenarios. Specifically, DehazeDDPM works in two stages. The former stage physically models the dehazing task with the Atmospheric Scattering Model (ASM), pulling the distribution closer to the clear data and endowing DehazeDDPM with fog-aware ability. The latter stage exploits the strong generation ability of DDPM to compensate for the haze-induced huge information loss, by working in conjunction with the physical modelling. Extensive experiments demonstrate that our method attains state-of-the-art performance on both synthetic and real-world hazy datasets. Our code is available at https://github.com/yuhuUSTC/DehazeDDPM
影像去霧在濃霧情境下相當挑戰,因為在霧影中保留的原有資訊相當少。雖然先前的方法已經取得了驚人的進展,但在濃霧情境下,它們仍然會在內容和顏色上遭受資訊損失。近期出現的去噪擴散概率模型(DDPM)展現了強大的生成能力,顯示出解決這個問題的潛力。然而,DDPM 未能考慮去霧任務的物理屬性,限制了其資訊補充能力。在這項工作中,我們提出了 DehazeDDPM:一個基於 DDPM 並具有物理意識的影像去霧框架,適用於複雜的霧影情境。具體來說,DehazeDDPM 分為兩個階段。前階段使用大氣散射模型(ASM)對去霧任務進行物理建模,將分佈拉攏到清晰數據附近,並赋予 DehazeDDPM 霧感知能力。後階段則利用 DDPM 強大的生成能力,與物理建模相結合,補償由霧引起的巨大資訊損失。 廣泛的實驗證明,我們的方法在合成和真實世界模糊數據集上均達到最尖端的性能。我們的代碼可在 https://github.com 上獲取。com/yuhuUSTC/DehazeDDPM

Index Terms-Image dehazing, Diffusion model, Atmospheric Scattering Model
索引術語-圖像去霧,擴散模型,大氣散射模型

I. InTRODUCTION  I. 介紹

HAZE is a common atmospheric phenomenon. Images captured in hazy environments usually suffer from information loss both in content and color. The goal of image dehazing is to restore a clean scene from a hazy image. This task has been a longstanding and challenging problem with various applications, such as surveillance systems and autonomous driving, drawing the attention of researchers. As recognized, the haze procedure can be represented by the wellknown ASM [1], [2], which is formulated as
霧是常見的大氣現象。在霧濛環境中攝取的圖像通常會在內容和顏色上出現信息損失。圖像去霧的目標是從霧蒙圖像中恢復清潔場景。這項任務是一個長期以來具有挑戰性的問題,具有各種應用,如監控系統和自動駕駛,吸引了研究人員的關注。正如所認識的那樣,霧過程可以用著名的 ASM [1],[2] 表示,其表達式為
I ( x ) = J ( x ) t ( x ) + A ( 1 t ( x ) ) , I ( x ) = J ( x ) t ( x ) + A ( 1 t ( x ) ) , I(x)=J(x)t(x)+A(1-t(x)),I(x)=J(x) t(x)+A(1-t(x)),
where I ( x ) I ( x ) I(x)I(x) and J ( x ) J ( x ) J(x)J(x) denote the hazy image and the clean image respectively, A A AA is the global atmospheric light, and t ( x ) t ( x ) t(x)t(x) is the transmission map (we use trmap to denote transmission map here after to avoid confusion). According to ASM, densehaze corresponds to small transmission map value and less original information.
哪裡 I ( x ) I ( x ) I(x)I(x) J ( x ) J ( x ) J(x)J(x) 分別表示模糊圖像和清晰圖像, A A AA 是全局大氣光,而 t ( x ) t ( x ) t(x)t(x) 是傳播圖(之後我們使用 trmap 來表示傳播圖,以避免混淆)。根據 ASM,densehaze 對應於較小的傳播圖值和較少的原始信息。
With this line, conventional approaches rely on ASM [1] and adopt priors as external information to estimate the parameters of ASM [4]-[7]. However, these hand-crafted image priors are drawn from specific observations with limited robustness, which may not be reliable for modeling the intrinsic characteristics of hazy images.
此行,傳統方法依賴於 ASM [1]並採用先驗作為外部信息來估計 ASM [4]-[7]的參數。然而,這些手工製作的圖像先驗來自特定的觀察,並且具有有限的鲁棒性,可能不適合於建模霧氣圖像的內在特性。
Inspired by the success of deep learning, numerous approaches [3], [8]-[13] have been developed to directly learn
啟發於深度學習的成功,已開發出許多方法 [3],[8]-[13] 以直接學習

H. Yu, J. Huang, K. Zheng and F. Zhao are affiliated with University of Science and Technology of China, Hefei 230027, China (email:fzhao956@ustc.edu.cn)
H. 余,J. 黃,K. 鄭和 F. 趙與中國科學技術大學,合肥 230027,中國(電子郵件:fzhao956@ustc.edu.cn)有關聯

Fig. 1. The visual examples of dehazing results were sampled from real-world hazy images. The second to fourth columns show the results of Dehamer [3], our first-stage, and our DehazeDDPM, respectively. Our method demonstrates unprecedented perceptual quality on the challenging real-world datasets.
圖 1.去霧結果的可視化示例均從真實世界的霧氣圖像中抽取。第二至第四列分別顯示了 Dehamer [3]、我們的第一階段方法和我們的 DehazeDDPM 的結果。我們的方法在挑戰性的真實世界數據集上展示了前所未有的感知質量。

the hazy to clear mapping. Though these methods have made marvelous progress, they still suffer from information loss in content and color in dense-haze scenarios, as shown in Fig. 1 As for the reason, quite limited original information remains in the hazy image in this challenging case, restricting the information completion ability of such mapping. The detailed statistics illustration of the haze-induced information loss is shown in Fig. 2 For example, the t-SNE [14] as well as the wasserstein distance shows that the distribution of clear and hazy images deviates from each other. Entropy is a measure of the amount of information contained in an image. The average entropy of hazy images is much smaller than that of clear images, which denotes the haze-induced information loss.
濃霧到晴朗的映射。雖然這些方法已經取得了驚人的進步,但在濃霧情況下,它們仍然會在內容和顏色上出現信息損失,如圖 1 所示。至於原因,在這個挑戰性案例中,濃霧圖像中保留的原始信息非常有限,這限制了這種映射的信息補全能力。濃霧導致的信息損失的詳細統計說明如圖 2 所示。例如,t-SNE [14]以及 Wasserstein 距離表明,晴朗和濃霧圖像的分布彼此偏差。熵是衡量圖像中包含信息量的度量。濃霧圖像的平均熵遠小於晴朗圖像,這表示濃霧導致的信息損失。
Recently, DDPM [15], [16] has drawn intensive attention due to its strong generation ability. DDPM can produce highquality images both unconditionally [ 17 ] [ 19 ] [ 17 ] [ 19 ] [17]-[19][17]-[19] and conditionally [20]-[22], which pose a new perspective for image dehazing task. However, DDPM fails to consider the physics property of dehazing task, limiting its information completion capacity. For example, the huge distribution difference between dense-haze and clear images makes DDPM struggle with weak and deviate distribution guidance. Besides, DDPM is not aware of the restoration difficulty of different image regions, which is important in modeling the complex distribution in real-world hazy scenarios.
近期,由於 DDPM [15]、[16]具有強大的生成能力,因此受到廣泛關注。DDPM 能夠無條件地 [ 17 ] [ 19 ] [ 17 ] [ 19 ] [17]-[19][17]-[19] 和條件性地 [20]-[22]生產高品質圖像,為圖像去霧任務提供了一個新視角。然而,DDPM 未能考慮去霧任務的物理屬性,限制了其信息補全能力。例如,濃霧和清晰圖像之間的巨大分佈差異使得 DDPM 在處理弱和偏離的分佈指導時感到困難。此外,DDPM 並未意識到不同圖像區域的恢復難度,這在建模真實霧景中的複雜分佈時非常重要。
In this work, we propose DehazeDDPM: A DDPM-based and physics-aware image dehazing framework that is applicable to complex hazy scenarios. A sketch of the main ideas is shown in Fig. 3. Our DehazeDDPM views image dehazing as a conditional generative modeling task. Instead of learning a mapping, DehazeDDPM memorizes the data distribution of clear images by introducing conditional DDPM into image dehazing, where conditional DDPM approximates the data distribution with appropriate conditions. Therefore, in the challenging dense-haze case, our method largely surpasses
在這項工作中,我們提出 DehazeDDPM:一個基於 DDPM 並具有物理感知的圖像去霧框架,適用於複雜的霧天情境。主要思路的草圖見圖 3。我們的 DehazeDDPM 將圖像去霧視為一個條件生成建模任務。DehazeDDPM 不是學習映射,而是通過將條件 DDPM 引入圖像去霧,記憶清晰圖像的數據分布,其中條件 DDPM 在適當的條件下逼近數據分布。因此,在挑戰性的濃霧情況下,我們的方法在很大程度上超越

Fig. 2. Statistics illustration of the haze-induced information loss, including t-SNE clustering [14], distribution distance, histogram, gradient, entropy, and standard deviation. The dense haze causes massive information loss in content and color.
圖 2. 霧導致的信息損失統計示意,包括 t-SNE 分群[14]、分佈距離、直方圖、梯度、熵和標準差。濃厚的霧導致內容和顏色大量信息損失。

previous mapping-based methods. Besides, the frequency prior of the generation process is leveraged to optimize and constrain the frequency information of the hard region.
先前基於映射的方法。此外,利用生成過程的頻率先驗來優化和約束硬區域的頻率信息。
Specifically, DehazeDDPM works in two stages. The former stage estimates the transmission map trmap, the hazefree image J J JJ, and the atmospheric light A A AA governed by the underlying the Atmospheric Scattering Model (ASM) physics. The estimated haze-free image J J JJ has closer distribution to the corresponding clear data than the originally hazy image. The transmission map trmap is exploited as the confidence guidance for the second stage which endows DehazeDDPM with fog-aware ability. The latter stage exploits the strong generation ability of DDPM to compensate for the hazeinduced huge information loss, by working in conjunction with the physical modelling. The latter stage can recover the details that failed to be retrieved by the first stage, as well as correct artifacts introduced by that stage. Besides these, although diffusion model can generate high quality, it spends most of the time to generate high frequency in the whole reverse denoise process. Thus, we impose frequency prior constraints to the training of the diffusion model. Our method demonstrates unprecedented perceptual quality in image dehazing task, as shown in Fig. 1 Extensive experiments demonstrate that our method attains SOTA performance on several image dehazing benchmarks.
特別地,DehazeDDPM 分兩階段運作。第一階段估計傳遞映射 trmap、無霧圖像 J J JJ 和受基於大氣散射模型(ASM)物理學控制的大氣光 A A AA 。估計的無霧圖像 J J JJ 的分佈與原始霧圖像的對應清晰數據更接近。傳遞映射 trmap 用作第二階段的信心導引,使 DehazeDDPM 具有霧感知能力。第二階段利用 DDPM 的強大生成能力,通過與物理建模相結合,彌補由霧引起的巨大信息損失。後一階段可以恢復第一階段未能恢復的細節,以及修正該階段引入的錯誤。此外,雖然擴散模型可以生成高品質的圖像,但它花費大部分時間在整個逆去噪過程中生成高頻率。因此,我們對擴散模型的訓練施加頻率先驗約束。我們的方法在圖像去霧任務中展示了前所未有的感知質量,如圖所示。 1 經過廣泛的實驗證明,我們的方法在多個圖像去霧的基準上達到了 SOTA 性能。
Overall, our contributions can be summarized as follows:
總的來說,我們的貢獻可以概括如下:
  • We firstly introduce conditional DDPM to tackle the challenging dense-haze image dehazing task by working in conjunction with the physical modelling.
    我們首先介紹了條件 DDPM,通過與物理建模協同工作來應對挑戰性的濃霧圖像去霧任務。
  • Specifically, physical modelling pulls the distribution of hazy data closer to that of clear data and endows DehazeDDPM with fog-aware ability.
    特別地,物理建模將模糊數據的分布拉攏到清晰數據附近,並賦予 DehazeDDPM 霧感知能力。
  • Extensive experiments demonstrate that our method outperforms SOTA approaches on several image dehazing benchmarks with much better FID and LPIPS scores on complex real-world datasets.
    廣泛的實驗證明,我們的方法在多個圖像去霧的基準上超過了 SOTA 方法,在複雜的實際數據集上具有更好的 FID 和 LPIPS 分數。
Image Dehazing. In recent years, we have witnessed significant advances in single image dehazing. Existing methods can be roughly categorized into two classes: physical-based methods and deep learning-based methods.
影像去霧。近年來,我們目睹了單影像去霧技術的顯著進步。現有方法大致可分為兩類:基於物理的方法和基於深度學習的方法。
Physical-based methods depend on the atmospheric scattering model [1] and the handcraft priors, such as dark channel
物理基方法依赖于大气散射模型[1]和手工先验,例如暗通道

Fig. 3. Thumbnail of main idea. Most previous image dehazing methods learn the mapping from hazy to clear images. Our method memorizes the data distribution of clear images by introducing conditional DDPM into image dehazing.
圖 3. 主要概念的縮略圖。大多數之前的圖像去霧方法學習從模糊到清晰圖像的映射。我們的方法通過引入條件 DDPM 到圖像去霧中,記憶清晰圖像的數據分布。

prior [5], color line prior [6], color attenuation prior [23], sparse gradient prior [24], maximum reflectance prior [25], and non-local prior [4]. For example, DCP [5] discovers the dark channel prior to modeling the properties of the hazyfree images, which assumes that the locally lowest intensity in RGB channels should be close to zero in haze-free natural images. However, the handcraft priors are mainly from empirical observations, which cannot accurately characterize the haze formation process.
先前 [5],顏色線先前 [6],顏色減弱先前 [23],稀疏梯度先前 [24],最大反照率先前 [25],以及非局部先前 [4]。例如,DCP [5] 在建模模糊無雲圖像的屬性之前發現了暗通道先前,這假設 RGB 通道中局部最低強度應該接近於零在無雲自然圖像中。然而,手工製先驗主要是從經驗觀察中來的,這不能準確描繪霧的形成過程。
Different from the physical-based methods, deep learningbased methods employ convolution neural networks to learn the image prior [8], [26]-[30] or directly learn hazy-to-clear translation [3], [9], [11]-[13], [29], [31]-[40]. For example, AOD-Net [8] produces the recovered images by reformulating the Atmospheric Scattering Model. DeHamer [3] introduces transformer into image dehazing to combine the global modeling capability of Transformer and the local representation capability of CNN. FSDGN [13] reveals the relationship between the haze degradation and the characteristics of frequency and jointly explores the information in the frequency and spatial domains for image dehazing. RIDCP [40] presents a paradigm for real image dehazing from the perspectives of synthesizing more realistic hazy data and introducing more robust priors into the network. Zheng et al. [39] proposed a curricular contrastive regularization, which leverages the dehazed results of other existing dehazing methods to provide better lowerbound constraints.
與基於物理的方法不同,基於深度學習的方法使用卷積神經網絡來學習圖像先驗[8],[26]-[30]或直接學習濾雲到清晰轉換[3],[9],[11]-[13],[29],[31]-[40]。例如,AOD-Net[8]通過重新規範大氣散射模型來產生恢復的圖像。DeHamer[3]將轉換器引入圖像去霧,以結合 Transformer 的全局建模能力和 CNN 的局部表示能力。FSDGN[13]揭示了霧退化與頻率特徵之間的關係,並共同探索頻率和空間領域中的信息以進行圖像去霧。RIDCP[40]從合成更現實的霧数据和將更堅固的先驗引入網絡的角度提出了一種實際圖像去霧的範式。鄭等[39]提出了一種課程對比正則化,該正則化利用其他現有去霧方法的去霧結果以提供更好的下界約束。
The above techniques have shown outstanding performance on image dehazing. They resort to learn the mapping from hazy images to haze-free counterparts. For example, CNNand transformer-based methods learn the relationship between different pixels or regions by local or global modeling, respectively. However, if the image is captured in complex real foggy scenes, the transformation mapping is hard to learn, and the network cannot recover the original clear image via the very limited information in hazy images. Besides, these approaches share the limitation that they produce a deterministic output,
上述技術在圖像去霧方面展現了出色的性能。它們通過學習從霧蒙圖像到無霧對應圖像的映射來實現。例如,基於 CNN 和 transformer 的方法分別通過局部或全局建模學習不同像素或區域之間的關係。然而,如果圖像是在複雜的真實霧景中攝取的,則轉換映射難以學習,網絡無法通過霧蒙圖像中的非常有限信息恢復原始清晰圖像。此外,這些方法還存在一個限制,即它們產生確定性輸出,

which is at odds with the ill-posedness nature of image dehazing. What’s more, the training objectives of minimizing pixel-level distortion are known to be poorly correlated with human perception and often lead to blurry and unrealistic reconstructions [21], especially in complex real-world hazy scenarios.
與圖像去霧的病態性質相矛盾。更重要的是,最小化像素級失真的訓練目標與人類感知的關聯性不佳,常導致模糊且不真實的重建,尤其是在複雜的實際霧景中 [21]。
Instead of learning a mapping, our method memorizes the information of clear images by introducing conditional diffusion model into image dehazing, where conditional DDPM approximates the data distribution with appropriate conditions. Therefore, in the challenging dense-haze case, our method largely surpasses previous mapping-based methods.
此方法不學習映射,而是透過將條件擴散模型引入圖像去霧,通過條件 DDPM 在適當的條件下逼近數據分佈來記憶清晰圖像的信息。因此,在挑戰性的濃霧情況下,我們的方法在很大程度上超過了基於映射的先前方法。

Deep generative models. Deep generative models have seen success in learning complex empirical distributions of images and exhibiting convincing image generation results. Generative adversarial networks (GANs), autoregressive models, Normalizing Flows, and variational autoencoders (VAEs) have synthesized striking image samples [41]-[44] and have been applied to conditional tasks such as image dehazing [33], [45][49]. However, these approaches often suffer from various limitations. For example, GANs capture less diversity than state-of-the-art likelihood-based models [18], [50] and require carefully designed regularization and optimization tricks to avoid optimization instability and mode collapse [51], [52].
深生成模型。深生成模型在学习和展示图像的复杂经验分布以及令人信服的图像生成结果方面取得了成功。生成对抗网络(GANs)、自回归模型、正态化流和变分自编码器(VAEs)合成了引人注目的图像样本[41]-[44],并已应用于条件任务,如图像去雾[33]、[45][49]。然而,这些方法通常存在各种局限性。例如,GANs 捕获的多样性不如最先进的基于似然度的模型[18]、[50],并且需要精心设计的正则化和优化技巧来避免优化不稳定性和模式崩溃[51]、[52]。
In contrast, diffusion models, as a class of likelihoodbased generative models, possess the desirable properties such as distribution coverage, a stationary training objective, and easy scalability [16]-[19]. With this line, conditional DDPM [20]-[22], [53]-[55] are developed for image enhancement in low-level vision, such as image super-resolution [20], image inpainting [54], and image deblurring [21]. Although DDPMbased methods have been developed for some low-level vision tasks, there is no precedent for usage in image dehazing. Besides, DDPM also fails to consider the physics property of dehazing task, limiting its information completion capacity for hazy images. Thus, in this paper, we firstly introduce conditional DDPM to tackle the challenging dense-haze image dehazing task by working in conjunction with the physical modelling.
與之相比,擁有分佈涵蓋、穩定的訓練目標和易於擴展等優點的擴散模型作為一種基於機率似然函數的生成模型,在低階視覺領域的圖像增強方面,如圖像超解析度[20]、圖像補充[54]和圖像去模糊[21]等方面得到了發展。雖然已經開發了基於 DDPM 的方法來處理某些低階視覺任務,但在圖像去霧方面的應用尚無先例。此外,DDPM 也未能考慮去霧任務的物理屬性,限制了其在霧氣圖像上的信息補全能力。因此,在本文中,我們首先引入條件 DDPM,通過與物理建模相結合來應對挑戰性的濃霧圖像去霧任務。

III. PreLiminARIES: DDPM
III. 預備說明:DDPM

DDPM is a latent variable model specified by a T-step Markov chain, which approximates a data distribution q ( x ) q ( x ) q(x)\mathrm{q}(\mathrm{x}) with a model p θ ( x ) p θ ( x ) p_(theta)(x)p_{\theta}(x). It contains two processes: the forward diffusion process and the reverse denoise process.
DDPM 是一個由 T 步馬克夫鏈指定的潛在變數模型,用於用模型 p θ ( x ) p θ ( x ) p_(theta)(x)p_{\theta}(x)
The forward diffusion process. The forward diffusion process starts from a clean data sample x 0 x 0 x_(0)x_{0} and repeatedly injects Gaussian noise according to the transition kernel q ( x t x t 1 ) q x t x t 1 q(x_(t)∣x_(t-1))q\left(x_{t} \mid x_{t-1}\right) as follows:
前向擴散過程。前向擴散過程從一個潔淨數據樣本 x 0 x 0 x_(0)x_{0} 開始,並根據過渡核 q ( x t x t 1 ) q x t x t 1 q(x_(t)∣x_(t-1))q\left(x_{t} \mid x_{t-1}\right) 重複注入高斯噪聲,如下所示:
q ( x t x t 1 ) = N ( x t ; α t x t 1 , ( 1 α t ) I ) q x t x t 1 = N x t ; α t x t 1 , 1 α t I q(x_(t)∣x_(t-1))=N(x_(t);sqrt(alpha_(t))x_(t-1),(1-alpha_(t))I)q\left(x_{t} \mid x_{t-1}\right)=N\left(x_{t} ; \sqrt{\alpha_{t}} x_{t-1},\left(1-\alpha_{t}\right) I\right)
where α t α t alpha_(t)\alpha_{t} can be learned by reparameterization [42] or held constant as hyper-parameters, controlling the variance of noise added at each step. From the Gaussian diffusion process, we can derivate closed-form expressions for the marginal1 distri-
那裡 α t α t alpha_(t)\alpha_{t} 可以透過重新參數化[42]學習,或固定作為超參數,控制每一步加入的噪聲變異。從高斯擴散過程中,我們可以導出邊際 1 分佈的閉式表達式。

bution q ( x t x 0 ) q x t x 0 q(x_(t)∣x_(0))q\left(x_{t} \mid x_{0}\right) and the reverse diffusion step q ( x t 1 x t , x 0 ) q x t 1 x t , x 0 q(x_(t-1)∣x_(t),x_(0))q\left(x_{t-1} \mid x_{t}, x_{0}\right) as follows:
但作用於 q ( x t x 0 ) q x t x 0 q(x_(t)∣x_(0))q\left(x_{t} \mid x_{0}\right) 和反濾步驟 q ( x t 1 x t , x 0 ) q x t 1 x t , x 0 q(x_(t-1)∣x_(t),x_(0))q\left(x_{t-1} \mid x_{t}, x_{0}\right) 如下:
q ( x t x 0 ) = N ( x t ; α ¯ t x 0 , ( 1 α ¯ t ) I ) , q ( x t 1 x t , x 0 ) = N ( x t 1 ; μ ~ t ( x t , x 0 ) , β ~ t I ) , q x t x 0 = N x t ; α ¯ t x 0 , 1 α ¯ t I , q x t 1 x t , x 0 = N x t 1 ; μ ~ t x t , x 0 , β ~ t I , {:[q(x_(t)∣x_(0))=N(x_(t);sqrt( bar(alpha)_(t))x_(0),(1- bar(alpha)_(t))I)","],[q(x_(t-1)∣x_(t),x_(0))=N(x_(t-1); tilde(mu)_(t)(x_(t),x_(0)), tilde(beta)_(t)I)","]:}\begin{aligned} q\left(x_{t} \mid x_{0}\right) & =N\left(x_{t} ; \sqrt{\bar{\alpha}_{t}} x_{0},\left(1-\bar{\alpha}_{t}\right) I\right), \\ q\left(x_{t-1} \mid x_{t}, x_{0}\right) & =N\left(x_{t-1} ; \tilde{\mu}_{t}\left(x_{t}, x_{0}\right), \tilde{\beta}_{t} I\right), \end{aligned}
where μ ~ t ( x t , x 0 ) := α t 1 ( 1 α t ) 1 α ¯ t x 0 + α t ( 1 α ¯ t 1 ) 1 α ¯ t x t , β ~ t := μ ~ t x t , x 0 := α t 1 1 α t 1 α ¯ t x 0 + α t 1 α ¯ t 1 1 α ¯ t x t , β ~ t := tilde(mu)_(t)(x_(t),x_(0)):=(sqrt(alpha_(t-1))(1-alpha_(t)))/(1- bar(alpha)_(t))x_(0)+(sqrt(alpha_(t))(1- bar(alpha)_(t-1)))/(1- bar(alpha)_(t))x_(t), tilde(beta)_(t):=\tilde{\mu}_{t}\left(x_{t}, x_{0}\right):=\frac{\sqrt{\alpha_{t-1}}\left(1-\alpha_{t}\right)}{1-\bar{\alpha}_{t}} x_{0}+\frac{\sqrt{\alpha_{t}}\left(1-\bar{\alpha}_{t-1}\right)}{1-\bar{\alpha}_{t}} x_{t}, \tilde{\beta}_{t}:= 1 α ¯ t 1 1 α ¯ t ( 1 α t ) 1 α ¯ t 1 1 α ¯ t 1 α t (1- bar(alpha)_(t-1))/(1- bar(alpha)_(t))(1-alpha_(t))\frac{1-\bar{\alpha}_{t-1}}{1-\bar{\alpha}_{t}}\left(1-\alpha_{t}\right), and α ¯ t := s = 1 t α s α ¯ t := s = 1 t α s bar(alpha)_(t):=prod_(s=1)^(t)alpha_(s)\bar{\alpha}_{t}:=\prod_{s=1}^{t} \alpha_{s}.
哪裡 μ ~ t ( x t , x 0 ) := α t 1 ( 1 α t ) 1 α ¯ t x 0 + α t ( 1 α ¯ t 1 ) 1 α ¯ t x t , β ~ t := μ ~ t x t , x 0 := α t 1 1 α t 1 α ¯ t x 0 + α t 1 α ¯ t 1 1 α ¯ t x t , β ~ t := tilde(mu)_(t)(x_(t),x_(0)):=(sqrt(alpha_(t-1))(1-alpha_(t)))/(1- bar(alpha)_(t))x_(0)+(sqrt(alpha_(t))(1- bar(alpha)_(t-1)))/(1- bar(alpha)_(t))x_(t), tilde(beta)_(t):=\tilde{\mu}_{t}\left(x_{t}, x_{0}\right):=\frac{\sqrt{\alpha_{t-1}}\left(1-\alpha_{t}\right)}{1-\bar{\alpha}_{t}} x_{0}+\frac{\sqrt{\alpha_{t}}\left(1-\bar{\alpha}_{t-1}\right)}{1-\bar{\alpha}_{t}} x_{t}, \tilde{\beta}_{t}:= 1 α ¯ t 1 1 α ¯ t ( 1 α t ) 1 α ¯ t 1 1 α ¯ t 1 α t (1- bar(alpha)_(t-1))/(1- bar(alpha)_(t))(1-alpha_(t))\frac{1-\bar{\alpha}_{t-1}}{1-\bar{\alpha}_{t}}\left(1-\alpha_{t}\right) ,以及 α ¯ t := s = 1 t α s α ¯ t := s = 1 t α s bar(alpha)_(t):=prod_(s=1)^(t)alpha_(s)\bar{\alpha}_{t}:=\prod_{s=1}^{t} \alpha_{s}
Note that the above-defined forward diffusion formulation has no learnable parameters, and the reverse diffusion step cannot be applied due to having no access to x 0 x 0 x_(0)x_{0} in the inference stage. Therefore, we further introduce the learnable reverse denoise process for estimating x 0 x 0 x_(0)x_{0} from x T x T x_(T)x_{T}.
注意上述定義的前向擴散公式沒有可學習的參數,由於在推斷階段無法訪問 x 0 x 0 x_(0)x_{0} ,因此反向擴散步驟無法應用。因此,我們進一步引入可學習的反向去噪過程,從 x T x T x_(T)x_{T} 估計 x 0 x 0 x_(0)x_{0}
The reverse denoise process. The DDPM is trained to reverse the process in Equation 2 by learning the denoise network f θ f θ f_(theta)f_{\theta} in the reverse process. Specifically, the denoise network estimates f θ ( x t , t ) f θ x t , t f_(theta)(x_(t),t)f_{\theta}\left(x_{t}, t\right) to replace x 0 x 0 x_(0)x_{0} in Equation 4 Note that f θ ( x t , t ) f θ x t , t f_(theta)(x_(t),t)f_{\theta}\left(x_{t}, t\right) can predict the Gaussian noise ε ε epsi\varepsilon or x 0 x 0 x_(0)x_{0}. They deterministicly correspond to each other with Equation 3
反降噪流程。DDPM 經過訓練以反轉方程式 2 中的流程,透過學習反轉過程中的降噪網絡 f θ f θ f_(theta)f_{\theta} 。具體而言,降噪網絡估計 f θ ( x t , t ) f θ x t , t f_(theta)(x_(t),t)f_{\theta}\left(x_{t}, t\right) 以替換方程式 4 中的 x 0 x 0 x_(0)x_{0} 。注意 f θ ( x t , t ) f θ x t , t f_(theta)(x_(t),t)f_{\theta}\left(x_{t}, t\right) 可以預測高斯噪聲 ε ε epsi\varepsilon x 0 x 0 x_(0)x_{0} 。它們與方程式 3 確定性對應。
p θ ( x t 1 x t ) = q ( x t 1 x t , f θ ( x t , t ) ) = N ( x t 1 ; μ θ ( x t , t ) , θ ( x t , t ) ) μ θ ( x t , t ) = μ t ~ ( x t , x 0 ) , θ ( x t , t ) = β ~ t I p θ x t 1 x t = q x t 1 x t , f θ x t , t = N x t 1 ; μ θ x t , t , θ x t , t μ θ x t , t = μ t ~ x t , x 0 , θ x t , t = β ~ t I {:[p_(theta)(x_(t-1)∣x_(t))=q(x_(t-1)∣x_(t),f_(theta)(x_(t),t))],[=N(x_(t-1);mu_(theta)(x_(t),t),sum_(theta)(x_(t),t))],[mu_(theta)(x_(t),t)= tilde(mu_(t))(x_(t),x_(0))","sum_(theta)(x_(t),t)= tilde(beta)_(t)I]:}\begin{aligned} p_{\theta}\left(x_{t-1} \mid x_{t}\right) & =q\left(x_{t-1} \mid x_{t}, f_{\theta}\left(x_{t}, t\right)\right) \\ & =N\left(x_{t-1} ; \mu_{\theta}\left(x_{t}, t\right), \sum_{\theta}\left(x_{t}, t\right)\right) \\ \mu_{\theta}\left(x_{t}, t\right) & =\tilde{\mu_{t}}\left(x_{t}, x_{0}\right), \sum_{\theta}\left(x_{t}, t\right)=\tilde{\beta}_{t} I \end{aligned}
Similarly, the mean and variance in the reverse Gaussian distribution 5 can be determined by replacing x 0 x 0 x_(0)x_{0} in μ ~ t ( x t , x 0 ) μ ~ t x t , x 0 tilde(mu)_(t)(x_(t),x_(0))\tilde{\mu}_{t}\left(x_{t}, x_{0}\right) and β ~ t β ~ t tilde(beta)_(t)\tilde{\beta}_{t} with the learned x ^ 0 x ^ 0 hat(x)_(0)\hat{x}_{0}
類似地,在反高斯分佈 5 中,均數和變異數可以通過將 μ ~ t ( x t , x 0 ) μ ~ t x t , x 0 tilde(mu)_(t)(x_(t),x_(0))\tilde{\mu}_{t}\left(x_{t}, x_{0}\right) β ~ t β ~ t tilde(beta)_(t)\tilde{\beta}_{t} 中的 x 0 x 0 x_(0)x_{0} 替換為學習到的 x ^ 0 x ^ 0 hat(x)_(0)\hat{x}_{0} 來確定
Training objective and sampling process. As mentioned above, f θ ( x t , t ) f θ x t , t f_(theta)(x_(t),t)f_{\theta}\left(x_{t}, t\right) is trained to approach the Gaussian noise ε ε epsi\varepsilon. Thus the final training objective is:
訓練目標與抽樣過程。如上所述, f θ ( x t , t ) f θ x t , t f_(theta)(x_(t),t)f_{\theta}\left(x_{t}, t\right) 訓練以接近高斯噪聲 ε ε epsi\varepsilon 。因此最終訓練目標為:
L = E t , x 0 , ε ε f θ ( x t , t ) 1 L = E t , x 0 , ε ε f θ x t , t 1 L=E_(t,x_(0),epsi)||epsi-f_(theta)(x_(t),t)||_(1)L=E_{t, x_{0}, \varepsilon}\left\|\varepsilon-f_{\theta}\left(x_{t}, t\right)\right\|_{1}
The sampling process in the inference stage is done by running the reverse process. Starting from a pure Gaussian noise x T x T x_(T)x_{T}, we iteratively apply the reverse denoise transition p θ ( x t 1 x t ) T p θ x t 1 x t T p_(theta)(x_(t-1)∣x_(t))Tp_{\theta}\left(x_{t-1} \mid x_{t}\right) \mathrm{T} times, and finally get the clear output x 0 x 0 x_(0)x_{0}.
推論階段的抽樣過程通過運行逆向過程來完成。從純高斯噪聲 x T x T x_(T)x_{T} 開始,我們迭代性地應用逆向去噪轉換 p θ ( x t 1 x t ) T p θ x t 1 x t T p_(theta)(x_(t-1)∣x_(t))Tp_{\theta}\left(x_{t-1} \mid x_{t}\right) \mathrm{T} 次,最終得到清晰的輸出 x 0 x 0 x_(0)x_{0}

IV. Methodology  第四. 方法論

The overview structure of our method is presented in Fig. 4 We combine the ASM and DDPM for image dehazing. DehazeDDPM works in two stages. For a hazy image I I II, the former stage firstly outputs the transmission map trmap, pseudo haze-free image J J JJ, and atmospheric light A A AA, following the formulation of the ASM. Then, for the latter stage, the learned trmap and J J JJ are integrated into the DDPM to pull the distribution of hazy data closer to that of clear data and endow DehazeDDPM with fog-aware ability.
我們的方法概結構示於圖 4。我們結合 ASM 和 DDPM 進行圖像去霧。DehazeDDPM 分兩階段運作。對於模糊圖像 I I II ,前階段首先根據 ASM 公式輸出傳播圖 trmap、假設去霧圖 J J JJ 和氣象光 A A AA 。然後,在後階段,學習到的 trmap 和 J J JJ 被整合到 DDPM 中,將模糊數據的分布拉攏近清晰數據,並使 DehazeDDPM 具備霧感知能力。

A. Physical modelling.  A. 物理建模。

The outline of our physics-based network is shown in the left part of Fig. 4 It decomposes the input image I I II into the trmap, J J JJ, and A A AA, governed by the underlying physics. To tease out these components more reliably, we reconstruct the hazy image I r I r I^(r)I^{r} with the estimated J , trmap J , trmap J,trmapJ, \operatorname{trmap}, and A A AA via the formulation of ASM.
我們基於物理的網絡結構示於圖 4 的左側。它將輸入圖像 I I II 分解為 trmap、 J J JJ A A AA ,這些都受基礎物理學的規律所控制。為了更可靠地提取這些組成部分,我們使用 ASM 公式的計算結果重建模糊圖像 I r I r I^(r)I^{r} ,並通過估計 J , trmap J , trmap J,trmapJ, \operatorname{trmap} A A AA 進行。
I r ( x ) = J ( x ) trmap ( x ) + A ( 1 trmap ( x ) ) . I r ( x ) = J ( x ) trmap ( x ) + A ( 1 trmap ( x ) ) . I^(r)(x)=J(x)trmap(x)+A(1-trmap(x)).I^{r}(x)=J(x) \operatorname{trmap}(x)+A(1-\operatorname{trmap}(x)) .