Even though hyperplastic and tubular adenomas, which are benign, are the most common polyps found in patients, if they remain undetected and untreated over a long period, they can turn malignant [2,3][2,3]. 尽管良性的增生性腺瘤和管状腺瘤是患者中最常见的息肉,但如果长期未被发现和治疗,它们就会转变为恶性 [2,3][2,3] 。
To date, the standard method for detecting and diagnosing colorectal polyps has been through colonoscopy. Colonoscopy is a multistep process in which patients are first provided with a specific solution to consume as part of the bowel preparation phase. The bowel is prepared and cleaned to ensure that there are no obstructions, such as feces or other objects, that may affect the colonoscopy procedure. Next, the examination phase takes place, in which a colonoscopy that is equipped with light and a camera is inserted into the patient’s rectum [7]. 迄今为止,检测和诊断大肠息肉的标准方法一直是结肠镜检查。结肠镜检查是一个多步骤的过程,在肠道准备阶段,患者首先要服用一种特定的溶液。肠道经过准备和清洁,以确保没有可能影响结肠镜检查过程的障碍物,如粪便或其他物体。接下来是检查阶段,将装有光源和摄像头的结肠镜插入患者的直肠[7]。
From a machine learning perspective, colorectal polyp detection can be defined as the process of training a machine learning model to learn a set of features representing a polyp from a given image or live video stream. In some cases, those models are further fine-tuned, and a classification component is added to distinguish different classes of polyps. 从机器学习的角度来看,结直肠息肉检测可定义为对机器学习模型进行训练的过程,以便从给定图像或实时视频流中学习一组代表息肉的特征。在某些情况下,这些模型会进一步微调,并添加分类组件以区分不同类别的息肉。
Although there are multiple ways to build a colorectal polyp detection model, common preliminaries are usually followed to complete an end-to-end detection system. At first, a dataset consisting of either static photos or video frames is required to train the model. A typical dataset would be a combination of polyp images and either segmentation masks, bounding boxes coordinates, or both. Next, the samples are pre-processed and augmented using various techniques such as flips, random crops, and rotations. Augmentation is often needed because the current public datasets are small, and the samples do not cover all the possible scenarios, like variation in illuminations, the visual field and polyp sizes. 尽管建立大肠息肉检测模型的方法有多种,但要完成一个端到端的检测系统,通常需要遵循一些共同的前期步骤。首先,需要一个由静态照片或视频帧组成的数据集来训练模型。典型的数据集是息肉图像与分割掩膜、边界框坐标或两者的组合。然后,使用翻转、随机裁剪和旋转等各种技术对样本进行预处理和增强。由于目前的公共数据集规模较小,样本无法涵盖所有可能的情况,如光照、视野和息肉大小的变化,因此通常需要对样本进行扩增。
Colorectal polyp detection models are built with one of three objectives: (1) polyp segmentation, (2) polyp detection, and (3) polyp classification. Some models are trained to carry out multiple objectives simultaneously, such as detection or segmentation, followed by classification. When a model is trained to detect polyps, training images with their corresponding labels are fed to a network which learns how to localize a polyp in a given image. The label format used for detection is usually a combination of four coordinates representing the bounding box, a variable representing whether a polyp is present, and sometimes, the class of the polyp is added, provided that such information exists in the dataset. In contrast, segmentation models are trained to draw an image segmentation mask around a detected polyp. Segmentation models are trained on the colonoscopy images, and the corresponding image masks are used as the input labels. Finally, classification models are trained to classify polyps based on the group a polyp belongs to without identifying the location. In Figure 1, we illustrate the three different approaches. 大肠息肉检测模型有三个目标:(1) 息肉分割;(2) 息肉检测;(3) 息肉分类。有些模型经过训练可同时实现多个目标,如检测或分割,然后进行分类。当训练一个模型来检测息肉时,训练图像及其相应的标签会被输入一个网络,该网络会学习如何在给定图像中定位息肉。用于检测的标签格式通常由代表边界框的四个坐标、代表是否存在息肉的变量以及息肉的类别(如果数据集中存在此类信息)组成。相比之下,分割模型经过训练后会在检测到的息肉周围绘制图像分割掩膜。分割模型在结肠镜图像上进行训练,相应的图像掩膜被用作输入标签。最后,对分类模型进行训练,根据息肉所属的组别对息肉进行分类,而不确定息肉的位置。图 1 展示了这三种不同的方法。
Regardless of the desired approach, the pipeline of building an automated polyp detection system using machine learning is always identical, with the steps sample normalization, augmentation, model fine-tuning, and evaluation being the standard first steps. This process is summarized in Figure 2. 无论采用哪种方法,利用机器学习构建息肉自动检测系统的流程始终是相同的,样本归一化、增强、模型微调和评估是标准的第一步。图 2 总结了这一过程。
Figure 1. A comparison of the three common approaches to building colorectal polyp detectors. Colonoscopy images are taken from a public dataset known as Kvasir-SEG. 图 1.构建结直肠息肉检测器的三种常用方法比较。结肠镜检查图像取自名为 Kvasir-SEG 的公共数据集。
Figure 2. The standard steps required to build automatic polyp detection models. Samples in this illustration are taken from Kvasir-SEG. 图 2.建立息肉自动检测模型所需的标准步骤。图中样本来自 Kvasir-SEG。
This article surveys the most recent trends and techniques for building automatic polyp detection models. The article covers various aspects, such as the existing public datasets available for training and testing, commonly reported challenges in the literature and the preferred deep learning architectures to build polyp detection models. Furthermore, we discuss and analyze the findings of the surveyed literature, and based on the analysis, we provide recommendations for future research. 本文探讨了建立息肉自动检测模型的最新趋势和技术。文章涉及多个方面,如现有的可用于训练和测试的公共数据集、文献中普遍报道的挑战以及构建息肉检测模型的首选深度学习架构。此外,我们还讨论和分析了所调查文献的结论,并根据分析结果为未来研究提出了建议。
The remainder of this paper is structured as follows: Section 2 presents our survey strategy and the criteria that make an article suitable for inclusion in our survey, Section 3 summarizes our contribution, Section 4 presents the challenges, Section 5 presents the datasets, Section 6 describes the standard architectures, Section 7 presents the performance metrics, Section 8 is a review of recently proposed work, Section 9 presents the discussion and analysis, and finally Section 10 concludes the paper. 本文其余部分的结构如下:第 2 节介绍了我们的调查策略以及文章是否适合纳入调查的标准,第 3 节总结了我们的贡献,第 4 节介绍了挑战,第 5 节介绍了数据集,第 6 节描述了标准架构,第 7 节介绍了性能指标,第 8 节回顾了近期提出的工作,第 9 节进行了讨论和分析,最后第 10 节对本文进行了总结。
2. Survey Strategy 2.调查策略
This survey focuses on the most recent methods and trends for detecting polyps using machine learning techniques. A total of 20 recently published articles were reviewed in this paper, and they were selected based on several conditions. First, we ensure that the reviewed article is relevant to the task of polyp detection by assessing the introduction, problem statement, proposed method and the reported result. Second, we look into the proposed method’s novelty and uniqueness and how it attempts to bridge some of the existing gaps. The third criterion we consider is the publication date, as we mainly focus on presenting the most current methods. Next, we look into the impact of the reviewed article based on the number of citations and mentions. In addition to the reviewed articles, our presentation of the challenges and common architectures is also based on an examination of the existing literature. During the literature analysis, we were interested in discovering the limitations that authors have reported in their work, which have been presented as existing challenges in this survey. Moreover, the polyp detection architectures reviewed in this survey are based on how frequently a specific design is mentioned in the reviewed work. 本调查主要关注利用机器学习技术检测息肉的最新方法和趋势。本文共综述了 20 篇近期发表的文章,这些文章是根据几个条件筛选出来的。首先,我们通过评估文章的引言、问题陈述、提出的方法和报告的结果,确保所评文章与息肉检测任务相关。其次,我们会考察所提出方法的新颖性和独特性,以及它是如何试图弥补一些现有差距的。我们考虑的第三个标准是发表日期,因为我们主要侧重于介绍最新的方法。其次,我们会根据引用和提及的次数来考察综述文章的影响力。除了综述文章,我们对挑战和常见架构的介绍也是基于对现有文献的研究。在文献分析过程中,我们有兴趣发现作者在其工作中报告的局限性,这些局限性在本调查中被列为现有挑战。此外,本调查中评述的息肉检测架构是基于特定设计在评述工作中被提及的频率。
The articles were collected from journals from various databases, namely Google Scholar, ScienceDirect, the National Institute of Health (NIH) database, Nature, and SpringerLink. 这些文章来自各种数据库中的期刊,即 Google Scholar、ScienceDirect、National Institute of Health (NIH) 数据库、Nature 和 SpringerLink。
Some of the primary keywords used to search for relevant articles were “colorectal polyp detection”, “colorectal polyp segmentation”, and “colorectal polyp classification”. As for the datasets reviewed in this survey, we analyze the reviewed articles, and a list of standard datasets is made based on how many times a specific dataset is mentioned. 用于搜索相关文章的一些主要关键词是 "结直肠息肉检测"、"结直肠息肉分割 "和 "结直肠息肉分类"。至于本次调查中查阅的数据集,我们对查阅的文章进行了分析,并根据特定数据集被提及的次数列出了标准数据集列表。
3. Contributions 3.捐款
Our contribution to the body of literature can be summarized in the following points: 我们对文献的贡献可归纳为以下几点:
We present a review of several recently proposed colorectal polyp detection methods. 我们对最近提出的几种大肠息肉检测方法进行了综述。
We review and analyze the various architectures available to build colorectal polyp detectors. 我们回顾并分析了用于构建大肠息肉检测器的各种架构。
Based on the current work, we present several common challenges researchers face when building colorectal polyp detection algorithms. 在当前工作的基础上,我们提出了研究人员在构建结直肠息肉检测算法时面临的几个共同挑战。
We present an analysis of the existing benchmark colorectal polyps datasets. 我们对现有的基准大肠息肉数据集进行了分析。
We analyze the findings in the reviewed literature and investigate the current gaps and trends. 我们分析了所查阅文献中的结论,并调查了当前的差距和趋势。
4. Common Challenges 4.共同挑战
Building a colorectal polyp detection model to work on media fed from colonoscopies comes with several obstacles and challenges. Those challenges can be classified into intrinsic issues, i.e., limited to the detection model, and extrinsic issues, i.e., caused by other external factors. Some examples of intrinsic issues include the demand for high computational power to detect as many polyps as possible and overfitting caused by data disparity. In contrast, extrinsic issues include poor bowel preparation and reflection caused by the light emitted from the colonoscopy, which could confuse the model. In this section, we discuss some of the common challenges based on the current work of literature. 建立结肠直肠息肉检测模型,以便在结肠镜检查提供的介质上工作,会遇到一些障碍和挑战。这些挑战可分为内在问题(即仅限于检测模型)和外在问题(即由其他外部因素造成)。内在问题的一些例子包括检测尽可能多的息肉需要很高的计算能力,以及数据差异导致的过度拟合。相比之下,外在问题包括肠道准备不充分和结肠镜检查发出的光线造成的反射,这可能会混淆模型。在本节中,我们将根据现有文献讨论一些常见的挑战。
4.1. Data Disparity 4.1.数据差异
The lack of enough diverse samples is a significant issue in most deep-learning applications. As polyps come in different shapes and sizes, the lack of enough diverse samples to cover most of the possible variations is a significant issue, as reported in [8,9]. This issue 在大多数深度学习应用中,缺乏足够的多样化样本是一个重要问题。如 [8,9] 所述,由于息肉的形状和大小各不相同,因此缺乏足够的多样化样本来覆盖大多数可能的变化是一个重要问题。这个问题
can be considered external as it is linked to the quality of the training samples. Although this issue can be addressed through various augmentation techniques or the introduction of artificially generated samples, misidentifying flat, depressed polyps still occur due to the lack of samples. 可将其视为外部因素,因为它与训练样本的质量有关。虽然可以通过各种增强技术或引入人工生成的样本来解决这一问题,但由于样本的缺乏,对扁平、凹陷息肉的错误识别仍然会发生。
4.2. Poor Bowel Preparation 4.2.肠道准备不充分
Proper bowel preparation is an essential preliminary step that, if done incorrectly, can increase misidentifications, as the model might identify non-polyp objects such as feces or normal blood vessels as polyps. Studies like [10,11] have reported that their methods misidentified polyp-like objects as being polyps, accounting for most of the detected false positives. Poor bowel preparation is an external factor that depends on the physician’s experience and the preparation process rather than fine-tuning the model or the introduction of new samples. 适当的肠道准备是必不可少的前期步骤,如果操作不当,可能会增加误判率,因为模型可能会将粪便或正常血管等非息肉物体识别为息肉。研究报告[10,11]等指出,他们的方法将类似息肉的物体误认为是息肉,占检测到的假阳性的大部分。肠道准备不充分是一个外部因素,取决于医生的经验和准备过程,而不是模型的微调或新样本的引入。
4.3. High Demand for Computational Resources 4.3.对计算资源的高需求
When working with live colonoscopy video streams, the efficiency of a model is a priority. However, optimizing the detection speed of a model usually means a drop in performance and an increase in inaccurate predictions. In [8,12], although promising performances were reported, the authors admitted that their proposed method demands sophisticated, high-end computational resources. The design of a model and the way it works is an internal problem that can be solved by bridging the gap between performance and speed. 在处理实时结肠镜检查视频流时,模型的效率是首要任务。然而,优化模型的检测速度通常意味着性能的下降和不准确预测的增加。在文献[8,12]中,虽然报告了令人鼓舞的性能,但作者承认他们提出的方法需要复杂、高端的计算资源。模型的设计及其工作方式是一个内部问题,可以通过缩小性能与速度之间的差距来解决。
4.4. Colonoscopy Light Reflection 4.4.结肠镜检查光反射
As colonoscopies are equipped with light to ease navigation during the procedure, the reflection caused by the light source can affect the model performance during the detection phase. The reflection of white light can either hide the polyps or create a polyplike anomaly in the colon. The researchers in [9,10][9,10] reported poor performance due to white light reflection. This issue is considered external, and can be solved through methods like the one introduced in [13]. 由于结肠镜配有光源,可在手术过程中方便导航,因此光源造成的反射会影响检测阶段的模型性能。白光的反射会隐藏息肉或在结肠中形成类似息肉的异常。 [9,10][9,10] 中的研究人员报告说,白光反射会导致性能不佳。这个问题被认为是外部问题,可以通过类似 [13] 中介绍的方法来解决。
4.5. Colonoscopy Viewpoints 4.5.结肠镜检查观点
The viewpoint of a colonoscopy refers to the field of vision of the camera attached to the colonoscope. In some cases, polyps may exist on the edges of a video frame, making them hard to detect even with the naked eye. In studies like [13,14], the researchers reported that one of the main challenges they faced was misidentifying actual polyps as non-polyp regions because the lesions were near the boundary of the video frame. Since this issue is related to the field of vision of the equipment itself, it can be classified as an external factor. 结肠镜检查的视角是指连接在结肠镜上的摄像头的视野。在某些情况下,息肉可能存在于视频帧的边缘,即使肉眼也很难发现。在 [13,14] 等研究中,研究人员报告说,他们面临的主要挑战之一是将实际息肉误认为非息肉区域,因为病变部位靠近视频帧的边界。由于这一问题与设备本身的视野有关,因此可将其归类为外部因素。
4.6. Pre-Training on Samples from an Irrelevant Domain 4.6.对不相关领域的样本进行预训练
Transfer learning is considered the go-to solution in most deep-learning applications. The preference for using pre-trained models derives from the fact that such models are initially trained to handle much more complicated tasks, and the model’s hyperparameters are tweaked to solve a much simpler problem. In most of the existing polyp detection methods, transfer learning has been the preferred method for solving the problem; however, the original domain in which a model is pre-trained is primarily irrelevant to colorectal polyp detection. This discrepancy between colonoscopy images and the pre-training domain has resulted in poor performance, as reported in [15,16]. 迁移学习被认为是大多数深度学习应用的首选解决方案。使用预训练模型的偏好源于这样一个事实,即此类模型最初是为处理复杂得多的任务而训练的,而模型的超参数则是为解决简单得多的问题而调整的。在大多数现有的息肉检测方法中,迁移学习一直是解决问题的首选方法;然而,预训练模型的原始领域主要与结直肠息肉检测无关。结肠镜图像与预训练域之间的这种差异导致了性能不佳,如文献[15,16]所述。
5. Benchmark Datasets 5.基准数据集
Any computer vision problem requires a high-quality, diverse training dataset. The types of data available for colorectal polyp detectors include both live videos and static photos extracted from live video streams. This section analyzes the standard datasets used to build polyp detection models. In Table 2, a summary of all the datasets is presented and in Figure 3 a diagram describing the total number of samples for each dataset is illustrated. 任何计算机视觉问题都需要高质量、多样化的训练数据集。大肠息肉检测器可用的数据类型包括实时视频和从实时视频流中提取的静态照片。本节将分析用于建立息肉检测模型的标准数据集。表 2 列出了所有数据集的摘要,图 3 则说明了每个数据集的样本总数。
Table 2. A summary of the existing benchmark colonoscopy datasets. 表 2.现有结肠镜检查基准数据集摘要。
Figure 3. Total number of samples in every dataset. 图 3.每个数据集中的样本总数。
5.1. Kvasir-SEG 5.1.Kvasir-SEG
The Kvasir-SEG dataset was introduced in 2020 by [17]. The Kvasir-SEG dataset consists of 1000 polyp images with their corresponding masks. The dataset samples were taken in unfiltered, real-life settings where the polyps were captured from different angles, and the samples are of different resolutions ranging between 332 xx487332 \times 487 to 1920 xx10721920 \times 1072 pixels. Kvasir-SEG 数据集由文献 [17] 于 2020 年提出。Kvasir-SEG 数据集由 1000 张息肉图像及其相应的掩膜组成。数据集样本是在未经过滤的真实环境中拍摄的,息肉从不同角度拍摄,样本分辨率从 332 xx487332 \times 487 到 1920 xx10721920 \times 1072 像素不等。
5.2. ETIS-Larib 5.2.ETIS-Larib
ETIS-Larib was introduced in 2014 by [18], comprising 196 samples. Unlike KvasirSEG, the resolution of the samples in the ETIS-Larib dataset is fixed at 1225 xx9961225 \times 996 pixels. The samples are all captured in unfiltered settings, and several images are blurry. The ETIS-Larib 于 2014 年由 [18] 推出,包含 196 个样本。与 KvasirSEG 不同,ETIS-Larib 数据集中样本的分辨率固定为 1225 xx9961225 \times 996 像素。样本都是在未经过滤的情况下采集的,有几幅图像比较模糊。图像
dataset is mainly used for testing due to the limited number of samples; however, some studies, such as [19,20], have used the ETIS-Larib dataset for training. 由于样本数量有限,ETIS-Larib 数据集主要用于测试;但也有一些研究,如 [19,20] 等,使用 ETIS-Larib 数据集进行训练。
5.3. CVC-ClinicDB 5.3.CVC-ClinicDB
CVC-ClinicDB was introduced by [21] in 2015, and is one of the most frequently used datasets. The dataset consists of 612 polyp static frames extracted from 31 colonoscopy sequences. The resolution of all the images is fixed at 384 xx288384 \times 288 pixels. The dataset has been widely used to test several segmentation methods, including the infamous U-NET [22]. CVC-ClinicDB 由文献[21]于 2015 年引入,是最常用的数据集之一。该数据集由从 31 个结肠镜检查序列中提取的 612 个息肉静态帧组成。所有图像的分辨率都固定为 384 xx288384 \times 288 像素。该数据集被广泛用于测试多种分割方法,包括臭名昭著的 U-NET [22]。
5.4. CVC-ColonDB 5.4.CVC-ColonDB
CVC-ColonDB is another frequently used dataset introduced by [23] in 2012. There are 300 static samples in this dataset, and each image has a size of 574 xx500574 \times 500 pixels. The dataset was constructed based on 15 short colonoscopy video frames. The samples in this dataset are unique in terms of the polyps’ sizes, variations and types. CVC-ColonDB 是 [23] 在 2012 年推出的另一个常用数据集。该数据集中有 300 个静态样本,每个图像的大小为 574 xx500574 \times 500 像素。该数据集基于 15 个结肠镜检查视频短片帧构建。该数据集中的样本在息肉大小、变化和类型方面都是独一无二的。
5.5. CVC-PolypHD 5.5.CVC-PolypHD
Introduced by [24] in 2017, CVC-PolypHD contains 56 high-resolution static samples, each sized at 1920 xx10801920 \times 1080 pixels. Each sample is accompanied by a ground truth annotation mask that locates the position of the polyp. This dataset is not used for training as frequently as CVC-ColonDB, CVC-ClinciDB and Kvasir-SEG. CVC-PolypHD 于 2017 年由文献 [24] 推出,包含 56 个高分辨率静态样本,每个样本的大小为 1920 xx10801920 \times 1080 像素。每个样本都附有一个地面实况注释掩码,用于定位息肉的位置。该数据集不像 CVC-ColonDB、CVC-ClinciDB 和 Kvasir-SEG 那样经常用于训练。
5.6. EndoTect 5.6.EndoTect
EndoTect was introduced as part of a challenge during the 25th International Conference on Pattern Recognition [25]. The dataset is relatively large, containing 110,079 static images and 373 videos of various resolutions. Out of the 110,079 images, 10,662 are labeled with the polyp classes, 99,417 images are not labeled, and 1000 samples are provided with their respective masks. The videos provided in this dataset amount to about 11.26 h and 1.1 million video frames. EndoTect 是作为第 25 届国际模式识别大会[25]期间的一项挑战赛的一部分推出的。该数据集相对较大,包含 110,079 幅静态图像和 373 段不同分辨率的视频。在 110,079 张图像中,10,662 张标有息肉类别,99,417 张没有标注,1000 个样本提供了各自的遮罩。该数据集中提供的视频长约 11.26 小时,视频帧数约 110 万。
This dataset was proposed by [26] in 2022, and it is relatively medium in terms of size, with about 1200 static images. The dataset is already segregated, with 1000 samples detected for training and the remaining 200 samples for testing. The samples in this dataset are divided by class into neo-plastic and non-neoplastic, making it a suitable dataset for classification problems. 该数据集由文献[26]于 2022 年提出,规模相对中等,约有 1200 张静态图像。该数据集已被隔离,其中 1000 个样本用于训练检测,其余 200 个样本用于测试检测。该数据集中的样本按类别分为新发肿瘤和非新发肿瘤,因此适合用于分类问题。
5.8. NeoPolyp 5.8.新多晶硅
Neopolyp was introduced by [27] in 2020, and is an extension of BKAI-IGH NeoPolypSmall. The dataset consists of 7500 labeled polyp images with fire-grained annotations. The images are of various resolutions and polyp sizes and shapes. Neopolyp 由文献 [27] 于 2020 年提出,是 BKAI-IGH NeoPolypSmall 的扩展。该数据集由 7500 张带有火粒注释的标记息肉图像组成。这些图像具有不同的分辨率、息肉大小和形状。
5.9. PolypGen 5.9.多毛基因
PolypGen [28-30] is a dataset used for segmentation and classification. Introduced in 2021, PolypGen was introduced as part of the 3rd International workshop and challenge on Endoscopic Computer Vision. The dataset consists of 8037 samples with 3762 positive samples (with polyps) and 4275 negative samples (without polyps). The samples are diverse in terms of resolution and quality. PolypGen [28-30] 是一个用于分割和分类的数据集。PolypGen 于 2021 年推出,是第三届内窥镜计算机视觉国际研讨会和挑战赛的一部分。该数据集包含 8037 个样本,其中 3762 个阳性样本(有息肉)和 4275 个阴性样本(无息肉)。样本的分辨率和质量各不相同。
5.10. EndoScene 5.10.内窥镜
EndoScene was introduced by [24] in 2017, and it is a result of merging two datasets, CVC-ColonDB and CVC-ClinicDB. The dataset consists of 912 samples, 300 from CVCColonDB and 612 from CVC-ClinicDB. The samples share the same characteristics as CVC-ColonDB and CVC-ClinicDB. EndoScene 由文献[24]于 2017 年提出,它是合并两个数据集(CVC-ColonDB 和 CVC-ClinicDB)的结果。该数据集由 912 个样本组成,其中 300 个来自 CVCColonDB,612 个来自 CVC-ClinicDB。这些样本与 CVC-ColonDB 和 CVC-ClinicDB 具有相同的特征。
5.11. S.U.N. Colonoscopy Video Database 5.11.S.U.N. 结肠镜检查视频数据库
The Showa University and Nagoya University Database (SUN) was introduced by [31] in 2020. The dataset consists of 49,136 samples of 100 different polyps. In addition, the dataset contains 109,554 frames of non-polyp areas from live stream videos. The 100 polyps consist of the following: 82 Low-grade adenomas, seven Hyperplastic polyps, four Sessile serrated lesions, four High-grade adenomas, two traditional serrated adenomas and one invasive carcinoma. The polyps are of different shapes and sizes. The resolutions of the samples are diverse. 昭和大学和名古屋大学数据库(SUN)由文献[31]于 2020 年引入。该数据集包含 100 个不同息肉的 49 136 个样本。此外,该数据集还包含来自实时流视频的 109 554 帧非息肉区域。这 100 个息肉包括82 个低级别腺瘤、7 个增生性息肉、4 个无柄锯齿状病变、4 个高级别腺瘤、2 个传统锯齿状腺瘤和 1 个浸润性癌。息肉的形状和大小各不相同。样本的分辨率也各不相同。
5.12. EndoTest 5.12.内脏测试
EndoTest was introduced by [32] in 2022, comprising two subsets. The first subset consists of 48 short videos with 22,856 manually annotated areas with polyps presence. Of the 22,856 samples, approximately 12,160 are positive, while the remaining 10,696 are negative. The second subset contains ten full-length colonoscopy videos with 230,898 annotated frames. Of the 230,989 frames, about 15%15 \% are positive samples, while the remaining are negative samples. The dataset is diverse in resolution and image quality. EndoTest 由文献[32]于 2022 年引入,包括两个子集。第一个子集由 48 个短视频组成,包含 22856 个人工标注的息肉区域。在 22856 个样本中,约有 12160 个为阳性样本,其余 10696 个为阴性样本。第二个子集包含 10 个完整的结肠镜检查视频,共 230,898 个注释帧。在 230,989 个帧中,约 15%15 \% 个为阳性样本,其余为阴性样本。数据集的分辨率和图像质量各不相同。
ASU-Mayo Clinic is a video dataset introduced in 2017, and it is commonly used to build and test real-time polyp detection systems [33]. The dataset is divided into 38 video sequences and further segregated into 20 videos for training and 18 for testing. The training videos come with ground-truth masks. The resolution of the videos is not stated in the author’s work. ASU-Mayo Clinic 是 2017 年推出的一个视频数据集,常用于构建和测试实时息肉检测系统[33]。该数据集分为 38 个视频序列,并进一步分为 20 个训练视频和 18 个测试视频。训练视频带有地面实况掩膜。作者的作品中没有说明视频的分辨率。
5.14. Colonoscopic Dataset 5.14.结肠镜数据集
The colonoscopic dataset is another video dataset introduced in 2016 [34]. The dataset comprises 76 short colonoscopy videos, each sized 768 xx576768 \times 576 pixels and is mainly used in polyp classification problems. The polyps in this dataset are classified into three classes: (1) Hyperplastic, (2) Adenoma and (3) Serrated. 结肠镜数据集是 2016 年推出的另一个视频数据集[34]。该数据集由 76 个结肠镜检查短视频组成,每个视频的大小为 768 xx576768 \times 576 像素,主要用于息肉分类问题。该数据集中的息肉分为三类:(1)增生性息肉;(2)腺瘤;(3)锯齿状息肉。
5.15. CVC-ClinicVideoDB 5.15.CVC-ClinicVideoDB
CVC-ClinicVideoDB was first introduced in 2017 by [35], and it is a video dataset that consists of 18 video sequences with a size of 768 xx576768 \times 576. Each video sample is accompanied by a polyp location mask which identifies the location of a polyp in a video. The videos cover a variety of properties, including different sizes, shapes, and variations of polyps. CVC-ClinicVideoDB 由文献[35]于 2017 年首次提出,它是一个视频数据集,由 18 个大小为 768 xx576768 \times 576 的视频序列组成。每个视频样本都附有一个息肉位置掩码,用于识别视频中息肉的位置。这些视频涵盖了各种特性,包括息肉的不同大小、形状和变化。
6. Common Architectures 6.通用架构
Building an automated polyp detection model depends primarily on the main objective of the model. In some instances, multiple architectures are combined to perform a particular task. For example, a model could be trained to detect and classify polyps; therefore, combining segmentation and classification into one model could be the ideal solution. This section presents the common architectures and designs available for carrying out polyp segmentation, classification and detection. 建立息肉自动检测模型主要取决于模型的主要目标。在某些情况下,需要结合多种架构来执行特定任务。例如,可以训练一个模型来检测息肉并对其进行分类;因此,将分割和分类结合到一个模型中可能是理想的解决方案。本节将介绍用于息肉分割、分类和检测的常见架构和设计。
6.1. Segmentation Methods 6.1.分割方法
Segmentation algorithms are deep-learning models commonly used to match the pixels of an image with a given class [36]. In colorectal polyp detection, segmentation algorithms are commonly used to locate polyps using a pre-defined mask during the training process. 分割算法是一种深度学习模型,通常用于将图像像素与给定类别相匹配[36]。在结直肠息肉检测中,分割算法通常用于在训练过程中使用预定义的掩膜定位息肉。
6.1.1. U-NET 6.1.1.U-NET
U-Net was introduced in 2015, and since then, it has been one of the preferred segmentation networks in the medical imaging domain [22]. The network has an encoder, decoder and residual connections, and it consists of 23 convolutional layers, with each U-Net 于 2015 年推出,自此成为医学影像领域的首选分割网络之一[22]。该网络有编码器、解码器和残差连接,由 23 个卷积层组成,每个卷积层有
layer activated using the rectified linear unit (ReLU) function. U-NET has been used in relevant colorectal polyp detection studies [37,38]. 层使用整流线性单元(ReLU)函数激活。U-NET 已用于相关的结直肠息肉检测研究[37,38]。
6.1.2. SegNet 6.1.2.SegNet
SegNet is a semantic segmentation network that follows the encoder-decoder design and was introduced in 2015 [39]. Unlike U-Net, SegNet does not have a series of residual connections; however, it follows the encoder-decoder design and a pixel-wise classification layer. The encoder consists of 13 convolutional layers identical to the VGG16 design [40], and it downsamples the input image. The decoder consists of 13 convolutional layers; however, the pooling layers are replaced by upsampling layers to upsample the input image. Studies such as [41] have utilized SegNet to segment colonoscopy images. SegNet 是一种遵循编码器-解码器设计的语义分割网络,于 2015 年推出[39]。与 U-Net 不同,SegNet 没有一系列残差连接;但它采用了编码器-解码器设计和像素分类层。编码器由 13 个卷积层组成,与 VGG16 的设计相同[40],并对输入图像进行下采样。解码器由 13 个卷积层组成,但池化层被上采样层取代,以对输入图像进行上采样。诸如 [41] 等研究利用 SegNet 对结肠镜图像进行分割。
The fully convolutional network (FCN) is a segmentation network that comprises convolutional layers only [42]. According to the authors, FCN is flexible, since it does not require a specific input size, and the training speed is much faster because there are no dense layers in the network. The network has a downsampling path, upsampling path and several skip connections to recover any lost information during the downsampling process. In [43,44][43,44], the authors used FCNs to detect colorectal polyps during colonoscopy. 全卷积网络(FCN)是一种仅由卷积层组成的分割网络[42]。作者认为,全卷积网络非常灵活,因为它不需要特定的输入大小,而且由于网络中没有密集层,因此训练速度更快。该网络有一个下采样路径、一个上采样路径和几个跳过连接,以恢复下采样过程中丢失的信息。在 [43,44][43,44] 中,作者使用 FCN 在结肠镜检查中检测大肠息肉。
6.1.4. Pyramid Scene Parsing Network (PSPNet) 6.1.4.金字塔场景解析网络(PSPNet)
Pyramid Scene Parsing Network (PSPNet) was introduced in 2017 by [45]. The core concept behind the network’s design is that it uses a pyramid parsing module that produces more accurate global context information using different-region context aggregation. PSNet depends on a pre-trained convolutional neural network and dilated convolution to obtain a feature map. PSPNet usage in colorectal polyp detection is not as common as SegNet and U-Net; however, a recent study by [46] used the network and presented a comparison of its performance with F.C.N., SegNet and U-Net. 金字塔场景解析网络(Pyramid Scene Parsing Network,PSPNet)是由[45]于 2017 年提出的。该网络设计的核心理念是使用金字塔解析模块,利用不同区域的上下文聚合产生更准确的全局上下文信息。PSNet 依靠预先训练的卷积神经网络和扩张卷积来获得特征图。PSPNet 在结直肠息肉检测中的应用不如 SegNet 和 U-Net 普遍;不过,[46] 最近的一项研究使用了该网络,并对其与 F.C.N.、SegNet 和 U-Net 的性能进行了比较。
6.2. Object Detection Methods 6.2.物体检测方法
Object detection algorithms such as YOLO and R-CNN networks are commonly used to detect specific objects of interest from medical images, including colorectal polyps. A typical object detection algorithm would draw a bounding box around a detected object in a given scene [47]. 物体检测算法(如 YOLO 和 R-CNN 网络)通常用于检测医学图像中的特定相关物体,包括结直肠息肉。典型的物体检测算法会在给定场景中围绕检测到的物体画一个边界框[47]。
Region-based CNN is an object detection algorithm introduced in 2014 by [48]. RCNN uses high-capacity convolutional neural networks to localize objects of interest and independently extracts features from each region of interest for further processing. Several studies, such as [49-51], have proven that RCNN-based polyp detectors can effectively locate polyps of different shapes and sizes. 基于区域的 CNN 是 [48] 于 2014 年提出的一种物体检测算法。RCNN 使用大容量卷积神经网络定位感兴趣的物体,并独立提取每个感兴趣区域的特征进行进一步处理。一些研究(如文献[49-51])证明,基于 RCNN 的息肉检测器能有效定位不同形状和大小的息肉。
A faster version of RCNN was introduced by [52] in 2015, known as Faster R-CNN. This algorithm is similar to the original RCNN; however, it uses a region proposal network (RPN) that shares all the convolutional features with the detection network, making it an appropriate solution for real-time detection. Fast R-CNN was used to build several real-time polyp detection models, such as [53-55]. 2015 年,[52] 提出了更快版本的 RCNN,称为 Faster R-CNN。该算法与原始 RCNN 相似,但它使用的是区域建议网络(RPN),与检测网络共享所有卷积特征,因此适合实时检测。快速 R-CNN 被用于建立多个实时息肉检测模型,如 [53-55]。
Single-shot detector (SSD) was introduced in 2016 by [56], and it depends on a single deep neural network to detect an object. The SSD model discretizes the output space of bounding boxes into several boxes with different aspect ratios. After the bounding boxes are discretized, the network scales per feature map location. The location of an object of 单镜头检测器(SSD)是由 [56] 于 2016 年提出的,它依靠单个深度神经网络来检测物体。SSD 模型将边界框的输出空间离散化为几个具有不同长宽比的框。边界框离散化后,网络按特征图位置进行缩放。物体的位置
interest is predicted using several feature maps with different resolutions. SSD detectors were used in recently proposed colorectal polyp detection methods such as [57,58]. 使用不同分辨率的多个特征图来预测息肉。最近提出的大肠息肉检测方法(如 [57,58])中使用了 SSD 检测器。
6.2.4. You Only Look Once (YOLO) 6.2.4.你只看一次(YOLO)
The first version of You Only Look Once (YOLO) was introduced in 2016 by [59], and it is one of the more popular object detection algorithms. The original algorithm, YOLOv1, treats detection as a regression problem in which a single neural network predicts bounding boxes and class probabilities from an entire image in one iteration. The YOLO architecture has been the preferred go-to solution for real-time detection tasks, as it can process 45 frames per second. Over the years, the YOLO algorithm has been modified, and better versions such as YOLOv2, YOLOv3, YOLOv4 and YOLOv5 have been introduced. The current work [60-63] shows that the YOLO algorithm has been used more frequently than other detection algorithms. You Only Look Once(YOLO)的第一个版本于 2016 年由文献[59]提出,是比较流行的物体检测算法之一。最初的算法 YOLOv1 将检测视为一个回归问题,由一个神经网络在一次迭代中预测整个图像的边界框和类概率。YOLO 架构每秒可处理 45 幅图像,因此一直是实时检测任务的首选解决方案。多年来,YOLO 算法不断修改,并推出了更好的版本,如 YOLOv2、YOLOv3、YOLOv4 和 YOLOv5。目前的工作[60-63]表明,YOLO 算法的使用频率高于其他检测算法。
A pre-trained network is a model trained on a much larger dataset to solve a complex problem and then fine-tuned to work on a more straightforward task. This process of using existing knowledge to solve another problem is known as transfer learning. Several polyp classification methods have utilized pre-trained models as it is much faster to fine-tune and more accurate than classification methods built from scratch [64-66]. 预训练网络是在一个大得多的数据集上训练出来的模型,用于解决一个复杂的问题,然后再进行微调,以处理一个更简单的任务。这种利用现有知识解决另一个问题的过程被称为迁移学习。与从头开始建立的分类方法相比,预训练模型的微调速度更快,准确度更高,因此有几种多肉分类方法都采用了预训练模型 [64-66]。
6.3.1. VGG16 6.3.1.VGG16
The VGG16 network was introduced by [40], and it is a standard network used in many computer vision tasks, including colorectal polyp classification. The network consists of 16 convolutional layers and takes an input size of 224 xx224224 \times 224 pixels. The network was trained on a large dataset known as ImageNet, a collection of 14 million images belonging to 22,000 classes [67]. The methods proposed in both [68,69[68,69 ] used the VGG16 model to extract polyp features from colonoscopy images. VGG16 网络由文献 [40] 提出,是用于许多计算机视觉任务(包括结直肠息肉分类)的标准网络。该网络由 16 个卷积层组成,输入大小为 224 xx224224 \times 224 像素。该网络是在一个被称为 ImageNet 的大型数据集上训练的,该数据集收集了属于 22,000 个类别的 1,400 万张图像[67]。 [68,69[68,69 ]中提出的方法使用 VGG16 模型从结肠镜检查图像中提取息肉特征。
6.3.2. VGG19 6.3.2.VGG19
VGG19 is identical to the VGG16 network except for the number of layers and model size. VGG19 has 19 convolutional layers and accepts an input size of 224 xx224224 \times 224, similar to VGG16. The VGG19 was trained on the ImageNet dataset as well. The authors in [70,71] used VGG19 as the backbone feature extraction network to automatically classify polyp images. 除了层数和模型大小外,VGG19 与 VGG16 网络完全相同。VGG19 有 19 个卷积层,接受的输入大小为 224 xx224224 \times 224 ,与 VGG16 类似。VGG19 也是在 ImageNet 数据集上训练的。作者在 [70,71] 中使用 VGG19 作为主干特征提取网络,对息肉图像进行自动分类。
6.3.3. ResNet50 6.3.3.ResNet50
ResNet50 is another pre-trained network used widely in various computer vision tasks, requiring an input size of 227 xx227227 \times 227 pixels [72]. ResNet50 is 50 layers deep with 48 convolutional layers, one max-pooling layer, and a single average-pooling layer. ResNet50 implements skip connections to allow information to flow directly without passing through an activation function. ResNet50 was trained on the ImageNet dataset and can classify 1000 categories. ResNet50 was the backbone network to detect and segment colorectal polyps in studies such as [73,74][73,74]. ResNet50 是另一种广泛用于各种计算机视觉任务的预训练网络,要求输入大小为 227 xx227227 \times 227 像素 [72]。ResNet50 深度为 50 层,有 48 个卷积层、一个最大池化层和一个平均池化层。ResNet50 采用跳转连接,允许信息直接流动,无需通过激活函数。ResNet50 在 ImageNet 数据集上进行了训练,可分类 1000 个类别。在 [73,74][73,74] 等研究中,ResNet50 是检测和分割结直肠息肉的骨干网络。
6.3.4. Xception 6.3.4.异常
Xception consists of 71 depthwise-separable convolutional layers [75]. The network takes an input size of 299 xx299299 \times 299 pixels and was trained on ImageNet. Xception is not used as frequently as the VGG networks or ResNet; however, [76] attempted to analyze the classification performance of the Xception model with the Swish activation function on colorectal polyps. Xception 由 71 个深度分离的卷积层组成 [75]。该网络的输入大小为 299 xx299299 \times 299 像素,并在 ImageNet 上进行了训练。与 VGG 网络或 ResNet 相比,Xception 的使用频率较低;不过,[76] 尝试分析了 Xception 模型与 Swish 激活函数在结直肠息肉上的分类性能。
6.3.5. AlexNet 6.3.5.AlexNet
AlexNet is designed as a typical convolutional neural network with pooling and dense layers [77]. AlexNet is only eight layers deep and takes an input size of 227 xx227227 \times 227, and it AlexNet 是一种典型的卷积神经网络,具有池化和密集层 [77]。AlexNet 只有八层深度,输入大小为 227 xx227227 \times 227 ,并且
was trained on the ImageNet dataset to classify 1000 different objects. Although AlexNet is not the most common go-to network for classifying polyps, several recent studies have reported that AlexNet was the best-performing classification model [78-80]. 在 ImageNet 数据集上进行训练,对 1000 个不同对象进行分类。虽然 AlexNet 并不是最常用的息肉分类网络,但最近的几项研究报告表明,AlexNet 是表现最好的分类模型[78-80]。
6.3.6. GoogLeNet 6.3.6.GoogLeNet
GoogLeNet is a convolutional neural network with 22 layers [81]. The network takes images of size 224 xx224224 \times 224 by default. Similarly to the other network covered in this section, it was trained on the ImageNet dataset to classify more than 1000 images. GoogLeNet was used to identify and localize polyps in [82-84], but it consistently underperformed when used as a standalone network. Therefore, GoogLeNet is mainly used as part of ensemble-based architectures. GoogLeNet 是一个有 22 层的卷积神经网络 [81]。该网络默认采用大小为 224 xx224224 \times 224 的图像。与本节介绍的其他网络类似,该网络在 ImageNet 数据集上进行了训练,对 1000 多张图像进行了分类。GoogLeNet 在文献[82-84]中被用于识别和定位息肉,但在作为独立网络使用时,它的表现始终不佳。因此,GoogLeNet 主要用作基于集合的架构的一部分。
7. Performance Evaluation 7.绩效评估
Evaluating the performance of any deep learning application is a critical step, as it provides an image of how generalized a model is. For automated colorectal polyp detectors, various metrics can be used to evaluate a model’s performance. The available metrics are used in specific scenarios, depending on whether the task involves segmentation, localization or classification. 评估任何深度学习应用的性能都是一个关键步骤,因为它提供了一个模型泛化程度的图像。对于结直肠息肉自动检测器,可以使用各种指标来评估模型的性能。根据任务是否涉及分割、定位或分类,可用指标可用于特定场景。
7.1. Classification and Localization Metrics 7.1.分类和定位指标
The most common combination of metrics used to evaluate automatic polyp detectors and classifiers is the precision, recall and F1 scores. Precision is used to calculate the number of correctly predicted outputs over the total number of correct and incorrect predictions, and it can be represented as follows: 评估息肉自动检测器和分类器最常用的指标组合是精确度、召回率和 F1 分数。精确度用于计算正确预测的输出数与正确和错误预测总数之比,其表示方法如下:
TP is the number of true positives, and FP represents the number of false positives. The recall metric is another measurement used to assess the model’s ability to correctly detect positive samples. The recall formula is defined as follows: TP 代表真阳性样本的数量,FP 代表假阳性样本的数量。召回指标是另一种用于评估模型正确检测阳性样本能力的指标。召回率公式定义如下
In the equation above, TP remains the number of true positives, and FN is the number of false negatives. When both precision and recall formulae are combined, the F1 measurement can be obtained using the following formula: 在上式中,TP 是真阳性的数量,FN 是假阴性的数量。将精确度和召回率公式结合起来,就可以用下面的公式得出 F1 测量值:
The main objective when using the abovementioned formulae is to achieve a value that is either 1 or close to 1 , which indicates that the model performs well. The three formulae are usually used to evaluate how well a model can classify or detect polyps in a given image. 使用上述公式的主要目的是获得 1 或接近 1 的值,这表明模型性能良好。这三个公式通常用于评估模型对给定图像中息肉的分类或检测能力。
Another standard set of metrics among researchers is sensitivity and specificity measures. Sensitivity simply measures how well a model can detect positive instances. In contrast, specificity measures how well a model can identify true negatives. Both measurements can be represented as follows: 研究人员的另一套标准衡量标准是灵敏度和特异度。灵敏度简单地衡量模型检测正例的能力。与此相反,特异性衡量的是模型识别真实阴性的能力。这两种测量方法都可以表示如下:
Accuracy is another metric used to evaluate polyp detection methods; however, it is unreliable when the dataset is unbalanced; therefore, researchers prefer using the F1 score to assess performance. Accuracy is defined as follows: 准确度是用于评估息肉检测方法的另一个指标;然而,当数据集不平衡时,准确度并不可靠;因此,研究人员更倾向于使用 F1 分数来评估性能。准确度的定义如下
TP is the number of true positives, TN is the number of true negatives, FP is the number of false positives, and FN is the number of false negatives. A preferable metric that can be used to assess polyp detection models built using YOLO, RCNN and Fast-RCNN architectures is mean average precision (mAP). Mean average precision is a better metric for evaluating object detection algorithms, as it uses both recall and precision to assess a model. The mAP metric can be represented using the following formula: TP 是真阳性的数量,TN 是真阴性的数量,FP 是假阳性的数量,FN 是假阴性的数量。可用于评估使用 YOLO、RCNN 和 Fast-RCNN 架构建立的息肉检测模型的较好指标是平均精度(mAP)。平均精度是评估物体检测算法的更好指标,因为它同时使用召回率和精度来评估模型。mAP 指标可用以下公式表示:
AP_(k)\mathrm{AP}_{\mathrm{k}} represents the average precision of class kk, while nn is the number of classes AP_(k)\mathrm{AP}_{\mathrm{k}} 表示类别 kk 的平均精度,而 nn 是类别数
7.2. Segmentation Metrics 7.2.分段指标
The standard evaluation metric for polyp segmentation tasks is the Intersection-OverUnion (IoU), also known as the Jaccard Index. The IoU formula is defined as follows: 息肉分割任务的标准评估指标是 "交集-超集(IoU)",也称为 Jaccard 指数。IoU 公式定义如下:
In Equation (8), |A nn B||A \cap B| is the area of overlap and |A uu B||A \cup B| is the area of union. An IoU that is greater than or equal to 0.5 is considered acceptable. 在公式 (8) 中, |A nn B||A \cap B| 是重叠面积, |A uu B||A \cup B| 是结合面积。大于或等于 0.5 的 IoU 是可以接受的。
8. Survey of Existing Work 8.现有工作调查
This section presents a survey on recently proposed colorectal polyp detection methods. The survey covers the details of each method, datasets, reported performance, and any reported limitations. 本节对最近提出的结直肠息肉检测方法进行了调查。调查内容包括每种方法的细节、数据集、报告的性能以及任何报告的局限性。
In [8], the authors introduced a real-time polyp detection algorithm based on modified YOLOv3 and YOLOv4 algorithms. The authors modified the YOLOv3 network by replacing the original DarkNet53 with a CSPNet [85]. As for the YOLOv4 network, a modified CSPNet denoted as CSPDarkNet53 was introduced as the new backbone of the YOLOv4 network. In addition, another modification was made to the YOLO networks, in which SiLU replaced all the ReLU activation functions. To overcome overfitting, the authors utilized various augmentation methods such as rotation, flip, scale, and translate. The main objective of this study was to improve the performance of both YOLO networks by modifying their core structure. The proposed method was trained on the SUN and PICCOLO Widefield datasets and scored a precision of 90.61 , recall of 91.04 and F1 score of 90.82 on the Etis-Larib dataset. Some of the main limitations reported in this study include an increase in misclassification when the model encountered fecal matter and bubbles. 在 [8] 中,作者基于修改后的 YOLOv3 和 YOLOv4 算法推出了一种实时息肉检测算法。作者修改了 YOLOv3 网络,用 CSPNet 取代了原来的 DarkNet53 [85]。至于 YOLOv4 网络,则引入了名为 CSPDarkNet53 的改进型 CSPNet 作为 YOLOv4 网络的新主干。此外,还对 YOLO 网络进行了另一项修改,用 SiLU 取代了所有 ReLU 激活函数。为了克服过拟合,作者采用了各种增强方法,如旋转、翻转、缩放和平移。本研究的主要目的是通过修改 YOLO 网络的核心结构来提高其性能。提出的方法在 SUN 和 PICCOLO Widefield 数据集上进行了训练,在 Etis-Larib 数据集上的精确度为 90.61,召回率为 91.04,F1 分数为 90.82。本研究报告的一些主要局限性包括:当模型遇到粪便和气泡时,误分类率会增加。
In another study, a saliency detection network was introduced to detect polyps from static polyp images [13]. In their study, the authors used Neutrosophic theory to decrease the effect of white light reflections caused by the colonoscopy light. The authors introduced an image-suppressing technique to rebuild a colonoscopy image without the white light reflection using a single-value Neutrosophic set (SVNS). The specular regions are then recovered using a dynamic window that searches non-specular pixels near each specular pixel. An 8xx88 \times 8 window is used with a specific size and rotated counter-clockwise until the whole image is covered and all specular regions are recovered. The RGB pixels’ average value is used to paint the specular pixels in the recovered image. The authors introduced a saliency network known as NeutSS-PLS, inspired by the design of U-Net and DSS, for the detection and segmentation part. The introduced network had two-level short connections 另一项研究引入了显著性检测网络,从静态息肉图像中检测息肉[13]。在他们的研究中,作者使用了中性理论来减少结肠镜检查光引起的白光反射的影响。作者引入了一种图像抑制技术,利用单值中性集(SVNS)重建没有白光反射的结肠镜图像。然后使用动态窗口搜索每个镜面像素附近的非镜面像素,恢复镜面区域。使用一个具有特定大小的 8xx88 \times 8 窗口并逆时针旋转,直到覆盖整个图像并恢复所有镜面区域。用 RGB 像素的平均值来描绘恢复图像中的镜面像素。作者受 U-Net 和 DSS 设计的启发,为检测和分割部分引入了一个称为 NeutSS-PLS 的突出网络。引入的网络具有两级短连接
on both sides of the VGG. The training was conducted on the EndoScene and Kvasir-SEG datasets, and precision and F1 scores of 92.30 and 92.40 were reported, respectively. The proposed method struggled to identify polyps near the boundary of colonoscopy images. 在 VGG 的两侧。在 EndoScene 和 Kvasir-SEG 数据集上进行了训练,精确度和 F1 分数分别为 92.30 和 92.40。所提出的方法在识别结肠镜图像边界附近的息肉方面存在困难。
In [10], an automatic polyp detection and segmentation system was introduced. The proposed method is called shuffle-efficient channel attention network (sECA-NET), and it segments colonoscopy images to detect polyps. At first, a CNN is applied to extract the feature map from an input image and a region proposal network (RPN), which predicts bounding boxes around polyps in the feature map, is developed. A region of interest align (RoiAlign) is applied to extract features from the feature map based on the bounding box of each detected object. Two parallel branches then compute the extracted features for every ROI. In the first branch, the features are computed by the fully connected layers, followed by softmax activation, and then a bounding box regression is performed. The second branch concerns mask segmentation, which predicts the category of each pixel in the region of interest. The proposed method was trained on the CVC-ClinicDB, ETIS-Larib and Kvasir-SEG. A private cross-validation dataset was used to evaluate the proposed idea. The authors reported a precision score of 94.9%94.9 \%, a recall score of 96.9%96.9 \% and an F1 score of 95.9%95.9 \%. 文献[10]介绍了一种息肉自动检测和分割系统。所提出的方法被称为洗牌高效通道注意网络(sECA-NET),它能分割结肠镜图像以检测息肉。首先,应用 CNN 从输入图像中提取特征图,然后开发区域建议网络(RPN),预测特征图中息肉周围的边界框。然后应用兴趣区域对齐(RoiAlign),根据每个检测到的物体的边界框从特征图中提取特征。然后,两个并行分支计算每个感兴趣区提取的特征。在第一个分支中,由全连接层计算特征,然后进行软最大激活,最后进行边界框回归。第二个分支是掩膜分割,预测感兴趣区域中每个像素的类别。所提出的方法在 CVC-ClinicDB、ETIS-Larib 和 Kvasir-SEG 数据库上进行了训练。使用私人交叉验证数据集来评估所提出的想法。作者报告的精确度得分为 94.9%94.9 \% ,召回得分为 96.9%96.9 \% ,F1得分为 95.9%95.9 \% 。
A dual-path CNN architecture was proposed in a different study by [9]. The proposed model takes an input colonoscopy image and produces a label corresponding to one of two classes: polyp and non-polyp. The first step of this method is image enhancement, in which the images are transformed into HSV color space. The V value in the HSV space is extracted using a multiscale Gaussian function; then, gamma correction is executed to correct the image’s brightness. After conversion, image fusion is conducted, in which the HSV image is converted back to RGB. Next, a custom, dual-path CNN that is eight layers deep extracts features from the processed image; then, the features are fed to a sigmoid layer, and mapped onto polyp and non-polyp. The training of this network was done on the CVC-ClinicDB, while testing was conducted on the CVC-ColonDB and ETIS-Larib datasets. The model performed best on CVC-ColonDB, with a precision of 100%100 \%, recall of 99.20%99.20 \% and F 1 score of 99.60%99.60 \%. The main limitation observed with this method is that the model assumes the location of the polyp manually, which is impractical when working with real-life samples. 在另一项研究中,[9] 提出了一种双路径 CNN 架构。该模型接收输入的结肠镜图像,并生成与息肉和非息肉两类中的一类相对应的标签。该方法的第一步是图像增强,将图像转换到 HSV 色彩空间。使用多尺度高斯函数提取 HSV 空间中的 V 值,然后执行伽玛校正以校正图像亮度。转换后,进行图像融合,将 HSV 图像转换回 RGB。接着,一个定制的双路径 CNN 从处理过的图像中提取出八层深度的特征;然后,将这些特征输入到一个 sigmoid 层,并映射到多聚物和非多聚物上。该网络的训练在 CVC-ClinicDB 数据集上完成,测试则在 CVC-ColonDB 和 ETIS-Larib 数据集上进行。该模型在 CVC-ColonDB 上表现最佳,精确度为 100%100 \% ,召回率为 99.20%99.20 \% ,F 1 得分为 99.60%99.60 \% 。这种方法的主要局限性在于,该模型假定息肉的位置是手动确定的,这在处理真实样本时是不切实际的。
Inspired by U-NET, the authors of [14] introduced Y-Net to detect polyps in colonoscopy images. The proposed Y-Net consists of two encoders and a single decoder, and it can be trained on a limited number of samples. Both encoders follow the VGG19 design, while the decoder is a custom-built CNN with five deconvolutional blocks and one final convolution block. The first encoder is initialized on the ImageNet weights, the second encoder is initialized using the Xavier normal initializer, and both use the SELU activation function instead of ReLU. Y-Net was trained and tested on the ASU-Mayo dataset without a cross-validation dataset. The authors reported a precision of 87.4 , recall of 84.4%84.4 \% and F1 scores of 85.9%85.9 \%. The authors reported that this method did not work well with reflections, polyp-shaped objects, and flat lesions. 受 U-NET 的启发,[14] 的作者引入了 Y-Net 来检测结肠镜图像中的息肉。所提出的 Y-Net 由两个编码器和一个解码器组成,可以在有限的样本数量上进行训练。两个编码器均采用 VGG19 设计,而解码器则是一个定制的 CNN,包含五个解卷积块和一个最终卷积块。第一个编码器使用 ImageNet 权重初始化,第二个编码器使用 Xavier 正常初始化器初始化,两个编码器都使用 SELU 激活函数而不是 ReLU。Y-Net 在 ASU-Mayo 数据集上进行了训练和测试,没有交叉验证数据集。作者报告的精确度为 87.4,召回率为 84.4%84.4 \% ,F1 分数为 85.9%85.9 \% 。作者报告说,这种方法在处理反射、多面体物体和扁平病变时效果不佳。
The authors in [58] proposed a deep learning-based method to detect and classify polyps from colonoscopy images. The proposed method utilizes the single-shot detector algorithm (SSD) to locate a polyp in a given image. At first, the images are manually annotated by specialists in which a bounding box is drawn around each polyp, and this bounding box is used to train the proposed model algorithm. After annotation, the images are pre-processed using dynamic histogram equalization to enhance the quality of the input image. The authors used a dataset provided by the University of Leeds, and an accuracy of 92%92 \% was reported. This method was not evaluated against any available datasets. 作者在 [58] 中提出了一种基于深度学习的方法,用于检测结肠镜图像中的息肉并对其进行分类。该方法利用单次检测器算法(SSD)来定位给定图像中的息肉。首先,由专家对图像进行人工标注,在每个息肉周围画出一个边界框,这个边界框用于训练所提出的模型算法。注释后,使用动态直方图均衡对图像进行预处理,以提高输入图像的质量。作者使用了利兹大学提供的数据集,报告的准确率为 92%92 \% 。该方法未针对任何可用数据集进行评估。
A method that utilizes ensemble learning was proposed in [83], in which three models were used together to classify whether a polyp exists in a given image. The authors in this method stacked three pre-trained classifiers: (1) ResNet101, (2) GoogLeNet, and (3) Xception. The proposed model attempts to first classify whether a polyp exists in an image; the model then classifies the polyp again to check whether the detected lesion is malignant or benign. The three models extract features independently then a weighted majority voting 文献[83]中提出了一种利用集合学习的方法,即使用三个模型共同对给定图像中是否存在息肉进行分类。作者在该方法中堆叠了三个预先训练好的分类器:(1) ResNet101、(2) GoogLeNet 和 (3) Xception。所提出的模型试图首先对图像中是否存在息肉进行分类,然后再次对息肉进行分类,以检查检测到的病变是恶性还是良性。这三种模型分别独立提取特征,然后进行加权多数表决。
representing whether a polyp exists and whether it is malignant or benign is produced. The training and testing were performed using a private collection of colonoscopy images and the Kvasir-SEG datasets. The authors reported precision and recall scores of 98.6 and 98.01, respectively, for the polyp detection task. In addition, precision and recall of 98.66 and 96.73 were reported for the malignant/benign classification task. 该数据集代表息肉是否存在以及是恶性还是良性。使用私人收集的结肠镜图像和 Kvasir-SEG 数据集进行了训练和测试。作者报告说,息肉检测任务的精确度和召回分数分别为 98.6 和 98.01。此外,恶性/良性分类任务的精确度和召回率分别为 98.66 和 96.73。
In a different approach, researchers in [11] used a no-code, deep-learning platform to predict colorectal polyps from colonoscopy images. The authors used a platform known as Neuro-T, as it had the most user-friendly GUI and it produced the best performance scores. The authors fed the system with white light colonoscopy images that were manually labeled with ground-truth labels according to a pathological evaluation. The authors acquired a different colonoscopy dataset to evaluate the proposed method. The authors reported a precision of 78.5 , a recall of 78.8 and an F1 score of 78.6. One of the main limitations reported in this paper is that the system consistently misclassified normal blood vessels as polyps. 在另一种方法中,[11] 的研究人员使用无代码深度学习平台从结肠镜图像中预测结肠直肠息肉。作者使用了一个名为 Neuro-T 的平台,因为该平台的图形用户界面最友好,而且性能得分最高。作者向系统输入了白光结肠镜图像,这些图像是根据病理评估结果用地面实况标签手动标注的。作者获取了一个不同的结肠镜数据集来评估所提出的方法。作者报告的精确度为 78.5,召回率为 78.8,F1 分数为 78.6。本文报告的主要局限性之一是,该系统始终将正常血管误分类为息肉。
In [86], the authors attempt to combine the SWIN [87] transformer and EfficientNet [88] to segment and detect polyps from colonoscopy images. The proposed method combines architectures to capture all the critical global information through the SWIN transformer and all the local features using the EfficientNet. The proposed method has a multi-dilation convolutional block to refine all the local features extracted by EfficientNet and the SWIN transformer separately. In addition, a multi-feature aggregation block is added to aggregate both the global and local features. Once the features are refined and aggregated, an attentive block receives the features, and a polyp mask is built. The proposed method was trained on the Kvasir-SEG and CVC-ClinicDB datasets and tested on the CVC-ColonDB, ETIS-Larib and Endoscene datasets. During the evaluation stage, the authors reported a mean dice coefficient of 0.906 , an IoU of 0.842 , a mean weighted F-measure of 0.88 , mean absolute error of 0.001. 在 [86] 中,作者尝试结合 SWIN [87] 变换器和 EfficientNet [88] 从结肠镜图像中分割和检测息肉。所提出的方法结合了多种架构,通过 SWIN 变换器捕捉所有关键的全局信息,并利用 EfficientNet 捕捉所有局部特征。所提出的方法有一个多扩张卷积块,可分别细化由 EfficientNet 和 SWIN 变换器提取的所有局部特征。此外,还添加了一个多特征聚合块,用于聚合全局和局部特征。一旦对特征进行了细化和聚合,一个细心模块就会接收这些特征,并建立一个多面膜。提出的方法在 Kvasir-SEG 和 CVC-ClinicDB 数据集上进行了训练,并在 CVC-ColonDB、ETIS-Larib 和 Endoscene 数据集上进行了测试。在评估阶段,作者报告的平均骰子系数为 0.906,IoU 为 0.842,平均加权 F-measure 为 0.88,平均绝对误差为 0.001。
In [89], the authors introduced a deep learning-based method to detect colorectal polyps from colonoscopy images. The proposed method can identify high-risk regions and classify polyps in a given region using a traditional machine-learning algorithm. For polyp detection, the authors use the Faster-RCNN network combined with a ResNet101 to extract the features of the detected polyp. A gradient-boosted decision tree classifier takes the output of Faster-RCNN and predicts whether a given region is high or low risk. The authors used a dataset provided by Singapore General Hospital; no cross-validation on a different dataset was mentioned in this work. The authors reported a sensitivity of 97.4%97.4 \%, specificity of 60.3%60.3 \%, AUC of 91.7%91.7 \% and F1 score of 97.4%97.4 \%. 在 [89] 中,作者介绍了一种基于深度学习的方法,用于从结肠镜检查图像中检测结肠直肠息肉。所提出的方法可以识别高风险区域,并使用传统的机器学习算法对给定区域内的息肉进行分类。在息肉检测方面,作者使用 Faster-RCNN 网络结合 ResNet101 来提取检测到的息肉特征。梯度提升决策树分类器利用 Faster-RCNN 的输出,预测给定区域是高风险还是低风险。作者使用了新加坡中央医院提供的数据集,但未提及在不同数据集上进行交叉验证。作者报告的灵敏度为 97.4%97.4 \% ,特异性为 60.3%60.3 \% ,AUC为 91.7%91.7 \% ,F1得分为 97.4%97.4 \% 。
In a study by [90], the authors introduced a deep learning method to differentiate premalignant and benign polyps apart from 3D colonoscopy images. The authors introduced two handcrafted convolutional neural networks called SEG and noSEG. The SEG and NoSeg networks consist of 50 3D convolutional layers, and both are stacked, forming an ensemble-based model. Both SEG and NoSEG are trained differently. On the one hand, the SEG network is trained on colonoscopy images with a mask to detect the location of polyps. 在 [90] 的一项研究中,作者引入了一种深度学习方法来区分三维结肠镜图像中的恶性息肉和良性息肉。作者引入了两个名为 SEG 和 noSEG 的手工卷积神经网络。SEG 和 NoSeg 网络由 50 个三维卷积层组成,两者相互堆叠,形成一个基于集合的模型。SEG 和 NoSEG 的训练方式不同。一方面,SEG 网络是在带有掩膜的结肠镜图像上进行训练,以检测息肉的位置。
On the other hand, the NoSEG network was trained on 3D CT colonography images without masks. Both networks were trained separately to predict the class of a polyp (i.e., premalignant or benign). The output of both SEG and noSEG are concatenated, and a final class output is produced. The training dataset consisted of several privately collected images of adults undergoing colonography and cross-validated on a public dataset from the Cancer Imaging Archive (TCIA). The authors used the area under the ROC curve (AUC) and reported a score of 83.0 for the SEG network and 75.0 for the NoSEG network. 另一方面,NoSEG 网络是在无掩膜的 3D CT 结肠成像图像上进行训练的。两个网络分别进行训练,以预测息肉的类别(即恶性或良性)。将 SEG 和 noSEG 的输出进行合并,得出最终的类别输出。训练数据集由几张私人收集的成人结肠造影图像组成,并在癌症成像档案(TCIA)的公共数据集上进行交叉验证。作者使用了 ROC 曲线下面积 (AUC),并报告 SEG 网络的得分为 83.0,NoSEG 网络的得分为 75.0。
In [91], the authors introduced a polyp characterization deep learning algorithm and embedded it into a GI Genius V2 endoscopy [92]. The classification module of the proposed system consists of two pre-trained ResNet18 networks. The first network is responsible for classifying each frame as adenoma or non-adenoma. The second network produces a polyp descriptor for every detected polyp in a frame. The authors used a privately collected dataset of unfiltered colonoscopy videos for training. The same dataset was used to test the system, and an accuracy of 84.8%84.8 \% was reported. In addition, the authors reported a sensitivity 在[91]中,作者引入了一种息肉特征描述深度学习算法,并将其嵌入到消化道天才 V2 内窥镜[92]中。拟议系统的分类模块由两个预先训练好的 ResNet18 网络组成。第一个网络负责将每帧图像分类为腺瘤或非腺瘤。第二个网络为每一帧检测到的息肉生成息肉描述符。作者使用私人收集的未经过滤的结肠镜检查视频数据集进行训练。同样的数据集也被用来测试该系统,据报告准确率为 84.8%84.8 \% 。此外,作者还报告了灵敏度
score of 80.7%80.7 \%