这是用户在 2024-10-24 16:59 为 https://onlinelibrary.wiley.com/doi/10.1111/joor.13701 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?
REVIEW 评审
Full Access 全访问

Deep learning for temporomandibular joint arthropathies: A systematic review and meta-analysis
深度学习在颞下颌关节关节炎中的应用:系统回顾与元分析

Rata Rokhshad

Rata Rokhshad

Topic Group Dental Diagnostics and Digital Dentistry, ITU/WHO Focus Group AI on Health, Berlin, Germany

Search for more papers by this author
Hossein Mohammad-Rahimi

Hossein Mohammad-Rahimi

Topic Group Dental Diagnostics and Digital Dentistry, ITU/WHO Focus Group AI on Health, Berlin, Germany

Division of Artificial Intelligence Research, University of Maryland School of Dentistry, Baltimore, Maryland, USA

Search for more papers by this author
Fatemeh Sohrabniya

Fatemeh Sohrabniya

Topic Group Dental Diagnostics and Digital Dentistry, ITU/WHO Focus Group AI on Health, Berlin, Germany

Search for more papers by this author
Bahare Jafari

Bahare Jafari

Topic Group Dental Diagnostics and Digital Dentistry, ITU/WHO Focus Group AI on Health, Berlin, Germany

Search for more papers by this author
Parnian Shobeiri

Parnian Shobeiri

Department of Radiology, Memorial Sloan Kettering Cancer Center, New York, New York, United States

Search for more papers by this author
Ioannis A. Tsolakis

Ioannis A. Tsolakis

Department of Orthodontics, School of Dentistry, Aristotle University of Thessaloniki, Thessaloniki, Greece

Department of Orthodontics, School of Dental Medicine, Case Western Reserve University, Cleveland, Ohio, USA

Search for more papers by this author
Seyed AmirHossein Ourang

Seyed AmirHossein Ourang

Dentofacial Deformities Research Center, Research Institute of Dental Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran

Search for more papers by this author
Ahmed S. Sultan

Ahmed S. Sultan

Division of Artificial Intelligence Research, University of Maryland School of Dentistry, Baltimore, Maryland, USA

Department of Oncology and Diagnostic Sciences, University of Maryland School of Dentistry, Baltimore, Maryland, USA

University of Maryland Marlene and Stewart Greenebaum Comprehensive Cancer Center, Baltimore, Maryland, USA

Search for more papers by this author
Shehryar Nasir Khawaja

Corresponding Author

Shehryar Nasir Khawaja

Orofacial Pain Medicine, Shaukat Khanum Memorial Cancer Hospitals and Research Centres, Lahore and Peshawar, Pakistan

School of Dental Medicine, Tufts University, Boston, Massachusetts, USA

Correspondence

Shehryar Nasir Khawaja, School of Dental Medicine, Tufts University, Boston, MA, USA.

Email: shehryarnasir@skm.org.pk

Roxanne Bavarian, Department of Oral and Maxillofacial Surgery, Harvard School of Dental Medicine, Boston, MA, USA.

Email: roxanne_bavarian@hsdm.harvard.edu

Search for more papers by this author
Roxanne Bavarian

Corresponding Author

Roxanne Bavarian

Department of Oral and Maxillofacial Surgery, Massachusetts General Hospital, Boston, Massachusetts, USA

Department of Oral and Maxillofacial Surgery, Harvard School of Dental Medicine, Boston, Massachusetts, USA

Correspondence

Shehryar Nasir Khawaja, School of Dental Medicine, Tufts University, Boston, MA, USA.

Email: shehryarnasir@skm.org.pk

Roxanne Bavarian, Department of Oral and Maxillofacial Surgery, Harvard School of Dental Medicine, Boston, MA, USA.

Email: roxanne_bavarian@hsdm.harvard.edu

Search for more papers by this author
Juan Martin Palomo

Juan Martin Palomo

Department of Orthodontics, School of Dental Medicine, Case Western Reserve University, Cleveland, Ohio, USA

Search for more papers by this author
First published: 17 May 2024
Citations: 2

首次发表:2024 年 5 月 17 日 https://doi.org/10.1111/joor.13701 引用次数:2

Rata Rokhshad and Hossein Mohammad-Rahimi were contributed equally to this project.
拉塔·罗克萨德和侯赛因·莫罕马德-拉赫米对这个项目贡献等同。

Abstract 摘要

Background and Objective
背景与目的

The accurate diagnosis of temporomandibular disorders continues to be a challenge, despite the existence of internationally agreed-upon diagnostic criteria. The purpose of this study is to review applications of deep learning models in the diagnosis of temporomandibular joint arthropathies.
尽管存在国际公认的诊断标准,颞下颌关节障碍的准确诊断仍是一个挑战。本研究的目的是回顾深度学习模型在颞下颌关节病诊断中的应用。

Materials and Methods 材料和方法

An electronic search was conducted on PubMed, Scopus, Embase, Google Scholar, IEEE, arXiv, and medRxiv up to June 2023. Studies that reported the efficacy (outcome) of prediction, object detection or classification of TMJ arthropathies by deep learning models (intervention) of human joint-based or arthrogenous TMDs (population) in comparison to reference standard (comparison) were included. To evaluate the risk of bias, included studies were critically analysed using the quality assessment of diagnostic accuracy studies (QUADAS-2). Diagnostic odds ratios (DOR) were calculated. Forrest plot and funnel plot were created using STATA 17 and MetaDiSc.
至 2023 年 6 月,我们在 PubMed、Scopus、Embase、Google Scholar、IEEE、arXiv 和 medRxiv 上进行了电子搜索。包括那些报告了基于深度学习模型(干预措施)的人类关节或由关节引起的 TMJ 关节病(人群)的预测、对象检测或分类效果(结果),并与参考标准(比较)进行比较的研究。为了评估偏倚风险,对纳入的研究使用诊断准确性研究的质量评估工具(QUADAS-2)进行了批判性分析。计算了诊断奇偶比(DOR)。使用 STATA 17 和 MetaDiSc 创建了福尔斯特图和漏斗图。

Results 结果

Full text review was performed on 46 out of the 1056 identified studies and 21 studies met the eligibility criteria and were included in the systematic review. Four studies were graded as having a low risk of bias for all domains of QUADAS-2. The accuracy of all included studies ranged from 74% to 100%. Sensitivity ranged from 54% to 100%, specificity: 85%–100%, Dice coefficient: 85%–98%, and AUC: 77%–99%. The datasets were then pooled based on the sensitivity, specificity, and dataset size of seven studies that qualified for meta-analysis. The pooled sensitivity was 95% (85%–99%), specificity: 92% (86%–96%), and AUC: 97% (96%–98%). DORs were 232 (74–729). According to Deek's funnel plot and statistical evaluation (p =.49), publication bias was not present.
全文审查了 1056 篇研究中的 46 篇,其中 21 篇符合入选标准并被纳入系统综述。4 篇研究在 QUADAS-2 的所有领域均被评为低偏倚风险。所有纳入的研究准确率范围从 74%到 100%。敏感性范围从 54%到 100%,特异性:85%–100%,Dice 系数:85%–98%,AUC:77%–99%。然后根据敏感性、特异性和数据集大小,将符合元分析条件的 7 篇研究的数据集进行汇总。汇总后的敏感性为 95%(85%–99%),特异性:92%(86%–96%),AUC:97%(96%–98%)。DORs 为 232(74–729)。根据 Deek 的漏斗图和统计评估(p =.49),未发现发表偏倚。

Conclusion 结论

Deep learning models can detect TMJ arthropathies high sensitivity and specificity. Clinicians, and especially those not specialized in orofacial pain, may benefit from this methodology for assessing TMD as it facilitates a rigorous and evidence-based framework, objective measurements, and advanced analysis techniques, ultimately enhancing diagnostic accuracy.
深度学习模型可以以高灵敏度和高特异性检测颞下颌关节病。临床医生,尤其是那些不专门研究颜面疼痛的医生,可以从这种方法评估 TMD 中获益,因为它提供了一个严谨和基于证据的框架、客观测量和高级分析技术,最终提高了诊断的准确性。

1 INTRODUCTION 1 引言

Temporomandibular disorders (TMDs) refer to a group of musculoskeletal conditions characterised by clinical manifestations affecting the masticatory muscles, temporomandibular joints (TMJ), and associated tissues. TMDs impact up to a third of adults, predominantly aged between 20 and 40.1-4 TMJ arthropathies refer to joint-based, or arthrogenous. These include articular disc disorders as well as arthritic conditions affecting the TMJ. They can be asymptomatic or alternatively may be painful or cause limited range of motion or intermittent locking.5, 6 Diagnosis is primarily based on a thorough history of present illness and physical examination. The examination includes palpation and auscultation of joint sounds, assessment of a mandibular range of motion and path of opening, and detection of any pain on palpation of the joint.7, 8 Diagnostic imaging or auditory devices like stethoscopes may be used if intra-articular abnormalities are suspected. However, the diagnostic performance of auditory devices or devices utilising vibrations is notably inadequate.9, 10 In some cases, minimally invasive surgery with arthroscopy and synovial biopsy are indicated to obtain an accurate diagnosis.3, 11, 12
颞下颌关节障碍(TMD)是指一组影响咀嚼肌、颞下颌关节(TMJ)及其相关组织的肌肉骨骼疾病,其临床表现多样。TMD 影响大约三分之一的成年人,主要集中在 20 至 40 岁之间。 1-4 TMJ 关节病是指基于关节的疾病,或由关节引起的。这包括关节盘障碍以及影响 TMJ 的关节炎。它们可能无症状,也可能疼痛,导致关节活动范围受限或间歇性锁定。 5, 6 诊断主要基于当前病史的详细记录和体格检查。检查包括关节声音的触诊和听诊,评估下颌运动范围和开口路径,以及在触诊关节时检测任何疼痛。 7, 8 如果怀疑存在关节内异常,可能需要使用诊断成像或听诊器等设备。然而,听诊器或利用振动的设备的诊断性能显著不足。 在某些情况下,关节镜手术和滑膜活检的微创手术是获取准确诊断的指示方法。 3, 11, 12

With advances in artificial intelligence, machine learning and deep learning are increasingly utilised in medicine.13, 14 Machine learning refers to a set of methods that recognise data patterns and use them to predict future data or enable decisions under uncertain conditions. Deep learning is a subtype of machine learning that is particularly adept with large data sets. It uses complex, multi-layered algorithms modelled after the human brain to solve problems.15, 16 These algorithms mimic the structure of the human cognitive system in their formation of artificial neural networks (ANNs). Deep learning approaches can detect different types of cancers in their early stages and propose the best formulation for each drug in its target tissue.17-20 ANNs were able to perform tasks such as distinguishing between normal and abnormal structures in medical images.21 Clinical databases may include data points containing clinical history, radiology reports, and physical examination findings. AI-assisted natural language processing (NLP) systems can interpret this information, converting free text into structured data.22 Furthermore, convolutional neural networks (CNNs) are deep learning networks primarily used in image analysis. They apply weighted filters to each image element, simplifying recognition and response by the computer.18
随着人工智能的进步,机器学习和深度学习在医学领域中的应用越来越广泛。 13, 14 机器学习是指一组方法,用于识别数据模式并利用这些模式预测未来数据或在不确定条件下做出决策。深度学习是机器学习的一个子集,特别擅长处理大量数据。它使用模仿人类大脑的复杂、多层算法来解决问题。 15, 16 这些算法在形成人工神经网络(ANNs)时,模仿了人类认知系统的结构。深度学习方法可以在早期检测不同类型的癌症,并为每种药物在目标组织中的最佳配方提供建议。 17-20 人工神经网络能够执行诸如在医学图像中区分正常和异常结构等任务。 21 临床数据库可能包含临床历史、放射学报告和体格检查结果等数据点。基于人工智能的自然语言处理(NLP)系统可以解释这些信息,将自由文本转换为结构化数据。 22 此外,卷积神经网络(CNN)是主要用于图像分析的深度学习网络。它们对每个图像元素应用加权滤波器,简化了计算机的识别和响应过程。 18

Deep learning methods and their application in dentistry have recently been a research subject. Examples include dental caries detection, estimating periodontal bone loss, planning treatments for orthodontics or maxillofacial surgeries, designing clinical decision support systems, numbering teeth, and detecting lesions.23-26 Pathology diagnostics, such the evaluation of the TMJ and early detection of TMJ arthropathies through deep learning, is a critical and sensitive task that has been the focus of numerous studies.27-30 The present systematic review aims to report on the primary studies investigating deep learning applications on TMJ arthropathies.
深度学习方法及其在牙科的应用最近成为研究主题。示例包括龋齿检测、估计牙周骨丢失、规划正畸或颌面外科治疗、设计临床决策支持系统、牙齿编号和病变检测。 23-26 通过深度学习进行颞下颌关节(TMJ)病理诊断,特别是早期检测 TMJ 关节病,是一个关键且敏感的任务,受到了众多研究的关注。 27-30 当前的系统性综述旨在报告研究深度学习在 TMJ 关节病应用的主要研究。

2 METHODS 2 方法

This systematic review explores the relationship between TMJ arthropathies and deep learning. This study follows the Preferred Reporting Items for a Systematic Review and Meta-analysis of Diagnostic Test Accuracy (PRISMA-DTA) guideline.31 The study protocol was registered at PROSPERO [CRD42022354183]. This systematic review answers the following question: What is the efficacy (outcome) of deep learning models (intervention) in the assessment of TMJ arthropathies (population) in comparison with conventional diagnostic protocols (comparison)?
本系统综述探讨了颞下颌关节病与深度学习之间的关系。本研究遵循了系统综述和诊断测试准确性元分析的首选报告项目(PRISMA-DTA)指南。研究方案已在 PROSPERO 注册[CRD42022354183]。本系统综述回答了以下问题:深度学习模型在评估颞下颌关节病与传统诊断协议相比,其在人群中的效果(结果)是什么?

2.1 Eligibility criteria
2.1 资格标准

Diagnostic studies reporting the following criteria were included:
报告以下标准的诊断研究被纳入:
  • Population: Studies in which deep learning models were implemented on TMD medical diagnostic data, including: panoramic radiograph, cone beam computed tomography (CBCT), magnetic resonance imaging (MRI), TMJ acoustic sounds, and structured data. Structured data is defined as data that can be easily analysed and modelled by deep learning algorithms. Typically, this data is presented in a tabular format with well-defined columns and rows.
    人口:在实施深度学习模型的 TMD 医疗诊断数据研究中,包括:全景放射图,锥形束计算机断层扫描(CBCT),磁共振成像(MRI),TMJ 声学声音,以及结构化数据。结构化数据被定义为可以被深度学习算法轻松分析和建模的数据。通常,此类数据以表格形式呈现,具有明确的列和行。
  • Intervention: Deep learning approaches
    干预:深度学习方法
  • Comparison: Studies comparing the performance of deep learning models with a reference standard
    比较:将深度学习模型的性能与参考标准进行比较的研究
  • Outcome: Reported accuracy, sensitivity, specificity, the area under the curve (AUC), F1-score, recall, receiver operating characteristic curve (ROC), dice similarity coefficient (DSC) and co-occurrence
    结果:报告的准确性,敏感性,特异性,曲线下面积(AUC),F1 分数,召回率,受试者操作特征曲线(ROC),Dice 相似性系数(DSC)和共现

Studies reporting the following criteria were excluded: study results that do not include sufficient details about the data used for training and testing (e.g., dataset size, modalities); studies that do not explain deep learning models in sufficient detail; and review articles.
报告以下标准的研究被排除在外:研究结果不包含用于训练和测试的数据的足够细节的研究(例如,数据集大小,模态);没有对深度学习模型进行足够详细解释的研究;以及综述文章。

2.2 Information sources and search
2.2 信息来源和搜索

The following electronic databases were searched up to June 2023: Medline (via PubMed), Scopus, Google Scholar, Embase, ArXiv, medRxiv, and IEEE. No limitation was conducted based on the publication time and the language. Adapted keywords were used to search each database (Table 1). Additionally, the bibliographies of the included papers were manually cross-referenced.
截至 2023 年 6 月,以下电子数据库进行了搜索:Medline(通过 PubMed),Scopus,Google Scholar,Embase,ArXiv,medRxiv 和 IEEE。未对出版时间和语言进行限制。使用了适应性关键词来搜索每个数据库(表 1)。此外,还手动交叉参考了所包含论文的参考文献。

TABLE 1. Specific search query for different databases (till June 2023).
表 1. 不同数据库的特定搜索查询(截至 2023 年 6 月)。
Data Base 数据库 Search Query 搜索查询 Results 结果
Pubmed (‘Temporomandibular Joint’[MeSH Terms] OR ‘Temporomandibular Joint Disorders’[MeSH Terms] OR ‘Mandibular Condyle’[MeSH Terms] OR ‘Joint dislocations/surgery’([MeSH Terms]) AND (‘Deep Learning’[MeSH Terms] OR (Image Processing, Computer-assisted/methods[MeSH Terms]) OR ‘Artificial Intelligence’[MeSH Terms] OR ‘Machine learning’[MeSH Terms] OR ‘Algorithms’[MeSH Terms] OR ‘Neural Networks, computer ‘[MeSH Terms])
(颞下颌关节[MeSH 术语] OR 颞下颌关节障碍[MeSH 术语] OR 下颌关节突[MeSH 术语] OR 关节脱位/手术[MeSH 术语])AND(深度学习[MeSH 术语] OR (计算机辅助图像处理方法[MeSH 术语])OR 人工智能[MeSH 术语] OR 机器学习[MeSH 术语] OR 算法[MeSH 术语] OR 计算机神经网络[MeSH 术语])
325
Google Scholar 谷歌学术

Allintitle:(‘temporomandibular Joint’ OR ‘temporomandibular Joint Disorders’ OR ‘mandibular Condyle’ OR ‘mandibular joint physiology’) AND (‘deep learning’ OR ‘image processing’ OR ‘artificial Intelligence’ OR ‘machine learning’ OR ‘neural network’)
关键词:(颞下颌关节 OR 颞下颌关节障碍 OR 下颌髁突 OR 下颌关节生理学)AND(深度学习 OR 图像处理 OR 人工智能 OR 机器学习 OR 神经网络)

11
Embase (‘Deep learning’:ti, ab, kw OR ‘machine learning’:ti, ab, kw OR ‘artificial intelligence’:ti, ab, kw OR ‘neural network’:ti, ab, kw) AND (‘temporomandibular joint’:ti, ab, kw OR ‘temporomandibular joint disorders’:ti, ab, kw OR ‘mandibular condyle’:ti, ab, kw OR ‘mandibular joint physiology’:ti, ab, kw OR ‘tmj’:ti, ab, kw)
(深度学习:ti, ab, kw 或 机器学习:ti, ab, kw 或 人工智能:ti, ab, kw 或 神经网络:ti, ab, kw)AND (颞下颌关节:ti, ab, kw 或 颞下颌关节障碍:ti, ab, kw 或 下颌关节:ti, ab, kw 或 下颌关节生理学:ti, ab, kw 或 tmj:ti, ab, kw)
38
Scopus

TITLE-ABS-KEY ((‘deep learning’ OR ‘artificial intelligence’ OR ‘machine learning’ OR ‘neural network’ OR ‘image processing’) AND (‘Temporomandibular Joint’ OR ‘Temporomandibular Joint Disorders’ OR ‘Mandibular Condyle’ OR ‘Mandibular joint physiology’ OR ‘TMJ’))

740
ArXiv

(‘deep learning’ OR ‘machine learning’ OR ‘artificial intelligence’ OR ‘neural network’) AND (‘Temporomandibular Joint’ OR ‘Mandibular Condyle’ OR ‘temporomandibular disorder’ OR ‘TMJ’)

2
medRxiv

(‘Deep learning’ OR ‘machine learning’ OR ‘artificial intelligence’ OR ‘neural network’) AND (‘Temporomandibular Joint’ OR ‘Mandibular Condyle’ OR ‘temporomandibular disorder’ OR ‘TMJ’)

10

IEEE

(‘deep learning’ OR ‘machine learning’ OR ‘artificial intelligence’ OR ‘neural network’ OR ‘image processing’) AND (‘Temporomandibular Joint’ OR ‘Mandibular Condyle’ OR ‘Mandibular joint physiology’ OR ‘temporomandibular disorder’ OR ‘TMJ’)

28

2.3 Study selection 2.3 研究选择

Citations were managed using Endnote 20 (Clarivate, Philadelphia, USA). Following removal of duplicate studies, two independent reviewers screened the titles and abstracts (B.J. and F.S.). A third reviewer was consulted to resolve any disagreements (R.R). Based on the inclusion and exclusion criteria, two independent investigators evaluated the full texts of eligible studies (B.J. and F.S.). In cases of disagreement between the two reviewers, a third investigator was sought to reach consensus (R.R).
参考文献使用 Endnote 20(Clarivate,美国费城)进行管理。在移除重复的研究后,两位独立的审阅者筛选了标题和摘要(B.J.和 F.S.)。第三位审阅者被咨询以解决任何分歧(R.R)。根据纳入和排除标准,两位独立的研究者评估了符合条件研究的完整文本(B.J.和 F.S.)。在两位审阅者之间存在分歧的情况下,寻求第三位研究者以达成共识(R.R)。

2.4 Data collection and extraction
2.4 数据收集与提取

Two reviewers (F.S. and B.J.) collected data from the included studies. A third reviewer (R.R.) revised data collection for discrepancies and disagreements. Extracted data items included information about the authors and the publication year (bibliographic details), modality of data (panoramic, MRI, structured data, CBCT, and TMJ acoustic signals), size of the dataset (if the study had included information regarding the train, validation, and test datasets, they were incorporated into the analysis), labeling procedures (how the reference test is established), deep learning tasks (classification, object detection, segmentation), preprocessing (removal of unwanted distortions and enhancing specific qualities that are crucial for the intended application), augmentation (a technique that involves artificially expanding the training set by generating modified copies of a dataset using existing data), performance, and outcomes.
两位审稿人(F.S. 和 B.J.)收集了纳入研究的数据。第三位审稿人(R.R.)对数据收集进行了修订,以解决分歧和差异。提取的数据项目包括作者信息和出版年份(文献细节)、数据模态(全景图、MRI、结构化数据、CBCT、和颞下颌关节声信号)、数据集大小(如果研究包含有关训练集、验证集和测试集的信息,则将其纳入分析)、标签程序(如何建立参考测试)、深度学习任务(分类、对象检测、分割)、预处理(去除不必要的扭曲并增强对预期应用至关重要的特定质量)、增强(一种技术,通过使用现有数据生成数据集的修改副本以人工扩展训练集)、性能和结果。

2.5 Risk of bias and applicability
2.5 偏差风险及适用性

The risk of bias was assessed independently by two reviewers (F.S. and B.J) using the Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) tool.32 Four risks of bias domains are included in the QUADAS-2 checklist: data selection, index test, reference standard and flow and timing. Moreover, it evaluates the applicability of a study on patient selection, index tests, and reference standards. Disputes or discrepancies between the two reviewers were resolved by a third investigator (R.R).
风险偏误的评估由两位审阅者(F.S. 和 B.J.)独立进行,使用诊断准确性研究质量评估工具(QUADAS-2)。在 QUADAS-2 检查表中包含了四个风险偏误领域:数据选择、索引测试、参考标准和流程与时间。此外,它还评估了研究在患者选择、索引测试和参考标准方面的适用性。两位审阅者之间的争议或分歧由第三位调查者(R.R.)解决。

Each domain was graded high, low, or unclear risk of bias based on the full text. A high bias risk was considered in ‘patient selection’ due to a lack of information about the dataset and unclear data-splitting strategies. For the ‘index test,’ indicators included poor reporting on the reproducibility of tests, insufficient information about model construction, and no robustness analysis. Regarding ‘reference standard,’ a lack of information about the definition of the reference standard and the use of only one examiner were indicators of a high risk of bias. A final issue was ‘flow and timing’; varying reference standards across studies and an inappropriate interval between the index test and reference standard were considered high risks of bias.31
每个领域根据全文被评定为高风险、低风险或偏倚风险不明确。在“患者选择”方面,由于缺乏关于数据集的信息和数据分割策略的不明确,被考虑为高偏倚风险。对于“索引测试”,指标包括测试可重复性报告不足、模型构建信息不充分以及没有进行稳健性分析。关于“参考标准”,缺乏关于参考标准定义的信息以及仅使用一名检查员是高偏倚风险的指标。最后的问题是“流程和时间”;研究中使用不同的参考标准和索引测试与参考标准之间的时间间隔不当被认为是高偏倚风险。

2.6 Data synthesis 2.6 数据合成

The main study findings were synthesised in a narrative format. As reporting varied widely, quantitative synthesis was restricted to studies that reported sensitivity and specificity (n = 7). The ROC curves were used to calculate the AUC, sensitivity, and specificity. Summary Receiver Operating Characteristic (SROC) curves and AUCs were used to illustrate the diagnostic test performance. The plots were created with STATA 17 (StataCorp LP, College Station, Texas, USA) and MetaDiSc, and all analyses were conducted using STATA 17.0. To account for expected heterogeneity, the DerSimonian-Laird method (random-effects model) was applied, as well as I2 >50% or p .05, as an indication of data confirmation. Deeks' funnel plot was used to assess publication bias.
主要研究结果以叙述形式综合。由于报告的差异性很大,定量合成仅限于报告敏感性和特异性的研究(n = 7)。使用 ROC 曲线计算 AUC、敏感性和特异性。使用 SROC 曲线和 AUC 来展示诊断测试性能。使用 STATA 17(StataCorp LP,美国得克萨斯州科罗拉多市)和 MetaDiSc 创建了图表,并使用 STATA 17.0 进行所有分析。为了考虑预期的异质性,应用了 DerSimonian-Laird 方法(随机效应模型),以及 I 2 >50% 或 p .05,作为数据确认的指示。使用 Deeks 的漏斗图来评估发表偏倚。

In the absence of reports on the number of true positives, true negatives, false positives, and false negatives, diagnostic odds ratios (DORs) were calculated as pooled outcomes. We calculated DORs as follows:
由于缺乏真阳性、真阴性、假阳性、假阴性数量的报告,诊断比值比(DORs)作为汇总结果进行计算。我们按照以下方式计算 DORs:
DOR=Sensitivity×Specificity(1Senitivity)×(1Specificity)

Furthermore, the Grading of Recommendations Assessment, Development and Evaluation (GRADE) framework was employed to appraise the level of confidence and dependability in the data compiled from an extensive array of studies covering diverse activities (https://www.gradeworkinggroup.org).33
此外,使用了推荐评估、开发与评价(GRADE)框架来评估从涵盖各种活动的大量研究中收集的数据的可信度和可靠性(https://www.gradeworkinggroup.org)。 33

3 RESULTS 3 结果

3.1 Study selection 3.1 研究选择

The full text of 46 of the 1156 identified studies were assessed. After full-text assessment, 21 studies were included (Figure 1) and are summarised in Table 1.
在 1156 篇已识别的研究中,对 46 篇全文进行了评估。经过全文评估,共有 21 篇研究被纳入(见图 1),总结在表 1 中。

Details are in the caption following the image
PRISMA flow chart for the systematic review.
PRISMA 流程图用于系统性回顾。

3.2 Study characteristics
3.2 研究特性

Regarding data modalities, studies used various methods. A total of nine data modalities were employed: CBCT (n = 4), structured data (n = 7), MRI (n = 6), MRI and structured data (n = 1), panoramic radiographs (n = 2), and TMJ acoustic sounds (n = 1). Most studies used expert opinions to set the reference test for datasets (n = 11); however, ten studies did not mention the methods that they used. Specifically, one human expert (n = 4), two (n = 4), or three or more (n = 3) participated in defining the reference test in those studies. Image classification was the most selected deep learning task (n = 9) in included studies, followed by regression (n = 4), and object detection (n = 1) and seven studies used more than one task (three of which used segmentation) (Table 2).
关于数据模态,研究中使用了各种方法。总共使用了九种数据模态:CBCT(n = 4),结构化数据(n = 7),MRI(n = 6),MRI 和结构化数据(n = 1),全景放射图(n = 2),以及 TMJ 声学声音(n = 1)。大多数研究使用专家意见为数据集设置了参考测试(n = 11);然而,有十项研究没有提到他们使用的方法。具体而言,在这些研究中,一个人类专家(n = 4),两个(n = 4),或三个或更多(n = 3)参与定义了参考测试。图像分类是被包括研究中最常选择的深度学习任务(n = 9),其次是回归(n = 4),和对象检测(n = 1)以及七项研究使用了超过一个任务(其中三项使用了分割)(表 2)。

TABLE 2. The summary of included studies, which are categorised by the modality of data and publication year.
表 2. 包含研究的摘要,按数据模态和发表年份分类。
Author /year/ref. number
作者/年/引用编号
Objective/aim 目标/目的 Modality 模态 Data Set (Train, Validation, Test)
数据集(训练,验证,测试)
Inclusion & exclusion criteria (if any)
纳入与排除标准(如有)
Labeling procedure 标签程序 Machine Learning Task Preprocessing Augmentation Model (Algorithm Architecture) Reported diagnostic performance Outcomes
Kreiner et al. 202238
克赖纳等人. 2022 38
Diagnosis of orofacial pain and TMD
口腔面部疼痛及颞下颌关节紊乱的诊断
Clinical records 临床记录 11 NA NA Regression NA NA MLP Accuracy 90%
Kajor et al. 202237
卡约尔等人. 2022 37
TMD classification TMD 分类 TMJ acoustic signals TMJ 声学信号 129 Diagnosed with DC/TMD were included.
被诊断为 DC/TMD 的病例被纳入。
One dentist annotate data.
一位牙医标注数据。
Classification, Segmentation NA NA U-Net Sensitivity 98%
DCS 85%
Taskıran et al.202136 塔基兰等人.2021 36 TMD Diagnosis TMD 诊断 Structured data 结构化数据 241 The data between 0 and 25 kHz (TMJ sounds) is reduced to 0–5 kHz
0 到 25kHz(下颌关节声音)的数据被减少到 0-5kHz
Class I TMD patient and class II healthy patient label based on patient's records
基于患者记录的 I 类 TMD 患者和 II 类健康患者标签
Classification Images are con- verted into a PNG format 521 593 RGB-based colour image. NA CNN Classification success 94%
Kim, et al. 202128
金,等. 2021 28
TMJ Disc perforation prediction
关节盘穿孔预测
Structured data 结构化数据 443 Patients with tumours, congenital deformity, inadequate quality of MRI, or a prior TMJ surgery (total joint replacement, arthroplasty) were excluded.
患有肿瘤、先天性畸形、MRI 质量不足或曾经进行过颞下颌关节手术(全关节置换、关节成形术)的患者被排除在外。
Two oral and maxillofacial surgeons and one oral and maxillofacial radiologist analysed MRIs
两位口腔颌面外科医生和一位口腔颌面放射科医生分析了 MRI 图像
Regression NA NA ANN Sensitivity 96%
Specificity 85%

Laputkovai, et al.202041

Salivary Proteomic patterns in TMD Structured data 35 NA NA

Classification

NA NA SNN Recognition capability 97%
Sharma et al. 201911 Predication of TMJ Syndrome Structured data 2300 Patients who had symptoms of TMJ and other risks were (vertigo, tinnitus headaches, and migraines) included. NA Regression NA NA ANN Correlation coefficient of output and target data 99%
Radke et al. 201744 Detection of TMD Structured data 68 (34/−/34) Acute closed locked were included. Three opening and closing point were recorded. Classification, Regression NA NA ANN Specificity 100%
Sensitivity 92%
Accuracy 87%
Bas et al. 201243 Temporomandibular internal derangement classification Structured data 219 (161/−/58) NA Evaluated by an experienced OMFS Classification NA NA ANN

Sensitivity

100%
Specificity 95%

Ozsari, et al.

2023

TMD diagnosis using MRI MRI

2576

(64/16/20)

Patients with TMJ arthropathies were included.

Patients with prior surgery to the mandible or those with craniofacial syndromes were excluded

NA

Segmentation,

Classification

NA Augmentation by contrast, flip and rotation

CNN

Xception

ResNet-101

MobileNetV2

InceptionV3

DenseNet-121

ConvNeXt

Accuracy Rate 97%
Precision 97%
Recall 97%
F1- Score 97%
NPV 96%
Specificity 94%
AUC 97%
Kappa score 94%

Yoon. K, et al. 202346

MRI-based clinical decision support engine MRI

2360

(8/−/2)

Exclusion criteria were: previous operation, benign tumour, suspected adhesion, lateral or medial disc displacement, severe MRI artefacts, and inconsistent annotations from different clinicians. Two independent specialists

Object Detection,

Classification

Enhancing the contrast of ROI Ten times augmentation by rotation, shift, zoom NN mAP 81%

Lin, et al. 202230

Detection of displacement of TMJ disc on MRI MRI 9009 (80/10/10) Patients with diagnosed with disc displacement on either or both joints, with no history of systemic disease or cranio-maxillofacial surgery, with a MRI were included. MRI with motion and other artefacts were excluded.

Two physicians with 3–7 years of experience on MRI classified the images into ADD and non-ADD groups.

Third expert with 30 years of experience resolved any disagreements.

Classification NA NA CNN ROC 98%
Sensitivity 95%
Specificity 92%
Specificity 98%
Kao et al. 202239 TMD classification MRI 100 Patients >20 years of age who had an articular disc disorder of the TMJ and healthy controls were included One oral and maxillofacial surgery specialist annotate data. Classification, Segmentation Image equalisation, resizing Zoom, rotating, and flipping U-Net, InceptionResNetV2, InceptionV3, DenseNet169, and VGG16 Accuracy (U-Net) 95%
Lee et al. 20226 Detection of anterior disc displacement MRI 2520 (1640/411/468) NA A TMD specialist more than 7 years of experience in TMD diagnosed based on the criteria for TMD Axis I Object detection NA Flipping, contrast, and brightness change VGG-16 AUC 88%
Accuracy 83%

Iwasaki et al.

201542

TMD Progress determinationTMD Progress determination MRI, Structured data 295 NA The cases were interpreted by a maxillofacial radiologist Regression NA NA BBN Accuracy 100%

Jung et al. 202112

OA classification in TMJ Panoramic 858 (6/2/2) Patients with detected bone lesions or diseases other than OA, those with a time interval of 2 month or more between panoramic and CBCT, or illegible or blurry images were excluded. TMD-OA was diagnosed by checking the CBCTs by experts. Classification Images cropped, resized to 224*224 pixels in PNG format NA EfficientNet-B7, ResNet-152 Sensitivity (ResNet-152) 95%
Specificity (ResNet-152) 91%
Accuracy (ResNet-152) 88%
AUC (ResNet-152) 95%
Kim, et al. 202027 OA Classification and Condyle Detection Panoramic 1000 (80/−20) NA Images were labelled with the assistance of two oral physicians or orofacial pain specialist. Object Detection, Classification NA Duplication and rotation. R-CNN Accuracy 85%
Sensitivity 54%
Specificity 94%
AUC 82%
Shoukri et al. 201935 Diagnosing TMJ OA CBCT 530 (259 from de dumas et al. 2018) NA Two expert classified condylar morphology. Classification, Segmentation NA

Simulated data by adding perlin noise

ANN Accuracy 91%
Tubau et al. 201934 OA TMD Classification CBCT

293 (80/−/20)

NA Classified by two expert clinicians Classification NA NA ANN Accuracy 91%
Dumast et al. 201817 TMJ OA Classification CBCT 293 (259/−/34) NA Experienced clinicians classified patients into 6 groups: control, close to normal, degeneration 1, degeneration 2, degeneration 3, degeneration 4–5 Classification NA NA ANN Agreement between the clinician's consensus and SVA classification 91%

Prieto et al. 2018.

19

TMJ OA Classification CBCT 263 (80/−/20) NS Experienced clinicians classified datasets into 6 categories: control, close to normal, degeneration 1, degeneration 2, degeneration 3, degeneration 4–5 Classification NA NA ANN Accuracy 93%
Zhang, et al. 202140 TMJ OA Diagnosis Structured data + CBCT 92 (80/−/20) NA Patient diagnosis was confirmed by a TMJ specialist Classification NA NA FNN AUC 77%
Accuracy 74%
  • Abbreviations: ANN, Artificial Neural Network; AUC, Area under the ROC Curve; BBN, Bayesian Belief Network; CNN, Convolutional Neural Network; CBCT, Cone Beam Computed Tomography; systems DC/TMD, Diagnostic Criteria for Temporomandibular Disorders; DSC, Mean Dice Similarity Coefficient; FNN, Feedforward Neural Network; ID, Internal derangements; IoU, Intersection over Union; MLP, Multilayer Perceptron; MRI, Magnetic Resonance Imaging; MSE, Mean Squared Error; NCC, Normalised Cross Correlation; NMI, Normalised Mutual Information; NN, Neural Network, OA, Osteoarthritis; OMFS, Oral and Maxillofacial Surgeon; ROC, Receiver Operating characteristic Curve; SNN, Spiking Neural Network; SVA, Shape Variation Analyser; TMJ, Temporomandibular Joint; TMJOA, Temporomandibular Joint Osteoarthritis; TMD, Temporomandibular Disorders.
    缩写:ANN,人工神经网络;AUC,ROC 曲线下的面积;BBN,贝叶斯信念网络;CNN,卷积神经网络;CBCT,锥形束计算机断层成像;系统 DC/TMD,颞下颌关节障碍诊断标准;DSC,平均 Dice 相似性系数;FNN,前馈神经网络;ID,内部紊乱;IoU,交并比;MLP,多层感知器;MRI,磁共振成像;MSE,均方误差;NCC,归一化交叉相关;NMI,归一化互信息;NN,神经网络;OA,骨关节炎;OMFS,口腔颌面外科医生;ROC,接收者操作特征曲线;SNN,脉冲神经网络;SVA,形状变异分析器;TMJ,颞下颌关节;TMJOA,颞下颌关节骨关节炎;TMD,颞下颌关节障碍。

3.3 Risk of bias and applicability
3.3 偏差风险及适用性

The included studies were rated for risk of bias assessment using QUADAS-2 assessment tools (Figure 2). In 17 studies, the index text was graded as having a low risk of bias, and in two studies, a high risk of bias. The patient selection also had a low rate of high-risk bias for 19 studies. Based on the reference standard, nine studies were at low bias risk, and eight were classified as high bias risk. Among the included studies, four showed a low risk of bias for all domains.
纳入的研究使用 QUADAS-2 评估工具对其偏倚风险进行了评估(图 2)。在 17 项研究中,指数文本被评定为偏倚风险低,而在 2 项研究中,评定为偏倚风险高。对于 19 项研究,患者的筛选也具有低风险的高偏倚率。根据参考标准,9 项研究的偏倚风险低,8 项研究被归类为高偏倚风险。在纳入的研究中,有 4 项在所有领域均显示偏倚风险低。

Details are in the caption following the image
(A) Risk of bias assessment of each domain. Included studies were assessed with QUADAS2-tool. Since a gold standard reference was not used in the reference standard domain, bias was high. The least biased domain was timing and flow. (B) Applicability assessment of included studies.
(A) 每个领域的偏倚风险评估。纳入的研究使用 QUADAS2 工具进行评估。由于参考标准领域没有使用金标准参考,因此存在高偏倚。最不偏倚的领域是时间点和流程。 (B) 对纳入研究的适用性评估。

3.4 Findings of the studies
3.4 研究发现

According to the reports, the accuracy of deep learning of all included studies ranged from 73.9%–100%. Sensitivity ranged from 54% to 100% and specificity from 85% to 100%. The Dice coefficient ranged from 85%–98% and AUC ranged from 77% to 99%. Studies using structured data had a 92%–100% sensitivity range.8, 11, 12, 34-43 Bas et al. reported 100% sensitivity for an ANN model that classified 219 temporomandibular internal derangements based on structured data;43 whereas Radke et al. reported 92% for TMD detection (n = 68) using an ANN model using structured data.44
根据报告,所有纳入研究的深度学习准确性范围为 73.9%–100%。敏感性范围为 54%至 100%,特异性范围为 85%至 100%。Dice 系数范围为 85%–98%,AUC 范围为 77%至 99%。使用结构化数据的研究的敏感性范围为 92%–100%。 8, 11, 12, 34-43 Bas 等人报告了一个基于结构化数据分类 219 个颞下颌关节内部紊乱的 ANN 模型的敏感性为 100%; 43 而 Radke 等人报告了一个使用结构化数据的 ANN 模型用于 TMD 检测(n = 68),敏感性为 92%。 44

Based on CBCT, studies reported 74%–93% accuracy. Zhang et al. classified 92 TMJ-OA CBCT and structured data with an FNN model and reported 74% accuracy.40 Studies using panoramic radiographs reported a range of 54%–95% for sensitivity. Jung et al. trained the EfficientNet-B7 model and classified 856 OA,12 which resulted in 95% sensitivity.
基于 CBCT,研究报道了 74%至 93%的准确性。张等人对 92 个 TMJ-OA CBCT 和结构化数据进行了分类,并使用 FNN 模型,报告了 74%的准确性。使用全景 X 光片的研究报告了 54%至 95%的敏感性范围。Jung 等人训练了 EfficientNet-B7 模型,并对 856 个 OA 进行了分类,结果获得了 95%的敏感性。

3.5 Synthesis of results
3.5 各项结果的综合

The datasets were pooled based on the sensitivity, specificity, and dataset size of the seven included studies. As shown in Figure 3A, a forest plot of the 95% confidence intervals for sensitivity and specificity is displayed. Sensitivity was 95% (85%–99%), specificity was 92% (86%–96%), and AUC was 97% (96%–98%). DORs were 232 (74–729) (Figure 3B). According to Deek's funnel plot and statistical evaluation (p =.49), publication bias was not present (Figure 3C). Overall, the I2 value in this group was 99%, suggesting significant heterogeneity. Threshold effects in the amount of 0.03 may have caused heterogeneity. The level of confidence in the evidence using the GRADE system (Table 3) demonstrated a generally medium level of certainty in the evidence.
数据集根据七项纳入研究的敏感性、特异性及数据集大小进行合并。如图 3A 所示,显示了敏感性和特异性 95%置信区间(CI)的森林图。敏感性为 95%(85%-99%),特异性为 92%(86%-96%),曲线下面积(AUC)为 97%(96%-98%)。DOR 值为 232(74-729)(图 3B)。根据 Deek 的漏斗图和统计评估(p=.49),未发现发表偏倚(图 3C)。总体而言,该组的 I 2 值为 99%,表明存在显著异质性。0.03 量级的阈值效应可能引起异质性。使用 GRADE 系统评估证据的可信度显示,证据的总体可信度水平为中等(表 3)。

Details are in the caption following the image
(A) Simplifying the forest plot, each square represents a study, and lines indicate a mean and 95% confidence interval. Pooled values are using diamond symbols. (B) A summary of receiver operating characteristics (ROC) curves is provided. Each circle represents a different study, while the solid square represents the summary intersection point between sensitivity and specificity. (C) Deek's funnel plot is used to assess publication bias. According to a p value of .49, there is no publication bias, which indicates a symmetrical funnel shape.
(A) 简化森林图,每个正方形代表一项研究,线表示平均值和 95%置信区间。汇总值使用钻石符号表示。 (B) 提供了受试者操作特性(ROC)曲线的摘要。每个圆圈代表不同的研究,而实心正方形代表敏感性和特异性之间的汇总交点。 (C) 使用德克的漏斗图来评估发表偏倚。根据 p 值为 0.49,没有发表偏倚,这表明漏斗形状对称。
TABLE 3. The results obtained through the Grading of Recommendations Assessment, Development, and Evaluation (GRADE) were categorised into four separate clusters, with each cluster containing the studies that were included.
表 3. 通过评估、制定与实施建议分级(GRADE)获得的结果被分为四个单独的集群,每个集群包含被纳入的研究。
Outcome 结果 No of studies (No of patients)
研究数量(患者数量)
Study design 研究设计 Factors that may decrease certainty of evidence
可能降低证据确信度的因素
Certainty of evidence
Risk of bias 风险偏斜 Indirectness 间接性 Inconsistency 不一致性 Imprecision 误差 Publication bias 出版偏见
Detection of human TMJ arthropathies
人类颞下颌关节病的检测
8 studies 14 102 samples
8 项研究 14,102 个样本
Diagnostic accuracy study
诊断准确性研究
Not serious 不严重 Not serious 不严重 Seriousa 严肃 a Not serious 不严重 None 源文本不存在。 翻译文本: ⨁⨁⨁◯ Moderate
  • a Inconsistency arose from the utilisation of different deep learning tasks.
    a 不一致性来源于使用了不同的深度学习任务。

4 DISCUSSION 讨论

The growing prevalence of TMDs, which can impact the quality of life for many patients, necessitates immediate treatment in some cases, while for others, symptoms may subside without intervention.9, 11, 12 Diagnosing TMDs demands a comprehensive evaluation of a patient's symptoms, signs, and behavioural and psychosocial characteristics. Hence, TMDs may often go undetected by clinicians or get mistaken for other causes of orofacial pain.9, 12, 40 Treatment options may vary from pharmacotherapy with anti-inflammatory medications, occlusal orthotic appliances, physical therapy, to arthroscopic surgery and even total joint replacement of the TMJ. Thus, a proper diagnosis of TMJ is important to tailor treatment to the severity of disease and its cause, particularly in cases of inflammatory arthrtis such as rheumatoid arthritis, psoriatic arthritis or ankylosing spondylitis.35
源文本的翻译如下: 随着颞下颌关节障碍(TMD)的日益普遍,这些障碍可能影响许多患者的生活质量,因此在某些情况下需要立即治疗,而对于其他患者,症状可能在不进行干预的情况下自行缓解。诊断 TMD 需要对患者的症状、体征、行为和心理社会特征进行全面评估。因此,TMD 可能被临床医生忽视,或者被误认为是其他面部疼痛的原因。治疗选项可能包括使用抗炎药物的药物疗法、咬合矫治器、物理疗法,甚至关节镜手术,甚至包括 TMJ 的全关节置换。因此,正确诊断 TMJ 对于根据疾病的严重程度和其原因定制治疗方案至关重要,尤其是在类风湿性关节炎、银屑病关节炎或强直性脊柱炎等炎症性关节炎的情况下。

Currently, the Diagnostic Criteria for Temporomandibular Disorders (DC-TMD), based on extensive international studies and data analyses, is the most widely accepted diagnostic criteria.36 The DC-TMD and International Classification of Orofacial Pain (ICOP) provides a framework for TMJ disorder diagnosis based on history, examinations, and imaging indications. However, subjective biases and variability in diagnostic studies and treatment options exist. Deep learning can aid in decision-making by reducing biases and enhancing diagnostic accuracy.45
目前,基于广泛国际研究和数据分析的颞下颌关节障碍诊断标准(DC-TMD)是最广泛接受的诊断标准。DC-TMD 和口腔面部疼痛国际分类(ICOP)提供了一种基于病史、检查和影像学指示的颞下颌关节障碍诊断框架。然而,诊断研究和治疗选择中存在主观偏见和变异性。深度学习可以通过减少偏见和提高诊断准确性来辅助决策。

Deep learning models are evaluated by various measures, primarily by comparing their predictive accuracy to expert-labelled test data. In some studies, these models have been reported to be more precise in diagnosing dental diseases than general dental clinicians.11 Considering the frequent misdiagnoses and insufficient knowledge of TMDs in clinics, deploying deep learning models in everyday practice may prove to be a helpful adjunct that can assist clinicians. These models can enhance the interpretation of images by nonexpert human readers, facilitate accurate anatomical model creation, and potentially automate certain clinical evaluations and minor corrections. Nevertheless, clinicians are ultimately responsible for diagnosis and treatment procedures.
深度学习模型通过多种评估指标进行评估,主要通过将其预测准确性与专家标记的测试数据进行比较来进行评估。在一些研究中,这些模型被报告在诊断牙科疾病方面比一般的牙科临床医生更精确。 11 考虑到诊所中经常出现的误诊和对 TMDs 知识的不足,将深度学习模型应用于日常实践中可能证明是一种有益的辅助工具,可以帮助临床医生。这些模型可以通过非专家人类读者增强对图像的解释,促进准确的解剖模型创建,并可能自动化某些临床评估和轻微修正。然而,临床医生最终对诊断和治疗程序负责。

However, the included studies had some limitations. Without an extensive and adequate dataset, the data learning model cannot be effectively trained. Therefore, there is a pressing need for a comprehensive, well-labelled, open-source database with a confirmed TMD diagnostic method to improve accuracy. Dataset generalizability (dataset diversity) is a significant challenge and TMD studies should account for different genders, nationalities, and ages in their datasets. Data from various centers will also boost the deep learning model's power. The majority of the studies included in this review were focused on adults over 18, and not all studies specified who annotated the data. Having an experienced annotator is crucial to circumvent biases in the training model. Moreover, due to a lack of consistently available scans, large cohort sizes have not been externally validated. Only four studies mentioned external validation of deep learning models. Finally, not all studies examined inter- and intra-observer variability.
然而,所包含的研究存在一些局限性。没有广泛且足够大的数据集,数据学习模型无法得到有效训练。因此,迫切需要一个全面、标注良好、开源的数据库,以及确认的 TMD 诊断方法,以提高准确性。数据集的通用性(数据集多样性)是一个重大挑战,TMD 研究应在其数据集中考虑不同性别、国籍和年龄的人群。来自不同中心的数据也将增强深度学习模型的力量。本综述中包含的大多数研究集中在 18 岁以上的成年人,但并非所有研究都指明了标注数据的人。拥有经验丰富的标注者对于避免训练模型中的偏见至关重要。此外,由于可获取的扫描不一致,大型队列的外部验证尚未得到证实。只有四项研究提到了深度学习模型的外部验证。最后,并非所有研究都探讨了观察者之间的变异性和观察者内部的变异。

The present review has certain limitations, including the quality of devices and techniques used in different studies, which may impact diagnosis accuracy. Data modalities and scope were also limited, as only deep learning analyses were included in the study. The overall performance of the included studies was evaluated using various metrics, including accuracy, sensitivity, and specificity. However, a meta-analysis was not conducted for all included studies due to the heterogeneity of the studies and poor data reporting quality. The absence of comparable parameters and variables prevents the accurate assessment of effectiveness. Thus, it is recommended that future clinical studies use different modalities in one study and compare the results to choose the best performance between them and reach a gold standard. In addition, larger datasets from several different centres would increase reliability of results.
当前的综述存在一些局限性,包括不同研究中所使用设备和技术的质量,这可能影响诊断的准确性。数据模态和范围也有限,因为研究中仅包括深度学习分析。所包含的研究的整体性能通过包括准确度、敏感性和特异性在内的各种指标进行了评估。然而,并非所有包含的研究都进行了元分析,原因是研究的异质性和数据报告质量差。缺乏可比较的参数和变量阻止了对有效性的准确评估。因此,建议未来的临床研究在一项研究中使用不同的模态,并比较结果以选择它们之间的最佳性能并达到金标准。此外,来自多个不同中心的更大数据集将增加结果的可靠性。

5 CONCLUSION 结论

The results of this systematic review suggest that deep learning has high sensitivity and specificity for detecting TMJ arthropathies, even though studies used a limited dataset. Of course, it the meta-analysis of diagnostic accuracy revealed a significant level of heterogeneity among the included studies. This systematic review demonstrated that deep learning models not only have a high diagnostic accuracy in detecting the presence of TMD, but that deep learning can assist clinicians in detecting, classifying, and segmenting TMJ. Clinicians, particularly those not specialized in orofacial pain or TMJ surgery, may benefit from this methodology for assessing TMD, as it allows for the automation of clinical evaluation and potentially increases diagnostic accuracy.
系统回顾的结果表明,深度学习在检测颞下颌关节病方面具有高敏感性和特异性,尽管研究使用的数据集有限。当然,诊断准确性荟萃分析揭示了纳入研究之间的显著异质性水平。这项系统回顾表明,深度学习模型不仅在检测 TMD 存在的诊断准确性方面具有高诊断准确性,而且深度学习还可以帮助临床医生检测、分类和分割颞下颌关节。对于不专门从事颌面疼痛或颞下颌关节手术的临床医生而言,这种方法评估 TMD 可能会带来益处,因为它允许自动化临床评估,可能提高诊断准确性。

ACKNOWLEDGEMENTS 致谢

Not applicable. 不适用

    FUNDING INFORMATION 资金信息

    This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
    这项研究没有从公共、商业或非营利部门的任何特定资助机构获得资助。

    CONFLICT OF INTEREST STATEMENT
    利益冲突声明

    This paper's results and/or discussion are not influenced by any competing interests of authors.
    本文的结果和/或讨论不受作者任何竞争利益的影响。

    ETHICS STATEMENT 伦理声明

    Not applicable. 不适用

    CONSENT 同意

    Not applicable. 不适用

      The full text of this article hosted at iucr.org is unavailable due to technical difficulties.