这是用户在 2024-10-24 16:19 为 https://onlinelibrary.wiley.com/doi/10.1111/joor.13701 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?
REVIEW 回顾
Full Access 完全访问权限

Deep learning for temporomandibular joint arthropathies: A systematic review and meta-analysis
颞下颌关节病的深度学习:系统评价和荟萃分析

Rata Rokhshad

Rata Rokhshad

Topic Group Dental Diagnostics and Digital Dentistry, ITU/WHO Focus Group AI on Health, Berlin, Germany

Search for more papers by this author
Hossein Mohammad-Rahimi

Hossein Mohammad-Rahimi

Topic Group Dental Diagnostics and Digital Dentistry, ITU/WHO Focus Group AI on Health, Berlin, Germany

Division of Artificial Intelligence Research, University of Maryland School of Dentistry, Baltimore, Maryland, USA

Search for more papers by this author
Fatemeh Sohrabniya

Fatemeh Sohrabniya

Topic Group Dental Diagnostics and Digital Dentistry, ITU/WHO Focus Group AI on Health, Berlin, Germany

Search for more papers by this author
Bahare Jafari

Bahare Jafari

Topic Group Dental Diagnostics and Digital Dentistry, ITU/WHO Focus Group AI on Health, Berlin, Germany

Search for more papers by this author
Parnian Shobeiri

Parnian Shobeiri

Department of Radiology, Memorial Sloan Kettering Cancer Center, New York, New York, United States

Search for more papers by this author
Ioannis A. Tsolakis

Ioannis A. Tsolakis

Department of Orthodontics, School of Dentistry, Aristotle University of Thessaloniki, Thessaloniki, Greece

Department of Orthodontics, School of Dental Medicine, Case Western Reserve University, Cleveland, Ohio, USA

Search for more papers by this author
Seyed AmirHossein Ourang

Seyed AmirHossein Ourang

Dentofacial Deformities Research Center, Research Institute of Dental Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran

Search for more papers by this author
Ahmed S. Sultan

Ahmed S. Sultan

Division of Artificial Intelligence Research, University of Maryland School of Dentistry, Baltimore, Maryland, USA

Department of Oncology and Diagnostic Sciences, University of Maryland School of Dentistry, Baltimore, Maryland, USA

University of Maryland Marlene and Stewart Greenebaum Comprehensive Cancer Center, Baltimore, Maryland, USA

Search for more papers by this author
Shehryar Nasir Khawaja

Corresponding Author

Shehryar Nasir Khawaja

Orofacial Pain Medicine, Shaukat Khanum Memorial Cancer Hospitals and Research Centres, Lahore and Peshawar, Pakistan

School of Dental Medicine, Tufts University, Boston, Massachusetts, USA

Correspondence

Shehryar Nasir Khawaja, School of Dental Medicine, Tufts University, Boston, MA, USA.

Email: shehryarnasir@skm.org.pk

Roxanne Bavarian, Department of Oral and Maxillofacial Surgery, Harvard School of Dental Medicine, Boston, MA, USA.

Email: roxanne_bavarian@hsdm.harvard.edu

Search for more papers by this author
Roxanne Bavarian

Corresponding Author

Roxanne Bavarian

Department of Oral and Maxillofacial Surgery, Massachusetts General Hospital, Boston, Massachusetts, USA

Department of Oral and Maxillofacial Surgery, Harvard School of Dental Medicine, Boston, Massachusetts, USA

Correspondence

Shehryar Nasir Khawaja, School of Dental Medicine, Tufts University, Boston, MA, USA.

Email: shehryarnasir@skm.org.pk

Roxanne Bavarian, Department of Oral and Maxillofacial Surgery, Harvard School of Dental Medicine, Boston, MA, USA.

Email: roxanne_bavarian@hsdm.harvard.edu

Search for more papers by this author
Juan Martin Palomo

Juan Martin Palomo

Department of Orthodontics, School of Dental Medicine, Case Western Reserve University, Cleveland, Ohio, USA

Search for more papers by this author
First published: 17 May 2024
Citations: 2

首次发表: 2024 年 5 月 17
引用次数: 2

Rata Rokhshad and Hossein Mohammad-Rahimi were contributed equally to this project.
Rata Rokhshad 和 Hossein Mohammad-Rahimi 对该项目的贡献相同。

Abstract 抽象

Background and Objective 背景和目标

The accurate diagnosis of temporomandibular disorders continues to be a challenge, despite the existence of internationally agreed-upon diagnostic criteria. The purpose of this study is to review applications of deep learning models in the diagnosis of temporomandibular joint arthropathies.
尽管存在国际公认的诊断标准,但颞下颌关节疾病的准确诊断仍然是一个挑战。本研究的目的是回顾深度学习模型在颞下颌关节病诊断中的应用。

Materials and Methods 材料和方法

An electronic search was conducted on PubMed, Scopus, Embase, Google Scholar, IEEE, arXiv, and medRxiv up to June 2023. Studies that reported the efficacy (outcome) of prediction, object detection or classification of TMJ arthropathies by deep learning models (intervention) of human joint-based or arthrogenous TMDs (population) in comparison to reference standard (comparison) were included. To evaluate the risk of bias, included studies were critically analysed using the quality assessment of diagnostic accuracy studies (QUADAS-2). Diagnostic odds ratios (DOR) were calculated. Forrest plot and funnel plot were created using STATA 17 and MetaDiSc.
对 PubMed、Scopus、Embase、Google Scholar、IEEE、arXiv 和 medRxiv 进行了电子检索,检索时间截至 2023 年 6 月。与参考标准(比较)相比,纳入了通过基于人类关节或关节源性 TMD(人群)的深度学习模型(干预)对 TMJ 关节病进行预测、对象检测或分类的疗效(结局)的研究。为了评估偏倚风险,使用诊断准确性研究质量评估 (QUADAS-2) 对纳入的研究进行严格分析。计算诊断比值比 (DOR)。Forrest 图和漏斗图是使用 STATA 17 和 MetaDiSc 创建的。

Results 结果

Full text review was performed on 46 out of the 1056 identified studies and 21 studies met the eligibility criteria and were included in the systematic review. Four studies were graded as having a low risk of bias for all domains of QUADAS-2. The accuracy of all included studies ranged from 74% to 100%. Sensitivity ranged from 54% to 100%, specificity: 85%–100%, Dice coefficient: 85%–98%, and AUC: 77%–99%. The datasets were then pooled based on the sensitivity, specificity, and dataset size of seven studies that qualified for meta-analysis. The pooled sensitivity was 95% (85%–99%), specificity: 92% (86%–96%), and AUC: 97% (96%–98%). DORs were 232 (74–729). According to Deek's funnel plot and statistical evaluation (p =.49), publication bias was not present.
在确定的 1056 项研究中,有 46 项进行了全文评价,其中 21 项研究符合纳入纳入标准。四项研究被评为对 QUADAS-2 的所有领域具有低偏倚风险。所有纳入研究的准确率在 74% 到 100% 之间。敏感性范围为 54% 至 100%,特异性:85%–100%,Dice 系数:85%–98%,AUC:77%–99%。然后根据符合荟萃分析条件的 7 项研究的敏感性、特异性和数据集大小对数据集进行汇总。合并敏感性为 95% (85%–99%),特异性:92% (86%–96%) 和 AUC:97% (96%–98%)。DOR 为 232 (74–729)。根据 Deek 的漏斗图和统计评估 (p =.49),不存在发表偏倚。

Conclusion 结论

Deep learning models can detect TMJ arthropathies high sensitivity and specificity. Clinicians, and especially those not specialized in orofacial pain, may benefit from this methodology for assessing TMD as it facilitates a rigorous and evidence-based framework, objective measurements, and advanced analysis techniques, ultimately enhancing diagnostic accuracy.
深度学习模型可以检测 TMJ 关节病,具有高灵敏度和特异性。临床医生,尤其是那些不专门研究口面部疼痛的临床医生,可能会从这种评估 TMD 的方法中受益,因为它有助于建立严格的循证框架、客观测量和先进的分析技术,最终提高诊断准确性。

1 INTRODUCTION 1 引言

Temporomandibular disorders (TMDs) refer to a group of musculoskeletal conditions characterised by clinical manifestations affecting the masticatory muscles, temporomandibular joints (TMJ), and associated tissues. TMDs impact up to a third of adults, predominantly aged between 20 and 40.1-4 TMJ arthropathies refer to joint-based, or arthrogenous. These include articular disc disorders as well as arthritic conditions affecting the TMJ. They can be asymptomatic or alternatively may be painful or cause limited range of motion or intermittent locking.5, 6 Diagnosis is primarily based on a thorough history of present illness and physical examination. The examination includes palpation and auscultation of joint sounds, assessment of a mandibular range of motion and path of opening, and detection of any pain on palpation of the joint.7, 8 Diagnostic imaging or auditory devices like stethoscopes may be used if intra-articular abnormalities are suspected. However, the diagnostic performance of auditory devices or devices utilising vibrations is notably inadequate.9, 10 In some cases, minimally invasive surgery with arthroscopy and synovial biopsy are indicated to obtain an accurate diagnosis.3, 11, 12
颞下颌关节疾病 (TMD) 是指一组肌肉骨骼疾病,其特征是影响咀嚼肌、颞下颌关节 (TMJ) 和相关组织的临床表现。TMD 影响多达三分之一的成年人,主要年龄在 20 至 40 岁之间。1-4颞下颌关节病是指基于关节的或关节源性的。这些包括关节盘疾病以及影响 TMJ 的关节炎疾病。它们可能无症状,也可能是疼痛的或导致关节活动度受限或间歇锁。56诊断主要基于现病史和体格检查。检查包括关节声音的触诊和听诊,评估下颌关节活动范围和张开路径,以及检测关节触诊时是否有任何疼痛。78如果怀疑关节内异常,可以使用诊断成像或听觉设备(如听诊器)。然而,听觉设备或利用振动的设备的诊断性能明显不足。910在某些情况下,需要进行关节镜检查和滑膜活检的微创手术以获得准确的诊断。31112

With advances in artificial intelligence, machine learning and deep learning are increasingly utilised in medicine.13, 14 Machine learning refers to a set of methods that recognise data patterns and use them to predict future data or enable decisions under uncertain conditions. Deep learning is a subtype of machine learning that is particularly adept with large data sets. It uses complex, multi-layered algorithms modelled after the human brain to solve problems.15, 16 These algorithms mimic the structure of the human cognitive system in their formation of artificial neural networks (ANNs). Deep learning approaches can detect different types of cancers in their early stages and propose the best formulation for each drug in its target tissue.17-20 ANNs were able to perform tasks such as distinguishing between normal and abnormal structures in medical images.21 Clinical databases may include data points containing clinical history, radiology reports, and physical examination findings. AI-assisted natural language processing (NLP) systems can interpret this information, converting free text into structured data.22 Furthermore, convolutional neural networks (CNNs) are deep learning networks primarily used in image analysis. They apply weighted filters to each image element, simplifying recognition and response by the computer.18
随着人工智能的进步,机器学习和深度学习越来越多地用于医学。1314机器学习是指一组识别数据模式并使用它们来预测未来数据或在不确定条件下做出决策的方法。深度学习是机器学习的一个子类型,特别擅长处理大型数据集。它使用以人脑为模型的复杂、多层算法来解决问题。1516 元这些算法在人工神经网络 (ANN) 的形成中模仿了人类认知系统的结构。深度学习方法可以在早期检测不同类型的癌症,并为靶组织中的每种药物提出最佳配方。17-20 元人工神经网络能够执行诸如区分医学图像中的正常和异常结构等任务。21 临床数据库可能包括包含临床病史、放射学报告和体格检查结果的数据点。AI 辅助自然语言处理 (NLP) 系统可以解释此信息,将自由文本转换为结构化数据。22 此外,卷积神经网络 (CNN) 是主要用于图像分析的深度学习网络。它们将加权过滤器应用于每个图像元素,从而简化计算机的识别和响应。18

Deep learning methods and their application in dentistry have recently been a research subject. Examples include dental caries detection, estimating periodontal bone loss, planning treatments for orthodontics or maxillofacial surgeries, designing clinical decision support systems, numbering teeth, and detecting lesions.23-26 Pathology diagnostics, such the evaluation of the TMJ and early detection of TMJ arthropathies through deep learning, is a critical and sensitive task that has been the focus of numerous studies.27-30 The present systematic review aims to report on the primary studies investigating deep learning applications on TMJ arthropathies.
深度学习方法及其在牙科中的应用最近成为一个研究主题。示例包括龋齿检测、估计牙周骨质流失、规划正畸或颌面手术的治疗、设计临床决策支持系统、对牙齿进行编号和检测病变。23-26病理诊断,例如通过深度学习评估 TMJ 和早期发现 TMJ 关节病,是一项关键且敏感的任务,一直是众多研究的重点。27-30 元本系统综述旨在报告调查深度学习在 TMJ 关节病中的应用的主要研究。

2 METHODS 2 方法

This systematic review explores the relationship between TMJ arthropathies and deep learning. This study follows the Preferred Reporting Items for a Systematic Review and Meta-analysis of Diagnostic Test Accuracy (PRISMA-DTA) guideline.31 The study protocol was registered at PROSPERO [CRD42022354183]. This systematic review answers the following question: What is the efficacy (outcome) of deep learning models (intervention) in the assessment of TMJ arthropathies (population) in comparison with conventional diagnostic protocols (comparison)?
本系统综述探讨了 TMJ 关节病与深度学习之间的关系。本研究遵循诊断测试准确性系统评价和荟萃分析的首选报告项目 (PRISMA-DTA) 指南。31 研究方案在 PROSPERO 注册 [CRD42022354183]。本系统综述回答了以下问题:与传统诊断方案(比较)相比,深度学习模型(干预)在评估 TMJ 关节病(人群)方面的疗效(结果)如何?

2.1 Eligibility criteria 2.1 资格标准

Diagnostic studies reporting the following criteria were included:
纳入了报告以下标准的诊断研究:
  • Population: Studies in which deep learning models were implemented on TMD medical diagnostic data, including: panoramic radiograph, cone beam computed tomography (CBCT), magnetic resonance imaging (MRI), TMJ acoustic sounds, and structured data. Structured data is defined as data that can be easily analysed and modelled by deep learning algorithms. Typically, this data is presented in a tabular format with well-defined columns and rows.
    人群: 对 TMD 医疗诊断数据实施深度学习模型的研究,包括:全景 X 光片、锥形束计算机断层扫描 (CBCT)、磁共振成像 (MRI)、TMJ 声学声音和结构化数据。结构化数据被定义为可以通过深度学习算法轻松分析和建模的数据。通常,此数据以表格格式显示,其中包含定义明确的列和行。
  • Intervention: Deep learning approaches
    干预:深度学习方法
  • Comparison: Studies comparing the performance of deep learning models with a reference standard
    比较:将深度学习模型的性能与参考标准进行比较的研究
  • Outcome: Reported accuracy, sensitivity, specificity, the area under the curve (AUC), F1-score, recall, receiver operating characteristic curve (ROC), dice similarity coefficient (DSC) and co-occurrence
    结果:报告的准确性、敏感性、特异性、曲线下面积 (AUC)、F1 评分、召回率、受试者工作特征曲线 (ROC)、dice 相似系数 (DSC) 和共现

Studies reporting the following criteria were excluded: study results that do not include sufficient details about the data used for training and testing (e.g., dataset size, modalities); studies that do not explain deep learning models in sufficient detail; and review articles.
报告以下标准的研究被排除在外:研究结果不包括有关用于训练和测试的数据的足够细节(例如,数据集大小、模式);没有足够详细地解释深度学习模型的研究;和评论文章。

2.2 Information sources and search
2.2 信息来源和搜索

The following electronic databases were searched up to June 2023: Medline (via PubMed), Scopus, Google Scholar, Embase, ArXiv, medRxiv, and IEEE. No limitation was conducted based on the publication time and the language. Adapted keywords were used to search each database (Table 1). Additionally, the bibliographies of the included papers were manually cross-referenced.
检索了截至 2023 年 6 月的以下电子数据库:Medline(通过 PubMed)、Scopus、Google Scholar、Embase、ArXiv、medRxiv 和 IEEE。没有根据发布时间和语言进行限制。使用改编的关键字来搜索每个数据库(表 1)。此外,所收录论文的参考书目是手动交叉引用的。

TABLE 1. Specific search query for different databases (till June 2023).
表 1. 不同数据库的特定检索查询(至 2023 年 6 月)。
Data Base 数据库 Search Query 搜索查询 Results 结果
Pubmed 公共医学 (‘Temporomandibular Joint’[MeSH Terms] OR ‘Temporomandibular Joint Disorders’[MeSH Terms] OR ‘Mandibular Condyle’[MeSH Terms] OR ‘Joint dislocations/surgery’([MeSH Terms]) AND (‘Deep Learning’[MeSH Terms] OR (Image Processing, Computer-assisted/methods[MeSH Terms]) OR ‘Artificial Intelligence’[MeSH Terms] OR ‘Machine learning’[MeSH Terms] OR ‘Algorithms’[MeSH Terms] OR ‘Neural Networks, computer ‘[MeSH Terms])
(“颞下颌关节”[MeSH 术语] OR “颞下颌关节疾病”[MeSH 术语] OR “下颌髁”[MeSH 术语] OR “关节脱位/手术”([MeSH 术语]) AND ('深度学习'[MeSH 术语] OR (图像处理,计算机辅助/方法[MeSH 术语]) OR “人工智能”[MeSH 术语] OR “机器学习”[MeSH 术语] OR “算法”[MeSH 术语] OR '神经网络,计算机'[MeSH 术语] OR '神经网络,计算机'[MeSH 术语])
325
Google Scholar 谷歌学术

Allintitle:(‘temporomandibular Joint’ OR ‘temporomandibular Joint Disorders’ OR ‘mandibular Condyle’ OR ‘mandibular joint physiology’) AND (‘deep learning’ OR ‘image processing’ OR ‘artificial Intelligence’ OR ‘machine learning’ OR ‘neural network’)
Allintitle:('颞下颌关节' OR '颞下颌关节疾病' OR '下颌髁' OR '下颌关节生理学') AND ('深度学习' OR '图像处理' OR '人工智能' OR '机器学习' OR '神经网络')

11
Embase Embase 公司 (‘Deep learning’:ti, ab, kw OR ‘machine learning’:ti, ab, kw OR ‘artificial intelligence’:ti, ab, kw OR ‘neural network’:ti, ab, kw) AND (‘temporomandibular joint’:ti, ab, kw OR ‘temporomandibular joint disorders’:ti, ab, kw OR ‘mandibular condyle’:ti, ab, kw OR ‘mandibular joint physiology’:ti, ab, kw OR ‘tmj’:ti, ab, kw)
(“深度学习”:ti, ab, kw 或 “机器学习”:ti, ab, kw 或 “人工智能”:ti, ab, kw 或“神经网络”:ti, ab, kw)AND ('颞下颌关节':ti, ab, kw 或 '颞下颌关节疾病':ti, ab, kw OR '下颌髁':ti, ab, kw 或 '下颌关节生理学':ti, ab, kw 或 'tmj':ti, ab, kw)
38
Scopus

TITLE-ABS-KEY ((‘deep learning’ OR ‘artificial intelligence’ OR ‘machine learning’ OR ‘neural network’ OR ‘image processing’) AND (‘Temporomandibular Joint’ OR ‘Temporomandibular Joint Disorders’ OR ‘Mandibular Condyle’ OR ‘Mandibular joint physiology’ OR ‘TMJ’))
TITLE-ABS-KEY ((('深度学习' OR '人工智能' OR '机器学习' OR '神经网络' OR '图像处理') AND ('颞下颌关节' OR '颞下颌关节疾病' OR '下颌髁' OR '下颌关节生理学' OR 'TMJ'))

740
ArXiv 重试    错误原因

(‘deep learning’ OR ‘machine learning’ OR ‘artificial intelligence’ OR ‘neural network’) AND (‘Temporomandibular Joint’ OR ‘Mandibular Condyle’ OR ‘temporomandibular disorder’ OR ‘TMJ’) 重试    错误原因

2
medRxiv 重试    错误原因

(‘Deep learning’ OR ‘machine learning’ OR ‘artificial intelligence’ OR ‘neural network’) AND (‘Temporomandibular Joint’ OR ‘Mandibular Condyle’ OR ‘temporomandibular disorder’ OR ‘TMJ’) 重试    错误原因

10

IEEE

(‘deep learning’ OR ‘machine learning’ OR ‘artificial intelligence’ OR ‘neural network’ OR ‘image processing’) AND (‘Temporomandibular Joint’ OR ‘Mandibular Condyle’ OR ‘Mandibular joint physiology’ OR ‘temporomandibular disorder’ OR ‘TMJ’) 重试    错误原因

28

2.3 Study selection 2.3 研究选择

Citations were managed using Endnote 20 (Clarivate, Philadelphia, USA). Following removal of duplicate studies, two independent reviewers screened the titles and abstracts (B.J. and F.S.). A third reviewer was consulted to resolve any disagreements (R.R). Based on the inclusion and exclusion criteria, two independent investigators evaluated the full texts of eligible studies (B.J. and F.S.). In cases of disagreement between the two reviewers, a third investigator was sought to reach consensus (R.R).
使用 Endnote 20 (Clarivate, Philadelphia, USA) 管理引文。删除重复研究后,两名独立评价员筛选了标题和摘要(B.J. 和 F.S.)。咨询了第三位评价员以解决任何分歧 (R.R)。根据纳入和排除标准,两名独立研究者评估了合格研究的全文 (BJ 和 F.S.)。在两位评价员之间存在分歧的情况下,寻求第三名研究者达成共识 (R.R)。

2.4 Data collection and extraction
2.4 数据收集和提取

Two reviewers (F.S. and B.J.) collected data from the included studies. A third reviewer (R.R.) revised data collection for discrepancies and disagreements. Extracted data items included information about the authors and the publication year (bibliographic details), modality of data (panoramic, MRI, structured data, CBCT, and TMJ acoustic signals), size of the dataset (if the study had included information regarding the train, validation, and test datasets, they were incorporated into the analysis), labeling procedures (how the reference test is established), deep learning tasks (classification, object detection, segmentation), preprocessing (removal of unwanted distortions and enhancing specific qualities that are crucial for the intended application), augmentation (a technique that involves artificially expanding the training set by generating modified copies of a dataset using existing data), performance, and outcomes.
两位评价员 (F.S. 和 B.J.) 从纳入的研究中收集资料。第三位评价员 (R.R.) 修订了数据收集的差异和分歧。提取的数据项包括有关作者和出版年份的信息(书目详细信息)、数据模态(全景、MRI、结构化数据、CBCT 和 TMJ 声学信号)、数据集的大小(如果研究包含有关训练、验证和测试数据集的信息,则它们被纳入分析)、标记程序(参考测试是如何建立的), 深度学习任务(分类、对象检测、分割)、预处理(去除不需要的失真并增强对预期应用至关重要的特定质量)、增强(一种涉及通过使用现有数据生成数据集的修改副本来人为扩展训练集的技术)、性能和结果。

2.5 Risk of bias and applicability
2.5 偏倚风险和适用性

The risk of bias was assessed independently by two reviewers (F.S. and B.J) using the Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) tool.32 Four risks of bias domains are included in the QUADAS-2 checklist: data selection, index test, reference standard and flow and timing. Moreover, it evaluates the applicability of a study on patient selection, index tests, and reference standards. Disputes or discrepancies between the two reviewers were resolved by a third investigator (R.R).
两位评价员 (F.S. 和 B.J.) 使用诊断准确性研究质量评估 (QUADAS-2) 工具独立评估偏倚风险。32 QUADAS-2 检查表中包括四个偏倚风险域:数据选择、指标测试、参考标准以及流程和时间。此外,它还评估了患者选择、指标测试和参考标准研究的适用性。两位评价员之间的争议或差异由第三位研究者 (R.R.) 解决。

Each domain was graded high, low, or unclear risk of bias based on the full text. A high bias risk was considered in ‘patient selection’ due to a lack of information about the dataset and unclear data-splitting strategies. For the ‘index test,’ indicators included poor reporting on the reproducibility of tests, insufficient information about model construction, and no robustness analysis. Regarding ‘reference standard,’ a lack of information about the definition of the reference standard and the use of only one examiner were indicators of a high risk of bias. A final issue was ‘flow and timing’; varying reference standards across studies and an inappropriate interval between the index test and reference standard were considered high risks of bias.31
根据全文,每个领域的偏倚风险分为高、低或不明确。由于缺乏有关数据集的信息和不明确的数据拆分策略,在 “患者选择” 中考虑了高偏倚风险。对于 “指数检验 ”,指标包括对检验可重复性的报告不佳、关于模型构建的信息不足以及没有稳健性分析。关于“参考标准”,缺乏有关参考标准定义的信息以及仅使用一名审查员是高偏倚风险的指标。最后一个问题是“流程和时间”;不同研究的参考标准不同以及指标测试和参考标准之间的不适当间隔被认为是高偏倚风险。31

2.6 Data synthesis 2.6 数据合成

The main study findings were synthesised in a narrative format. As reporting varied widely, quantitative synthesis was restricted to studies that reported sensitivity and specificity (n = 7). The ROC curves were used to calculate the AUC, sensitivity, and specificity. Summary Receiver Operating Characteristic (SROC) curves and AUCs were used to illustrate the diagnostic test performance. The plots were created with STATA 17 (StataCorp LP, College Station, Texas, USA) and MetaDiSc, and all analyses were conducted using STATA 17.0. To account for expected heterogeneity, the DerSimonian-Laird method (random-effects model) was applied, as well as I2 >50% or p .05, as an indication of data confirmation. Deeks' funnel plot was used to assess publication bias.
主要研究结果以叙述形式进行综合。由于报告差异很大,定量综合仅限于报告敏感性和特异性的研究 (n = 7)。ROC 曲线用于计算 AUC 、敏感性和特异性。总结:受试者工作特征 (SROC) 曲线和 AUC 用于说明诊断测试性能。这些图是使用 STATA 17 (StataCorp LP, College Station, Texas, USA) 和 MetaDiSc 创建的,所有分析均使用 STATA 17.0 进行。为了解释预期的异质性,应用 DerSimonian-Laird 方法(随机效应模型)以及 I2 >50% 或 p .05,作为数据确认的指示。Deeks 的漏斗图用于评估发表偏倚。

In the absence of reports on the number of true positives, true negatives, false positives, and false negatives, diagnostic odds ratios (DORs) were calculated as pooled outcomes. We calculated DORs as follows:
在没有关于真阳性、真阴性、假阳性和假阴性数量的报告的情况下,诊断比值比 (DOR) 计算为合并结局。我们按如下方式计算 DORs:
DOR=Sensitivity×Specificity(1Senitivity)×(1Specificity)

Furthermore, the Grading of Recommendations Assessment, Development and Evaluation (GRADE) framework was employed to appraise the level of confidence and dependability in the data compiled from an extensive array of studies covering diverse activities (https://www.gradeworkinggroup.org).33
此外,还采用了建议分级评估、制定和评价 (GRADE) 框架来评估从涵盖不同活动的广泛研究中汇编的数据的可信度和可靠性 (https://www.gradeworkinggroup.org)。33

3 RESULTS 3 结果

3.1 Study selection 3.1 研究选择

The full text of 46 of the 1156 identified studies were assessed. After full-text assessment, 21 studies were included (Figure 1) and are summarised in Table 1.
评估了 1156 项已确定研究中 46 项的全文。经过全文评估,纳入了 21 项研究(图 1),总结见表 1

Details are in the caption following the image
PRISMA flow chart for the systematic review.
PRISMA 系统评价流程图。

3.2 Study characteristics
3.2 研究特征

Regarding data modalities, studies used various methods. A total of nine data modalities were employed: CBCT (n = 4), structured data (n = 7), MRI (n = 6), MRI and structured data (n = 1), panoramic radiographs (n = 2), and TMJ acoustic sounds (n = 1). Most studies used expert opinions to set the reference test for datasets (n = 11); however, ten studies did not mention the methods that they used. Specifically, one human expert (n = 4), two (n = 4), or three or more (n = 3) participated in defining the reference test in those studies. Image classification was the most selected deep learning task (n = 9) in included studies, followed by regression (n = 4), and object detection (n = 1) and seven studies used more than one task (three of which used segmentation) (Table 2).
关于数据模态,研究使用了各种方法。共采用九种数据模式:CBCT (n = 4)、结构化数据 (n = 7)、MRI (n = 6)、MRI 和结构化数据 (n = 1)、全景 X 光片 (n = 2) 和 TMJ 声学声音 (n = 1)。大多数研究使用专家意见来设置数据集的参考测试 (n = 11);然而,10项研究没有提到他们使用的方法。具体来说,在这些研究中,一名人类专家 (n = 4)、两名 (n = 4) 或三名或更多 (n = 3) 参与了定义参考测试。图像分类是纳入研究中选择最多的深度学习任务 (n = 9),其次是回归 (n = 4) 和对象检测 (n = 1),7 项研究使用了不止一个任务(其中 3 项使用分割)(表 2)。

TABLE 2. The summary of included studies, which are categorised by the modality of data and publication year.
表 2. 纳入研究的摘要,按数据模式和出版年份分类。
Author /year/ref. number 作者 /年份/参考文献编号 Objective/aim 目标/目标 Modality 形态 Data Set (Train, Validation, Test)
数据集 (训练、验证、测试)
Inclusion & exclusion criteria (if any)
纳入和排除标准(如果有)
Labeling procedure 贴标程序 Machine Learning Task Preprocessing Augmentation Model (Algorithm Architecture) Reported diagnostic performance Outcomes
Kreiner et al. 202238
Kreiner 等人,2022年 38
Diagnosis of orofacial pain and TMD
口面部疼痛和 TMD 的诊断
Clinical records 临床记录 11 NA NA Regression NA NA MLP Accuracy 90%
Kajor et al. 202237
Kajor 等人,202237
TMD classification TMD 分类 TMJ acoustic signals TMJ 声学信号 129 Diagnosed with DC/TMD were included.
包括诊断为 DC/TMD 的患者。
One dentist annotate data.
一名牙医对数据进行注释。
Classification, Segmentation NA NA U-Net Sensitivity 98%
DCS 85%
Taskıran et al.202136 TMD Diagnosis TMD 诊断 Structured data 结构化数据 241 The data between 0 and 25 kHz (TMJ sounds) is reduced to 0–5 kHz
0 到 25 kHz 之间的数据(TMJ 声音)减少到 0-5 kHz
Class I TMD patient and class II healthy patient label based on patient's records
基于患者记录的 I 类 TMD 患者和 II 类健康患者标签
Classification Images are con- verted into a PNG format 521 593 RGB-based colour image. NA CNN Classification success 94%
Kim, et al. 202128
Kim 等人,2021年 28
TMJ Disc perforation prediction
TMJ 椎间盘穿孔预测
Structured data 结构化数据 443 Patients with tumours, congenital deformity, inadequate quality of MRI, or a prior TMJ surgery (total joint replacement, arthroplasty) were excluded.
患有肿瘤、先天性畸形、MRI 质量不佳或既往 TMJ 手术 (全关节置换术、关节成形术) 的患者被排除在外。
Two oral and maxillofacial surgeons and one oral and maxillofacial radiologist analysed MRIs
两名口腔颌面外科医生和一名口腔颌面放射科医生分析了 MRI
Regression NA NA ANN Sensitivity 96%
Specificity 85%

Laputkovai, et al.202041
Laputkovai 等人,202041

Salivary Proteomic patterns in TMD
TMD 中的唾液蛋白质组学模式
Structured data 结构化数据 35 NA NA

Classification

NA NA SNN Recognition capability 97%
Sharma et al. 201911
Sharma 等人,2019年 11
Predication of TMJ Syndrome
TMJ 综合征的预测
Structured data 结构化数据 2300 Patients who had symptoms of TMJ and other risks were (vertigo, tinnitus headaches, and migraines) included.
包括有 TMJ 症状和其他风险 (眩晕、耳鸣头痛和偏头痛) 的患者。
NA Regression NA NA ANN Correlation coefficient of output and target data 99%
Radke et al. 201744
Radke 等人,201744
Detection of TMD TMD 检测 Structured data 结构化数据 68 (34/−/34) 68 (34/-/34) Acute closed locked were included.
包括急性闭合锁定。
Three opening and closing point were recorded.
记录了三个开盘点和收盘点。
Classification, Regression NA NA ANN Specificity 100%
Sensitivity 92%
Accuracy 87%
Bas et al. 201243
Bas 等人,2012年 43
Temporomandibular internal derangement classification
颞下颌关节内部紊乱分类
Structured data 结构化数据 219 (161/−/58) 219 (161/-/58) NA Evaluated by an experienced OMFS
由经验丰富的 OMFS 评估
Classification NA NA ANN

Sensitivity

100%
Specificity 95%

Ozsari, et al. Ozsari 等人。

2023

TMD diagnosis using MRI 重试    错误原因 MRI

2576

(64/16/20)

Patients with TMJ arthropathies were included.
纳入 TMJ 关节病患者。

Patients with prior surgery to the mandible or those with craniofacial syndromes were excluded
既往接受过下颌骨手术的患者或颅面综合征患者被排除在外

NA

Segmentation,

Classification

NA Augmentation by contrast, flip and rotation

CNN

Xception

ResNet-101

MobileNetV2

InceptionV3

DenseNet-121

ConvNeXt

Accuracy Rate 97%
Precision 97%
Recall 97%
F1- Score 97%
NPV 96%
Specificity 94%
AUC 97%
Kappa score 94%

Yoon. K, et al. 202346
尹。K 等人,202346

MRI-based clinical decision support engine
基于 MRI 的临床决策支持引擎
MRI

2360

(8/−/2) (8/-/2)

Exclusion criteria were: previous operation, benign tumour, suspected adhesion, lateral or medial disc displacement, severe MRI artefacts, and inconsistent annotations from different clinicians.
排除标准是:既往手术、良性肿瘤、疑似粘连、外侧或内侧椎间盘移位、严重的 MRI 伪影以及不同临床医生的注释不一致。
Two independent specialists
两名独立专家

Object Detection,

Classification

Enhancing the contrast of ROI Ten times augmentation by rotation, shift, zoom NN mAP 81%

Lin, et al. 202230
Lin, et al. 202230

Detection of displacement of TMJ disc on MRI
MRI 上检测 TMJ 椎间盘移位
MRI 9009 (80/10/10) Patients with diagnosed with disc displacement on either or both joints, with no history of systemic disease or cranio-maxillofacial surgery, with a MRI were included. MRI with motion and other artefacts were excluded.
纳入诊断为一侧或两侧关节椎间盘移位、无全身性疾病或颅颌面外科手术史、MRI 检查的患者。排除了带有运动和其他伪影的 MRI。

Two physicians with 3–7 years of experience on MRI classified the images into ADD and non-ADD groups.
两名具有 3-7 年 MRI 经验的医生将图像分为 ADD 组和非 ADD 组。

Third expert with 30 years of experience resolved any disagreements.
第三位拥有 30 年经验的专家解决了任何分歧。

Classification NA NA CNN ROC 98%
Sensitivity 95%
Specificity 92%
Specificity 98%
Kao et al. 202239
Kao 等人,2022,39
TMD classification TMD 分类 MRI 100 Patients >20 years of age who had an articular disc disorder of the TMJ and healthy controls were included
包括患有 TMJ 关节盘疾病的 >20 岁患者和健康对照者
One oral and maxillofacial surgery specialist annotate data.
一名口腔颌面外科专家对数据进行注释。
Classification, Segmentation Image equalisation, resizing Zoom, rotating, and flipping U-Net, InceptionResNetV2, InceptionV3, DenseNet169, and VGG16 Accuracy (U-Net) 95%
Lee et al. 20226
Lee 等人,2022年 6
Detection of anterior disc displacement
检测前椎间盘移位
MRI 2520 (1640/411/468) NA A TMD specialist more than 7 years of experience in TMD diagnosed based on the criteria for TMD Axis I
根据 TMD 轴 I 的标准诊断的 TMD 专家,在 TMD 方面拥有超过 7 年的经验
Object detection NA Flipping, contrast, and brightness change VGG-16 AUC 88%
Accuracy 83%

Iwasaki et al. Iwasaki 等人。

201542
2015年 42

TMD Progress determinationTMD Progress determination
TMD 进度确定TMD 进度确定
MRI, Structured data MRI, 结构化数据 295 NA The cases were interpreted by a maxillofacial radiologist
这些病例由颌面放射科医生解释
Regression NA NA BBN Accuracy 100%

Jung et al. 202112
Jung 等人,2021年 12

OA classification in TMJ TMJ 中的 OA 分类 Panoramic 全景的 858 (6/2/2) Patients with detected bone lesions or diseases other than OA, those with a time interval of 2 month or more between panoramic and CBCT, or illegible or blurry images were excluded.
排除检测到骨病变或 OA 以外的疾病的患者,全景和 CBCT 之间间隔 2 个月或以上的患者,或图像难以辨认或模糊的患者。
TMD-OA was diagnosed by checking the CBCTs by experts.
TMD-OA 是通过专家检查 CBCT 来诊断的。
Classification Images cropped, resized to 224*224 pixels in PNG format NA EfficientNet-B7, ResNet-152 Sensitivity (ResNet-152) 95%
Specificity (ResNet-152) 91%
Accuracy (ResNet-152) 88%
AUC (ResNet-152) 95%
Kim, et al. 202027
Kim 等人,2020年 27
OA Classification and Condyle Detection
OA 分类和髁突检测
Panoramic 全景的 1000 (80/−20) 1000 (80/-20) NA Images were labelled with the assistance of two oral physicians or orofacial pain specialist.
在两名口腔医生或口面部疼痛专家的协助下对图像进行标记。
Object Detection, Classification NA Duplication and rotation. R-CNN Accuracy 85%
Sensitivity 54%
Specificity 94%
AUC 82%
Shoukri et al. 201935
Shoukri 等人,2019年 35
Diagnosing TMJ OA 诊断 TMJ OA CBCT 530 (259 from de dumas et al. 2018)
530(259 来自 De Dumas等人,2018 年)
NA Two expert classified condylar morphology.
两位专家分类髁突形态。
Classification, Segmentation NA

Simulated data by adding perlin noise

ANN Accuracy 91%
Tubau et al. 201934
Tubau 等人,2019年 34
OA TMD Classification OA TMD 分类 CBCT

293 (80/−/20) 293 (80/-/20)

NA Classified by two expert clinicians
由两位临床专家分类
Classification NA NA ANN Accuracy 91%
Dumast et al. 201817
Dumast 等人,2018年 17
TMJ OA Classification TMJ OA 分类 CBCT 293 (259/−/34) 293 (259/−/34) NA Experienced clinicians classified patients into 6 groups: control, close to normal, degeneration 1, degeneration 2, degeneration 3, degeneration 4–5
经验丰富的临床医生将患者分为 6 组:对照组、接近正常组、退化 1 组、退化 2 组、退化 3 组、退化 4-5 组
Classification NA NA ANN Agreement between the clinician's consensus and SVA classification 91%

Prieto et al. 2018. Prieto 等人,2018 年。

19

TMJ OA Classification TMJ OA 分类 CBCT 263 (80/−/20) 263 (80/−/20) NS Experienced clinicians classified datasets into 6 categories: control, close to normal, degeneration 1, degeneration 2, degeneration 3, degeneration 4–5
经验丰富的临床医生将数据集分为 6 类:对照、接近正常、退化 1、退化 2、退化 3、退化 4-5
Classification NA NA ANN Accuracy 93%
Zhang, et al. 202140
Zhang 等人,202140
TMJ OA Diagnosis 颞下颌关节骨关节炎诊断 Structured data + CBCT 结构化数据 + CBCT 92 (80/−/20) 92 (80/−/20) NA Patient diagnosis was confirmed by a TMJ specialist
患者诊断由 TMJ 专家确认
Classification NA NA FNN AUC 77%
Accuracy 74%
  • Abbreviations: ANN, Artificial Neural Network; AUC, Area under the ROC Curve; BBN, Bayesian Belief Network; CNN, Convolutional Neural Network; CBCT, Cone Beam Computed Tomography; systems DC/TMD, Diagnostic Criteria for Temporomandibular Disorders; DSC, Mean Dice Similarity Coefficient; FNN, Feedforward Neural Network; ID, Internal derangements; IoU, Intersection over Union; MLP, Multilayer Perceptron; MRI, Magnetic Resonance Imaging; MSE, Mean Squared Error; NCC, Normalised Cross Correlation; NMI, Normalised Mutual Information; NN, Neural Network, OA, Osteoarthritis; OMFS, Oral and Maxillofacial Surgeon; ROC, Receiver Operating characteristic Curve; SNN, Spiking Neural Network; SVA, Shape Variation Analyser; TMJ, Temporomandibular Joint; TMJOA, Temporomandibular Joint Osteoarthritis; TMD, Temporomandibular Disorders.
    缩写:ANN,人工神经网络;AUC,ROC 曲线下面积;BBN,贝叶斯信仰网络;CNN,卷积神经网络;CBCT,锥形束计算机断层扫描;系统 DC/TMD,颞下颌关节疾病诊断标准;DSC, 平均骰子相似系数;FNN,前馈神经网络;ID,内部紊乱;IoU,交集对联合;MLP,多层感知器;MRI、磁共振成像;MSE,均方误差;NCC,归一化互相关;NMI,归一化互信息;NN,神经网络,OA,骨关节炎;OMFS,口腔颌面外科医生;ROC,受试者工作特征曲线;SNN,脉冲神经网络;SVA,形状变化分析仪;TMJ,颞下颌关节;TMJOA,颞下颌关节骨关节炎;TMD,颞下颌关节疾病。

3.3 Risk of bias and applicability
3.3 偏倚风险和适用性

The included studies were rated for risk of bias assessment using QUADAS-2 assessment tools (Figure 2). In 17 studies, the index text was graded as having a low risk of bias, and in two studies, a high risk of bias. The patient selection also had a low rate of high-risk bias for 19 studies. Based on the reference standard, nine studies were at low bias risk, and eight were classified as high bias risk. Among the included studies, four showed a low risk of bias for all domains.
使用 QUADAS-2 评估工具对纳入的研究进行偏倚风险评估(图 2)。在 17 项研究中,索引文本被评为低偏倚风险,在 2 项研究中被评为高偏倚风险。19 项研究的患者选择也具有低高风险偏倚率。根据参考标准,9 项研究处于低偏倚风险,8 项研究被归类为高偏倚风险。在纳入的研究中,有 4 项显示所有领域的偏倚风险都较低。

Details are in the caption following the image
(A) Risk of bias assessment of each domain. Included studies were assessed with QUADAS2-tool. Since a gold standard reference was not used in the reference standard domain, bias was high. The least biased domain was timing and flow. (B) Applicability assessment of included studies.
(A) 每个领域的偏倚风险评估。采用QUADAS2工具对纳入的研究进行评估。由于参考标准域中未使用金标准参考,因此偏差很高。偏差最小的领域是 timing and flow。(B) 纳入研究的适用性评估。

3.4 Findings of the studies
3.4 研究结果

According to the reports, the accuracy of deep learning of all included studies ranged from 73.9%–100%. Sensitivity ranged from 54% to 100% and specificity from 85% to 100%. The Dice coefficient ranged from 85%–98% and AUC ranged from 77% to 99%. Studies using structured data had a 92%–100% sensitivity range.8, 11, 12, 34-43 Bas et al. reported 100% sensitivity for an ANN model that classified 219 temporomandibular internal derangements based on structured data;43 whereas Radke et al. reported 92% for TMD detection (n = 68) using an ANN model using structured data.44
根据报告,所有纳入研究的深度学习准确率在 73.9%-100% 之间。敏感性为 54%-100%,特异性为 85%-100%。Dice 系数范围为 85%–98%,AUC 范围为 77% 至 99%。使用结构化数据的研究的敏感性范围为 92%-100%。8111234-43Bas 等人报告了 ANN 模型的 100% 灵敏度,该模型根据结构化数据对 219 例颞下颌关节内部紊乱进行分类;43 而 Radke 等人报告了使用结构化数据的 ANN 模型的 TMD 检测率为 92 (n = 68)。44

Based on CBCT, studies reported 74%–93% accuracy. Zhang et al. classified 92 TMJ-OA CBCT and structured data with an FNN model and reported 74% accuracy.40 Studies using panoramic radiographs reported a range of 54%–95% for sensitivity. Jung et al. trained the EfficientNet-B7 model and classified 856 OA,12 which resulted in 95% sensitivity.
基于 CBCT,研究报告了 74%-93% 的准确率。Zhang 等人使用 FNN 模型对 92 个 TMJ-OA CBCT 和结构化数据进行了分类,并报告了 74% 的准确率。40 项使用全景 X 光片的研究报告了 54%-95% 的敏感性范围。Jung 等人训练了 EfficientNet-B7 模型并对 856 OA,12 进行了分类,从而获得了 95% 的灵敏度。

3.5 Synthesis of results 3.5 结果综合

The datasets were pooled based on the sensitivity, specificity, and dataset size of the seven included studies. As shown in Figure 3A, a forest plot of the 95% confidence intervals for sensitivity and specificity is displayed. Sensitivity was 95% (85%–99%), specificity was 92% (86%–96%), and AUC was 97% (96%–98%). DORs were 232 (74–729) (Figure 3B). According to Deek's funnel plot and statistical evaluation (p =.49), publication bias was not present (Figure 3C). Overall, the I2 value in this group was 99%, suggesting significant heterogeneity. Threshold effects in the amount of 0.03 may have caused heterogeneity. The level of confidence in the evidence using the GRADE system (Table 3) demonstrated a generally medium level of certainty in the evidence.
根据 7 项纳入研究的敏感性、特异性和数据集大小对数据集进行合并。如图 3A 所示,显示了灵敏度和特异性的 95% 置信区间的森林图。敏感性为 95% (85%–99%),特异性为 92% (86%–96%),AUC 为 97% (96%–98%)。DOR 为 232 (74–729)(图 3B)。根据 Deek 的漏斗图和统计评估 (p =.49),不存在发表偏倚(图 3C)。总体而言,该组的 I2 值为 99%,表明存在显著的异质性。量为 0.03 的阈值效应可能导致异质性。使用GRADE系统的证据可信度(表3)表明证据的质量通常为中等水平。

Details are in the caption following the image
(A) Simplifying the forest plot, each square represents a study, and lines indicate a mean and 95% confidence interval. Pooled values are using diamond symbols. (B) A summary of receiver operating characteristics (ROC) curves is provided. Each circle represents a different study, while the solid square represents the summary intersection point between sensitivity and specificity. (C) Deek's funnel plot is used to assess publication bias. According to a p value of .49, there is no publication bias, which indicates a symmetrical funnel shape.
(A) 简化森林图,每个方块代表一项研究,线条表示平均值和 95% 置信区间。合并值使用菱形符号。(B) 提供了受试者工作特性 (ROC) 曲线的摘要。每个圆圈代表不同的研究,而实心方块代表敏感性和特异性之间的总结交点。(C) Deek 的漏斗图用于评估发表偏倚。根据 p 值 .49,没有发表偏倚,这表明对称的漏斗形状。
TABLE 3. The results obtained through the Grading of Recommendations Assessment, Development, and Evaluation (GRADE) were categorised into four separate clusters, with each cluster containing the studies that were included.
表 3. 通过建议分级评估、开发和评价 (GRADE) 获得的结果分为四个独立的集群,每个集群都包含被纳入的研究。
Outcome 结果 No of studies (No of patients)
研究数量 (Number of patients)
Study design 研究设计 Factors that may decrease certainty of evidence
可能降低证据质量的因素
Certainty of evidence
Risk of bias 偏倚风险 Indirectness 间接性 Inconsistency 不一致 Imprecision 不精确 Publication bias 发表偏倚
Detection of human TMJ arthropathies
检测人类 TMJ 关节病
8 studies 14 102 samples
8 项研究 14 102 个样本
Diagnostic accuracy study
诊断准确性研究
Not serious 不严重 Not serious 不严重 Seriousa
严重的
Not serious 不严重 None 没有 ⨁⨁⨁◯ Moderate
  • a Inconsistency arose from the utilisation of different deep learning tasks.
    a 不一致是由于使用不同的深度学习任务而引起的。

4 DISCUSSION 4 讨论

The growing prevalence of TMDs, which can impact the quality of life for many patients, necessitates immediate treatment in some cases, while for others, symptoms may subside without intervention.9, 11, 12 Diagnosing TMDs demands a comprehensive evaluation of a patient's symptoms, signs, and behavioural and psychosocial characteristics. Hence, TMDs may often go undetected by clinicians or get mistaken for other causes of orofacial pain.9, 12, 40 Treatment options may vary from pharmacotherapy with anti-inflammatory medications, occlusal orthotic appliances, physical therapy, to arthroscopic surgery and even total joint replacement of the TMJ. Thus, a proper diagnosis of TMJ is important to tailor treatment to the severity of disease and its cause, particularly in cases of inflammatory arthrtis such as rheumatoid arthritis, psoriatic arthritis or ankylosing spondylitis.35
TMD 的日益流行会影响许多患者的生活质量,在某些情况下需要立即治疗,而对于其他情况,症状可能会在没有干预的情况下消退。91112诊断 TMD 需要对患者的症状、体征以及行为和社会心理特征进行全面评估。因此,TMD 可能经常被临床医生发现或被误认为是口面部疼痛的其他原因。91240 元治疗选择可能多种多样,从抗炎药药物治疗、咬合矫形器具、物理治疗到关节镜手术,甚至颞下颌关节全关节置换术。因此,正确诊断 TMJ 对于根据疾病的严重程度及其原因调整治疗非常重要,尤其是在炎症性关节炎(如类风湿性关节炎、银屑病关节炎或强直性脊柱炎)的情况下。35

Currently, the Diagnostic Criteria for Temporomandibular Disorders (DC-TMD), based on extensive international studies and data analyses, is the most widely accepted diagnostic criteria.36 The DC-TMD and International Classification of Orofacial Pain (ICOP) provides a framework for TMJ disorder diagnosis based on history, examinations, and imaging indications. However, subjective biases and variability in diagnostic studies and treatment options exist. Deep learning can aid in decision-making by reducing biases and enhancing diagnostic accuracy.45
目前,基于广泛的国际研究和数据分析的颞下颌关节疾病诊断标准 (DC-TMD) 是最广泛接受的诊断标准。36 DC-TMD 和国际口面部疼痛分类 (ICOP) 为基于病史、检查和影像学指征的 TMJ 疾病诊断提供了一个框架。然而,诊断研究和治疗方案存在主观偏倚和可变性。深度学习可以通过减少偏差和提高诊断准确性来帮助决策。45

Deep learning models are evaluated by various measures, primarily by comparing their predictive accuracy to expert-labelled test data. In some studies, these models have been reported to be more precise in diagnosing dental diseases than general dental clinicians.11 Considering the frequent misdiagnoses and insufficient knowledge of TMDs in clinics, deploying deep learning models in everyday practice may prove to be a helpful adjunct that can assist clinicians. These models can enhance the interpretation of images by nonexpert human readers, facilitate accurate anatomical model creation, and potentially automate certain clinical evaluations and minor corrections. Nevertheless, clinicians are ultimately responsible for diagnosis and treatment procedures.
深度学习模型通过各种措施进行评估,主要是通过将其预测准确性与专家标记的测试数据进行比较。在一些研究中,据报道,这些模型在诊断牙科疾病方面比一般牙科临床医生更精确。11 考虑到临床中频繁的误诊和对 TMD 的了解不足,在日常实践中部署深度学习模型可能被证明是一种有用的辅助手段,可以帮助临床医生。这些模型可以增强非专业人类读者对图像的解释,促进准确的解剖模型创建,并可能自动执行某些临床评估和轻微的校正。然而,临床医生最终负责诊断和治疗程序。

However, the included studies had some limitations. Without an extensive and adequate dataset, the data learning model cannot be effectively trained. Therefore, there is a pressing need for a comprehensive, well-labelled, open-source database with a confirmed TMD diagnostic method to improve accuracy. Dataset generalizability (dataset diversity) is a significant challenge and TMD studies should account for different genders, nationalities, and ages in their datasets. Data from various centers will also boost the deep learning model's power. The majority of the studies included in this review were focused on adults over 18, and not all studies specified who annotated the data. Having an experienced annotator is crucial to circumvent biases in the training model. Moreover, due to a lack of consistently available scans, large cohort sizes have not been externally validated. Only four studies mentioned external validation of deep learning models. Finally, not all studies examined inter- and intra-observer variability.
然而,纳入的研究有一些局限性。如果没有广泛且足够的数据集,数据学习模型就无法有效训练。因此,迫切需要一个全面、标记良好的开源数据库,并采用经证实的 TMD 诊断方法来提高准确性。数据集的泛化性(数据集多样性)是一个重大挑战,TMD 研究应该在其数据集中考虑不同的性别、国籍和年龄。来自各个中心的数据也将增强深度学习模型的能力。本综述中纳入的大多数研究都集中在 18 岁以上的成年人身上,并非所有研究都指定了谁对数据进行了注释。拥有经验丰富的注释者对于规避训练模型中的偏见至关重要。此外,由于缺乏持续可用的扫描,大队列规模尚未经过外部验证。只有 4 项研究提到了深度学习模型的外部验证。最后,并非所有研究都检查了观察者之间和观察者内部的变异性。

The present review has certain limitations, including the quality of devices and techniques used in different studies, which may impact diagnosis accuracy. Data modalities and scope were also limited, as only deep learning analyses were included in the study. The overall performance of the included studies was evaluated using various metrics, including accuracy, sensitivity, and specificity. However, a meta-analysis was not conducted for all included studies due to the heterogeneity of the studies and poor data reporting quality. The absence of comparable parameters and variables prevents the accurate assessment of effectiveness. Thus, it is recommended that future clinical studies use different modalities in one study and compare the results to choose the best performance between them and reach a gold standard. In addition, larger datasets from several different centres would increase reliability of results.
本综述有一定的局限性,包括不同研究中使用的设备和技术的质量,这可能会影响诊断的准确性。数据模式和范围也很有限,因为研究中只包括深度学习分析。使用各种指标评估纳入研究的总体性能,包括准确性、敏感性和特异性。然而,由于研究的异质性和数据报告质量差,没有对所有纳入的研究进行meta分析。缺乏可比较的参数和变量会妨碍对有效性的准确评估。因此,建议未来的临床研究在一项研究中使用不同的模式,并比较结果以选择它们之间的最佳性能并达到金标准。此外,来自几个不同中心的较大数据集将提高结果的可靠性。

5 CONCLUSION 5 总结

The results of this systematic review suggest that deep learning has high sensitivity and specificity for detecting TMJ arthropathies, even though studies used a limited dataset. Of course, it the meta-analysis of diagnostic accuracy revealed a significant level of heterogeneity among the included studies. This systematic review demonstrated that deep learning models not only have a high diagnostic accuracy in detecting the presence of TMD, but that deep learning can assist clinicians in detecting, classifying, and segmenting TMJ. Clinicians, particularly those not specialized in orofacial pain or TMJ surgery, may benefit from this methodology for assessing TMD, as it allows for the automation of clinical evaluation and potentially increases diagnostic accuracy.
本系统综述的结果表明,深度学习在检测 TMJ 关节病方面具有很高的敏感性和特异性,尽管研究使用的数据集有限。当然,诊断准确性的荟萃分析揭示了纳入研究之间存在显着的异质性。本系统评价表明,深度学习模型不仅在检测 TMD 的存在方面具有很高的诊断准确性,而且深度学习可以帮助临床医生检测、分类和分割 TMJ。临床医生,尤其是那些不专门从事口面部疼痛或 TMJ 手术的临床医生,可能会从这种评估 TMD 的方法中受益,因为它允许临床评估的自动化,并可能提高诊断的准确性。

ACKNOWLEDGEMENTS 确认

Not applicable. 不適用。

    FUNDING INFORMATION 资金信息

    This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
    这项研究没有从公共、商业或非营利部门的资助机构获得任何具体资助。

    CONFLICT OF INTEREST STATEMENT
    利益冲突声明

    This paper's results and/or discussion are not influenced by any competing interests of authors.
    本文的结果和/或讨论不受作者任何竞争利益的影响。

    ETHICS STATEMENT 道德声明

    Not applicable. 不適用。

    CONSENT 同意

    Not applicable. 不適用。

      The full text of this article hosted at iucr.org is unavailable due to technical difficulties.