这是用户在 2024-7-5 22:11 为 https://app.immersivetranslate.com/word/ 保存的双语快照页面,由 沉浸式翻译 提供双语支持。

Submission Date:2023-03-05

Accepted Date:2023-09-12 AcceptedManuscriptonline:2023-09-12



A machine learning-based choledocholithiasis prediction tool improves ERCP decision making -a proof of concept study

Steven NSteinway,Bohao Tang,Jeremy Telezing,Aditya Ashok,Ayesha Kamal,Chung Yao Yu,Nitin Jagtap,James Buxbaum,B. Joseph EImunzer,Sachin B Wani,Mouen AKhashab,Brian SCaffo,Venkata S Akshintala.

Affiliations below.


Please cite this article as: Steinway SN,Tang B,TelezingJet al.A machine learning-based choledocholithiasis prediction tool improves ERCP decision making-a proof of concept study.Endoscopy 2023.doi:10.1055/a-2174-0534

Conflict of Interest: Venkata Akshintala:Co-founder and Chief Medical Officer,Origin Endoscopy Inc.
利益冲突:Venkata Akshintala:Origin Endoscopy Inc.联合创始人兼首席医疗官。

Mouen Khashab:Advisory board member and consultant for Boston Scientific,Olympus,Medtronic.
Mouen Khashab:Boston Scientific、Olympus、Medtronic顾问委员会成员和顾问。

All the other authors have no disclosures



have limited accuracy,

studies have demonstrated that existing guidelines to pre

leading to overutilization of ERCP.Improved stratification may allow for appropriate patient selection for ERCP and the useof

lower-risk modalities (i.e.EUS and MRCP).

Methods:Amachine learning model was developed using patient information from two published cohort studiesoriginally

used to evaluate performance of published guidelines in predicting choledocholithiasis.Prediction models were developed

usingthe gradientboosting(GBM)machine learning method.GBM performance was evaluated using 10-fold cross-validation and area under the receiver operating curve(AUC).Important predictors of choledocholithiasis were identified based on relati- ve importance in GBM.

Results:1,378 patients (mean age 43.3 years;55.5%females)were included in the GBM model and 59.4%had choledocholit-
结果:1,378例患者(平均年龄43.3岁; 55.5%为女性)被纳入GBM模型,59.4%患有胆总管结石。

hiasis.Eight variables were identified as predictors of choledocholithiasis.The GBM model was evaluated with 10-fold cross-

validation andhad an accuracy of71.5±2.5%(AUC0.79±0.06)and performed better than the 2019 ASGE guidelines(accuracy
验证,准确度为71.5 ±2.5%(AUC 0.79 ±0.06),优于2019年ASGE指南(准确度

62.4±2.6%,AUC0.63±0.03)and the ESGE guidelines (accuracy 62.8±2.6%,AUC0.67±0.02).The GBM model correctly categorized 22%of patientsdirected tounnecessary ERCPby the ASGE guidelines and appropriately recommended 48%of ERCPs incorrectly rejected by the ESGE guidelines as the next step in the management.
GBM模型对ASGE指南不必要的ERCP的正确分类率为22%,对ESGE指南不正确的ERCP的正确分类率为48%,GBM模型对ASGE指南不必要的ERCP的正确分类率为62.8± 2.6%,AUC0.63 ± 0.03)和ESGE指南的正确分类率为62.8± 2.6%,AUC0.67 ±0.02。

Conclusions:A machine learning-basedtool was created that provides a real-time,personalized,objective probability of chole-

docholithiasis and ERCPrecommendations.This more accurately directs ERCP use than the existingASGE and ESGE guidelines

and has the potential to reduce morbidity and healthcare costs associatedwith ERCPor missedcholedocholithiasis.

Corresponding Author:

Dr.Venkata S Akshintala,Johns Hopkins Medical Institutions,Division of Gastroenterology and Hepatology,600N.Wolfe St,21287 Baltimore,United States,vakshin1@jhmi.edu


Steven N Steinway,Johns Hopkins Medical Institutions,Division of Gastroenterology and Hepatology,Baltimore,United States
Steven N Steinway,美国巴尔的摩约翰霍普金斯医疗机构胃肠病学和肝病学分部

Bohao Tang,Johns Hopkins Bloomberg School of Public Health Centerfor Teaching and Learning,Department of Biostatistics,Balti- more,United States

Jeremy Telezing,Johns Hopkins Bloomberg School of Public Health Center for Teachingand Learning,Department of Biostatistics,Balti- more,United States
Jeremy Telezing,约翰霍普金斯彭博公共卫生学院教学中心,生物统计学系,巴尔蒂-莫尔,美国


Venkata SAkshintala,Johns Hopkins Medical Institutions,Division of Gastroenterology andHepatology,Baltimore,United States

Title: A machine learning-based choledocholithiasis prediction tool improves ERCP decision making – a

proof of concept study

Short Title: Choledocholithiasis decision-making tool


Steven N. Steinway M.D. Ph.D.*1 Bohao Tang Ph.D.*2 Jeremy Telezing 2 Aditya Ashok M.D.1

Ayesha Kamal Ph.D.1 Chung Yao Yu M.D.3 Nitin Jagtap M.D. D.M. 4 James L. Buxbaum M.D.3

Joseph Elmunzer M.D. M.Sc5 Sachin B. Wani M.D.6 Mouen A. Khashab M.D.1 Brian S. Caffo Ph.D.2 #

Venkata S. Akshintala M.D. #

* Authors contributed equally
* 作者贡献相同

# Authors contributed equally

1. Division of Gastroenterology Johns Hopkins Medical Institutions Baltimore MD United States. 2. Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Baltimore MD
1.美国巴尔的摩约翰·霍普金斯医疗机构胃肠病学科Division of Gastroenterology Johns Hopkins Medical Institutions Baltimore MD United States 2.约翰·霍普金斯大学布隆伯格公共卫生学院巴尔的摩医学博士生物统计系Johns Hopkins Bloomberg School of Public Health Baltimore MD

3. Division of Gastroenterology Keck School of Medicine University of Southern California Los Angeles CA
3.美国南加州大学洛杉矶分校(The Division of Gastroenterology Keck School of Medicine)

4. Department of Gastroenterology Asian Institute of Gastroenterology India.
4.主办单位:Asian Institute of Gastroenterology India

5. Division of Gastroenterology and Hepatology Medical University of South Carolina Charleston SC
5.美国南卡罗来纳州查尔斯顿医科大学(Division of Gastroenterology and Hepatology Medical University of South Carolina Charleston SC)

6. Division of Gastroenterology University of Colorado Anschutz Medical Campus Aurora CO

Address correspondence to:

Venkata Akshintala MD

Assistant Professor of Medicine

Johns Hopkins Hospital Division of Gastroenterology and Hepatology
约翰霍普金斯大学胃肠病学和肝病科Johns Hopkins Hospital Division of Gastroenterology and Hepatology

600 N. Wolfe St Blalock 411 Baltimore MD 21287

Email: vakshin1@jhmi.edu

Phone: 410-624 6955



Steven Steinway - interpretation and analysis of data; drafting of the manuscript; critical revision of the

manuscript for important intellectual content

Bohao Tang - interpretation and analysis of data; critical revision of the manuscript for important

intellectual content

Jeremy Telezing - interpretation and analysis of data; critical revision of the manuscript for important
Jeremy Telezing -数据的解释和分析;重要手稿的批判性修订

intellectual content

Aditya Ashok - drafting of the manuscript; critical revision of the manuscript for important intellectual
Aditya Ashok -手稿的起草;重要知识分子手稿的批判性修订


Ayesha Kamal - acquisition of data; drafting of the manuscript; critical revision of the manuscript for
Ayesha Kamal -数据采集;手稿起草;手稿的关键修订,

important intellectual content

Chung Yao Yu - acquisition of data; critical revision of the manuscript for important intellectual content
Chung Yao Yu -数据采集;重要知识内容手稿的批判性修订

Nitin Jagtap - critical revision of the manuscript for important intellectual content
Nitin Jagtap -对重要知识内容的手稿进行批判性修订

James L. Buxbaum - critical revision of the manuscript for important intellectual content

B. Joseph Elmunzer - critical revision of the manuscript for important intellectual content

Sachin B. Wani - critical revision of the manuscript for important intellectual content

Brian S. Caffo - study concept and design; interpretation of data; critical revision of the manuscript for
布莱恩·S Caffo -研究概念和设计;数据解释;手稿的关键修订,

important intellectual content

Mouen A. Khashab - study concept and design; interpretation of data; drafting of the manuscript; critical

revision of the manuscript for important intellectual content; study supervision

Venkata S. Akshintala - study concept and design; interpretation and analysis of data; drafting of the

manuscript; critical revision of the manuscript for important intellectual content

Background: Prior studies have demonstrated that existing guidelines to predict choledocholithiasis have limited accuracy leading to overutilization of ERCP. Improved stratification may allow for appropriate patient selection for ERCP and the use of lower-risk modalities (i.e. EUS and MRCP).

appropriate patient selection for ERCP and the use of lower-risk modalities (i.e. EUS and MRCP).

Methods: A machine learning model was developed using patient information from two published cohort studies originally used to evaluate performance of published guidelines in predicting choledocholithiasis. Prediction models were developed using the gradient boosting (GBM) machine learning method. GBM performance was evaluated using 10-fold cross-validation and area under the receiver operating curve (AUC). Important predictors of choledocholithiasis were identified based on

relative importance in GBM.

Results: 1 378 patients (mean age 43.3 years; 55.5% females) were included in the GBM model and 59.4% had choledocholithiasis. Eight variables were identified as predictors of choledocholithiasis. The GBM model was evaluated with 10-fold cross-validation and had an accuracy of 71.5±2.5% (AUC 0.79±0.06) and performed better than the 2019 ASGE guidelines (accuracy 62.4±2.6% AUC 0.63±0.03) and the ESGE guidelines (accuracy 62.8±2.6% AUC 0.67±0.02). The GBM model correctly categorized 22% of patients directed to unnecessary ERCP by the ASGE guidelines and appropriately recommended
结果:1378例患者(平均年龄43.3岁,55.5%为女性)被纳入GBM模型,59.4%患有胆总管结石。八个变量被确定为胆总管结石的预测因子。GBM模型采用10倍交叉验证进行评估,准确度为71.5±2.5%(AUC 0.79±0.06),优于2019年ASGE指南(准确度62.4±2.6% AUC 0.63±0.03)和ESGE指南(准确度62.8±2.6% AUC 0.67±0.02)。GBM模型正确分类了22%的患者,这些患者根据ASGE指南进行了不必要的ERCP,

48% of ERCPs incorrectly rejected by the ESGE guidelines as the next step in the management .

Conclusions: A machine learning-based tool was created that provides a real-time personalized objective probability of choledocholithiasis and ERCP recommendations. This more accurately directs ERCP use than the existing ASGE and ESGE guidelines and has the potential to reduce morbidity and

healthcare costs associated with ERCP or missed choledocholithiasis.

Keywords: choledocholithiasis; ERCP; calculator; clinical prediction tool; machine learning
关键词:胆总管结石; ERCP;计算器;临床预测工具;机器学习

Abbreviations: (in the order that they appear)

1. Endoscopic retrograde cholangio-pancreatography (ERCP)

2. Endoscopic ultrasound (EUS)

3. American Society for Gastrointestinal Endoscopy (ASGE)

4. European Society for Gastrointestinal Endoscopy (ESGE)

5. Gradient boosting model (GBM)

6. Area under the curve (AUC)

7. Magnetic resonance cholangiopancreatography (MRCP)

8. Liver function tests (LFTs)

9. Common bile duct (CBD)

10. Positive predictive value (PPV)

11. Confidence interval (CI)

12. Ultrasound (US)


Gallbladder disease affects an estimated 20.5 million persons in the United States with gallstone disease itself costing about $6.2 billion annually[1]. More specifically choledocholithiasis the presence of gallstones in the common bile duct (CBD) affects 10-20% of patients with symptomatic cholelithiasis 18-33% of patients with acute biliary pancreatitis and 7-14% of patients who underwent cholecystectomy. ERCP with bile duct stone clearance is needed to avoid complications including cholangitis and gallstone pancreatitis. Despite being the favored and least morbid procedure to treat choledocholithiasis ERCP is still associated with high cost radiation exposure to the patient and staff as

well as major adverse events in 6-15% of patients[2]. The major adverse events include ERCP-related

pancreatitis infections perforation and bleeding[3]. Therefore it is critical to accurately identify the

probability of choledocholithiasis to appropriately select patients for ERCP.

The 2010 American Society for Gastrointestinal Endoscopy (ASGE) guidelines categorized patients as high risk (probability>50%) intermediate (10-50%) and low risk (<10%) for

choledocholithiasis based on clinical features including bilirubin level and bile duct diameter[4].

Nevertheless the 2010 ASGE guidelines directed patients unnecessarily to ERCP in 30-50% of cases. To improve the specificity in 2019 the ASGE revised their criteria to define more stringently which patients should proceed directly to ERCP[5]. The European Society for Gastrointestinal Endoscopy (ESGE) also came out with guidelines in 2019 which stratify patients to low-risk (normal LFTs and ultrasound) intermediate-risk (abnormal LFTs and/or CBD dilation) and high-risk (clinical cholangitis or BDS on US)


Improved stratification allows for more effective triage to lower risk modalities such as endoscopic ultrasound (EUS) and magnetic resonance cholangiopancreatography (MRCP) and thus for more appropriate patient selection for ERCP ultimately minimizing adverse events and improving patient outcomes. Yet conventional models of statistical inference have struggled in this regard. Machine learning on the other hand shows promise in the application to clinical data sets to reformulate patient classes and establish more accurate risk models [6]. To our knowledge there has not been a study that compares machine learning to the ASGE or ESGE guidelines. The goal of this study was to develop a machine learning-based risk estimation tool that can provide a real-time personalized objective

probability of the presence of a CBD stone.


The primary goal was to develop a computer-based prediction and decision-making tool for the

prediction of choledocholithiasis based on risk factors identified from the two cohort studies.

Study Population

We integrated patient-level data from two large cohorts of patients admitted with suspected choledocholithiasis from the Medical University of South Carolina between January 1 2009 and December 31 2014 and the Los Angeles County Hospital between January 2010 to November 2016. For both cohorts biliary sludge was considered equivalent to choledocholithiasis since both have comparable clinical sequelae[7 8]. These study cohorts were selected since they have the essential patient-level data variables considered important for predicting choledocholithiasis. A total of 1378
我们整合了2009年1月1日至2014年12月31日期间来自南卡罗来纳州医科大学和2010年1月至2016年11月期间来自洛杉矶县医院的两个大型疑似胆总管结石患者队列的患者水平数据。对于两个队列,胆泥被认为等同于胆总管结石,因为两者具有相当的临床后遗症[7 8]。选择这些研究队列是因为它们具有被认为对预测胆总管结石很重要的基本患者水平数据变量。共1378个

patients were included in this study.

Study Variables

Data pertaining to patient demographics laboratory parameters at the time of presentation and 24-48 hours after imaging and procedural findings were obtained. If available a second set of laboratory tests were incorporated into the machine learning algorithm. For comparison of the effectiveness of the machine learning algorithm the same patients were classified as high risk for the presence of choledocholithiasis requiring ERCP using 2010 ASGE guidelines revised 2019 ASGE guidelines and the ESGE guidelines based on their first or second set of bloodwork. Choledocholithiasis presence was confirmed by ERCP intra-operative cholangiography or clinical follow-up. We excluded

patients who had acute cholangitis as ERCP or biliary drainage was required in such patients.

Model Construction and Validation

Model derivation and validation were developed on patient-level data obtained from participants of the two previously published cohort studies described above [7 8]. Prediction models
模型推导和验证是基于从上述两项先前发表的队列研究的参与者中获得的患者水平数据开发的[7 8]。预测模型

were developed using a gradient-boosting machine learning algorithm (GBM).

Gradient boosting is a machine learning algorithm that uses a series of decision trees to make predictions. Each decision tree is trained to correct the errors made by the previous trees. This helps to improve the accuracy of the model by making it more robust to noise in the data. GBMs are often used for classification tasks. They are particularly well-suited for problems where the relationship between the features and the target variable is complex. In this case we are using various laboratory imaging and patient information to iteratively fit new models to provide an increasingly accurate estimate of the response variable the presence or absence of choledocholithiasis. Specifically we used gradient

tree boosting as implemented in the R package GBM. [9 10].
在R包GBM中实现的树提升。[9 10]。

Two models were explored. One model with initial liver enzyme testing (AST ALT alkaline phosphatase total bilirubin) and a second model with both initial liver enzyme levels and a follow-up set of labs in order to determine whether a second set of labs would be improving the diagnostic yield
探讨了两种模式。一个模型包含初始肝酶检测(AST ALT碱性磷酸酶总胆红素),第二个模型包含初始肝酶水平和一组随访实验室,以确定第二组实验室是否会提高诊断率

of the model.

The Gini impurity index was used to determine the importance of predictors in modeling classification. Gini impurity index measures the probability of a particular variable being wrongly classified when it is randomly chosen. The decrease in the Gini impurity index after the inclusion of each of the predictors was used to calculate an importance score indicating improvement in

classification when included in the model.

Performance of GBM in estimating choledocholithiasis presence was then evaluated using 10- fold cross-validation. 10-fold cross-validation is a methodology in machine learning in which the dataset is split into a learning or model fitting set and a test set to determine model performance. The model fitting procedure was performed a total of ten times with each fit being performed on a training set of 90% of the total dataset selected at random with the remaining 10% used for validation. Model

performance was also determined using the mean area under the receiver operating curve (AUC).

Statistical significance for the AUC were calculated in R with the roc() function which is based on the DeLong method [11]. Influential predictors of choledocholithiasis were further identified based on the relative variable importance metric which is a measure based on the number of times a variable is selected for splitting weighted by the squared improvement to the model as a result of each split and averaged over all trees (Figure 1) [9] [12]. Our fit GBM model was implemented in a computer-based
使用基于DeLong方法的roc()函数计算R中AUC的统计学显著性[11]。基于相对变量重要性度量进一步识别胆总管结石的影响预测因子,相对变量重要性度量是一种基于选择变量进行拆分的次数的度量,通过每次拆分的结果对模型的平方改善进行加权,并在所有树上取平均值(图1)[9] [12]。我们的适合GBM模型是在一个基于计算机的

application risk-calculator and decision-making tool (Figure 2).


Baseline Characteristics

A total of 1 378 patients were included in the model. Patients with clinical cholangitis were removed from the study (143 9.4% of patients). The average age of participants was 43.3 years 844 (55.5%) were female 247 (17.9%) had acute pancreatitis. 800 patients (58.1%) had a dilated common bile duct (CBD >6mm) and 461 (33.5%) patients had a bile duct stone on ultrasound. Total bilirubin was elevated in 712 patients (51.7%) based on their initial laboratories. AST was elevated in 940 (68.2%) ALT in 903 (65.5%) and alkaline phosphatase in 883 (64.1%) patients on follow-up. The baseline
共有1378例患者被纳入该模型。临床胆管炎患者从研究中排除(143 9.4%的患者)。参与者的平均年龄为43.3岁,其中844人(55.5%)为女性,247人(17.9%)患有急性胰腺炎。800例患者(58.1%)有胆总管扩张(CBD >6mm),461例患者(33.5%)超声显示胆管结石。根据初始实验室结果,712例患者(51.7%)的总胆红素升高。随访时,940例(68.2%)患者AST升高,903例(65.5%)患者ALT升高,883例(64.1%)患者碱性磷酸酶升高。基线

characteristics of our patient population are summarized in Table 1.

Study Outcomes

The Gini impurity index was calculated after the inclusion of each of the predictors was added to the model to calculate an importance score for all variables included in the study. In the model with one set of laboratory values eight variables were identified as important independent predictors of choledocholithiasis (Figure 1). In the model with two sets of laboratory values twelve variables were identified as important independent predictors of choledocholithiasis (Supplemental Figure 1A). The finding of a bile duct stone on ultrasound was the single strongest predictor of choledocholithiasis in

both models. Interestingly follow-up total bilirubin and alkaline phosphatase levels were the next

strongest predictors in the two-lab model (Supplemental Figure 1A) whereas bilirubin and alkaline phosphatase levels were the next strongest predictors of BDS in the one lab value model (Figure 1).

Other important predictors include the presence of acute pancreatitis age and CBD greater than 6mm.

Performance of the GBM machine learning model was only slightly improved with the addition of a second set of lab tests based on the receiver operating characteristic. On a 10-fold cross-validation the GBM model with two sets of labs had an AUC 0.792 (Supplemental Figure 1B) whereas the GBM model with one set of labs had an AUC 0.786 (Figure 3). We additionally tested a version of the machine learning model where we incorporated the difference (i.e. the delta) in lab values to determine whether the change in lab values for AST ALT alkaline phosphatase and total bilirubin during the hospitalization improved model prediction of BDS (i.e. the delta). Interestingly inclusion of the delta in labs produced worse model performance with an AUC 0.768 (Supplemental Figure 1B). Because of the only slight improvement in model performance and the increased complexity for clinical implementation that a second set of lab values requires we chose to use the GBM model that required a single set of labs for further evaluation and we thus call this the “GBM” machine learning model in
GBM机器学习模型的性能仅在添加基于接收器操作特性的第二组实验室测试后略有改善。在10倍交叉验证中,具有两组实验室的GBM模型的AUC 0.792(补充图1B),而具有一组实验室的GBM模型的AUC 0.786(图3)。我们还测试了一个版本的机器学习模型,其中我们将实验室值的差异(即Δ)纳入其中,以确定住院期间AST ALT碱性磷酸酶和总胆红素的实验室值变化是否改善了BDS的模型预测(即Δ)。有趣的是,在实验室中纳入Δ产生了更差的模型性能,AUC 0.768(补充图1B)。 由于模型性能仅有轻微改善,并且第二组实验室值要求临床实施的复杂性增加,因此我们选择使用需要单组实验室进行进一步评估的GBM模型,因此我们将其称为“GBM”机器学习模型。

the rest of the manuscript.

The GBM machine learning model performed better than the original (2010) revised (2019) ASGE guidelines and the ESGE guidelines. On a 10-fold cross-validation the GBM model (AUC 0.786± 0.06) performed better than the ASGE 2010 guidelines (AUC 0.626±0.03) the updated 2019 ASGE guidelines (AUC 0.623±0.03) and the ESGE guidelines (AUC 0.666±0.02). There was a statistically significant difference for AUC for the GBM model compared to all other models tested (p<0.01). (Figure 3). Specifically the sensitivity (70.3±3.2%) and specificity (72.3±3.9%) of the GBM performed better than the 2010 ASGE (sensitivity 57.6±3.25% specificity 67.6±4.15%) and the 2019 updated ASGE
GBM机器学习模型的表现优于原始(2010)修订(2019)ASGE指南和ESGE指南。在10倍交叉验证中,GBM模型(AUC 0.786± 0.06)优于ASGE 2010指南(AUC 0.626±0.03)、更新的2019年ASGE指南(AUC 0.623±0.03)和ESGE指南(AUC 0.666±0.02)。GBM模型的AUC与测试的所有其他模型相比存在统计学显著差异(p<0.01)。(图3)。具体而言,GBM的灵敏度(70.3±3.2%)和特异性(72.3±3.9%)优于2010年ASGE(灵敏度57.6±3.25%,特异性67.6±4.15%)和2019年更新的ASGE

guidelines (sensitivity 61.9±3.4% and specificity 62.8±4.1%). The ESGE guidelines notably had the

highest sensitivity 86.2±3.5% with the lowest specificity 46.9±3.0%. The positive predictive value (PPV) for the GBM was higher (78.1± 3.0%) than the 2010 ASGE guidelines (70.0±3.3%) or the 2019 updated ASGE guidelines (70.7% ±3.4). The negative predictive value (NPV) was also better with the GBM (63.4% ±3.9) than the 2010 ASGE guidelines (54.9% ±4.2) or the 2019 updated ASGE guidelines (53.1%±3.9). The ESGE guidelines had the highest PPV (83.3±3.5%) though the lowest NPV (52.6±3.3%) (Figure 4). The GBM model correctly categorized 22% of patients directed to unnecessary ERCP by the 2019 ASGE guidelines and appropriately recommended 48% of ERCPs incorrectly rejected by the ESGE guidelines as the next step in the management. An intuitive computer-based risk-calculator and decision-making tool was created to aid in clinician use of the GBM model. This tool allows the user to enter each of the eight important variables included in the model. After these values are entered the probability of
最高灵敏度为86.2±3.5%,最低特异度为46.9± 3.0%。GBM的阳性预测值(PPV)(78.1± 3.0%)高于2010年ASGE指南(70.0±3.3%)或2019年更新的ASGE指南(70.7% ±3.4)。GBM的阴性预测值(NPV)(63.4% ±3.9)也优于2010年ASGE指南(54.9% ±4.2)或2019年更新的ASGE指南(53.1%±3.9)。ESGE指南的PPV最高(83.3±3.5%),但NPV最低(52.6±3.3%)(图4)。根据2019年ASGE指南,GBM模型正确分类了22%的不必要ERCP患者,并适当推荐了ESGE指南错误拒绝的48%的ERCP作为下一步管理。一个直观的基于计算机的风险计算器和决策工具,以帮助临床医生使用GBM模型。该工具允许用户输入模型中包含的八个重要变量中的每一个。 输入这些值后,

choledocholithiasis presence and a recommendation regarding ERCP are reported (Figure 2).


We developed a machine learning model which was trained using data from 1 378 patients from two previously published retrospective studies to develop a tool that predicts probability of choledocholithiasis presence and provides a recommendation regarding ERCP. We tested three versions of our model one with lab tests from initial presentation (Figure 3) a second model that incorporated initial lab tests and follow-up labs to see whether a second set of labs improved prediction and a third model that incorporated a “delta” in labs (the difference between the first and second set of labs) (Supplemental Figure 1). The two-lab test model (Supplemental Figure 1) only slightly improves model performance and given several extra inputs; we felt the single lab test model was equivalent (Figure 3). Additionally incorporation of the “delta” lab values actually worsened model performance (Supplemental Figure 1B). These findings are consistent with other studies which did not

find improved prediction of the presence of stones on ERCP with dynamic lab testing[7].

Our machine learning model has the highest diagnostic performance based on AUC compared to 2010 and 2019 ASGE as well as the ESGE guidelines (Figure 3). Our machine learning model has the highest sensitivity accuracy PPV and NPV compared to 2010 and 2019 ASGE as well as the ESGE guidelines. The only test characteristic it did not surpass was the specificity of the ESGE guidelines (72.3% in GBM model vs 86.2% ESGE guidelines). The ESGE guidelines carry the strictest indication to proceed with ERCP requiring clinical cholangitis or BDS on imaging in order to proceed directly to ERCP and the high specificity is at the cost of the lowest sensitivity of all guidelines which is per our

analysis 46.9% (Figure 4).

The optimal approach to diagnosis and treatment of choledocholithiasis remains unclear but the advent of non-invasive evaluation of the biliary system has led ERCP to be used more judiciously. Given the development of MRCP and EUS ERCP is now largely reserved for therapeutic rather than diagnostic approaches[13]. The appropriate identification of patients’ risk for choledocholithiasis and thus their exposure to ERCP is a pressing concern for gastroenterology. Clinical risk stratification tools aim to stratify which patients should go directly to ERCP. The ASGE’s 2010 proposed classification of patients into high-risk intermediate-risk and low-risk categories was an important first step to appropriately allocate the use of ERCP. Yet the 2010 classification had well-studied limitations in accuracy. One study estimated a 62% sensitivity and 47% specificity[14] and another study estimated

55% sensitivity and 69% specificity identifying choledocholithiasis[8].

Considering several studies showing improved specificity with bile duct dilation and bilirubin levels [15 16] the ASGE narrowed their high-risk criteria in 2019. A total bilirubin >4 mg/dL now only satisfies high-risk criteria if it is accompanied by a dilated CBD on US/cross-sectional imaging[5]. To reduce unnecessary diagnostic ERCPs the ASGE recommends that only patients satisfying high-risk
考虑到几项研究显示胆管扩张和胆红素水平的特异性提高[15 16],ASGE在2019年缩小了其高风险标准。总胆红素>4 mg/dL现在仅满足高风险标准,如果它伴有US/横断面成像上的扩张CBD [5]。为了减少不必要的诊断性ERCP,ASGE建议只有满足高风险要求的患者才能接受ERCP。

criteria proceed to ERCP. A retrospective evaluation of 1042 patients using these new guidelines

demonstrated the specificity and positive predictive value (PPV) of ASGE high likelihood criteria were 96.9% (95 % confidence interval [CI] 95.4 98.0) and 89.6% (95% CI 85.2 - 92.8) for choledocholithiasis
ASGE高IKK标准诊断胆总管结石的特异性和阳性预测值分别为96.9%(95%CI 95.4 ~ 98.0)和89.6%(95%CI 85.2 ~ 92.8

validating the clinical utility of new ASGE criteria for predicting choledocholithiasis[17].

Spontaneous passage of the bile duct stone was shown to occur in over 50% of the patients presenting with obstructive jaundice especially with small stone sizes. [18] The liver function laboratory tests are therefore expected to improve with such spontaneous stone passage and a dynamic assessment of these laboratory tests may help identify patients with such spontaneous stone passage thereby avoiding unnecessary ERCP procedures. [19] Interestingly two cohort studies did not identify a statistical significance for such dynamic assessment of liver function laboratory tests likely due to the limitations with statistical analysis but the use of machine learning-based methods in our current study

identified this to be among the most important predictors for the presence of choledocholithiasis.

The introduction of artificial intelligence represents an additional opportunity for improvement in risk stratification. To our knowledge there have been only two other applications of artificial intelligence to the prediction of choledocholithiasis [20 21]. Jovanovic and colleagues constructed an artificial neural network model to see if it could improve the accuracy of selecting patients for ERCP. They applied this model to 291 patients prospectively who underwent ERCP after being referred with firm suspicion for choledocholithiasis. 80% of patients had choledocholithiasis on ERCP. This model had a positive predictive value of 92.3% and a negative predictive value of 69.6% with respect to finding a stone on ERCP. This study has several limitations. the authors included only those with a firm suspicion of choledocholithiasis based on clinical and/or biochemical data to develop markers their populations were enriched with a very high (80%) of patients having choledocholithiasis a substantial number of
人工智能的引入代表了改善风险分层的额外机会。据我们所知,人工智能在胆总管结石预测方面只有另外两个应用[20 21]。Jovanovic及其同事构建了一个人工神经网络模型,看看它是否可以提高选择ERCP患者的准确性。他们将该模型应用于291例患者,这些患者在明确怀疑胆总管结石后接受了ERCP。80%的患者在ERCP检查中发现胆总管结石。该模型在ERCP上发现结石的阳性预测值为92.3%,阴性预测值为69.6%。这项研究有几个局限性。 作者仅包括那些根据临床和/或生化数据明确怀疑胆总管结石的患者,以开发标记物,他们的人群富含非常高(80%)的胆总管结石患者,

patients with stones were misclassified as suggested by the low negative predictive value.

Golub et al. reported a single-center experience and only included patients who had a cholecystectomy in addition to either intraoperative cholangiogram or preoperative ERCP which substantially limited generalizability[21]. Importantly neither study compared their work against the ASGE or ESGE guidelines— though admittedly one pre-dated the guidelines—nor translated their

findings into a practical clinical tool.

Though machine learning is advancing the field of gastroenterology a recent review has demonstrated there have been few inroads into the actual clinical setting[22]. We addressed that gap. We sought to provide an AI-based risk-estimation of choledocholithiasis that was more accurate than the existing ASGE framework and provide proof-of-principle that it could be translated into a clinically relevant tool. We took advantage of a well-known machine learning algorithm GBM which is an aptly named supervised decision tree learning algorithm [6 10]. Our model correctly avoided 22% of ERCPs recommended by the 2019 ASGE guidelines and appropriately recommended 48% of ERCPs incorrectly
虽然机器学习正在推进胃肠病学领域的发展,但最近的一项综述表明,在实际的临床环境中很少有进展[22]。我们解决了这个差距。我们试图提供一种基于人工智能的胆总管结石风险估计,比现有的ASGE框架更准确,并提供原理证明,可以将其转化为临床相关工具。我们利用了一个著名的机器学习算法GBM,这是一个恰当的命名监督决策树学习算法[6 10]。我们的模型正确地避免了2019年ASGE指南推荐的22%的ERCP,并正确地推荐了48%的ERCP

rejected by the ESGE guidelines providing the most accurate BDS decision tool to date.

This study has several strengths. It relies on diverse data sets drawn from different medical centers thus enhancing our study’s applicability. These novel machine learning-based statistical methods improve the accuracy of choledocholithiasis prediction compared to previously explored and established techniques. Additionally our analysis translates to a user-friendly interface that can be applied in a real- life setting and in real-time. Finally the data input into our risk estimation tool is based on routinely

available parameters without requiring additional diagnostic testing.

A few limitations however deserve consideration. First our study population had shortcomings related to the eligibility criteria of the cohort studies that we relied on which were retrospective in nature. However the multi-center nature of these cohorts provides more pragmatic data reflective of

the real-world setting. Additionally we were unable to include EUS and MRCP in the decision making

which are essential pre-ERCP screening tools due to limitations with the data. We however aim to expand this proof-of-concept study by including additional data. For example by use of age-adjusted upper limit for common bile duct dilation rather than a single cut-off of 6mm [23]. Additional model validation in independent cohorts will be important to improve validity of our approach. Lastly the cohorts used had a relatively high prevalence of choledocholithiasis. Future studies to test the prediction

model in lower prevalence populations will be important validation.

In conclusion we developed a machine learning-based choledocholithiasis risk estimation tool that can provide real-time personalized objective probability of the presence of choledocholithiasis in a practical way that appears to be superior to existing guidelines. The study demonstrates that the GBM machine learning model may help to screen patients to identify those at higher risk of having CBD stones and who may be subjected to direct ERCP or may be screened using EUS / MRCP followed by subsequent ERCP as may be required based on other parameters. At this time this is a proof-of- principle study and this risk estimation tool will need to be further clinically validated—ideally in a
总之,我们开发了一种基于机器学习的胆总管结石风险估计工具,可以以一种实用的方式提供胆总管结石存在的实时个性化客观概率,似乎上级现有的指南。该研究表明,GBM机器学习模型可能有助于筛选患者,以识别患有CBD结石的风险较高的患者,这些患者可能会接受直接ERCP,或者可以使用EUS / MRCP进行筛选,然后根据其他参数进行后续ERCP。目前,这是一项原理验证研究,该风险估计工具需要进一步临床验证-理想情况下,

prospective trial— before it can be adopted in a widespread manner.

Conflicts statement

Venkata Akshintala: Co-founder and Chief Medical Officer Origin Endoscopy Inc.

Mouen Khashab: Advisory board member and consultant for Boston Scientific Olympus Medtronic.
Mouen Khashab:Boston Scientific Olympus Medtronic顾问委员会成员和顾问。

All the other authors have no disclosures

Figure : Importance score for each of the predictors included in the machine learning model.

Importance scores were calculated based on the decrease in the Gini impurity index associated with the inclusion of each predictor (US – ultrasound; ALP – alkaline phosphatase; T. bili – total bilirubin; CBD
重要性评分的计算是基于与纳入每个预测因子相关的基尼杂质指数的降低(US -超声; ALP -碱性磷酸酶; T. bili -总胆红素; CBD -

common bile duct).

Figure 2: Screenshot of the computer-based risk-calculator and decision-making tool. The predictors of bile duct stone (choledocholithiasis) presence are listed on top and the probability of choledocholithiasis

presence the requirement of ERCP is listed below. (CBD = Common bile duct; US ultrasound; BDS
ERCP的要求如下所示。(CBD=胆总管; US -超声; BDS -

bile duct stone; Tbili – total bilirubin; ALP – alkaline phosphatase).

Figure 3: Area under the receiver operating curve (AUC) with 10-fold cross-validation of the machine learning Gradient Boosting Model (GBM) utilizing a single set of biochemical lab tests compared to the 2010 and 2019 ASGE as well as the ESGE guideline-based risk classification in predicting the presence of

bile duct stone.

Figure 4: Flowsheet with ERCP decision making in the setting of suspected choledocholithiasis as

recommended by original 2010 ASGE guidelines the updated 2019 ASGE guidelines the ESGE guidelines and the machine learning gradient boosting model (GBM) respectively. “ERCP Yes” indicates the model recommends proceeding to ERCP without any additional testing (i.e. MRCP/EUS). “ERCP No” indicates

no ERCP or further work-up is recommended. The corresponding performance parameters are listed

below. (Stone = choledocholithiasis PPV = positive predictive value NPV = negative predictive value).

`Supplemental Figure : Importance score and area under the receiver operating curve analysis for a

two-lab test machine learning model. A) Importance score for each of the predictors included in the

two-lab test machine learning model. Importance scores were calculated based on the decrease in the Gini impurity index associated with the inclusion of each predictor. B) Area under the receiver operating curve (AUC) with 10-fold cross-validation of the two-lab test machine learning Gradient Boosting Model

(GBM) model compared to 2010 and 2019 ASGE as well as the ESGE guidelines.


n (%)

Acute Pancreatitis

Choledocholithiasis on definitive testing (ERCP, intraoperative cholangiogram, CBD exploration)

Patients undergoing abdominal US

CBD > 6 mm on US

Bile duct stone on US

Follow up (second set) laboratories:

AST > 35 U/L

ALT > 45 U/L

ALP > 110 U/L

Total Bilirubin >=1.2 mg/dL

Age yr. (mean ± S.D.)

Female gender

Admission (first set) laboratories:

AST > 35 U/L

ALT > 45 U/L

ALP > 110 U/L

Total Bilirubin >=1.2 mg/dL

940 (68.2) 903 (65.5) 840 (61.0)

712 (51.7)

981 (71.2) 972 (70.5) 883 (64.1)

761 (55.2)

247 (17.9)

819 (59.4)

800 (58.1)

461 (33.5)

43.3 ± 16.2

844 (55.5)

Table 1: Baseline characteristics and key outcomes of the study population. (n

– number of patients; S.D. – Standard Deviation; US ultrasonography; CBD
- 患者人数;标准差- 标准差; US -超声检查; CBD -

Common bile duct; T Bili – total bilirubin)


Choledocholithiasis risk


prediction model
















Risk of choledocholithiasis:82.7% Decision: Recommend ERCP


ROC plots for different models



























A) B)


