Main  正文

Gastric cancer (GC) is the fifth most commonly diagnosed cancer and the fourth leading cause of cancer-related deaths worldwide1, with nearly 75% of new cases and related deaths occurring in Asian countries, particularly China, Japan and Korea2,3. Effective screening is crucial to reduce mortality rate as studies show that the 5-year survival rate for early-stage, resectable GC (EGC) is 95–99%, compared with less than 30% for advanced stages (AGC)4,5. Endoscopy is the standard diagnostic test for GC because it allows direct visualization of the gastric mucosa and biopsy sampling for histologic evaluation. Japan and Korea have implemented national GC screening programs via endoscopy since 1983 and 1999, respectively, leading to higher early diagnosis rates and reduced mortality, with survival rates significantly higher in Japan (60.3%) and Korea (68.9%) compared with China (35.9%)6,7.
胃癌(GC)是全球第五大常见确诊癌症及第四大癌症相关死亡原因 1 ,约 75%的新发病例及相关死亡发生在亚洲国家,尤其是中国、日本和韩国 2,3 。研究表明早期可切除胃癌(EGC)的 5 年生存率达 95-99%,而晚期胃癌(AGC)则不足 30% 4,5 ,因此有效筛查对降低死亡率至关重要。内镜检查作为胃癌诊断金标准,可直接观察胃黏膜并进行活检组织学评估。日本与韩国分别自 1983 年和 1999 年起实施全国性内镜筛查计划,显著提高了早期诊断率并降低死亡率,其生存率(日本 60.3%、韩国 68.9%)远高于中国(35.9%) 6,7

However, in most countries with high GC prevalence rates, large-scale GC screening in the general population via endoscopy is not feasible or cost-effective, leading to a high mortality rate. This is due mainly to the high cost, suboptimal detection rate (the percentage of detected GC cases out of the total number of people screened) and the invasive nature of endoscopy, resulting in low compliance among the population8. Screening high-risk people is preferable, as they are more likely to undergo endoscopic screening following positive test results. Currently, serological tests, including Helicobacter pylori serology, serum pepsinogen levels and gastrin-17 testing, are the methods used most commonly for identifying people at high risk for GC. However, the detection rate of GC by gastroscopy after serological screening and risk stratification was only 1.25%, showing limited improvement compared with the detection rate of gastroscopy screening in the general population (1.20%)9. Therefore, there is an urgent need to develop new noninvasive, low-cost, efficient and reliable screening techniques to identify people at high risk for GC. Such techniques could prioritize endoscopic examinations for high-risk populations, improving cost-effectiveness of large-scale screening programs and ultimately reducing GC-related mortality10.
然而,在胃癌高发率的多数国家中,通过内窥镜对普通人群进行大规模胃癌筛查既不可行也不符合成本效益,导致死亡率居高不下。这主要归因于内窥镜检查成本高昂、检出率欠佳(即筛查人群中确诊胃癌病例所占比例)以及侵入性特质,致使民众依从性较低 8 。针对高风险人群进行筛查更为可取,因为他们在检测结果呈阳性后更可能接受内窥镜检查。目前最常用的高风险人群识别方法包括幽门螺杆菌血清学检测、血清胃蛋白酶原水平检测以及胃泌素-17 检测等血清学检查。但经血清学筛查和风险分层后,胃镜对胃癌的检出率仅为 1.25%,与普通人群胃镜筛查的检出率(1.20%)相比改善有限 9 。因此,亟需开发新型无创、低成本、高效且可靠的筛查技术来识别胃癌高风险人群。 这类技术可优先对高风险人群进行内镜检查,提高大规模筛查项目的成本效益,最终降低胃癌相关死亡率。

Noncontrast computed tomography (CT) is a low-cost and widely used imaging protocol in physical examination centers and hospitals (especially in low-resource regions)11,12. Low-dose noncontrast chest CT has become an established diagnostic paradigm for pulmonary nodule surveillance in clinical practice13,14. Noncontrast abdominal CT is usually indicated for rapid diagnosis of acute conditions, evaluating traumatic injuries and assessing patients who have contraindications to contrast agents, which serves as a critical tool for the initial clinical evaluation of abdominal pathologies. Recent advances in artificial intelligence (AI) have shown promising results in cancer screening via noncontrast CT, notably in the detection of pancreatic cancer15. Currently, contrast-enhanced CT is used to evaluate GC16, including invasion depth, lymph node metastasis and distant metastasis. However, identifying the primary GC lesion, particularly assessing the invasion depth, is often influenced by stomach filling and gastrointestinal peristalsis. These limitations have hindered the exploration of the role of CT scans in GC evaluation. Routine CT imaging, although a valuable tool for opportunistic screening11, faces substantial challenges in detecting GC. The value of routine CT imaging in GC detection, especially noncontrast CT, has the potential to be explored further by AI.
平扫计算机断层扫描(CT)是一种成本低廉且广泛应用于体检中心和医院(尤其是资源匮乏地区)的影像学检查方案 11,12 。低剂量平扫胸部 CT 已成为临床实践中肺结节监测的成熟诊断模式 13,14 。腹部平扫 CT 通常用于急症快速诊断、评估创伤性损伤以及检查存在造影剂禁忌症的患者,是腹部病变初步临床评估的重要工具。人工智能(AI)技术的最新进展在平扫 CT 癌症筛查领域展现出显著成效,尤其在胰腺癌检测方面 15 。目前临床采用增强 CT 评估胃癌(GC) 16 ,包括浸润深度、淋巴结转移和远处转移。然而原发性胃癌病灶的识别(特别是浸润深度评估)常受胃腔充盈状态和胃肠蠕动影响,这些局限性阻碍了 CT 扫描在胃癌评估中的应用探索。 常规 CT 成像虽然是机会性筛查的有价值工具,但在胃癌检测方面仍面临重大挑战。人工智能有望进一步挖掘常规 CT 成像(尤其是平扫 CT)在胃癌检测中的应用价值。

In this study, we proposed GRAPE (GC risk assessment procedure with AI), using noncontrast CT and deep learning to identify patients with GC. The study was conducted in three distinct phases. First, we developed GRAPE using a cohort from two centers across China, including 3,470 GC cases and 3,250 non-GC (NGC) cases. Its effectiveness and utility were validated using internal validation cohorts (1,298 cases) and an independent external validation cohort from 16 centers (18,160 cases). Then, we compared the interpretations of GRAPE with those of radiologists and explored its potential in assisting radiologists during interpretation. Finally, we validated the performance of GRAPE in opportunistic screening using two real-world study cohorts comprising 78,593 consecutive participants with noncontrast CT from one comprehensive cancer center and two independent regional hospitals. We aim to validate the ability of GRAPE in incidental GC detection in opportunistic screening on routine patients included from various scenarios (physical examination, emergency, inpatient and outpatient department) in the first real-world cohort and on patients treated or being followed up for other cancer in the second real-world cohort (Fig. 1).
本研究提出 GRAPE(基于 AI 的胃癌风险评估流程),利用平扫 CT 和深度学习技术识别胃癌患者。研究分为三个阶段:首先基于中国两家医疗中心的 3,470 例胃癌病例和 3,250 例非胃癌病例开发模型,随后通过内部验证队列(1,298 例)和 16 家中心的独立外部验证队列(18,160 例)验证其有效性;其次将 GRAPE 的判读结果与放射科医师进行对比,探索其辅助诊断潜力;最后在真实世界筛查场景中,通过某肿瘤专科医院和两家地区医院连续入组的 78,593 例平扫 CT 受试者队列验证其机会性筛查性能。 我们旨在首个真实世界队列中验证 GRAPE 在常规患者(包括体检、急诊、住院和门诊等不同场景)机会性筛查中偶然发现胃癌的能力,并在第二个真实世界队列(其他癌症治疗或随访患者)中进行验证(图 1)。

Fig. 1: Overview of the development, evaluation and clinical translation of GRAPE.
图 1:GRAPE 系统的开发、评估与临床转化流程概览
figure 1

a, Model development. GRAPE takes noncontrast CT as input and outputs the probability and the segmentation mask of possible primary gastric lesions. GRAPE was trained with noncontrast CT from gastroscope-confirmed GC patients. The performance of GRAPE was evaluated via GRAPE scores, ROC curves and so on. b, Overview of the training cohort, internal validation cohort and external validation cohort. c, Overview of reader studies. d, Real-word study. The performance of GRAPE in realistic hospital opportunistic screening was validated using two real-world study cohorts, including a regional hospital cohort and a cancer center cohort.
a. 模型开发。GRAPE 以平扫 CT 作为输入,输出潜在原发性胃部病变的概率和分割掩模。该模型使用经胃镜确诊的胃癌患者平扫 CT 数据进行训练,通过 GRAPE 评分、ROC 曲线等指标评估性能。b. 训练队列、内部验证队列与外部验证队列概况。c. 读者研究概述。d. 真实世界研究。通过两个真实世界研究队列(包括地区医院队列和癌症中心队列)验证 GRAPE 在实际医院机会性筛查中的表现。

Results  结果

The GRAPE model  GRAPE 模型

The GRAPE model is a deep-learning framework designed to analyze three-dimensional (3D) noncontrast CT scans for GC detection and segmentation. GRAPE was trained on the internal training cohort, including 3,470 GC cases and 3,250 NGC cases (Extended Data Table 1). It generates two outputs: a pixel-level segmentation mask of the stomach and tumors, and a classification score distinguishing GC patients from NGC. The model follows a two-stage approach (Extended Data Fig. 1a). In the first stage, a segmentation network is used to locate the stomach within the full CT scan, generating a segmentation mask that is then used to crop and isolate the stomach region. This cropped region is fed into the second stage, where a joint classification and segmentation network with dual branches is employed. The segmentation branch detects tumors within the identified stomach region, whereas the classification branch integrates multilevel features to classify the patient as either GC-positive or NGC. Detailed description, the architecture and the interpretability analysis of GRAPE are illustrated in Methods and Extended Data Fig. 1.
GRAPE 模型是一种深度学习框架,专为分析三维非增强 CT 扫描以进行胃癌检测与分割而设计。该模型基于包含 3,470 例胃癌病例和 3,250 例非胃癌病例的内部训练队列进行训练(扩展数据表 1)。其输出包含两部分:胃部及肿瘤的像素级分割掩膜,以及区分胃癌患者与非胃癌患者的分类评分。该模型采用两阶段处理流程(扩展数据图 1a):第一阶段通过分割网络在全 CT 扫描中定位胃部区域,生成的分割掩膜用于裁剪并分离胃部区域;第二阶段将裁剪后的区域输入具有双分支的联合分类与分割网络——分割分支负责在已识别的胃部区域内检测肿瘤,分类分支则整合多层级特征对患者进行胃癌阳性或阴性的分类。关于 GRAPE 模型的详细说明、架构设计及可解释性分析详见方法部分与扩展数据图 1。

Internal and external evaluation
内部与外部评估

GRAPE achieved an area under the receiver operating characteristic (ROC) curve (AUC) of 0.970 (95% confidence interval (CI) 0.962–0.978), sensitivity of 0.851 (95% CI 0.825–0.877) and specificity of 0.968 (95% CI 0.953–0.980) in the internal validation cohort (Fig. 2a,d), whereas external validation confirmed stable performance with an AUC of 0.927 (95% CI 0.922–0.931), sensitivity 0.817 (95% CI 0.807–0.827), specificity 0.905 (95% CI 0.900–0.910) (Fig. 2c,d and Extended Data Table 2). No statistically significant differences emerged between Zhejiang Cancer Center (ZJCC) and Ningbo Second Hospital (NBSH) cohorts (Fig. 2b and Extended Data Table 2). Multicenter analysis (n ≥ 100 per center) showed a consistent AUC range of 0.902–0.995 (Extended Data Figs. 2 and 3a and Extended Data Table 2).
GRAPE 模型在内部验证队列中取得了受试者工作特征曲线下面积(AUC)0.970(95%置信区间 0.962-0.978)、灵敏度 0.851(0.825-0.877)和特异度 0.968(0.953-0.980)的优异表现(图 2a,d);外部验证则显示其性能稳定,AUC 达 0.927(0.922-0.931),灵敏度 0.817(0.807-0.827),特异度 0.905(0.900-0.910)(图 2c,d 及扩展数据表 2)。浙江省肿瘤医院(ZJCC)与宁波第二医院(NBSH)队列间未出现统计学显著差异(图 2b 及扩展数据表 2)。多中心分析(每中心≥100 例)显示 AUC 值稳定在 0.902-0.995 区间(扩展数据图 2、3a 及扩展数据表 2)。

Fig. 2: GRAPE performance in internal and external validation cohorts.
图 2:GRAPE 模型在内部与外部验证队列中的表现
figure 2

a, ROC curve of GRAPE in an internal validation cohort. b, ROC curve of GRAPE in ZJCC and NBSH in internal validation cohort. c, ROC curve of GRAPE in external validation cohort. d, Sensitivity, specificity and AUC of GRAPE in internal and external validation cohorts. e,f, Distribution of GRAPE scores of GC and NGC in the internal (e) and external (f) validation cohorts. g,h, GRAPE scores in different T stages in internal (g) and external (h) validation cohorts. i,j, GRAPE scores in different locations in internal (i) and external (j) validation cohorts. k, Proportion of GC detected by GRAPE in different T stages in internal and external validation cohorts. l, Proportion of GC detected by GRAPE in different locations of the stomach in internal and external validation cohorts.
a,GRAPE 在内部验证队列中的 ROC 曲线。b,GRAPE 在内部验证队列中 ZJCC 和 NBSH 的 ROC 曲线。c,GRAPE 在外部验证队列中的 ROC 曲线。d,GRAPE 在内部和外部验证队列中的敏感性、特异性和 AUC。e,f,内部(e)和外部(f)验证队列中 GC 和 NGC 的 GRAPE 评分分布。g,h,内部(g)和外部(h)验证队列中不同 T 分期的 GRAPE 评分。i,j,内部(i)和外部(j)验证队列中不同部位的 GRAPE 评分。k,GRAPE 在内部和外部验证队列中不同 T 分期检测到的 GC 比例。l,GRAPE 在内部和外部验证队列中胃部不同部位检测到的 GC 比例。

GRAPE score distribution analysis revealed clear differentiation between GC and NGC groups (Fig. 2e,f and Extended Data Table 1), with >0.5 scores predominating in GC. T stage progression showed significant score escalation (Fig. 2g,h), whereas tumor location demonstrated no correlation (Fig. 2g–j). Detection rates increased with T stage advancement in both cohorts. Location-specific analysis showed optimal detection for whole-stomach lesions, with similar performance across different locations (Fig. 2k,l). Sensitivity remained consistent across locations within identical T stages (Extended Data Fig. 3b–e). Notably, well-filled stomachs showed 10.72% higher EGC detection versus poor filling (Extended Data Fig. 3f). Additionally, the detection rate of GC was associated with TNM stage but showed no correlation with gender or age (Extended Data Fig. 3g–i).
GRAPE 评分分布分析显示胃癌组与非胃癌组存在明显区分(图 2e、f 及扩展数据表 1),其中>0.5 分主要集中于胃癌组。随着 T 分期的进展,评分呈现显著升高趋势(图 2g、h),而肿瘤位置则未显示相关性(图 2g-j)。两组中检出率均随 T 分期进展而提升。针对特定部位的分析表明全胃病变检出效果最佳,不同部位间表现相似(图 2k、l)。相同 T 分期下各部位的敏感性保持一致(扩展数据图 3b-e)。值得注意的是,胃充盈良好者早期胃癌检出率较充盈不良者高出 10.72%(扩展数据图 3f)。此外,胃癌检出率与 TNM 分期相关,但与性别或年龄无显著关联(扩展数据图 3g-i)。

Reader studies  读者研究

In a multicenter reader study (n = 13 radiologists) interpreting 297 noncontrast CT scans, GRAPE surpassed all readers in diagnostic accuracy (AUC, GRAPE 0.92 versus readers’ range 0.76–0.85), demonstrating significant performance advantages in both sensitivity and specificity. When reassessing cases with AI assistance after a ≥1-month washout, radiologists achieved mean improvements of 6.6% sensitivity and 13.3% specificity. Notably, while both senior and junior radiologists showed significant accuracy gains (P < 0.05), their augmented performance remained below GRAPE’s standalone results (Fig. 3a–c).
在一项涉及 13 名放射科医师的多中心阅片研究中,研究者对 297 例非增强 CT 扫描进行判读,结果显示 GRAPE 模型的诊断准确率(AUC 值为 0.92)超越所有医师(医师 AUC 值范围 0.76-0.85),在敏感性和特异性方面均展现出显著优势。经过至少 1 个月的洗脱期后,在 AI 辅助下重新评估病例时,放射科医师的敏感性平均提升 6.6%,特异性提升 13.3%。值得注意的是,虽然高年资与低年资医师的判读准确率均有显著提高(P < 0.05),但其增强后的表现仍低于 GRAPE 独立诊断的结果(图 3a-c)。

Fig. 3: Reader study.  图 3:阅片者研究。
figure 3

a, Comparison between GRAPE and 13 readers with 13 radiologists with different levels of expertise on GC. b, Performance in GC discrimination of the same set of radiologists with the assistance of GRAPE on noncontrast CT. c, Balanced accuracy improvement in radiologists with different levels of expertise for GC discrimination. d, Detection rate of EGC and AGC by radiologists alone, radiologists with the assistance of GRAPE and GRAPE alone. e, Detection rate of GC in different locations by radiologists alone, radiologists with the assistance of GRAPE and GRAPE alone. f, Examples of T1 and T2 GC discrimination by GRAPE, which were missed by readers.
a,GRAPE 与 13 位不同专业水平的放射科医师在胃癌检测中的对比。b,同一组放射科医师在 GRAPE 辅助下对非增强 CT 图像进行胃癌鉴别的表现。c,不同专业水平放射科医师在 GRAPE 辅助下胃癌鉴别平衡准确率的提升情况。d,放射科医师独立检测、GRAPE 辅助检测及 GRAPE 独立检测对早期胃癌和进展期胃癌的检出率。e,放射科医师独立检测、GRAPE 辅助检测及 GRAPE 独立检测对不同部位胃癌的检出率。f,GRAPE 成功鉴别 T1/T2 期胃癌而放射科医师漏诊的案例展示。

We then analyzed the proportion of GC detected by GRAPE, radiologists alone and radiologist with GRAPE assistant in EGC and AGC. GRAPE showed a higher proportion of EGC detected compared with radiologists alone or with GRAPE assistance (Fig. 3d). We also found that GRAPE performed significantly better in the middle-third location compared with radiologists alone or with GRAPE assistance (Fig. 3e), which may result from the difficulty in identifying GC at this location with various degrees of stomach filling. Figure 3f shows that representative CT scans of T1/T2 stage GC that radiologists failed to identify.
随后我们分析了 GRAPE 系统、单独放射科医师以及放射科医师结合 GRAPE 辅助在早期胃癌(EGC)和进展期胃癌(AGC)中的检出比例。与单独放射科医师或 GRAPE 辅助阅片相比,GRAPE 系统显示出更高的早期胃癌检出比例(图 3d)。我们还发现,在胃中部位置病变的检测上,GRAPE 系统表现显著优于单独放射科医师或 GRAPE 辅助阅片(图 3e),这可能源于该部位因胃充盈程度不同而导致的胃癌识别困难。图 3f 展示了放射科医师未能识别的 T1/T2 期胃癌代表性 CT 图像。

Real-world study of hospital opportunistic screening
医院机会性筛查的真实世界研究

We further validated the performance of GRAPE in opportunistic screening using two real-world study cohorts comprising 78,593 consecutive participants with noncontrast CT between 2018 and 2024 from two independent regional hospitals and one comprehensive cancer center.
我们进一步利用两个真实世界研究队列(包含 2018 至 2024 年间来自两家独立地区医院和一家综合癌症中心的 78,593 名连续接受非增强 CT 检查的参与者)验证了 GRAPE 模型在机会性筛查中的性能表现。

First, we validated the opportunistic screening capacity of GRAPE in a real-world cohort comprising 41,178 routine patients from diverse clinical settings (physical examination, emergency, inpatient and outpatient departments) across regional hospitals. GRAPE identified 11.28% (4,645) as high risk, with distinct proportions across cohorts: 6.2% (1,248 of 20,097) in Fenghua Peopleʼs Hospital (FHPH) versus 16.1% (3,397 of 21,081) in Pinyang Peopleʼs Hospital (PYPH) (Fig. 4). Through comprehensive medical record review and follow-up, we stratified GRAPE-identified high-risk patients into four clinical trajectories: confirmed GC via gastroscopy, NGC diagnoses, asymptomatic patients with 1-year follow-up and those lacking both gastroscopy and follow-up documentation. Finally, among the high-risk population, gastroscopic and follow-up verification revealed that 289 GC cases were confirmed in FHPH, while 556 GC in PYPH (Fig. 4a,b and Extended Data Table 4). The detection rate was 24.49% (289 of 1,180) and 17.65% (556 of 3,151) in FHPH and PYPH, respectively. Furthermore, we retrospectively analyzed the patients with confirmed GC in the ‘high-risk’ population and found that 40.48% (117 of 289) and 38.31% (213 of 556) patients did not have abdominal symptoms in FHPH and PYPH, respectively. The multidisciplinary team (MDT) searched and reviewed the available electronical health record of low-risk patients in the FHPH and PYPH cohorts, including all existing follow-up information, and found 37 and 49 GCs patients, respectively, confirmed by pathology. Thus, the estimated sensitivity and specificity are 0.887 (95% CI 0.850–0.920) and 0.951 (95% CI 0.949–0.954) for the FHPH cohort, and 0.919 (95% CI 0.899–0.939) and 0.861 (95% CI 0.857–0.866) for the PYPH cohort, respectively.
首先,我们在包含 41,178 例常规患者的真实世界队列中验证了 GRAPE 的机会性筛查能力,这些患者来自地区医院不同临床场景(体检科、急诊科、住院部和门诊部)。GRAPE 识别出 11.28%(4,645 例)为高风险人群,各队列比例差异显著:奉化人民医院(FHPH)为 6.2%(20,097 例中的 1,248 例),而平阳人民医院(PYPH)达 16.1%(21,081 例中的 3,397 例)(图 4)。通过全面病历审查和随访,我们将 GRAPE 识别的高风险患者分为四种临床轨迹:经胃镜确诊的胃癌、非胃癌诊断、无症状但完成 1 年随访的患者,以及既无胃镜检查也无随访记录者。最终在高风险人群中,胃镜及随访验证显示 FHPH 确诊 289 例胃癌,PYPH 确诊 556 例(图 4a、b 及扩展数据表 4)。两院的检出率分别为 24.49%(FHPH:1,180 例中的 289 例)和 17.65%(PYPH:3,151 例中的 556 例)。 此外,我们对"高风险"人群中确诊胃癌的患者进行回顾性分析发现,复旦大学附属浦东医院(FHPH)和濮阳市人民医院(PYPH)分别有 40.48%(289 例中的 117 例)和 38.31%(556 例中的 213 例)患者未出现腹部症状。多学科团队(MDT)检索并审查了 FHPH 和 PYPH 队列中低风险患者的可用电子健康记录(包括所有现有随访信息),分别发现 37 例和 49 例经病理证实的胃癌患者。因此,FHPH 队列的预估敏感性和特异性分别为 0.887(95% CI 0.850-0.920)和 0.951(95% CI 0.949-0.954),PYPH 队列则分别为 0.919(95% CI 0.899-0.939)和 0.861(95% CI 0.857-0.866)。

Fig. 4: Performance of GRAPE in realistic hospital opportunistic screening in regional hospitals validated using a real-world study cohort.
图 4:GRAPE 在区域性医院真实世界机会性筛查中的性能表现(基于真实世界研究队列验证)。
figure 4

a, Overview of GRAPE’s performance in FHPH cohort and a case study. b, Overview of GRAPE’s performance in PYPH cohort and a case study.
a、GRAPE 在 FHPH 队列中的性能概述及案例研究。b、GRAPE 在 PYPH 队列中的性能概述及案例研究。

Next, we aimed to validate the ability of GRAPE in GC detection on a comprehensive cancer center, where most patients were diagnosed with or suspected of having tumors. We consecutively collected the abdominal noncontrast CT between January 2022 and June 2024 from ZJCC. GRAPE identified 8.1% (3,045) as high risk. Comprehensive medical review stratified that 311 GC cases were confirmed (Fig. 5a and Extended Data Table 4), including 34.41% (107 of 311) patients without obvious abdominal symptoms. The overall detection rate was 12.1%. Notably, one patient was predicted high risk by GRAPE on an abdominal noncontrast CT scan in June 2024. MDT review showed that the patient underwent gastroscopy, and was diagnosed as GC in August 2024. Surgical pathology confirmed the GC was pT2N0M0 (AJCC Stage IB) (Fig. 5b).
接下来,我们旨在验证 GRAPE 在综合性癌症中心检测胃癌的能力,该中心大多数患者被确诊或疑似患有肿瘤。我们连续收集了 2022 年 1 月至 2024 年 6 月期间在 ZJCC 进行的腹部非增强 CT 扫描。GRAPE 识别出 8.1%(3,045 例)为高风险。综合医学审查分层确认了 311 例胃癌病例(图 5a 和扩展数据表 4),其中包括 34.41%(311 例中的 107 例)无明显腹部症状的患者。总体检出率为 12.1%。值得注意的是,GRAPE 在 2024 年 6 月的一次腹部非增强 CT 扫描中预测一名患者为高风险。多学科会诊显示该患者于 2024 年 8 月接受胃镜检查并被诊断为胃癌。手术病理证实为 pT2N0M0(AJCC 分期 IB 期)(图 5b)。

Fig. 5: Performance of GRAPE in realistic hospital opportunistic screening in cancer center validated using a real-world study cohort.
图 5:GRAPE 在癌症中心真实世界研究队列验证的医院机会性筛查中的表现。
figure 5

a, Overview of GRAPE’s performance in the ZJCC cohort. b, Illustration of GC patient diagnosed after detection using GRAPE. This patient was being followed up for lung cancer treatment in June 2024 but was detected by GRAPE. MDT review showed that the patient underwent gastroscopy and was diagnosed with moderately differentiated to well-differentiated GC in August 2024. c, Data collection process of prediagnosis CT scans. Among 26 patients under follow-up for other cancers, 11 had prediagnostic CT scans taken in the 6 months before GC diagnosis. GRAPE suggested GC in 63.64% (7 of 11) patients in the 6 months before their diagnosis. d, A patient underwent 2 abdominal CT examinations at 14 and 6 months before diagnosis of GC due to pulmonary nodules, with no notable abnormalities of stomach reported in 2023. Later, this patient was diagnosed with poorly differentiated GC in T4aN1M0 stage via gastroscopy after more than 4 months of abdominal discomfort in April 2024. We evaluated the noncontrast CT before and at the time of diagnosis, and the results showed that the GRAPE indicated GC in noncontrast CT 6 months before diagnoses. Based on the GRAPE prediction and retrospective review of the image, the MDT suspected the patient was in stage T2, which was detected successfully by GRAPE 6 months in advance.
a、GRAPE 在 ZJCC 队列中的性能概述。b、GRAPE 检出后被确诊胃癌患者的案例图示。该患者 2024 年 6 月因肺癌治疗接受随访时被 GRAPE 系统检出,多学科会诊显示患者于 2024 年 8 月接受胃镜检查,确诊为中分化至高分化的胃癌。c、诊断前 CT 扫描数据收集流程。在 26 名因其他癌症接受随访的患者中,11 例在胃癌确诊前 6 个月内接受过 CT 扫描,其中 63.64%(11 例中的 7 例)患者在确诊前 6 个月的扫描中 GRAPE 系统已提示胃癌可能。d、某患者因肺结节分别在胃癌确诊前 14 个月和 6 个月接受过两次腹部 CT 检查,2023 年报告均未提示胃部明显异常。后因 2024 年 4 月持续 4 个多月的腹部不适经胃镜检查确诊为 T4aN1M0 期低分化胃癌。我们对确诊时及确诊前的平扫 CT 进行评估,结果显示在确诊前 6 个月的平扫 CT 中 GRAPE 系统已提示胃癌征象。 根据 GRAPE 预测结果及影像学回顾性分析,多学科诊疗团队怀疑患者处于 T2 期,这一病情被 GRAPE 系统提前 6 个月成功检出。

We validated the ability of GRAPE in early diagnosis. Among 311 GC cases in the second real-world cohort (ZJCC), 26 patients were under follow-up, being reviewed or under treatment for other cancer, whereas the other 285 patients visited the hospital for the first time before diagnoses with GC. Among the 26 patients, 11 had prediagnostic abdominal noncontrast CT scans taken during the 6 months before GC diagnosis. GRAPE was tested on 11 prediagnostic scans and predicted GC in 63.64% (7 of 11) patients (Fig. 5c). A patient underwent two abdominal CT examinations at 14 and 6 months before diagnosis of GC due to pulmonary nodules, with no notable abnormalities of stomach reported in 2023. Later, this patient was diagnosed with poorly differentiated GC in T4aN1M0 stage via gastroscopy after more than 4 months of abdominal discomfort in April 2024. We evaluated the noncontrast CT before and at the time of diagnosis and the results showed that the GRAPE indicated GC in noncontrast CT 6 months before diagnoses. Based on the GRAPE prediction and retrospective review of the image, the MDT suspected the patient was in stage T2, which was detected successfully by GRAPE 6 months in advance (Fig. 5d).
我们验证了 GRAPE 在早期诊断中的能力。在第二个真实世界队列(ZJCC)的 311 例胃癌病例中,26 例患者处于随访、复查或治疗其他癌症阶段,其余 285 例患者为首次就诊即确诊胃癌。在这 26 例患者中,11 例在胃癌确诊前 6 个月内接受过腹部平扫 CT 检查。GRAPE 对这 11 例确诊前扫描的测试显示,其对 63.64%(11 例中的 7 例)患者作出了胃癌预测(图 5c)。其中一例患者因肺结节分别在确诊前 14 个月和 6 个月接受过两次腹部 CT 检查,2023 年报告显示胃部无显著异常。2024 年 4 月该患者经历 4 个多月腹部不适后,经胃镜确诊为 T4aN1M0 期低分化胃癌。我们对比评估了确诊时及确诊前的平扫 CT,结果显示 GRAPE 在确诊前 6 个月的平扫 CT 中已提示胃癌迹象。 根据 GRAPE 预测结果及影像学回顾性分析,多学科会诊团队怀疑患者处于 T2 期,这一结果被 GRAPE 提前 6 个月成功检出(图 5d)。

Discussion  讨论

CT provides a comprehensive and objective evaluation of anatomical structures, irrespective of clinical indications, establishing its significance as an effective cancer detection tool17. The emergence of AI methods has enabled the automatic segmentation and processing of underutilized data through deep learning, facilitating fast, objective algorithms for accurate lesion identification18,19. In this study, we present GRAPE: an AI model based on routine noncontrast CT scans for GC detection. The effectiveness of this model was further validated in an independent, external multicenter study. GRAPE demonstrated superior detection performance for GC compared with previous models based on clinical information and serological diagnostics (AUC 0.757–0.79)20,21,22, and the performance was comparable to liquid biopsy23,24, including circular RNAs25, microRNAs26,27, cell-free DNA28 and metabolites23 in blood samples, which reported AUC values between 0.83 and 0.94. The subgroup analysis demonstrated GRAPE sensitivity of approximately 50% for EGC, and more than 90% for GC at T3 and T4 stages. Although GRAPE demonstrated superior performance in advanced GC detection, it may allow GC to be detected at an earlier stage, which can also improve the overall survival of GC patients.
CT 能够对解剖结构进行全面客观的评估,不受临床指征限制,这确立了其作为有效癌症检测工具的重要地位 17 。人工智能方法的出现,使得通过深度学习对未充分利用的数据进行自动分割和处理成为可能,从而实现了快速、客观的病灶精准识别算法 18,19 。本研究提出的 GRAPE 模型,是一种基于常规非增强 CT 扫描的胃癌检测 AI 模型。该模型的有效性在一个独立的外部多中心研究中得到进一步验证。与既往基于临床信息和血清学诊断的模型(AUC 0.757-0.79)相比 20,21,22 ,GRAPE 展现出更优异的胃癌检测性能,其表现与液体活检技术相当 23,24 ——包括血液样本中的环状 RNA 25 、微小 RNA 26,27 、游离 DNA 28 和代谢物 23 检测(这些技术报告的 AUC 值介于 0.83 至 0.94 之间)。亚组分析显示,GRAPE 对早期胃癌的敏感性约为 50%,对 T3 和 T4 期胃癌的敏感性超过 90%。 尽管 GRAPE 在晚期胃癌检测中展现出卓越性能,但它同样有望实现胃癌的早期发现,从而提升胃癌患者的整体生存率。

To validate the value of GRAPE in realistic hospital opportunistic screening, we conducted two real-world studies comprising 78,593 consecutive participants. GRAPE achieved high detection rates, significantly surpassing the 0.9% detection rate of all GC from questionnaire-based methods29. GRAPE also showed good performance on prediagnosis scans performed in the 6 months before GC diagnosis. Despite its advantage, GRAPE is not intended to replace endoscopic evaluation. We believe it provides a valuable alternative for symptomatic patients hesitant to undergo initial endoscopic screening.
为验证 GRAPE 在医院机会性筛查中的实际应用价值,我们开展了两项包含 78,593 名连续参与者的真实世界研究。GRAPE 实现了高检出率,显著超越基于问卷调查方法 0.9%的总体胃癌检出率 29 。该模型在胃癌确诊前 6 个月的预诊断扫描中也表现出良好性能。尽管具备优势,GRAPE 并非旨在替代内镜评估。我们认为该技术为抗拒初始内镜筛查的有症状患者提供了有价值的替代方案。

Currently, no other new GC screening methods have been implemented in large-scale screening cohorts, although similar studies have been conducted for other malignancies. The National Lung Screening Trial in the United States reported a lung cancer detection rate of 2.4–5.2% using low-dose helical CT30. The detection rate of ovarian and fallopian tube cancers was 5.5% using transvaginal ultrasound31. In liquid screening, the detection rate of nasopharyngeal carcinoma was 11.0% using Epstein–Barr virus DNA levels in plasma samples32, and the detection rate of colorectal cancer was 3.2% via cell-free DNA detection in plasma in the theoretical ECLIPSE study population33. As a highly accurate high-risk identification tool, GRAPE has the potential to enable large-scale screening programs by boosting the detection rate for the secondary endoscopy examinations, thus reducing GC mortality.
目前尚未有其他新型胃癌筛查方法应用于大规模筛查队列,尽管针对其他恶性肿瘤已开展类似研究。美国国家肺部筛查试验报告显示,采用低剂量螺旋 CT 的肺癌检出率为 2.4%-5.2% 30 。经阴道超声检查对卵巢和输卵管癌的检出率达 5.5% 31 。在液体活检领域,基于血浆样本中 EB 病毒 DNA 水平的鼻咽癌检出率为 11.0% 32 ;ECLIPSE 理论研究人群中,通过血浆游离 DNA 检测的结直肠癌检出率为 3.2% 33 。作为高精度高风险识别工具,GRAPE 系统有望通过提升二次内镜检查的检出率来推动大规模筛查计划实施,从而降低胃癌死亡率。

The GRAPE model adopts a simple yet effective architecture to the detection of GC on noncontrast CT scans by integrating both classification and segmentation into a single deep-learning framework. Traditional methods often fall short by focusing solely on segmentation, which limits the ability to provide patient-level probability assessments, or by relying exclusively on classification approaches applied directly to organ regions of interest (ROIs), thereby reducing the interpretability of predictions. In contrast, our joint model leverages the advantages of both strategies, allowing us to benefit from detailed local textural pattern recognition of tumors while maintaining a comprehensive understanding of the stomach’s overall shape and structure. By adapting the nnUNet backbone—a proven architecture known for its robust and high-performance visual feature extraction capabilities—our model achieved enhanced capacity to handle the complex visual characteristics present in medical images specific to GC. GRAPE’s overall design not only ensures efficient feature learning but also offers improved generalization across diverse datasets. Moreover, in terms of interpretability, GRAPE combined segmentation and classification pipeline, providing clinicians with detailed images where tumor segmentation masks are explicitly delineated while simultaneously offering decisive classification outputs. We also visually analyzed interpretability through Grad-CAM and found the coarse heatmap corresponded well with the tumor region. Although noncontrast CT is not a routine modality for GC diagnosis, the performance of radiologists can be improved significantly via the assistance of GRAPE. This improvement could be attributed to GRAPE’s tumor segmentation output, which is easy for radiologists to interpret.
GRAPE 模型采用简洁高效的架构,通过将分类与分割整合至单一深度学习框架,实现了非增强 CT 扫描中的胃癌检测。传统方法往往存在局限:仅聚焦分割任务会限制患者层面概率评估的能力;而单纯依赖直接应用于器官感兴趣区域(ROIs)的分类方法,则会降低预测结果的可解释性。相比之下,我们的联合模型融合了两种策略的优势,既能从肿瘤局部纹理特征的精细识别中获益,又可保持对胃部整体形态结构的全局把握。通过采用 nnUNet 主干网络——该架构以其稳健高效的视觉特征提取能力著称——我们的模型显著提升了处理胃癌特异性医学图像复杂视觉特征的能力。GRAPE 的整体设计不仅确保了高效的特征学习,还展现出跨数据集更优异的泛化性能。 此外,在可解释性方面,GRAPE 结合了分割与分类流程,既为临床医生提供肿瘤分割掩模清晰标注的详细图像,又能同步输出确定性分类结果。我们通过 Grad-CAM 进行可视化可解释性分析时发现,生成的粗粒度热力图与肿瘤区域高度吻合。虽然平扫 CT 并非胃癌诊断的常规检查手段,但在 GRAPE 的辅助下,放射科医生的诊断效能可获得显著提升。这种提升可能归因于 GRAPE 输出的肿瘤分割结果更便于医生直观解读。

Despite GRAPE’s promising results, further research is required to address key aspects of its applicability. First, a large prospective screening cohort is necessary to validate GRAPE’s effectiveness in GC screening more robustly. We are implementing large-scale prospective validation of GRAPE’s performance through a nationwide GC screening program. Due to scenario difference, for example, high-risk versus opportunistic screening, the optimal cutoff value is expected to be selected to balance the sensitivity and specificity during clinical implementation. Second, enhancing the sensitivity of the model for early-stage GC remains a priority. We will amplify the training dataset of GRAPE with more EGC, thereby improving model sensitivity. We will also utilize more information from endoscopic and pathology reports as additional supervision to train the GRAPE model to further improve sensitivity, for example, the description of shape, texture, size and degree of invasion of the tumor. As mentioned before, patients with better gastric filling had higher sensitivity in early GC detection. As a result, we would recommend distension of the stomach before noncontrast CT imaging in our prospective trial. Third, although the prevalence of non-GC tumors such as gastrointestinal stromal tumors and gut-associated lymphoid tissue tumors are relatively low, GRAPE could be expanded to detect and diagnose these rare tumors, making it a more comprehensive screening tool. Finally, to address generalization challenges, future direction would be the incorporation of training data from more centers. From the technical aspect, the test-time adaptation34 technique is a recent machine-learning approach for better model generalizability and has the potential to boost external validation performance for GRAPE. Overall, GRAPE represents a new approach for mass GC screening, demonstrating high sensitivity, specificity and detection rates, thus enhancing both the cost-effectiveness and compliance of GC screening efforts.
尽管 GRAPE 研究取得了令人鼓舞的成果,但其临床应用仍需解决若干关键问题。首先,需要通过大规模前瞻性筛查队列更全面地验证 GRAPE 在胃癌筛查中的效能。我们正通过全国性胃癌筛查项目对 GRAPE 进行大规模前瞻性验证。由于实际应用场景存在差异(例如高风险人群筛查与机会性筛查),在临床实施过程中需要选择最佳临界值以平衡敏感性与特异性。其次,提升模型对早期胃癌的检测灵敏度仍是重点方向。我们将通过纳入更多早期胃癌病例来扩充 GRAPE 训练数据集,从而提高模型敏感性。此外还将利用内镜和病理报告中的形态学特征(如肿瘤的形状、质地、大小及浸润程度描述)作为附加监督信号来训练模型,进一步提升检测灵敏度。如前所述,胃腔充盈度较好的患者在早期胃癌检测中表现出更高的敏感性。 因此,我们建议在前瞻性试验中进行非增强 CT 成像前先扩张胃部。第三,虽然非胃癌肿瘤(如胃肠道间质瘤和肠道相关淋巴组织肿瘤)的发病率相对较低,但 GRAPE 系统可扩展用于检测和诊断这些罕见肿瘤,从而成为更全面的筛查工具。最后,为解决泛化性挑战,未来方向将是纳入更多医疗中心的训练数据。从技术层面看,测试时自适应技术是近期提升模型泛化能力的机器学习方法,有望增强 GRAPE 的外部验证性能。总体而言,GRAPE 为大规模胃癌筛查提供了新途径,展现出高灵敏度、特异性和检出率,从而显著提升了胃癌筛查的成本效益和受检依从性。

Methods  方法

Trial design and population
试验设计与人群

This study employed a phase design and included three independent cohorts (Fig. 1). The study was approved by the centralized institutional review board (IRB) covering 20 participating centers (IRB-2024-279, Clinicaltrials.gov registration NCT06614179). The training cohort was enrolled between September 2006 and June 2024 across 2 centers in China, comprising 3,470 participants with GC and 3,250 participants with NGC, aged 18–99 years. The internal validation cohort was enrolled between December 2006 and April 2024 across 2 centers in China, comprising 650 participants with GC and 648 participants with NGC, aged 20–94 years. Moreover, between January 2011 and August 2024, we conducted an independent external validation cohort, enrolled 18,160 participants from 16 centers who underwent gastroscopic examination, aged 18–80 years. The inclusion criteria for the GC group required participants to have a diagnosis of GC confirmed by gastroscopic pathological biopsy and to have undergone a noncontrast CT examination before receiving any treatment at the time of diagnosis. Participants were excluded if they underwent the noncontrast CT examination after receiving treatment. For the NGC group, inclusion criteria required participants to be confirmed as not having GC by gastroscopy and to have undergone a noncontrast CT examination within 6 months of the gastroscopy. Alternatively, NGC participants were included if they underwent a noncontrast CT examination and were confirmed without GC based on a 1-year follow-up. We conducted a comprehensive review of the medical records for all enrolled patients, including age, sex, T stage, location and so on. Full details of the internal and external cohorts, including patients’ clinical information and CT acquisition parameters, are shown in Extended Data Table 1.
本研究采用阶段性设计,包含三个独立队列(图 1)。研究方案获得覆盖 20 家参与中心的集中机构审查委员会批准(IRB-2024-279,临床试验注册号 NCT06614179)。训练队列于 2006 年 9 月至 2024 年 6 月期间在中国 2 家中心招募,包含 3,470 名胃癌患者和 3,250 名非胃癌受试者,年龄 18-99 岁。内部验证队列于 2006 年 12 月至 2024 年 4 月期间在中国 2 家中心招募,包含 650 名胃癌患者和 648 名非胃癌受试者,年龄 20-94 岁。此外在 2011 年 1 月至 2024 年 8 月期间,我们建立了独立外部验证队列,从 16 家中心招募了 18,160 名接受胃镜检查的受试者,年龄 18-80 岁。胃癌组的纳入标准要求受试者经胃镜病理活检确诊胃癌,且在确诊时接受任何治疗前完成平扫 CT 检查。若受试者在接受治疗后进行平扫 CT 检查则被排除。 对于非胃癌组,纳入标准要求参与者通过胃镜检查确认未患胃癌,并在胃镜检查后 6 个月内接受过非增强 CT 检查。或者,若参与者接受了非增强 CT 检查并通过 1 年随访确认未患胃癌,也可纳入非胃癌组。我们对所有入组患者的病历进行了全面审查,包括年龄、性别、T 分期、肿瘤位置等信息。内部和外部队列的完整细节(包括患者临床信息和 CT 扫描参数)详见扩展数据表 1。

Finally, we validated the performance of GRAPE in realistic hospital opportunistic screening using 2 real-world study cohorts, including a cohort comprising 41,178 consecutive participants with noncontrast CT between 2018 and 2024 from 2 regional hospitals and a cohort comprising 37,415 participants with last noncontrast CT between 2022 and 2024 from a cancer center. In total, 45.19% (9,083 of 20,097) patients came from the outpatient and emergency departments and physical examination center, while 54.80% (11,014 of 20,097) patients came from department of inpatient in FHPH; 41.56% (8,761 of 21,081) patients came from the outpatient and emergency departments and physical examination center, whereas 58.44% (12,320 of 21,081) patients came from the inpatient department in PYPH. In ZJCC, we additionally excluded 4,573 patients diagnosed with GC before routine CT examination. In total, 30.66% (11,472 of 37,415) patients came from the outpatient and emergency departments and physical examination center, while 39.34% (25,943 of 37,415) patients came from the inpatient department.
最后,我们使用两个真实世界研究队列验证了 GRAPE 在医院机会性筛查中的实际表现。第一个队列包含 2018 至 2024 年间两家地区医院连续接受非增强 CT 检查的 41,178 名参与者;第二个队列包含 2022 至 2024 年间某癌症中心最后接受非增强 CT 检查的 37,415 名参与者。在 FHPH 医院中,45.19%(20,097 例中的 9,083 例)患者来自门诊、急诊及体检中心,54.80%(20,097 例中的 11,014 例)来自住院部;PYPH 医院中 41.56%(21,081 例中的 8,761 例)患者来自门诊、急诊及体检中心,58.44%(21,081 例中的 12,320 例)来自住院部。在 ZJCC 癌症中心,我们额外排除了 4,573 例在常规 CT 检查前已确诊胃癌的患者。总体来看,30.66%(37,415 例中的 11,472 例)患者来自门诊、急诊及体检中心,39.34%(37,415 例中的 25,943 例)来自住院部。

Image acquisition and processing
图像采集与处理

CT images in the training cohort, internal validation cohort and external validation cohort were collected before gastroscopy. Stomach segmentation was obtained via a semi-supervised self-training approach using a combination of publicly available dataset35 with manually annotated masks and our internal training set. Tumor segmentation was performed meticulously by two proficient radiologists using ITK-SNAP software (v.3.8, available at http://www.itksnap.org)36. As GC can be distinguished readily from normal gastric tissues in portal venous-phase CT images, we delineated two-dimensional sections containing tumors to effectively outline tumor boundaries in venous-phase CT images. In cases where marked discrepancies in the delineation of the ROI arose between the two radiologists, resolution was achieved through constructive discussion, resulting in a consensus. To obtain tumor annotations on noncontrast images, we employed the DEEDS37 registration algorithm to align venous-phase images with noncontrast images. The resulting deformation fields were then applied to the annotations on the venous phase, thereby generating the transformed tumor annotations for the noncontrast phase. Finally, the transferred tumor annotations on the noncontrast phase were again delineated and confirmed via the same procedure as on the venous phase.
训练队列、内部验证队列和外部验证队列的 CT 图像均在胃镜检查前采集。胃部分割采用半监督自训练方法完成,该方法结合了公开数据集 35 (含人工标注掩膜)与我们内部训练集的数据。肿瘤分割由两名资深放射科医师使用 ITK-SNAP 软件(v3.8 版,官网 http://www.itksnap.org) 36 精细标注。由于门静脉期 CT 图像中胃癌组织与正常胃组织对比明显,我们通过勾画含肿瘤的二维切面来有效标定静脉期 CT 图像的肿瘤边界。当两位医师对感兴趣区标注存在显著差异时,通过建设性讨论达成共识。为获取平扫图像的肿瘤标注,我们采用 DEEDS 37 配准算法将静脉期图像与平扫图像进行对齐。 随后将生成的形变场应用于静脉期标注,从而生成平扫期的肿瘤标注转换结果。最后,按照与静脉期相同的流程对平扫期转移的肿瘤标注再次进行勾画确认。

AI model: GRAPE  AI 模型:GRAPE

GRAPE model was developed to analyze 3D noncontrast CT scans aimed at the detection and segmentation of GC. The model yields two outputs: a pixel-level segmentation mask outlining the stomach and potential tumors, and a classification score that categorizes patients into those with GC and normal controls. The model employs a two-stage strategy. The first stage involves the localization of the stomach region within the entire CT scan. This stage is implemented using a segmentation network, nnUNet38, with the configuration for 3D full resolution. The output stomach segmentation mask is used to generate a 3D bounding box of the stomach region, which is cropped and served as input for the second stage. This cropping enhances the efficiency and simplification of the subsequent stage by focusing the tumor segmentation specifically on the area of interest. In the second stage, a joint classification and segmentation network with dual branches is employed. The segmentation branch focuses on segmenting tumors in this identified stomach region. Meanwhile, the classification branch, which integrates multilevel feature maps from the segmentation branch, is designed to classify the patients as either having GC or as normal controls. Whereas noncontrast CT has inherent resolution limitations (~0.5–1 mm), GRAPE detects early GCs not through direct lesion visualization but by integrating subtle 3D morphometric patterns (for example, local wall thickening, mucosal heterogeneity) and contextual radiological features (for example, perigastric fat stranding, lymph node microcalcifications).
GRAPE 模型专为分析三维非增强 CT 扫描而开发,旨在实现胃癌的检测与分割。该模型可生成两项输出:勾勒胃部及潜在肿瘤区域的像素级分割掩膜,以及将患者分类为胃癌患者与正常对照组的评分结果。模型采用两阶段策略:第一阶段通过分割网络 nnUNet 38 在全 CT 扫描中定位胃部区域,采用三维全分辨率配置。输出的胃部分割掩膜用于生成胃部三维边界框,经裁剪后作为第二阶段输入。此裁剪操作通过将肿瘤分割聚焦于目标区域,提升了后续阶段的处理效率与简化程度。第二阶段采用双分支联合分类分割网络,其中分割分支专注于在已识别的胃部区域内进行肿瘤分割。 与此同时,分类分支整合了来自分割分支的多层级特征图,旨在将患者分类为胃癌患者或正常对照组。虽然非增强 CT 存在固有分辨率限制(约 0.5-1 毫米),但 GRAPE 并非通过直接病灶显像,而是通过整合细微的 3D 形态特征(如局部胃壁增厚、黏膜异质性)和上下文放射学特征(如胃周脂肪条索影、淋巴结微钙化)来检测早期胃癌。

To address variations in CT imaging parameters across several centers, we adopted a robust preprocessing pipeline as follows. First, all scans were resampled to the median voxel spacing of the training dataset, specifically 0.717 mm × 0.717 mm × 5.0 mm, to harmonize differences in slice thickness and spatial resolution. Next, all intensity values were normalized using the mean and s.d. calculated from the training dataset to minimize interscanner intensity discrepancies. During training, fixed-size 3D patches with a size of 256 × 256 × 28 were extracted randomly from the resampled volumes, and random Gaussian noise was injected before feeding into the deep network to increase model robustness. At test time, predictions were generated using a sliding window strategy across the entire resampled volume, ensuring consistent performance regardless of original scan dimensions.
为解决多中心 CT 成像参数差异问题,我们采用以下稳健预处理流程:首先将所有扫描图像重采样至训练数据集的中位体素间距(0.717 毫米×0.717 毫米×5.0 毫米),以统一层厚与空间分辨率差异;随后采用训练数据集计算的均值与标准差对所有强度值进行归一化处理,最大限度减少扫描设备间的强度差异。训练过程中,从重采样后的体积数据中随机提取固定尺寸的三维图像块(256×256×28),并在输入深度网络前注入随机高斯噪声以增强模型鲁棒性。测试阶段则采用滑动窗口策略对整个重采样体积生成预测结果,确保无论原始扫描尺寸如何都能保持稳定性能。

The training procedure involved a fivefold cross-validation approach by dividing the training set into five folds. Five models were trained individually using four folds, with validation on the one remaining fold, following the standard nnUNet cross-validation protocol. In the testing phase, ensemble learning was applied, averaging classification probabilities across five models to determine the classification score, and using pixel-level voting to determine segmentation output. The class with the higher probability between the two categories was chosen as the final classification for each test case. In addition, different status of stomach filling is challenging for the recognition of GC. GRAPE inherently addresses stomach filling variability through comprehensive training on a wide range of gastric volumes (fasting, <100 cm3 to over-distended, >800 cm3).
训练过程采用五折交叉验证法,将训练集划分为五部分。按照标准 nnUNet 交叉验证流程,分别使用其中四折数据训练五个模型,并在剩余一折上进行验证。测试阶段采用集成学习方法,通过平均五个模型的分类概率来确定分类得分,并采用像素级投票确定分割结果。每个测试案例最终分类结果取两类中概率较高者。此外,胃部充盈状态差异为胃癌识别带来挑战。GRAPE 算法通过涵盖广泛胃容积(空腹状态<100 cm³至过度充盈>800 cm³)的全面训练,本质上解决了胃部充盈变异性的问题。

An advantage of the GRAPE model is its inherent interpretability. Given that it produces both segmentation and classification outputs, the segmentation output provides a pixel-level visual map that aids in understanding and confirming the classification results. To further enhance interpretability, we visualized the heatmap of the convolutional feature map in the classification branch using Grad-CAM (Gradient-weighted Class Activation Mapping)39, which is a widely applicable technique to identify the regions contribute most importantly to the classification.
GRAPE 模型的优势在于其固有的可解释性。由于该模型同时输出分割和分类结果,其分割输出提供的像素级可视化图谱有助于理解和验证分类结果。为进一步增强可解释性,我们采用梯度加权类激活映射(Grad-CAM)技术对分类分支中的卷积特征图进行热力图可视化,该技术能有效识别对分类决策最重要的图像区域。

Evaluation metrics  评估指标

Our main goal is a binary classification task to determine whether a patient has GC. A GRAPE score greater than 0.5 is considered as the ‘high risk’ category for calculating AUC, sensitivity, specificity and accuracy. Furthermore, we access the proportion of GC detection stratified by T stage, TNM stage, lesion location and stomach filling during CT examination.
我们的主要目标是完成二元分类任务,即判断患者是否患有胃癌(GC)。在计算 AUC、灵敏度、特异性和准确率时,将 GRAPE 评分大于 0.5 的病例归类为"高风险"组。此外,我们还评估了按 T 分期、TNM 分期、病变部位及 CT 检查时胃充盈状态分层的胃癌检出比例。

Reader studies  读者研究

The aim of the reader study was to assess the difference in performance between GRAPE and radiologists in detecting GC on noncontrast CT. A total of 13 radiologists were enrolled in this study, comprising 5 senior radiologists and 8 junior radiologists. Details of the radiologists is shown in Extended Data Table 3. The study comprised two sessions. In the first session, GRAPE’s performance was compared with that of radiologists with varying levels of expertise in GC imaging. The second session evaluated GRAPE’s potential to assist radiologists where we provided the radiologists with GRAPE’s prediction in addition to the noncontrast image. A washout period of at least 1 month was maintained between the 2 sessions for each radiologist.
该读者研究的目的是评估 GRAPE 系统与放射科医师在非增强 CT 上检测胃癌的性能差异。本研究共纳入 13 名放射科医师,包括 5 名资深放射科医师和 8 名初级放射科医师。放射科医师的详细信息见扩展数据表 3。研究分为两个阶段:第一阶段将 GRAPE 系统与不同专业水平的胃癌影像诊断医师进行性能比较;第二阶段评估 GRAPE 系统辅助放射科医师的潜力,在提供非增强影像的同时也提供 GRAPE 的预测结果。每位放射科医师在两次评估之间设置了至少 1 个月的洗脱期。

Statistical analysis  统计分析

The performance of the GC and NGC classification was evaluated using the AUC, sensitivity, specificity, positive predictive value and balanced accuracy. Confidence intervals were calculated based on 1,000 bootstrap replications of the corresponding data. The significance comparisons of sensitivity, specificity and balanced accuracy were conducted using permutation tests to calculate two-sided P values with 10,000 permutations. The threshold to determine statistical significance is P < 0.05. Data analysis was conducted in Python using the numpy (v.1.26.2), scipy (v.1.11.4) and scikit-learn (v.1.3.2) packages.
胃癌与非胃癌分类性能通过 AUC 值、灵敏度、特异度、阳性预测值和平衡准确率进行评估。置信区间基于对应数据的 1,000 次自助重复抽样计算得出。采用置换检验进行灵敏度、特异度与平衡准确率的显著性比较,通过 10,000 次置换计算双侧 P 值。统计显著性判定阈值为 P < 0.05。数据分析使用 Python 中的 numpy(v.1.26.2)、scipy(v.1.11.4)和 scikit-learn(v.1.3.2)软件包完成。

Reporting summary  报告摘要

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
有关研究设计的更多信息,请参阅本文链接的《自然》期刊报告摘要。