Abstract 摘要

Background: This report and a companion report describe a validation of the ability of serum proteomic profiling via SELDI-TOF mass spectrometry to detect prostatic cancer. Details of this 3-stage process have been described. This report describes the development of the algorithm and results of the blinded test for stage 1.
背景:本报告及其伴随报告描述了通过 SELDI-TOF 质谱法进行血清蛋白质组学分析以检测前列腺癌的能力的验证。这个 3 阶段过程的细节已经描述。本报告描述了算法的开发以及第一阶段盲测的结果。

Methods: We derived the decision algorithm used in this study from the analysis of serum samples from patients with prostate cancer (n = 181) and benign prostatic hyperplasia (BPH) (n = 143) and normal controls (n = 220). We also derived a validation test set from a separate, geographically diverse set of serum samples from 42 prostate cancer patients and 42 controls without prostate cancer. Aliquots were subjected to randomization and blinded analysis, and data from each laboratory site were subjected to the decision algorithm and decoded.
方法:我们从 181 名前列腺癌患者、143 名良性前列腺增生(BPH)患者和 220 名正常对照中分析的血清样本中得出了本研究中使用的决策算法。我们还从 42 名前列腺癌患者和 42 名无前列腺癌的对照中得出了一个验证测试集,这些样本来自不同地理位置。各样本经过随机化和盲分析,每个实验室的数据都经过决策算法处理并解码。

Results: Using the data collected from the validation test set, the decision algorithm was unsuccessful in separating cancer from controls with any predictive utility. Analysis of the experimental data revealed potential sources of bias.
结果:使用从验证测试集中收集的数据,决策算法在将癌症与对照组分离方面没有任何预测效用。对实验数据的分析揭示了潜在的偏倚来源。

Conclusion: The ability of the decision algorithm to successfully differentiate between prostate cancer, BPH, and control samples using data derived from serum protein profiling was compromised by bias.
结论:决策算法成功区分前列腺癌、BPH 和对照样本的能力,使用从血清蛋白质分析中得出的数据受到偏倚的影响。

Multiple laboratories have reported that the spectral patterns of mass spectrometric examinations of specimens of serum can be used to identify patients with several types of tumors (1)(2)(3)(4)(5)(6)(7). Based on work at Eastern Virginia Medical School (EVMS)1 and the Fred Hutchinson Cancer Research Center, a diagnostic profile of 9 spectral peaks was reported that could be used to identify patients with prostate cancer (PCa) based on evaluation of serum using SELDI-TOF mass spectrometry. These studies suggested that an accuracy >90% in classification of patients with cancer from controls could be expected (8). Such results would be very important because the current screening methods for PCa using serum concentrations of prostate-specific antigen (PSA) do not detect the majority of prostate cancers, including high-grade tumors (9).
多个实验室报告称,血清标本的质谱检查的光谱模式可用于识别患有多种肿瘤类型的患者。根据东弗吉尼亚医学院(EVMS)和弗雷德·哈钦森癌症研究中心的研究,报道了一个包含9个光谱峰的诊断特征,可用于通过SELDI-TOF质谱检测血清来识别患有前列腺癌(PCa)的患者。这些研究表明,可以预期将患癌症患者与对照组进行分类的准确率高于90%。这样的结果将非常重要,因为目前使用前列腺特异性抗原(PSA)血清浓度进行前列腺癌筛查的方法无法检测大多数前列腺癌,包括高级别肿瘤。

Extensive controversy has accompanied the use of mass spectrometric techniques for identification of early cancers (10)(11)(12)(13)(14)(15)(16); there is concern for false discovery and unwarranted generalizability (17) arising from the analysis of data using methods such as SELDI-TOF mass spectrometry and other multiplex assays for the detection of disease (18)(19). In response to such controversy and because of the desire to evaluate the potential clinical utility of such methodologies, the National Cancer Institute Early Detection Research Network (EDRN) decided that a rigorous validation study should be undertaken. The validation study (20) was divided into 3 stages and was specifically targeted at evaluation of the previously published EDRN study for the detection of PCa (1). In stage 1, a group of 6 institutions reported that SELDI-TOF mass spectrometry instruments at separate sites could be accurately standardized over a 3-month period and could be used to accurately classify previously studied PCa patient and control sera using known spectral features (21). Also in stage 1, the intention was to test if the same algorithm could discriminate between cancer and noncancer samples derived from independent and geographically distinct nonoverlapping populations. The development of the original algorithm and the independent testing are the subject of this report. We present this study and a companion report as a model for biomarker validation studies.
广泛的争议伴随着质谱技术用于早期癌症识别的应用;人们担心使用SELDI-TOF质谱和其他多重检测方法分析数据可能导致虚假发现和不合理的普适性。为了回应这种争议并评估这些方法的潜在临床效用,国家癌症研究所早期检测研究网络(EDRN)决定进行严格的验证研究。验证研究分为3个阶段,专门针对先前发表的EDRN用于前列腺癌检测的研究进行评估。在第一阶段,6家机构报告称,在3个月的时间内,不同地点的SELDI-TOF质谱仪可以准确标准化,并且可以使用已知的光谱特征准确分类先前研究过的前列腺癌患者和对照血清。 在第一阶段,也是测试同一算法是否能够区分来自独立且地理上不重叠的人群的癌症和非癌症样本的意图。原始算法的开发和独立测试是本报告的主题。我们将这项研究和一份伴随报告作为生物标志物验证研究的模型呈现。

Materials and Methods 材料和方法

sample selection 样本选择

We identified 194 PCa patients, 216 patients with benign prostatic hyperplasia (BPH), and 1326 healthy control patients from samples collected at EVMS under a protocol approved by the EVMS Institutional Review Board and sent to the EDRN Data Management and Coordinating Center. Some patient samples were excluded for the following reasons: PCa—race not African-American or white (n = 2), age >80 years (n = 4), 1st available sample collected after biopsy (n = 3), and insufficient volume (n = 4); BPH—race not African-American or white (n = 42), age >80 years (n = 28), and insufficient volume (n = 3); control—race not African-American or white (n = 48) and age >80 years (n = 64). All nonexcluded PCa (n = 181) and BPH (n = 143) sera were selected for inclusion in the study. In addition, sera from men with no history of PCa or BPH were identified as normal controls and were selected to match the approximate age and race distribution of men with PCa and BPH. For PCa and BPH disease conditions, the age frequency distribution over 5-year intervals was constructed by race. For each age/race group, controls were selected at random, with frequency matching the larger of the case-specific age/race frequencies. The number of normal controls selected was 220.
我们从EVMS收集的样本中确定了194名前列腺癌患者、216名良性前列腺增生(BPH)患者和1326名健康对照患者,这些样本是根据EVMS机构审查委员会批准的方案收集并发送至EDRN数据管理和协调中心。由于以下原因,一些患者样本被排除在外:前列腺癌——种族不是非洲裔美国人或白人(n = 2)、年龄>80岁(n = 4)、首次可用样本在活检后收集(n = 3)和体积不足(n = 4);BPH——种族不是非洲裔美国人或白人(n = 42)、年龄>80岁(n = 28)和体积不足(n = 3);对照组——种族不是非洲裔美国人或白人(n = 48)和年龄>80岁(n = 64)。所有未被排除的前列腺癌(n = 181)和BPH(n = 143)血清被选定用于研究。此外,还确定了没有前列腺癌或BPH病史的男性血清作为正常对照组,并被选定以匹配前列腺癌和BPH患者的大致年龄和种族分布。对于前列腺癌和BPH疾病情况,按种族构建了每5年间隔的年龄频率分布。 对于每个年龄/种族群组,控制组是随机选择的,频率匹配案例特定年龄/种族频率中较大的那个。选择的正常对照组数量为220。

For the validation test set, 4 institutions provided 84 samples, 42 PCa and 42 normal controls, to be evaluated with the classifiers developed from the training data. Two institutions contributed 14 samples each and 2 contributed 28 samples each. Each institution provided an equal number of PCa and normal control samples. All samples were collected following strict standard operating procedure (20). Balance between PCa and normal control sample collection within each contributing biorepository produced consistent sample collection methods for case and control samples despite imbalance in the number of samples from different centers.
对于验证测试集,4个机构提供了84个样本,42个前列腺癌和42个正常对照样本,用于评估从训练数据中开发的分类器。两个机构各贡献了14个样本,另外两个机构各贡献了28个样本。每个机构提供了相同数量的前列腺癌和正常对照样本。所有样本均按照严格的标准操作程序收集(20)。在每个贡献生物库内部,前列腺癌和正常对照样本的收集保持平衡,尽管来自不同中心的样本数量存在不平衡,但仍保持了一致的样本收集方法。

We divided all samples into 18 aliquots of 30 μL and distributed the aliquots evenly to 6 laboratories having SELDI-TOF mass spectrometry systems that had been optimized/standardized as reported (21). The 6 participating sites were the University of Texas Health Center at San Antonio (CTRC), University of Pittsburgh Cancer Institute (UPCI), Johns Hopkins University Medical Center (JHU), Center for Prostate Disease Research (CPDR), University of Alabama Birmingham (UAB), and EVMS. Each laboratory received 252 aliquots. In addition, 36 aliquots from a pooled reference serum sample were sent to each laboratory for quality control monitoring. Each laboratory spotted the 252 specimen and 36 pooled serum samples on 36 IMAC-3 ProteinChips® (Ciphergen Biosystems), with either 3 case and 4 control or 4 case and 3 control aliquots per chip along with 1 reference sample. The position of each aliquot was randomized separately for each laboratory, so PCa and normal control aliquots occurred in every well with equal probability. Sample replicates (aliquots) were treated as individuals in randomization. Each laboratory ran the 36 chips in 3 bioprocessors. Each laboratory used separate calibrations to convert time-of-flight to mass/charge (m/z) values.
我们将所有样本分成了18份,每份30微升,并将这些样本均匀分配给了6家实验室,这些实验室配备了已经优化/标准化的SELDI-TOF质谱系统(参见文献21)。参与的6个实验室分别是得克萨斯大学圣安东尼奥健康中心(CTRC)、匹兹堡大学癌症研究所(UPCI)、约翰霍普金斯大学医学中心(JHU)、前列腺疾病研究中心(CPDR)、阿拉巴马大学伯明翰分校(UAB)和EVMS。每家实验室收到252份样本。此外,每家实验室还收到了来自混合参考血清样本的36份样本,用于质量控制监测。每家实验室将252个样本和36个混合血清样本分别点在36个IMAC-3 ProteinChips®(Ciphergen Biosystems)上,每个芯片上分别有3个病例和4个对照或4个病例和3个对照样本,以及1个参考样本。每家实验室的每个样本位置都是独立随机分配的,因此PCa和正常对照样本在每个孔中出现的概率相等。样本复制品(样本)在随机化中被视为独立个体。每家实验室在3个生物处理器中运行了36个芯片。 每个实验室都使用单独的校准来将飞行时间转换为质量/电荷(m/z)值。

mass spectrometry analysis
质谱分析

All serum processing steps were performed robotically with a Biomek 2000 Workstation liquid handling robot (Beckman Instruments) exactly as described by Semmes et al. (21). Mass accuracy was calibrated externally using the All-in-1 peptide molecular weight standard (Ciphergen Biosystems).
所有血清处理步骤均由 Biomek 2000 Workstation 液体处理机器人 (Beckman Instruments) 自动执行,完全按照 Semmes 等人的描述进行 (21)。质量精度通过使用 All-in-1 肽分子量标准 (Ciphergen Biosystems) 进行外部校准。

data processing 数据处理

Spectral degradation was apparent for some samples. Before further analysis, all spectra were examined and assessed regarding spectral quality. A logistic regression was fitted with visual assessment of spectrum degradation as the response and 3 spectrum-specific predictor variables: a) standard deviation of spectral values, b) autocorrelation of spectrum values measured by the Durbin-Watson statistic, and c) maximum spectral intensity. The model for spectral degradation yielded specificity 95% for sensitivity 95%. Because the probability model for spectral degradation is based on measurable criteria, we used the modeled probability to exclude spectra from classifier development and testing. Classifying all samples with modeled probability of spectrum degradation greater than P = 0.1 as degraded, 51 spectra (4.53%) were removed from the training data sets and 12 (2.47%) were removed from the test set. In most cases, only a single replicate was eliminated from any sample. However, all replicates for 2 control group samples were excluded, leaving n = 152 control samples for classifier development.
一些样本显示出光谱退化的迹象。在进一步分析之前,所有光谱都经过检查和评估,以确定光谱质量。使用视觉评估光谱退化作为响应变量,拟合了一个逻辑回归模型,包括3个光谱特定的预测变量:a)光谱数值的标准偏差,b)Durbin-Watson统计量测量的光谱数值的自相关性,以及c)最大光谱强度。光谱退化模型的特异性为95%,敏感性为95%。由于光谱退化的概率模型基于可测量的标准,我们使用建模的概率来排除分类器开发和测试中的光谱。将所有具有大于P = 0.1的光谱退化建模概率的样本分类为退化的,从训练数据集中删除了51个光谱(4.53%),从测试集中删除了12个光谱(2.47%)。在大多数情况下,任何样本只排除了一个复制品。然而,两个对照组样本的所有复制品都被排除在外,留下n = 152个对照样本用于分类器开发。

A test data set of 30% of samples in each group was chosen at random, and we used the remaining 70% to construct a disease status classifier. We identified signal peak locations using methods described in Yasui et al. (22). We subjected validation spectral data to baseline subtraction and total ion content normalizations and then sent them to the Data Management and Coordinating Center for subsequent analysis. Peak intensities over the interval m/z (1 ± 0.002) were constructed for all 2570 m/z values returned by the Yasui algorithm. We used a larger window for these secondary test data to account for slight differences between the SELDI-TOF mass spectrometry instruments in the participating laboratories.
每个组中30%的样本被随机选择作为测试数据集,我们使用剩余的70%来构建疾病状态分类器。我们使用了Yasui等人描述的方法来识别信号峰位置。我们对验证光谱数据进行了基线减法和总离子含量归一化处理,然后将其发送到数据管理和协调中心进行后续分析。对由Yasui算法返回的所有2570个m/z值构建了在m/z(1±0.002)间隔上的峰强度。我们对这些次要测试数据使用了更大的窗口,以考虑参与实验室中SELDI-TOF质谱仪之间的轻微差异。

In addition, we used wavelet decomposition of the mass spectrometry signal to identify and measure peak intensities. Wavelet decomposition, which measures local rate of change in intensity rather than local intensity, has been shown to be effective at peak identification on shoulders of dominant peaks (23).
另外,我们使用小波分解质谱信号来识别和测量峰值强度。小波分解测量的是局部强度变化率,而不是局部强度,已被证明在主峰的肩部对峰识别非常有效。

data analysis 数据分析

Using boosted logistic regression, we constructed 2 classifiers separating PCa from normal controls with the training data from EVMS. The first classifier used median peak intensity across replicates within samples at each peak alignment value as candidate predictor variables. To construct the second classifier, peak intensities were ranked within spectra and ranks binned into 100 levels. We computed median bin values across replicates within samples for each aligned m/z value. These median bin values were used as candidate predictor variables for the second classifier. The second method would be more robust if between-spectrum variability were large even after normalization and baseline subtraction.
使用增强逻辑回归,我们利用来自 EVMS 的训练数据构建了 2 个分类器,将 PCa 与正常对照组分开。第一个分类器使用每个峰对齐值处样本内复制品的中位数峰强度作为候选预测变量。为构建第二个分类器,峰强度在光谱内进行了排名,并将排名分成 100 个级别。我们计算了每个对齐 m/z 值的样本内复制品的中位数级别值。这些中位数级别值被用作第二个分类器的候选预测变量。如果即使在归一化和基线减法之后,光谱间变异性仍然很大,第二种方法将更加稳健。

Boosting models were fitted using a 10-fold cross-validation stopping rule. We divided groups into 10 sets each, balanced with respect to the number of observations in each set. For each 10% set, the remaining 90% of the data were used to construct a boosting classifier with K terms (or K iterations), K = 1, 2, 3, . For each of the 10 divisions of the data, we computed the misclassification error rate for the 10% of samples that were not used to construct the classifier and averaged the misclassification error rate over the 10 sets at every boosting iteration. The average misclassification error rate across the 10 cross-validation sets was used to determine the number of iterations (M) necessary to achieve a best model. The boosted logistic regression model was then computed employing all data in the training set, stopping after M iterations.
使用一个10折交叉验证的停止规则来拟合增强模型。我们将组分成10个集合,每个集合中的观测数量保持平衡。对于每个10%的集合,剩下的90%数据被用来构建一个包含K项(或K次迭代)的增强分类器,K = 1, 2, 3, ... 对于数据的每一个10个划分,我们计算了未用于构建分类器的10%样本的误分类率,并在每次增强迭代中对10个集合的平均误分类率进行了求平均。在10个交叉验证集合上的平均误分类率被用来确定达到最佳模型所需的迭代次数(M)。之后,使用训练集中的所有数据来计算增强逻辑回归模型,并在M次迭代后停止。

For evaluation, we scored the 30% test set data using both classifiers, with positive values of the boosting linear predictor indicating cases and negative values indicating controls. For the 30% test set, we computed sensitivity and specificity of the classifier. The validating test data were scored with the same classifiers. We also examined the sensitivity and specificity for data generated from each laboratory.
为了评估,我们使用两个分类器对 30%的测试集数据进行评分,其中提升线性预测器的正值表示病例,负值表示对照。对于 30%的测试集,我们计算了分类器的敏感性和特异性。验证测试数据也使用相同的分类器进行评分。我们还检查了从每个实验室生成的数据的敏感性和特异性。

Results 结果

construction of classifier for discriminating prostate cancer
用于区分前列腺癌的分类器构建

After peak selection, we subjected the data to 2 classifier development approaches: the first used median peak intensities and the second used median binned peak ranks. The boosting cross-validation error rates decreased through 3 iterations, with a final cross-validation error rate of 25% for the classifier constructed from median peak intensities. The experimental m/z values for the 3 peaks included in the classifier were 7775.93, 3651.38, and 3246.57, listed in order of entry into the classifier. In cross-validation, we observed sensitivity 71% and specificity 79%. The classifier constructed from median binned peak ranks required only 2 iterations to achieve a minimum cross-validation error rate of 23%. The m/z values for the 2 peaks included in the classifier were 5943.44 and 3449.77. Cross-validation of this set yielded sensitivity 63% and specificity 89%. It was expected that the cross-validation error rates were somewhat optimistic. When the 2 classifiers were used to predict status in the 30% test data set, the misclassification error rates were 27% and 28%, respectively. For the classifier constructed using median intensities, we observed sensitivity 59% and specificity 85%. The classifier constructed from median binned peak ranks had sensitivity 57% and specificity 82%.
在峰值选择之后,我们将数据应用于2种分类器开发方法:第一种使用中位数峰值强度,第二种使用中位数分箱峰值排名。通过3次迭代,增强交叉验证错误率降低,最终中位数峰值强度构建的分类器的交叉验证错误率为25%。分类器中包含的3个峰值的实验m/z值分别为7775.93、3651.38和3246.57,按进入分类器的顺序列出。在交叉验证中,我们观察到敏感性为71%,特异性为79%。从中位数分箱峰值排名构建的分类器只需2次迭代即可实现最低的交叉验证错误率为23%。分类器中包含的2个峰值的m/z值分别为5943.44和3449.77。对此集合进行的交叉验证得到的敏感性为63%,特异性为89%。预计交叉验证错误率有些乐观。当这2个分类器用于预测30%的测试数据集中的状态时,误分类错误率分别为27%和28%。 对于使用中位数强度构建的分类器,我们观察到敏感性为 59%,特异性为 85%。使用中位数分箱峰值排名构建的分类器具有 57%的敏感性和 82%的特异性。

analysis of data for sources of bias
源偏差数据分析

Postexperimental analysis can reveal hidden bias by evaluating the overall detail of collected data. To identify potential bias, we constructed spectral intensity heat maps with spectra arranged with respect to sample characteristics such as case status and specimen collection date within case status. In this analysis, we observed differences in mass spectroscopy profiles between prostate cancer cases collected before and after 1996. Heat maps of the mass spectroscopy profiles around primary peaks 7775.93 and 5943.44, which were dominant features for classification, suggested that the PCa cases collected before 1996 have considerably different spectral profiles from those collected in 1996 or later (Fig. 1A and B). For the 1st peaks that enter into each classifier, PCa cases collected after 1996 appear to have spectral profiles more similar to normal control samples, which were all collected after 1995. We also observed that overall higher intensities were associated with normal control and recent PCa cases compared with older PCa cases. The fact that older cases contained lower intensities and were all cancers was strong evidence of sample bias. Interestingly, this was not the case for the secondary peaks (Fig. 1C and D). Peaks that entered the classifiers in the 2nd or 3rd boosting iteration exhibited little difference between PCa samples collected before and after 1996. When we compiled all potential confounding aspects of the sample collection (see Table 1 ), we uncovered some disparities in time of storage (reflected as date of collection) and the number of freeze-thaws.
后实验分析可以通过评估收集数据的整体细节来揭示隐藏的偏见。为了识别潜在的偏见,我们构建了光谱强度热图,其中光谱根据样本特征(如病例状态和病例状态内的标本收集日期)进行排列。在这项分析中,我们观察到1996年之前和之后收集的前列腺癌病例之间的质谱分析谱图存在差异。质谱分析谱图围绕主要峰值7775.93和5943.44的热图,这些峰值是分类的主要特征,表明1996年之前收集的PCa病例与1996年或之后收集的病例具有明显不同的光谱特征(图1A和B)。对于进入每个分类器的第一个峰值,1996年之后收集的PCa病例似乎具有更类似于正常对照样本的光谱特征,这些样本都是在1995年之后收集的。我们还观察到,与较老的PCa病例相比,正常对照和最近的PCa病例的整体强度更高。较老病例包含较低的强度且全部为癌症的事实是样本偏见的强有力证据。 有趣的是,对于次要峰值来说情况并非如此(图1C和D)。进入分类器的峰值在第二次或第三次增强迭代中的PCa样本中,收集日期在1996年之前和之后表现出很小的差异。当我们编制了样本收集的所有潜在混杂因素(参见表1)后,我们发现了储存时间(以收集日期反映)和冻融次数方面的一些差异。

Figure 1. 图1。
Serum spectra profiles in the vicinity of decision peaks.
Open in new tabDownload slide
在新标签页中打开 下载幻灯片

Serum spectra profiles in the vicinity of decision peaks.
决策峰附近的血清光谱特征。

A. Primary peak for the classifier based on median intensities. B. Primary peak for the classifier based on median binned rank intensities. C. Secondary peak for the classifier based on median intensities. D. Secondary peak for the classifier based on median binned rank intensities. The arrows indicate the peak of interest. PCa and normal control serum specimens with early and late collection periods for the PCa specimens are indicated. Specimens are ordered by collection date within each group.
A. 基于中位数强度的分类器的主峰。B. 基于中位数分箱排名强度的分类器的主峰。C. 基于中位数强度的分类器的次峰。D. 基于中位数分箱排名强度的分类器的次峰。箭头指示兴趣的峰值。PCa 和正常对照血清标本以及 PCa 标本的早期和晚期采集期间被指示。在每组中按采集日期排序标本。

Figure 2. 图2。
ROC curves for the study A median intensity boosting classifier based on peak intensities obtained using the Yasui method for predicting prostate cancer status in 42 PCa and 42 normal control serum specimens collected from 4 biorepositories and processed by SELDI-TOF-MS instruments at 6 EDRN laboratories: EVMS (A), UAB (B), CTRC (C), CPDR (D), UPCI (E), and JHU (F).
Open in new tabDownload slide
在新标签中打开下载幻灯片

ROC curves for the study A median intensity boosting classifier based on peak intensities obtained using the Yasui method for predicting prostate cancer status in 42 PCa and 42 normal control serum specimens collected from 4 biorepositories and processed by SELDI-TOF-MS instruments at 6 EDRN laboratories: EVMS (A), UAB (B), CTRC (C), CPDR (D), UPCI (E), and JHU (F).
基于 Yasui 方法获得的峰值强度的中位强度增强分类器的 ROC 曲线,用于预测从 4 个生物库收集的 42 个前列腺癌和 42 个正常对照血清标本的前列腺癌状态,并由 6 个 EDRN 实验室的 SELDI-TOF-MS 仪器处理:EVMS(A)、UAB(B)、CTRC(C)、CPDR(D)、UPCI(E)和 JHU(F)。

Table 1.

Known characteristics of the serum specimens used for training the classifier.


表1. 用于训练分类器的血清样本的已知特征。
Dx Group 诊断组
Characteristic 特征Normal 正常PCaBPH
n220181143
Mean age, years (SD) 平均年龄,年(标准差)62.5 (6.6)60.7 (6.7)65.8 (5.8)
Year blood collected, % 采集年份,%
1980–19890.08.30.0
1990–19951.452.590.2
1996–200198.639.29.8
Race, %
White82.779.693.0
African-American 非裔美国人17.320.47.0
Neoadjuvant treatment, % 新辅助治疗,%
0100.085.699.3
10.011.60.0
Data missing0.02.80.7
Source, %
DTU Clinic0.088.40.0
Sentara Hospital 圣塔拉医院100.08.894.4
Data missing0.02.85.6
Estimated freeze-thaw, % 估计的冻融,%
1 or 263.225.457.3
>235.468.722.2
Data missing1.45.920.4
Dx Group
CharacteristicNormalPCaBPH
n220181143
Mean age, years (SD)62.5 (6.6)60.7 (6.7)65.8 (5.8)
Year blood collected, %
1980–19890.08.30.0
1990–19951.452.590.2
1996–200198.639.29.8
Race, %
White82.779.693.0
African-American17.320.47.0
Neoadjuvant treatment, %
0100.085.699.3
10.011.60.0
Data missing0.02.80.7
Source, %
DTU Clinic0.088.40.0
Sentara Hospital100.08.894.4
Data missing0.02.85.6
Estimated freeze-thaw, %
1 or 263.225.457.3
>235.468.722.2
Data missing1.45.920.4
Table 1.

Known characteristics of the serum specimens used for training the classifier.


表1. 用于训练分类器的血清样本的已知特征。
Dx Group 诊断组
Characteristic 特征Normal 正常PCaBPH
n220181143
Mean age, years (SD) 平均年龄,年(标准差)62.5 (6.6)60.7 (6.7)65.8 (5.8)
Year blood collected, % 采集年份,%
1980–19890.08.30.0
1990–19951.452.590.2
1996–200198.639.29.8
Race, %
White82.779.693.0
African-American 非裔美国人17.320.47.0
Neoadjuvant treatment, % 新辅助治疗,%
0100.085.699.3
10.011.60.0
Data missing0.02.80.7
Source, %
DTU Clinic0.088.40.0
Sentara Hospital 圣塔拉医院100.08.894.4
Data missing0.02.85.6
Estimated freeze-thaw, % 估计的冻融,%
1 or 263.225.457.3
>235.468.722.2
Data missing1.45.920.4
Dx Group
CharacteristicNormalPCaBPH
n220181143
Mean age, years (SD)62.5 (6.6)60.7 (6.7)65.8 (5.8)
Year blood collected, %
1980–19890.08.30.0
1990–19951.452.590.2
1996–200198.639.29.8
Race, %
White82.779.693.0
African-American17.320.47.0
Neoadjuvant treatment, %
0100.085.699.3
10.011.60.0
Data missing0.02.80.7
Source, %
DTU Clinic0.088.40.0
Sentara Hospital100.08.894.4
Data missing0.02.85.6
Estimated freeze-thaw, %
1 or 263.225.457.3
>235.468.722.2
Data missing1.45.920.4

evaluation of classification robustness
分类鲁棒性评估

Similarity in secondary peak intensity values between pre- and post-1996 PCa samples suggests there might be some ability to discriminate between PCa and normal control samples in the independent 84-sample test set collected from the 4 biorepositories. Therefore, we performed all subsequent analysis both with and without the pre-1996 data. We will refer to the initial data set as study A and after removal of the pre-1996 spectra as study B. We performed the same classifier construction approaches on study A and study B; because of space limitations, the results of study B are included as supplemental data. Fig. 2 displays ROC curves showing the utility of the classifier constructed from median intensities in predicting cancer status in study A. For Pittsburgh, the best point along the ROC curve produces 58.3% correct classification. Both EVMS and CTRC achieve 67.9% correct prediction at the best point along the ROC curve. Across the 6 laboratories, the average maximum correct prediction probability is 62.8%. The median intensity classifier from study A has significant ability to predict cancer status only for the 2 laboratories EVMS and CTRC. ROC curves for the median binned rank classifier approach in study A (see Supplemental Data Fig. 2) demonstrate similar classifier function, with a mean across the 6 laboratories of 64.6%.
1996年前后的前列腺癌样本中次要峰值强度的相似性表明,可能存在一定能力区分来自4个生物库的独立84个样本测试集中的前列腺癌和正常对照样本。因此,我们对所有后续分析均进行了包括和不包括1996年前数据的处理。我们将初始数据集称为研究A,去除1996年前光谱后的数据称为研究B。我们对研究A和研究B采用相同的分类器构建方法;由于空间限制,研究B的结果作为补充数据包含在内。图2显示了ROC曲线,展示了由中位强度构建的分类器在预测研究A中的癌症状态方面的效用。对于匹兹堡,ROC曲线上最佳点产生58.3%的正确分类。EVMS和CTRC在ROC曲线上最佳点处均实现了67.9%的正确预测。在6个实验室中,平均最大正确预测概率为62.8%。研究A中的中位强度分类器仅对EVMS和CTRC这两个实验室具有显著的预测癌症状态的能力。 在研究 A 中,中位数分箱排名分类器方法的 ROC 曲线(参见补充数据图 2)显示出类似的分类器功能,6 个实验室的平均值为 64.6%。

ROC curves constructed for the 4 classifiers obtained when we restrict sample collection to the post-1996 time period indicate no improvement in predictive utility for the models tested, except for 1 model employing median binned rank intensity values for peak locations and peak intensities measured through wavelet detail functions (see Supplemental Data Fig. 3).
当我们将样本收集限制在 1996 年后的时间段时,构建的 4 个分类器的 ROC 曲线表明,在经过测试的模型中,除了 1 个模型使用中位数分箱排名强度值来测量波形细节函数中的峰位置和峰强度外,预测效用没有改善(参见补充数据图 3)。

multilaboratory testing of the classifier; agreement between laboratories
分类器的多实验室测试;实验室之间的一致性

We next examined the across-laboratory agreement for each classifier as applied to the test set. Again, we analyzed the data with and without the pre-1996 data (see Supplemental Data Tables 1 and 2). Laboratory agreement among the 6 sites in predicting case status is shown in Table 2 . Agreement exceeds 80% in all but 1 instance (agreement between laboratories at JHU and CPDR was 78.6%). For the median intensity classifier, the association of cancer status prediction across laboratories was significant at P <0.05 (Fisher exact χ2). There was significant association at P <0.05 for the prediction of cancer status across all but 2 laboratory pairs (UAB with CTRC and UAB with JHU) when using the median binned peak ranks classifier. The high agreement between laboratories is confounded by the poor predictive ability of the models, which places constraints on the number of samples for which the prediction can differ. Both classifiers predicted the majority of samples as controls (see Table 3 ).
我们接着检查了每个分类器在测试集上的跨实验室一致性。同样,我们分析了包括和不包括1996年之前数据的数据(见补充数据表1和2)。在预测病例状态方面,6个实验室之间的一致性显示在表2中。在除1例外的所有情况下,一致性均超过80%(JHU和CPDR实验室之间的一致性为78.6%)。对于中位强度分类器,实验室间癌症状态预测的关联在P <0.05(Fisher exact χ)。当使用中位数分箱峰值排名分类器时,除了UAB与CTRC和UAB与JHU之间的两个实验室对之外,所有实验室对之间的癌症状态预测在P <0.05上有显著关联。实验室之间的高度一致性受到模型预测能力差的影响,这限制了预测可能不同的样本数量。两个分类器都将大多数样本预测为对照组(见表3)。

Table 2.

Percent agreement between sites in classification of 84 phase 1C samples.


84 相位 1C 样本分类中各站点之间的百分比一致性表。
Site 网站Site 网站
EVMS 电子投票管理系统UABCTRC 电子商务监管中心CPDR 消费者个人数据保护法UPCI 统一产品编码JHU
EVMS97.61 (90.9)100.01 (93.1)81.02 (75.4)97.61 (90.9)81.02 (75.4)
UAB85.71 (78.0)97.61 (90.9)83.31 (74.1)95.21 (88.8)81.01 (74.1)
CTRC82.11 (71.4)84.5 (81.2)81.02 (75.4)97.61 (90.9)81.02 (75.4)
CPDR86.91 (71.4)89.31 (81.2)85.71 (73.8)83.31 (74.1)78.61 (65.0)
UPCI88.11 (67.7)81.02 (75.9)82.11 (69.7)86.91 (69.7)81.01 (74.1)
JHU86.91 (74.3)89.3 (85.5)88.11 (77.1)90.51 (77.1)86.91 (72.4)
SiteSite
EVMSUABCTRCCPDRUPCIJHU
EVMS97.61 (90.9)100.01 (93.1)81.02 (75.4)97.61 (90.9)81.02 (75.4)
UAB85.71 (78.0)97.61 (90.9)83.31 (74.1)95.21 (88.8)81.01 (74.1)
CTRC82.11 (71.4)84.5 (81.2)81.02 (75.4)97.61 (90.9)81.02 (75.4)
CPDR86.91 (71.4)89.31 (81.2)85.71 (73.8)83.31 (74.1)78.61 (65.0)
UPCI88.11 (67.7)81.02 (75.9)82.11 (69.7)86.91 (69.7)81.01 (74.1)
JHU86.91 (74.3)89.3 (85.5)88.11 (77.1)90.51 (77.1)86.91 (72.4)

Agreement on boosted logistic regression classifier fitted to peak intensities above the diagonal. Values below the diagonal are agreement based on boosted logistic regression classifier employing peak rank values. Expected probability of agreement in parentheses.
对峰值强度进行增强逻辑回归分类器的一致性,超过对角线。对角线以下的数值是基于采用峰值排名数值的增强逻辑回归分类器的一致性。预期一致性概率(括号内)。

1

P <0.01,

2

P <0.05.

Table 2.

Percent agreement between sites in classification of 84 phase 1C samples.


84 相位 1C 样本分类中各站点之间的百分比一致性表。
Site 网站Site 网站
EVMS 电子投票管理系统UABCTRC 中央电视台CPDR 中国人民解放军UPCI 中国人民银行JHU
EVMS97.61 (90.9)100.01 (93.1)81.02 (75.4)97.61 (90.9)81.02 (75.4)
UAB85.71 (78.0)97.61 (90.9)83.31 (74.1)95.21 (88.8)81.01 (74.1)
CTRC82.11 (71.4)84.5 (81.2)81.02 (75.4)97.61 (90.9)81.02 (75.4)
CPDR86.91 (71.4)89.31 (81.2)85.71 (73.8)83.31 (74.1)78.61 (65.0)
UPCI88.11 (67.7)81.02 (75.9)82.11 (69.7)86.91 (69.7)81.01 (74.1)
JHU86.91 (74.3)89.3 (85.5)88.11 (77.1)90.51 (77.1)86.91 (72.4)
SiteSite
EVMSUABCTRCCPDRUPCIJHU
EVMS97.61 (90.9)100.01 (93.1)81.02 (75.4)97.61 (90.9)81.02 (75.4)
UAB85.71 (78.0)97.61 (90.9)83.31 (74.1)95.21 (88.8)81.01 (74.1)
CTRC82.11 (71.4)84.5 (81.2)81.02 (75.4)97.61 (90.9)81.02 (75.4)
CPDR86.91 (71.4)89.31 (81.2)85.71 (73.8)83.31 (74.1)78.61 (65.0)
UPCI88.11 (67.7)81.02 (75.9)82.11 (69.7)86.91 (69.7)81.01 (74.1)
JHU86.91 (74.3)89.3 (85.5)88.11 (77.1)90.51 (77.1)86.91 (72.4)

Agreement on boosted logistic regression classifier fitted to peak intensities above the diagonal. Values below the diagonal are agreement based on boosted logistic regression classifier employing peak rank values. Expected probability of agreement in parentheses.
对峰值强度进行增强逻辑回归分类器的一致性,超过对角线。对角线以下的数值是基于采用峰值排名数值的增强逻辑回归分类器的一致性。预期一致性概率(括号内)。

1

P <0.01,

2

P <0.05.

Table 3.

Marginal probability expressed as a percent (number of serum specimens) classified as prostate cancer of 84 samples split equally between case and control.


将 84 个样本平均分为病例组和对照组,作为前列腺癌分类的边际概率表达为百分比(血清标本数量)。
Model 模型Site 网站
EVMS 价值管理体系UABCTRC 中央电视台CPDR 中国人民解放军UPCI 中国人民银行JHU
Real boosting with intensities
强度实际提升
3.6 (3)6.0 (5)3.6 (3)22.6 (19)6.0 (5)22.6 (19)
Real boosting with ranked peaks
具有排名峰值的真实增强
19.0 (16)4.8 (4)15.5 (13)21.4 (18)15.5 (13)10.7 (9)
ModelSite
EVMSUABCTRCCPDRUPCIJHU
Real boosting with intensities3.6 (3)6.0 (5)3.6 (3)22.6 (19)6.0 (5)22.6 (19)
Real boosting with ranked peaks19.0 (16)4.8 (4)15.5 (13)21.4 (18)15.5 (13)10.7 (9)

The classifiers were constructed from training data sets.
分类器是从训练数据集构建的。

Table 3.

Marginal probability expressed as a percent (number of serum specimens) classified as prostate cancer of 84 samples split equally between case and control.


将 84 个样本平均分为病例组和对照组,作为前列腺癌分类的边际概率表达为百分比(血清标本数量)。
Model 模型Site 网站
EVMS 电子投票管理系统UABCTRC 中央电视台CPDR 中国人民解放军UPCI 中国人民银行JHU
Real boosting with intensities
强度实际提升
3.6 (3)6.0 (5)3.6 (3)22.6 (19)6.0 (5)22.6 (19)
Real boosting with ranked peaks
具有排名峰值的真实增强
19.0 (16)4.8 (4)15.5 (13)21.4 (18)15.5 (13)10.7 (9)
ModelSite
EVMSUABCTRCCPDRUPCIJHU
Real boosting with intensities3.6 (3)6.0 (5)3.6 (3)22.6 (19)6.0 (5)22.6 (19)
Real boosting with ranked peaks19.0 (16)4.8 (4)15.5 (13)21.4 (18)15.5 (13)10.7 (9)

The classifiers were constructed from training data sets.
分类器是从训练数据集构建的。

interstudy analysis for the presence of m/z peaks that display consistent discriminatory value
对显示出一致歧视价值的 m/z 峰的跨研究分析

The classifiers constructed for the training data may select candidate predictors that are not truly predictive (type I error) and may fail to select markers that are predictive (type II error). Without a priori knowledge regarding which classifiers to investigate among more than a thousand candidates, the probability of committing either type I or type II error is high. Accordingly, we investigated the predictive utility in the validation study data for a set of candidate markers with the best marginal predictive utility in the training data.
为训练数据构建的分类器可能选择并非真正具有预测能力的候选预测因子(I型错误),并可能未能选择具有预测能力的标记(II型错误)。在超过一千个候选项中没有关于要调查哪些分类器的先验知识,可能会高概率地发生I型或II型错误。因此,我们在验证研究数据中调查了一组在训练数据中具有最佳边际预测效用的候选标记的预测效用。

We plotted training data prediction error rates against validation study error rates, allowing sample-specific cut points for classifying diseased and nondiseased groups. Markers that have inconsistent effect-direction (e.g., mean in the PCa group is higher than mean in the normal controls in the training data but lower in the data of the confirmation study) are plotted in red (Fig. 3 ). Markers that have consistent effect-direction are plotted in black. We examined 96 candidate markers with the best predictive utility in the training data. To account for differential sample selection and laboratory effects, sample-specific cut points were allowed for separating PCa from normal controls. In addition to plotting consistent and inconsistent markers with different color symbols, symbol size is constructed proportional to marker mean intensity in the training data (Fig. 3 ).
我们将训练数据的预测误差率与验证研究的误差率进行绘图,允许针对分类患病和非患病群体的样本特定切点。具有不一致效应方向的标记(例如,在训练数据中,PCa 组的均值高于正常对照组的均值,但在确认研究数据中较低)以红色绘制(图 3)。具有一致效应方向的标记以黑色绘制。我们在训练数据中检查了具有最佳预测效用的 96 个候选标记。为了考虑不同样本选择和实验室效应,允许样本特定的切点来区分 PCa 和正常对照组。除了用不同颜色符号绘制一致和不一致的标记外,符号大小还与训练数据中标记均值强度成比例(图 3)。

Figure 3. 图 3。
Classification error rate in the validation data for 96 peak locations with the lowest classification error rate in the Prostate 2002 training data.
Open in new tabDownload slide
在新标签页中打开 下载幻灯片

Classification error rate in the validation data for 96 peak locations with the lowest classification error rate in the Prostate 2002 training data.
在前列腺 2002 年训练数据中,96 个峰值位置的验证数据中的分类错误率最低。

Mean peak intensity in the training data is indicated by the size of each bubble, with higher intensity peaks having larger bubbles. Black bubbles indicate peak features where the effect direction is consistent between training and validation data (cases have higher intensity than controls in both data sets or cases have lower intensity than controls in both data sets). Red bubbles indicate peaks where the effect direction is inconsistent.
训练数据中的平均峰值强度由每个气泡的大小表示,峰值强度较高的气泡较大。黑色气泡表示在训练和验证数据之间效果方向一致的峰值特征(病例在两个数据集中的强度均高于对照组,或病例在两个数据集中的强度均低于对照组)。红色气泡表示效果方向不一致的峰值。

Among the 96 markers with best predictive utility, the number of markers with inconsistent effect-direction was approximately 15 (EVMS 14, UAB 15, CPDR 14, CTRC 18, UPITT 15, JHU 15). If we assume that results for the 6 laboratories in the validation sample are independent of one another, the number of markers that have 5 or 6 consistent observations across the validation laboratories have an expected value of 10.5 (χ2 458.77, df=1, P <0.0001). We observed 76 markers with 5 (n = 11) or 6 (n = 65) consistent observations that were significantly different from the expected value of 10.5. Assuming that 5 of 6 laboratories with consistent direction is indicative of a consistent effect, observed consistency is still much greater than chance (χ2 32.67, df=1, P <0.0001). Among the 54 markers with best performance in the training data, 53 were consistent across 5 or 6 laboratories. The number of markers that were inconsistent across training and evaluation data sets was between 1 and 2 (EVMS 0, UAB 3, CPDR 1, CTRC 2, UPITT 1, JHU 1).
在具有最佳预测效用的 96 个标记物中,具有一致效应方向的标记物数量约为 15 个(EVMS 14,UAB 15,CPDR 14,CTRC 18,UPITT 15,JHU 15)。如果我们假设验证样本中的 6 个实验室的结果彼此独立,那么在验证实验室中具有 5 个或 6 个一致观察结果的标记物的期望值为 10.5(χ 2 458.77,df=1,P <0.0001)。我们观察到 76 个标记物具有与期望值 10.5 显著不同的 5 个(n=11)或 6 个(n=65)一致观察结果。假设有 5 个一致方向的 6 个实验室可以表明一致效应,观察到的一致性仍远远大于偶然事件(χ 2 32.67,df=1,P <0.0001)。在训练数据中表现最佳的 54 个标记物中,有 53 个在 5 个或 6 个实验室中一致。在训练和评估数据集之间不一致的标记物数量在 1 到 2 之间(EVMS 0,UAB 3,CPDR 1,CTRC 2,UPITT 1,JHU 1)。

Discussion 讨论

The algorithm described in this study is identical to the one used in our early analysis of analytical reproducibility (21). In that study, we demonstrated that the algorithm could correctly differentiate between PCa and control samples when those samples were derived from the same patient cohort used to develop the algorithm. Over the intervening time period, we maintained instrument output optimization by weekly calibration to 3 serum reference peaks as described (21). Thus, the low probability of correctly predicting prostate cancer among the 84 test set samples indicates that the initial samples collected for training the classifier differed markedly from the 84 samples collected for evaluating consistency of case assignment. Further investigation of the samples collected for developing the classifier revealed that the collection period for PCa samples was considerably different from the collection period for normal control samples. Of the 127 PCa samples used to construct the classifier, 78 (61.4%) were collected before 1996. In contrast, only 1 of the normal control samples (0.7%) was collected before 1996. This storage bias was obvious in retrospect. However, when designing discovery studies aimed at distinguishing between early less aggressive cancer vs late/aggressive cancer, this storage bias may be difficult to avoid. Specifically, in the case of prostate cancer, the incidence of advanced disease (Gleason >8) has dramatically reduced to the point that it is difficult to accrue significant numbers of high-grade disease at a single clinical site. Other disease models can be expected to present different but equally cryptic challenges as clinical treatment options change.
本研究中描述的算法与我们早期对分析可重复性的研究中使用的算法相同。在那项研究中,我们证明了当样本来自用于开发算法的同一患者队列时,该算法能够正确区分PCa和对照样本。在此期间,我们通过每周校准到3个血清参考峰来维持仪器输出的优化。因此,在84个测试集样本中正确预测前列腺癌的低概率表明,用于训练分类器的初始样本与用于评估病例分配一致性的84个样本之间存在明显差异。对用于开发分类器的样本进行进一步调查后发现,PCa样本的收集期与正常对照样本的收集期有很大不同。用于构建分类器的127个PCa样本中,有78个(61.4%)是在1996年之前收集的。相比之下,只有1个正常对照样本(0.7%)是在1996年之前收集的。这种存储偏差在事后显而易见。 然而,当设计旨在区分早期低侵袭性癌症与晚期/侵袭性癌症的发现研究时,这种存储偏差可能难以避免。具体而言,在前列腺癌的情况下,晚期疾病(Gleason >8)的发病率已经显著降低,以至于在单个临床研究点难以积累大量高级别疾病。随着临床治疗选择的改变,其他疾病模型可能会出现不同但同样难以捉摸的挑战。

Our analysis uncovered possible sources of storage time variability that arose from different collection protocols. Specifically, many samples collected before 1996 were derived from in-house PSA testing conducted at EVMS. In the intervening years, the standardization of laboratory tests for PSA resulted in a reduction in the number of serum samples derived from testing in-house. This change resulted in the reduction of freeze-thaw cycles in the majority of samples post-1996 from 1 cycle to 0 cycles. Also of critical impact to this study was the decline in the numbers of patients with advanced vs early PCa, a trend being experienced wherever PSA testing is aggressively employed. Thus, researchers can expect that increased demand for sample numbers will be especially affected by decreasing sample availability and variations in the storage process. As we learn more about the ideal conditions for both collection and storage of samples, even more changes may be introduced, and these changes might introduce new bias. These are critical issues often overlooked in the biomarker discovery process that are likely to be the single greatest reason most biomarker discoveries fail to be validated.
我们的分析揭示了可能的存储时间变异性来源,这源于不同的采集协议。具体来说,1996年之前收集的许多样本是来自于EVMS进行的内部PSA测试。在随后的几年里,PSA实验室测试的标准化导致了来自内部测试的血清样本数量减少。这一变化导致了1996年后大多数样本的冻融循环从1次减少到0次。对这项研究影响至关重要的另一个因素是晚期与早期前列腺癌患者数量的下降,这是在PSA测试被积极采用的地方普遍出现的趋势。因此,研究人员可以预期,对样本数量的增加需求将受到样本供应减少和存储过程变异的影响。随着我们对样本采集和存储的理想条件了解得更多,可能会引入更多的变化,这些变化可能会引入新的偏见。 这些是生物标志物发现过程中经常被忽视的关键问题,很可能是导致大多数生物标志物发现无法得到验证的最主要原因。

Differences associated with sample age may result from serum degradation associated with storage time or freeze-thaws. As has been described before, these and other preanalytical variables can greatly affect the peptide/protein content of samples (24)(25)(26)(27)(28)(29)(30). At peaks used in the 2 classifiers developed for this study, PCa samples collected after 1996 appeared to be more similar to normal controls. Also of consideration is the possibility that clinical protocols such as a decision to collect serum after voiding or fasting may become standardized without notification to the research group. A particularly insidious issue is the struggle with “improving” or “standardizing” collection protocol based on knowledge gained regarding sample stability. Although a close relationship between the clinic and the research laboratory helps reduce some concerns, even careful practice cannot prevent variability. Thus, it is recommended that poststudy data analysis such as was performed here be an integral component of biomarker discovery and validation. In fact, the collective experiences of many laboratories leading human biomarker discovery efforts has led to calls for both experimental standards (31) and uniformity in sample preparation (32).
与样本年龄相关的差异可能源于与储存时间或冻融相关的血清降解。正如之前所描述的,这些和其他的前分析变量可以极大地影响样本的肽/蛋白质含量。在为本研究开发的两个分类器中使用的峰值中,1996年后收集的前列腺癌样本似乎更类似于正常对照组。还需考虑的是,临床协议的可能性,例如决定在排尿或禁食后收集血清,可能会在未通知研究小组的情况下变得标准化。一个特别隐匿的问题是基于对样本稳定性的认识而努力“改进”或“标准化”收集协议。尽管临床和研究实验室之间的密切关系有助于减少一些担忧,但即使小心实践也无法防止变异。因此,建议像这里执行的后研究数据分析成为生物标志物发现和验证的一个组成部分。 实际上,许多实验室在人类生物标志物发现工作中的集体经验已经促使人们呼吁制定实验标准(31)和样品准备的统一性(32)。

Because our global analysis of the data suggested potential discriminating elements in the post-1996 data, we decided to construct a new decision algorithm. Additionally, our study population identified for stage 2 of the overall validation process was derived from multiple laboratory sites, imposing stricter sample storage criteria including a requirement for post-2002 collection date. Accordingly, such a study was initiated and these results are reported in the companion article.
由于我们对数据的全球分析表明了 1996 年后数据中潜在的区分元素,我们决定构建一个新的决策算法。此外,我们为整个验证过程的第 2 阶段确定的研究人群来自多个实验室,强制执行更严格的样品存储标准,包括要求采集日期在 2002 年后。因此,这样的研究得以启动,并这些结果在相关文章中报告。

Grant/funding Support: This work was supported by the Early Detection Research Network, National Cancer Institute Grant CA084986 (to O.J.S.) and Grants CA86402 (to I.M.T.), CA84968 (to W.L.B), CA86368 (to Z.F.), CA86359 (to W.E.G.), CA85067 (to O.J.S.), and CA86323 (to A.P.).
资助/资金支持:本工作得到了早期检测研究网络、国家癌症研究所授予的 CA084986 号资助(给 O.J.S.),以及 CA86402 号资助(给 I.M.T.),CA84968 号资助(给 W.L.B),CA86368 号资助(给 Z.F.),CA86359 号资助(给 W.E.G.),CA85067 号资助(给 O.J.S.),以及 CA86323 号资助(给 A.P.)。

Financial Disclosures: None declared.
金融披露:未声明。

1

Nonstandard abbreviations: EVMS, Eastern Virginia Medical School; PCa, prostate cancer; PSA, prostate-specific antigen; EDRN, National Cancer Institute Early Detection Research Network; BPH, benign prostatic hyperplasia; CTRC, University of Texas Health Center at San Antonio; UPCI, University of Pittsburgh Cancer Institute; JHU, Johns Hopkins University Medical Center; CPDR, Center for Prostate Disease Research; UAB, University of Alabama Birmingham.
非标准缩写:EVMS,东弗吉尼亚医学院;PCa,前列腺癌;PSA,前列腺特异性抗原;EDRN,国家癌症研究所早期检测研究网络;BPH,良性前列腺增生;CTRC,圣安东尼奥德克萨斯州卫生中心;UPCI,匹兹堡癌症研究所;JHU,约翰霍普金斯大学医学中心;CPDR,前列腺疾病研究中心;UAB,阿拉巴马大学伯明翰分校。

References 参考文献

1

Adam BL, Qu Y, Davis JW, Ward MD, Clements MA, Cazares LH, et al. Serum protein fingerprinting coupled with a pattern-matching algorithm distinguishes prostate cancer from benign prostate hyperplasia and healthy men.

Cancer Res
IF 11.2SCIEJCI 1.93Q1C1Top
2002
;
62
:
3609
-3614.
Adam BL,Qu Y,Davis JW,Ward MD,Clements MA,Cazares LH 等。血清蛋白指纹技术结合模式匹配算法可区分前列腺癌、良性前列腺增生和健康男性。癌症研究 2002 年;62:3609-3614。

2

Li J, Zhang Z, Rosenzweig J, Wang YY, Chan DW. Proteomics and bioinformatics approaches for identification of serum biomarkers to detect breast cancer.

Clin Chem
IF 9.3SCIEJCI 2.79Q1C2
2002
;
48
:
1296
-1304.
李君,张哲,罗森茨韦格,王颖怡,陈大伟。蛋白质组学和生物信息学方法鉴定血清生物标志物以检测乳腺癌。临床化验 2002 年;48:1296-1304。

3

Petricoin EF, Ardekani AM, Hitt BA, Levine PJ, Fusaro VA, Steinberg SM, et al. Use of proteomic patterns in serum to identify ovarian cancer.

Lancet
IF 168.9SCIEJCI 21.68Q1C1Top
2002
;
359
:
572
-577.
Petricoin EF,Ardekani AM,Hitt BA,Levine PJ,Fusaro VA,Steinberg SM 等。利用血清蛋白质组图谱识别卵巢癌。《柳叶刀》2002 年;359:572-577。

4

Petricoin EF, 3rd, Ornstein DK, Paweletz CP, Ardekani A, Hackett PS, Hitt BA, et al. Serum proteomic patterns for detection of prostate cancer.

J Natl Cancer Inst
IF 10.3SCIEJCI 2.37Q1C1Top
2002
;
94
:
1576
-1578.
Petricoin EF,第三,Ornstein DK,Paweletz CP,Ardekani A,Hackett PS,Hitt BA 等。血清蛋白质组图谱用于前列腺癌检测。《国家癌症研究所杂志》2002 年;94:1576-1578。

5

Rosty C, Christa L, Kuzdzal S, Baldwin WM, Zahurak ML, Carnot F, et al. Identification of hepatocarcinoma-intestine-pancreas/pancreatitis-associated protein I as a biomarker for pancreatic ductal adenocarcinoma by protein biochip technology.

Cancer Res
IF 11.2SCIEJCI 1.93Q1C1Top
2002
;
62
:
1868
-1875.
Rosty C, Christa L, Kuzdzal S, Baldwin WM, Zahurak ML, Carnot F, 等。通过蛋白质生物芯片技术鉴定肝胰胃/胰腺炎相关蛋白 I 作为胰腺导管腺癌的生物标志物。癌症研究 2002;62:1868-1875。

6

Vlahou A, Laronga C, Wilson L, Gregory B, Fournier K, McGaughey D, et al. A novel approach toward development of a rapid blood test for breast cancer.

Clin Breast Cancer
IF 3.1SCIEJCI 0.61Q3C3
2003
;
4
:
203
-209.
Vlahou A,Laronga C,Wilson L,Gregory B,Fournier K,McGaughey D,等。一种快速血液检测乳腺癌的新方法。Clin Breast Cancer2003;4:203-209。

7

Vlahou A, Schellhammer PF, Mendrinos S, Patel K, Kondylis FI, Gong L, et al. Development of a novel proteomic approach for the detection of transitional cell carcinoma of the bladder in urine.

Am J Pathol
IF 6.0SCIEJCI 1.71Q1C2
2001
;
158
:
1491
-1502.
Vlahou A,Schellhammer PF,Mendrinos S,Patel K,Kondylis FI,Gong L,等。一种新的蛋白质组学方法用于检测膀胱移行细胞癌的尿液中。Am J Pathol2001;158:1491-1502。

8

Qu Y, Adam BL, Yasui Y, Ward MD, Cazares LH, Schellhammer PF, et al. Boosted decision tree analysis of surface-enhanced laser desorption/ionization mass spectral serum profiles discriminates prostate cancer from noncancer patients.

Clin Chem
IF 9.3SCIEJCI 2.79Q1C2
2002
;
48
:
1835
-1843.
Qu Y, Adam BL, Yasui Y, Ward MD, Cazares LH, Schellhammer PF, 等。增强决策树分析表面增强激光解吸/电离质谱血清谱区分前列腺癌和非癌患者。Clin Chem2002;48:1835-1843。

9

Thompson IM, Pauler DK, Goodman PJ, Tangen CM, Lucia MS, Parnes HL, et al. Prevalence of prostate cancer among men with a prostate-specific antigen level < or =4.0 ng per milliliter.

N Engl J Med
IF 158.5SCIEJCI 24.51Q1C1Top
2004
;
350
:
2239
-2246.
Thompson IM, Pauler DK, Goodman PJ, Tangen CM, Lucia MS, Parnes HL 等。前列腺特异抗原水平≤4.0 ng 每毫升男性中前列腺癌的患病率。N Engl J Med2004;350:2239-2246。

10

Baggerly KA, Morris JS, Coombes KR. Reproducibility of SELDI-TOF protein patterns in serum: comparing datasets from different experiments.

Bioinformatics
IF 5.8SCIEJCI 1.67Q1C3
2004
;
20
:
777
-785.
Baggerly KA, Morris JS, Coombes KR。血清中 SELDI-TOF 蛋白质模式的重复性:比较来自不同实验的数据集。Bioinformatics2004;20:777-785。

11

Diamandis EP. Analysis of serum proteomic patterns for early cancer diagnosis: drawing attention to potential problems.

J Natl Cancer Inst
IF 10.3SCIEJCI 2.37Q1C1Top
2004
;
96
:
353
-356.
Diamandis EP. 分析血清蛋白质谱图用于早期癌症诊断:引起潜在问题的关注。J Natl Cancer Inst2004;96:353-356.

12

Diamandis EP. Peptidomics for cancer diagnosis: present and future.

J Proteome Res
IF 4.4SCIEJCI 1.12Q1C2
2006
;
5
:
2079
-2082.
Diamandis EP。 肽组学用于癌症诊断:现在与未来。 J Proteome Res2006;5:2079-2082。

13

Diamandis EP. Validation of breast cancer biomarkers identified by mass spectrometry.

Clin Chem
IF 9.3SCIEJCI 2.79Q1C2
2006
;
52
:
771
-772author reply 2.
Diamandis EP。 通过质谱鉴定的乳腺癌生物标志物的验证。 Clin Chem2006;52:771-772 作者回复 2。

14

Grizzle W, Semmes O, Bigbee W, Zhu L, Malik G, Oelschlager D, Manne B. The need for the review and understanding of SELDI/MALDI mass spectroscopy data prior to analysis.

Cancer Informatics
IF 2.0ESCIJCI 0.49
2005
;
1
:
86
-97.
Grizzle W, Semmes O, Bigbee W, Zhu L, Malik G, Oelschlager D, Manne B. 在分析之前需要审查和理解 SELDI/MALDI 质谱数据。癌症信息学 2005;1:86-97.

15

Hortin GL, Jortani SA, Ritchie JC, Jr, Valdes R, Jr, Chan DW. Proteomics: a new diagnostic frontier.

Clin Chem
IF 9.3SCIEJCI 2.79Q1C2
2006
;
52
:
1218
-1222.
Hortin GL,Jortani SA,Ritchie JC,Jr,Valdes R,Jr,Chan DW。蛋白质组学:一个新的诊断前沿。Clin Chem2006;52:1218-1222。

16

Sorace JM, Zhan M. A data review and re-assessment of ovarian cancer serum proteomic profiling.

BMC Bioinformatics
IF 3.0SCIEJCI 0.76Q2C3EI
2003
;
4
:
24
.
Sorace JM,Zhan M。卵巢癌血清蛋白质组学分析的数据回顾和重新评估。BMC Bioinformatics2003;4:24。

17

Ransohoff DF. Bias as a threat to the validity of cancer molecular-marker research.

Nat Rev Cancer
IF 78.5SCIEJCI 8.86Q1C1Top
2005
;
5
:
142
-149.
Ransohoff DF. 偏倚作为癌症分子标志物研究有效性的威胁。Nat Rev Cancer 2005;5:142-149.

18

Grizzle W, Semmes O, Bigbee W, Malik G, Miller E, Manne B, et al. Use of mass spectrographic methods to identify disease processes. Patrinos G Ansorg W eds.

Molecular Diagnosis
2005
;
Vol. 17
:
211
-222 .
格里兹尔 W,塞姆斯 O,比格比 W,马利克 G,米勒 E,曼尼 B 等。使用质谱方法识别疾病过程。帕特里诺斯 G 安索格 W 编。分子诊断 2005 年;Vol. 17:211-222。

19

Sharp V, Utz PJ. Technology insight: can autoantibody profiling improve clinical practice?.

Nat Clin Pract Rheumatol
2007
;
3
:
96
-103.
Sharp V,Utz PJ。技术洞察:自身抗体谱分析能改善临床实践吗?《自然临床实践风湿病学》2007 年;3:96-103。

20

Grizzle WE, Adam BL, Bigbee WL, Conrads TP, Carroll C, Feng Z, et al. Serum protein expression profiling for cancer detection: validation of a SELDI-based approach for prostate cancer.

Dis Markers
2003
;
19
:
185
-195.
Grizzle WE,Adam BL,Bigbee WL,Conrads TP,Carroll C,Feng Z 等。血清蛋白表达谱在癌症检测中的应用:基于 SELDI 的前列腺癌检测方法的验证。Dis Markers 2003;19:185-195。

21

Semmes OJ, Feng Z, Adam BL, Banez LL, Bigbee WL, Campos D, et al. Evaluation of serum protein profiling by surface-enhanced laser desorption/ionization time-of-flight mass spectrometry for the detection of prostate cancer: I. Assessment of platform reproducibility.

Clin Chem
IF 9.3SCIEJCI 2.79Q1C2
2005
;
51
:
102
-112.
Semmes OJ,Feng Z,Adam BL,Banez LL,Bigbee WL,Campos D 等。评估表面增强激光解吸/飞行时间质谱法血清蛋白谱的前列腺癌检测能力:I. 平台重复性评估。Clin Chem2005;51:102-112。

22

Yasui Y, McLerran D, Adam BL, Winget M, Thornquist M, Feng Z. An automated peak identification/calibration procedure for high-dimensional protein measures from mass spectrometers.

J Biomed Biotechnol
2003
;
2003
:
242
-248.
Yasui Y,McLerran D,Adam BL,Winget M,Thornquist M,Feng Z。质谱仪高维蛋白质测量的自动峰识别/校准程序。J Biomed Biotechnol 2003;2003:242-248。

23

Randolph TW, Yasui Y. Multiscale processing of mass spectrometry data.

Biometrics
IF 1.9SCIEJCI 0.52Q2C4EI
2006
;
62
:
589
-597.
Randolph TW, Yasui Y. 质谱数据的多尺度处理。生物统计学 2006;62:589-597。

24

Banks RE, Stanley AJ, Cairns DA, Barrett JH, Clarke P, Thompson D, Selby PJ. Influences of blood sample processing on low-molecular-weight proteome identified by surface-enhanced laser desorption/ionization mass spectrometry.

Clin Chem
IF 9.3SCIEJCI 2.79Q1C2
2005
;
51
:
1637
-1649.
Banks RE, Stanley AJ, Cairns DA, Barrett JH, Clarke P, Thompson D, Selby PJ. 血样处理对表面增强激光解吸/电离质谱鉴定的低分子量蛋白组的影响。临床化学 2005;51:1637-1649。

25

Drake SK, Bowen RA, Remaley AT, Hortin GL. Potential interferences from blood collection tubes in mass spectrometric analyses of serum polypeptides.

Clin Chem
IF 9.3SCIEJCI 2.79Q1C2
2004
;
50
:
2398
-2401.
Drake SK,Bowen RA,Remaley AT,Hortin GL。血液收集管在血清多肽质谱分析中的潜在干扰。Clin Chem2004;50:2398-2401。

26

Hsieh SY, Chen RK, Pan YH, Lee HL. Systematical evaluation of the effects of sample collection procedures on low-molecular-weight serum/plasma proteome profiling.

Proteomics
IF 3.4SCIEJCI 0.74Q2C4
2006
;
6
:
3189
-3198.
Hsieh SY,Chen RK,Pan YH,Lee HL。对样本收集程序对低分子量血清/血浆蛋白质组学分析影响的系统评价。Proteomics2006;6:3189-3198。

27

Karsan A, Eigl BJ, Flibotte S, Gelmon K, Switzer P, Hassell P, et al. Analytical and preanalytical biases in serum proteomic pattern analysis for breast cancer diagnosis.

Clin Chem
IF 9.3SCIEJCI 2.79Q1C2
2005
;
51
:
1525
-1528.

28

Timms JF, Arslan-Low E, Gentry-Maharaj A, Luo Z, T’Jampens D, Podust VN, et al. Preanalytic influence of sample handling on SELDI-TOF serum protein profiles.

Clin Chem
IF 9.3SCIEJCI 2.79Q1C2
2007
;
53
:
645
-656.

29

Traum AZ, Wells MP, Aivado M, Libermann TA, Ramoni MF, Schachter AD. SELDI-TOF MS of quadruplicate urine and serum samples to evaluate changes related to storage conditions.

Proteomics
IF 3.4SCIEJCI 0.74Q2C4
2006
;
6
:
1676
-1680.

30

West-Nielsen M, Hogdall EV, Marchiori E, Hogdall CK, Schou C, Heegaard NH. Sample handling for mass spectrometric proteomic investigations of human sera.

Anal Chem
IF 7.4SCIEJCI 1.71Q1C1TopEI
2005
;
77
:
5114
-5123.

31

Mischak H, Apweiler R, Banks RE, Conaway M, Coon J, Dominiczak A, et al. Clinical proteomics: a need to define the field and to begin to set adequate standards.

Proteomics Clin App
2007
;
1
:
148
-156.

32

Rai AJ, Gelfand CA, Haywood BC, Warunek DJ, Yi J, Schuchard MD, et al. HUPO Plasma Proteome Project specimen collection and handling: towards the standardization of parameters for plasma proteome samples.

Proteomics
IF 3.4SCIEJCI 0.74Q2C4
2005
;
5
:
3262
-3277.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)

Supplementary data