这是用户在 2024-11-28 17:10 为 https://pmc.ncbi.nlm.nih.gov/articles/PMC3079915/ 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?
Skip to main content
跳到主要内容
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2012 May 10.
.作者手稿;在 PMC 中可用:2012 年 5 月 10 日。
Published in final edited form as: Stat Med. 2011 Jan 13;30(10):1105–1117. doi: 10.1002/sim.4154
以最终编辑形式出版为: Stat Med. 2011 年 1 月 13 日;30(10):1105–1117.doi: 10.1002/sim.4154

On the C-statistics for Evaluating Overall Adequacy of Risk Prediction Procedures with Censored Survival Data
关于使用删失生存数据评估风险预测程序总体充分性的 C 统计量

a,b,*, b, c, d, b
Hajime Uno a, b, * , 蔡天曦 b , Michael J Pencina c , Ralph B D'Agostino d , L J Wei b
PMCID: PMC3079915  NIHMSID: NIHMS255748  PMID: 21484848
PMCID:PMC3079915 班次:NIHMS255748 PMID:21484848

Abstract 抽象

For modern evidence-based medicine, a well thought-out risk scoring system for predicting the occurrence of a clinical event plays an important role in selecting prevention and treatment strategies. Such an index system is often established based on the subject’s “baseline” genetic or clinical markers via a working parametric or semi-parametric model. To evaluate the adequacy of such a system, C-statistics are routinely used in the medical literature to quantify the capacity of the estimated risk score in discriminating among subjects with different event times. The C-statistic provides a global assessment of a fitted survival model for the continuous event time rather than focuses on the prediction of t-year survival for a fixed time. When the event time is possibly censored, however, the population parameters corresponding to the commonly used C-statistics may depend on the study-specific censoring distribution. In this article, we present a simple C-statistic without this shortcoming. The new procedure consistently estimates a conventional concordance measure which is free of censoring. We provide a large sample approximation to the distribution of this estimator for making inferences about the concordance measure. Results from numerical studies suggest that the new procedure performs well in finite sample.
对于现代循证医学,用于预测临床事件发生的深思熟虑的风险评分系统在选择预防和治疗策略方面起着重要作用。这样的指标系统通常是根据受试者的“基线”遗传或临床标志物通过有效的参数或半参数模型建立的。为了评估这种系统的充分性,医学文献中通常使用 C 统计量来量化估计的风险评分区分具有不同事件时间的受试者的能力。C 统计量提供对连续事件时间的拟合生存模型的全局评估,而不是侧重于对固定时间的 t 年生存率的预测。但是,当事件时间可能被删失时,与常用 C 统计量相对应的总体参数可能取决于特定于研究的删失分布。在本文中,我们提出了一个简单的 C 统计量,而没有这个缺点。新程序始终如一地估计一个没有删失的常规一致性度量。我们提供了此估计量分布的大样本近似值,用于对一致性度量进行推断。数值研究结果表明,新程序在有限样本中表现良好。

Keywords: AUC, Cox’s proportional hazards model, Framingham risk score, ROC
关键词:AUC,Cox 比例风险模型,Framingham 风险评分,ROC

1. INTRODUCTION 1. 引言

For modern clinical medicine, risk prediction procedures are valuable tools for disease prevention and management. Pioneered by the Framingham study, risk score systems have been established for assessing individual risks of developing cardiovascular diseases, cancer or many other conditions within a certain time period [, , , ]. A key component in the assessment of risk algorithm performance is its ability to distinguish subjects who will develop an event (“cases”) from those who will not (“controls”). This concept, known as discrimination, has been well studied and quantified for binary outcomes using measures such as the estimated area under the Receiver Operating Characteristics (ROC) curve (AUC), which is also referred to as a “C-statistic” []. Such a statistic is an estimated conditional probability that for any pair of “case” and “control,” the predicted risk of an event is higher for the “case” [].
对于现代临床医学,风险预测程序是疾病预防和管理的宝贵工具。由 Framingham 研究开创,已经建立了风险评分系统,用于评估在一定时间内患心血管疾病、癌症或许多其他疾病的个体风险 [ 1, 2, 3, 4]。评估风险算法性能的一个关键组成部分是它能够区分将发生事件的受试者(“病例”)和不会发生事件的受试者(“对照”)。这个概念被称为歧视,已经使用诸如受试者工作特征 (ROC) 曲线下估计面积 (AUC) 等措施对二元结果进行了充分研究和量化,该曲线也称为“C 统计量”[ 5]。这样的统计量是一个估计的条件概率,即对于任何一对 “案例” 和 “对照” ,“案例” 的事件预测风险更高 [ 6]。

If the primary response variable is the time to a certain event, the aforementioned procedure for binary outcomes can be used to quantify the ability of the risk score system to differentiate cases from controls at a time point t. If one is not interested in a particular time point, a standard concordance measure may be used to evaluate the overall performance of the risk scoring system. Specifically, let T be the event time, Z be a p × 1 covariate vector, and g(Z) be an estimated risk score for subjects with Z. There is a large class of measures to quantify how well the risk score g(Z) predicts the distribution of T or a function thereof. A good review paper is given by Korn & Simon [] or more recently by Hielscher et al. [] Such prediction measures can be classified into two broad classes, one based on explicit loss functions between the risk score and the survival time and the other based on rank correlations between these two quantities. The C-statistic proposed by Harrell et al. [, , ] is essentially a rank-correlation measure, motivated by Kendall’s tau for censored survival data []. A critical issue for rank correlation methods is how to order survival times in the presence of censoring. Brown et al. [] used all observations and assigned probability scores to pairs in which ordering is not obvious due to censoring, based on the pooled Kaplan-Meier estimate for T. However, the score based on the pooled Kaplan-Meier estimate may not be appropriate when the covariates are associated with T. Alternative forms of C-statistic considered in [, , , ] use only so-called “useable” pairs and calculate the proportion of concordant pairs among them. However, such C-statistics estimate population parameters that may depend on the current study-specific censoring distribution. In this article, we propose a modified C-statistic which is consistent for a population concordance measure that is free of censoring.
如果主要响应变量是某个事件发生的时间,则上述二元结果程序可用于量化风险评分系统在某个时间点 t 区分病例和对照的能力。如果对某个特定时间点不感兴趣,可以使用标准一致性度量来评估风险评分系统的整体性能。具体来说,设 T 为事件时间,Z 为 p × 1 协变量向量,g(Z) 为 Z 为主体的估计风险评分。有一大类度量可以量化风险评分 g(Z) 预测 T 或其函数分布的程度。Korn & Simon [ 7] 或最近的 Hielscher 等人 [ 8] 给出了一篇很好的综述论文。这种预测措施可以分为两大类,一类基于风险评分和生存时间之间的显式损失函数,另一类基于这两个量之间的等级相关性。Harrell 等人 [ 9, 10, 11] 提出的 C 统计量本质上是一种秩相关度量,由 Kendall 的 tau 驱动,用于删失生存数据 [ 12]。秩相关方法的一个关键问题是如何在存在删失的情况下对生存时间进行排序。Brown等[12]使用了所有观测值,并根据T的合并Kaplan-Meier估计,将概率分数分配给由于删失而排序不明显的对。然而,当协变量与 T 相关联时,基于合并的 Kaplan-Meier 估计的分数可能不合适。[ 9, 10, 11, 13] 中考虑的 C 统计量的替代形式仅使用所谓的“可用”对并计算它们之间一致对的比例。 然而,这种 C 统计量估计的总体参数可能取决于当前研究特定的删失分布。在本文中,我们提出了一种修改后的 C 统计量,它与没有删失的总体一致性测量是一致的。

More specifically, for two independent copies {(T1, g(Z1))′, (T2, g(Z2))′} of (T, g(Z))′, a commonly used concordance measure is
更具体地说,对于 (T, g(Z))′ 的两个独立副本 {(T 1 , g(Z 1 ))′, (T 2 , g(Z 2 ))′},常用的一致性度量是

𝐶=pr(𝑔(𝑍1)>𝑔(𝑍2)𝑇2>𝑇1) (1.1)

[]. When T is subject to right censoring, as discussed in Heagerty & Zheng [] one would typically consider a modified Cτ with a fixed, prespecified follow-up period (0, τ), where
当T受到右删失时,如Heagerty和Zheng [14]中所讨论的那样,人们通常会考虑 τ 具有固定的、预先指定的随访期(0, τ)的修改C,其中

𝐶𝜏=pr(𝑔(𝑍1)>𝑔(𝑍2)𝑇2>𝑇1,𝑇1<𝜏). (1.2)

Estimation of (1.1) or (1.2) when the event time may be censored, however, is not straightforward [, , ]. The estimator for C or Cτ proposed by Heagerty & Zheng [] is derived under a proportional hazards model. If this semi-parametric working model is not correctly specified, the resulting estimator may be biased. A popular nonparametric C-statistic for estimating C or Cτ was proposed by Harrell et al. [, , ] and extensively studied by Pencina & D’Agostino []. Note that this generalization is a weighted area under an “incident/dynamic” ROC curve [, ] with weights depending on the study-specific censoring distribution.
然而,当事件时间可以被删失时,估计 ( 1.1) 或 ( 1.2) 并不简单 [ 11, 13, 15]。Heagerty和Zheng [14]提出的C或C τ 的估计量是在比例风险模型下推导出来的。如果未正确指定此半参数工作模型,则生成的估计量可能会有偏差。Harrell等人提出了一种流行的非参数C统计量来估计C或C τ [ 9, 10, 11] 并被Pencina & D'Agostino [ 13]广泛研究。请注意,这种泛化是“事件/动态”ROC 曲线 [ 14, 16] 下的加权区域,其权重取决于特定于研究的删失分布。

When the study individuals have different follow-up times, the C-statistic studied by Harrell et al. [, , ] converges to an association measure that involves the study censoring distribution. In this article, under the general random censorship assumption, we provide a simple non-parametric estimator for the concordance measure in (1.2), which is free of censoring. Furthermore, we study the large sample properties of the new estimation procedure. Our proposal is illustrated with two real examples. The performance of the new proposal under various practical settings is also examined via a simulation study. Note that Gönen & Heller [] proposed a method for censored survival data to estimate pr(T2 > T1g(Z1) > g(Z2)), which is also a concordance measure and has a similar form to (1.1), but we do not focus on this type of measures in this paper.
当研究个体的随访时间不同时,Harrell 等人 [ 9, 10, 11] 研究的 C 统计量会收敛到涉及研究删失分布的关联度量。在本文中,在一般的随机删失假设下,我们为 ( 1.2 中的一致性测度提供了一个简单的非参数估计器),它是无删失的。此外,我们研究了新估计程序的大样本特性。我们的提案用两个真实的例子来说明。还通过模拟研究检查了新提案在各种实际设置下的性能。请注意,Gönen & Heller [ 17] 提出了一种删失生存数据估计 pr(T 2 > T 1 ∣ g(Z 1 ) > g(Z 2 )) 的方法,这也是一种一致性测量,与( 1.1)的形式相似,但我们在本文中不关注这种类型的测量。

2. INFERENCE PROCEDURES FOR DEGREE OF ASSOCIATION BETWEEN EVENT TIMES AND ESTIMATED RISK SCORES
2. 事件时间与估计风险评分之间关联程度的推理程序

In this section, we consider a non-trivial case that at least one component of the covariate vector Z is continuous. For the survival time T, let D be the corresponding censoring variable. Assume that D is independent of T and Z. Let {(Ti, Zi, Di), i = 1, …, n} be n independent copies of {(T,Z,D)}. For the ith subject, we only observe (Xi, Zi, Δi), where Xi = min (Ti, Di), and Δi equals 1 if Xi = Ti and 0 otherwise.
在本节中,我们考虑一个非平凡的情况,即协变量向量 Z 的至少一个分量是连续的。对于生存时间 T,设 D 为相应的删失变量。假设 D 独立于 T 和 Z。设 {(T, Z, D), i = 1, ..., n} 是 {(T,Z,D)} 的 n 个独立副本。对于第 i 个主题,我们只观察到 (X, Z, Δ),其中 X = min (T, D),如果 X = T,则 Δ 等于 1,否则为 0。

Suppose that we fit the data with a working parametric or semi-parametric regression model, for example, a standard Cox proportional hazards model []:
假设我们使用有效的参数或半参数回归模型拟合数据,例如,标准 Cox 比例风险模型 [ 18]:

Λ𝑍(𝑡)=Λ0(𝑡)exp(𝛽𝑍), (2.1)

where ΛZ(·) is the cumulative hazard function for subjects with covariate vector Z, Λ0(·) is the unknown baseline cumulative hazard function and β is the unknown p × 1 parameter vector. Let the maximum partial likelihood estimator for β be denoted by β̂. Note that even when the model (2.1) is not correctly specified, under a rather mild non-separable condition that there does not exist vector ζ such that pr(T1 > T2ζ′Z1 < ζ′Z2) = 1, β̂ converges to a constant vector, say, β0, as n → ∞. This stability property is important for deriving the new inference procedure.
其中 Λ Z (·) 是具有协变量向量 Z 的主体的累积风险函数,Λ 0 (·) 是未知的基线累积风险函数,β 是未知的 p × 1 参数向量。设 β 的最大偏似然估计量用 β̂ 表示。请注意,即使模型 ( 2.1) 没有正确指定,在不存在向量ζ的相当温和的不可分条件下,使得 pr(T 1 > T 2 ∣ ζ′Z 1 < ζ′Z 2 ) = 1, β̂ 收敛到一个常数向量,比如 β 0 ,作为 n → ∞。此稳定性属性对于派生新的推理过程非常重要。

For a pair of future patients with covariate vectors {𝑍0𝑘,𝑘=1,2} and the potential survival times {𝑇0𝑘,𝑘=1,2}, their corresponding risk scores are {̂𝛽𝑍0𝑘,𝑘=1,2}. To evaluate this risk score system, one may use the concordance measure discussed in Section 1:
对于一对具有协变量向量 {𝑍0𝑘,𝑘=1,2} 和潜在生存时间 {𝑇0𝑘,𝑘=1,2} 的未来患者,他们相应的风险评分为 {̂𝛽𝑍0𝑘,𝑘=1,2} 。要评估这个风险评分系统,可以使用第 1 节中讨论的一致性测量:

𝐶𝑛=pr(̂𝛽𝑍01>̂𝛽𝑍02𝑇01<𝑇02),

where the probability is evaluated with respect to the data, and ( 𝑇01,𝑍01) and ( 𝑇02,𝑍02). Note that Cn depends on the sample size. Let the limit of Cn be denoted by
其中,根据数据评估概率,以及 ( 𝑇01,𝑍01 ) 和 ( 𝑇02,𝑍02 )。请注意,C n 取决于样本大小。设 C n 的极限表示为

𝐶=pr(𝛽0𝑍01>𝛽0𝑍02𝑇01<𝑇02). (2.2)

Now, since the support of the censoring variable D is usually shorter than that of the failure time T, the tail part of the estimated survival function of T is rather unstable. Therefore, we consider a truncated version of C in (2.2), that is,
现在,由于删失变量 D 的支持通常短于失效时间 T 的支持,因此 T 的估计生存函数的尾部相当不稳定。因此,我们在 ( 2.2) 中考虑 C 的截断版本,即

𝐶𝜏=pr(𝛽0𝑍01>𝛽0𝑍