Introduction 简介

Broad-scale implementation of proteomic information in science and medicine has lagged behind genomics in large part because of the intricacies of protein molecules themselves and the lack of equivalent amplification mechanisms for low-abundance proteins. This has necessitated complex workflows that limit scalability making comprehensive studies of the plasma proteome exceptionally challenging. In spite of extensive efforts to interrogate the plasma proteome, relatively few new candidate biomarkers have been accepted as clinically useful1,2,3,4. Although the exact size of the plasma proteome is unknown, estimates range from >10,000 proteins to potentially covering all proteins5 with a concentration range exceeding 10 orders of magnitude, from albumin at 35–50 mg/mL to low-abundant proteins in the pg/mL range6,7. Combined with a lack of convenient molecular tools for protein analytical work (such as copy or amplification mechanisms), these features make comprehensive studies of the plasma proteome exceptionally challenging.

An extensive body of literature explores comprehensive, deep, and unbiased proteomic analysis of plasma and other biological samples by liquid chromatography-tandem mass spectrometry (LC-MS/MS)3,5,8. However, these studies often involve complex sample preparation workflows using immunodepletion of abundant proteins and chromatographic fractionation of samples upstream of LC-MS/MS analysis. More efficient techniques such as targeted analyte-specific (e.g., immunoassays) and untargeted LC-MS/MS proteomics strategies (without complex fractionation methods) have increased processing throughput, but lag behind the breadth and depth of proteomic coverage achieved with more work-intensive pipelines. Commercial targeted analyte-specific techniques can interrogate low- and high-abundance proteins and are amenable to multiplexing in the range of tens of proteins (e.g., Luminex and Meso Scale Diagnostics). Targeted MS has seen a dramatic expansion in utilization, either with simple fractionation methods (e.g. depletion of abundant proteins) or with anti-protein or anti-peptide immuno-enrichment workflows9,10. Nevertheless, even with these advances the number of targets remains only several hundred proteins11,12 and obviously requires prior knowledge of the targets to be measured.
广泛的文献探讨了通过液相色谱-串联质谱(LC-MS/MS)对血浆和其他生物样本进行全面、深入和无偏见的蛋白质组学分析。然而,这些研究通常涉及复杂的样品制备工作流程,包括免疫去除丰富蛋白质和样品在LC-MS/MS分析之前的色谱分级。更高效的技术,如定向分析特定分析物(例如,免疫分析)和无定向LC-MS/MS蛋白质组学策略(无需复杂的分级方法),已增加了处理吞吐量,但在蛋白质组学覆盖范围的广度和深度方面落后于更耗时的流程。商业定向分析特定分析物技术可以检测低丰度和高丰度蛋白质,并适用于数十种蛋白质的多重检测(例如,Luminex和Meso Scale Diagnostics)。定向质谱已经在利用方面出现了显著扩展,无论是简单的分级方法(例如去除丰富蛋白质)还是抗蛋白质或抗肽免疫富集工作流程。 然而,即使有了这些进展,目标数量仍然只有几百种蛋白质,并且显然需要先前对目标的了解才能进行测量。

Untargeted proteomics strategies with less work-intensive workflows enable enhanced throughput, but are generally limited to quantification of hundreds of predominantly higher-abundance proteins by LC-MS/MS5,9. Even with recent advances in parallel single-molecule protein sequencing13, the broad dynamic range of proteins in biological samples is still an obstacle to robust identification and quantification against a background of thousands of unique proteins, and even more protein variants14,15. While it is now possible to identify over 4500 proteins in plasma using advanced LC-MS/MS and data analytics2,5,16, these approaches generally rely on complex workflows including depletion, protein fractionation, peptide fractionation, and isobaric labelling coupled to LC-MS/MS, which is time-consuming (days to weeks), enforcing a trade-off between depth of protein coverage and sample throughput. These limitations not only hinder the discovery of new protein-based disease biomarkers, but constitute bottlenecks to faster adoption of proteogenomics and protein annotation of genomic variants17.

Increasing performance of proteomics pipelines in terms of throughput and depth can be achieved by at least two strategies: (1) employing advanced acquisition modes, like BoxCar18, scanning SWATH19 or state-of-the-art LC-MS setups such as ion mobility-enabled PASEF20 and sophisticated data processing pipelines that leverage additional information across and within samples21,22,23,24; and (2) improving the sample preparation, either by making low-abundant proteins and peptides more visible (increasing depth such as by fractionation and enrichment) or multiplexing samples to measure more samples in a shorter time (increasing throughput such as by isobaric labeling). These two strategies are often combined to increase performance. Despite advances in, and even when combined with sample preparation automation25,26,27, approaches that increase proteome coverage by sample preparation (strategy 2) usually make the workflow more complex and less scalable.

Nanoparticles (NP) that come into contact with a biological fluid such as plasma form a layer of proteins that coat the NPs at the nano-bio interface, which is referred to as a protein corona28,29,30. The effects of the protein corona on the biological fate of NPs in vitro and in vivo have recently been well explored28,29,30,31,32,33,34,35,36, and early studies focused on decreasing the binding of proteins and other macromolecules to the NP surface, commonly referred to biofouling, in an attempt to enhance utility for in vivo application37,38,39. Seminal systematic studies of the biophysics of protein corona formation then demonstrated the specificity of nano-bio interactions31,34,35,40,41. More recently we36 and others41,42,43,44,45,46 demonstrated that the composition and quantity of corona proteins depends largely on the physicochemical properties of the NP. Because altering these engineered properties reproducibly produces variation in the corona in terms of identity and/or quantity of proteins, it is now possible to systematically study the biomolecular information embedded within the protein corona of each unique NP.

Here, we describe a scalable and efficient protein identification and quantification platform that leverages the unique nano-bio interaction properties of multiple magnetic nanoparticles (NPs) with a protein corona strategy for highly parallel protein separation prior to MS. Our technology exploits magnetic NP-protein interactions and is therefore amenable to downstream sample processing such as multiplexing (e.g., isobaric labeling with tandem mass tag (TMT)) and any advanced MS acquisition strategy. Each NP interrogates hundreds of proteins across a broad dynamic range in an unbiased manner (e.g., not limited to a set of predetermined analytes, as in targeted or antibody-based strategies). We integrate multiple magnetic NPs in an automated Proteograph platform. Unlike other strategies that use single functionalized particles as a scaffold47,48,49,50, all NPs in the Proteograph platform are designed and engineered to synergistically, efficiently, and reproducibly sample complex proteomes based on the native physicochemical properties of proteins and unique nano-bio interactions. We characterize the assay linearity and precision possible with three NPs with distinct physicochemical properties demonstrating response linearity, signal reproducibility, and robustness. We also confirm the deeper sampling of the plasma proteome dynamic range by NP corona formation, enabling the capture and measurement of proteins spanning a wide dynamic range in a single LC-MS/MS run. Based on these results, we screen 43 NPs with distinct physicochemical properties to select a 10-particle panel optimized for plasma protein coverage. By comparison to published values5, we demonstrate that a panel of 10 NPs differentially samples the plasma proteome across more than seven orders of magnitude detecting 53 FDA-cleared protein biomarkers in a single pooled plasma. We test the utility for deep and rapid plasma proteome profiling in a pilot study distinguishing early non-small-cell lung cancer (NSCLC) subjects from age- and gender-matched healthy controls. We identify multi-protein classifiers including proteins known and unknown to play a role in NSCLC, supporting the NPs’ ability to identify new marker sets as the starting point for the eventual development of improved disease detection tests. The properties of our protein separation technology using multi-NP protein coronas present a scalable proteome sampling technology for deep unbiased proteomics to substitute for or complement existing sample preparation pipelines and integrate with any LC-MS/MS workflow.


Engineering and characterizing NPs

Various inorganic and organic NPs have been explored in fundamental studies of protein corona29,34,36,40,46,51,52,53. However, they may not be suitable for high-throughput translational proteomic analysis due to the necessity of repeated centrifugation or membrane filtration to separate the corona from free plasma proteins, and to wash away loosely attached proteins. In response, we developed superparamagnetic iron oxide NPs, or SPIONs (Figs. 12a–c) for protein corona formation in an automatable assay, as the superparamagnetic core of the particle facilitates rapid magnetic separation from plasma (<30 sec) after corona formation (Supplementary Fig. 1), drastically reducing the time needed for extraction of NP protein corona for LC-MS/MS. Moreover, SPIONs can be robustly modified with different surface chemistries, which may facilitate the generation of distinct corona patterns for broader interrogation of the proteome (Supplementary Fig. 2).

Fig. 1: Schematic of workflow of the Proteograph.
figure 1

a Formation of NP protein corona. Different NP physicochemical properties (indicated by three different colors) led to the formation of different protein corona compositions on the NP surface. b Proteograph platform workflow based on multi-NP protein corona approach and mass spectrometry for plasma proteome analysis. The Proteograph workflow includes four steps: (1) NP-plasma incubation and protein corona formation; (2) NP protein corona purification by a magnet; (3) digestion of corona proteins; and (4) LC-MS/MS analysis. In this context, each plasma-NP well is a sample, for a total of 96 samples per plate.

Fig. 2: Characterization of the three SPIONs.
figure 2

A SP-003, B SP-007, and C SP-011, by a, f, k SEM, b, g, l DLS, c, h, m TEM, d, i, n HRTEM, and e, j, o XPS, respectively. DLS shows three replicates of each NP. Panels d, i, and n show the HRTEM pictures recorded at the surface of individual SP-003, SP-007, and SP-011 NPs, respectively, and the yellow arrow points to the region of d amorphous SiO2 coating and i, n amorphous SiO2/polymer coatings on the NP surface. Source data are provided as a Source Data file.
A SP-003,B SP-007和C SP-011,分别通过a,f,k SEM,b,g,l DLS,c,h,m TEM,d,i,n HRTEM和e,j,o XPS。DLS显示每个NP的三个复制品。面板d,i和n显示分别记录在单个SP-003,SP-007和SP-011 NP表面的HRTEM图片,黄色箭头指向d无定形SiO 2 涂层和i,n NP表面上的无定形SiO 2 /聚合物涂层的区域。源数据提供为源数据文件。

Three SPIONs (SP-003, SP-007, and SP-011) with different surface functionalization were initially synthesized (Supplementary Table 1, Supplementary Fig. 3, Fig. 2) according to previously published methods54,55,56,57. SP-003 was coated with a thin layer of silica by a modified Stöber process using tetraethyl orthosilicate (TEOS). For the SPIONS coated with poly(dimethylaminopropyl methacrylamide) (PDMAPMA) (SP-007) and poly(ethylene glycol) (PEG) (SP-011), we first modified the iron oxide particle core with vinyl groups by a modified Stöber process using TEOS and 3-(trimethoxysilyl)propyl methacrylate. Next, the SPIONs were surface modified by free-radical polymerization with N-[3-(dimethylamino)propyl] methacrylamide (SP-007) or poly(ethylene glycol) methyl ether methacrylate (SP-011).

The three SPIONs were characterized in terms of size, morphology, and surface properties using techniques including scanning electron microscopy (SEM), dynamic light scattering (DLS), transmission electron microscopy (TEM), high-resolution TEM (HRTEM), and X-ray photoelectron spectroscopy (XPS) (Fig. 2). Our DLS measurements show that SP-003, SP-007, and SP-011 have average sizes/polydispersity indexes of, respectively, ~233 nm/0.05, ~283 nm/0.09, and ~238 nm/0.20. This is consistent with SEM showing that all three SPIONs are 200–300 nm with spherical and semi-spherical morphologies. Their surface charges of SP-003, SP-007, and SP-011 were evaluated by zeta potential (ζ) analysis, which shows the ζ values of, respectively, −36.9, +25.8, and −0.4 mV at pH 7.4 (Supplementary Table 1). This indicates negative, positive, and neutral surfaces, respectively, consistent with the coatings used (Fig. 2). Coating thickness was evaluated using HRTEM. For SP-003, an amorphous shell formed around the iron oxide core with a thickness >10 nm (Fig. 2d). For SP-007 and SP-011, a relatively thin (<10 nm) amorphous shell was formed (yellow arrows in Fig. 2i, n). In addition, XPS was performed for surface analysis, which, like HRTEM images, confirms the successful coating of the NPs with their respective functional groups.
三种SPIONs通过扫描电子显微镜(SEM)、动态光散射(DLS)、透射电子显微镜(TEM)、高分辨透射电子显微镜(HRTEM)和X射线光电子能谱(XPS)等技术进行了尺寸、形态和表面性质的表征(图2)。我们的DLS测量显示,SP-003、SP-007和SP-011的平均尺寸/多分散指数分别为约233 nm/0.05、约283 nm/0.09和约238 nm/0.20。这与SEM显示的三种SPIONs均为200-300 nm的球形和半球形形态一致。它们的表面电荷通过ζ电位分析进行评估,结果显示在pH 7.4时,SP-003、SP-007和SP-011的ζ值分别为-36.9、+25.8和-0.4 mV(附表1)。这表明它们分别具有负、正和中性表面,与所使用的涂层一致(图2)。通过HRTEM评估了涂层厚度。对于SP-003,形成了一个厚度大于10 nm的铁氧核心周围的非晶壳(图2d)。对于SP-007和SP-011,形成了一个相对较薄(<10 nm)的非晶壳(图2i、n中的黄色箭头)。 此外,还进行了XPS表面分析,类似于HRTEM图像,证实了纳米颗粒与它们各自的功能基团成功包覆的情况。

The analytical results described above confirm that these three SPIONs constitute a diverse test set of NPs, which we further evaluated for protein detection coverage, precision, and linearity of response.
上述分析结果证实了这三种 SPIONs 构成了一个多样化的纳米颗粒测试集,我们进一步评估了其蛋白质检测覆盖率、精度和响应线性。

Initial panel of three magnetic NPs for proteomic analysis

To evaluate the utility of our platform in proteomic analysis, we investigated the capacity of the three initial NPs to interrogate the complex proteome of blood plasma (Fig. 3, Supplementary Data 1). Each NP (100 µL) was first incubated with plasma for 1 h at 37 °C allowing for equilibrium of proteins that associate with NPs forming a stable protein corona, followed by magnet-based purification of NPs from unbound proteins (6 min per cycle × 3). The bound proteins were then digested, purified, and eluted. Notably, this highly parallel preparation workflow required only ~4–6 h in total for a batch of 96 corona preparations. The peptides from the NP-bound corona were analyzed in a 60-min LC-MS/MS run in data-dependent acquisition mode (DDA). Data were analyzed using MaxQuant for peptide identification and protein group assembly and MaxLFQ for quantification58.
为了评估我们平台在蛋白质组学分析中的实用性,我们调查了三种初始纳米颗粒在检测血浆复杂蛋白质组的能力(图 3,补充数据 1)。每种纳米颗粒(100 µL)首先与血浆在 37°C 下孵育 1 小时,使与纳米颗粒结合形成稳定蛋白质包被的蛋白质达到平衡,然后通过基于磁铁的方法从未结合的蛋白质中纯化纳米颗粒(每个周期 6 分钟×3)。然后对结合的蛋白质进行消化、纯化和洗脱。值得注意的是,这种高度并行的制备工作流程仅需要大约 4-6 小时的时间来处理 96 个包被制备。来自纳米颗粒结合包被的肽段在 60 分钟的 LC-MS/MS 运行中进行分析,采用数据依赖性采集模式(DDA)。数据使用 MaxQuant 进行肽鉴定和蛋白质组装,使用 MaxLFQ 进行定量分析。

Fig. 3: Proteomics characterization of the three initial SPIONs.
figure 3

a Protein groups from the NP corona of the three initial SPIONs, SP-003, SP-007, and SP-011 as determined by DDA LC-MSMS and MaxQuant (MaxLFQ, 1% protein and peptide FDR). All: represents proteins detected across all NPs. White line indicates the number of proteins detected with two or more peptides with at least one NP. For respective NPs median count and standard deviation across three assay replicates are shown as bar plots. Upper dashes depict number of proteins detected in any sample; lower dashes depict number of proteins detected in all three replicates. White circles show number of protein IDs for each assay replicate. b CV% for precision evaluation (MaxLFQ, filtering for three out of three valid values) of the NP protein corona-based Proteograph workflow. Inner boxplots report the 25% (lower hinge), 50%, and 75% quantiles (upper hinge). Whiskers indicate observations equal to or outside hinge ± 1.5 * interquartile range (IQR). Outliers (beyond 1.5 * IQR) are not plotted. Violin plots capture all data points. c Correlation of the maximum intensities of NP corona proteins vs. plasma proteins to the published concentration of the same proteins (median of assay triplicates). The black lines are linear regression models, and the grey shaded regions represent 95% confidence interval. d Linearity of response for measurement for CRP protein on the SP-007 NP in a spike-recovery experiment. Error bars denote standard deviations around the mean. All data were acquired in n = 3 independent assay replicates. Source data are provided as a Source Data file.
通过DDA LC-MSMS和MaxQuant(MaxLFQ,1%蛋白质和肽段FDR确定的),确定了三种初始SPIONs,SP-003,SP-007和SP-011的NP冠蛋白组。所有:表示在所有NP中检测到的蛋白质。白线表示至少有一个NP检测到两个或更多肽段的蛋白质数量。对于各自的NP,通过三次测定重复显示的中位数计数和标准偏差显示为条形图。上方虚线表示在任何样本中检测到的蛋白质数量;下方虚线表示在所有三次重复中检测到的蛋白质数量。白色圆圈显示每次测定重复的蛋白质ID数量。b基于NP蛋白质冠的Proteograph工作流程的精度评估的CV%(MaxLFQ,过滤三个中的三个有效值)。内部箱线图报告25%(下四分位数),50%和75%分位数(上四分位数)。须表示等于或超出四分位距±1.5 * 四分位距(IQR)的观测值。异常值(超出1.5 * IQR)未绘制。小提琴图捕捉所有数据点。c NP冠蛋白质的最大强度与血浆蛋白质与相同蛋白质的已发表浓度的相关性(测定三次重复的中位数)。 黑色线条是线性回归模型,灰色阴影区域代表95%置信区间。在SP-007 NP上进行CRP蛋白质测量的响应线性性在一个回收实验中。误差棒表示均值周围的标准偏差。所有数据均在n = 3个独立的测定重复中获取。源数据提供为源数据文件。

Three NPs facilitated the quantification of >700 protein groups across nine samples (triplicate measurements of three NPs) and more than 500 protein groups with each nanoparticle type alone (Fig. 3a, Supplementary Table 2). For precision, we determined that detection of a protein in three out of three SPION coronas represents median CVs of 19.6%, 30.3%, and 17.0% (on average 22%) for SP-003, SP-007, and SP-011, respectively (Fig. 3b). The NP panel has sufficient precision to detect relatively small differences in fairly small studies. For example, in a study with just 25 samples and assuming 2000 measured analytes, we would have 85% power to detect differences of 50% in protein concentrations between groups with a Bonferroni-corrected alpha = 0.05/2000.

To explore the ability of NPs to interrogate plasma proteins present over a wide range of concentrations, we compared measured protein feature intensities from the protein coronas of the three NPs described above to published values59 (Fig. 3c). In parallel, we also directly measured peptides from a digested plasma sample without enrichment using SPIONs. The decreasing slopes for the fitted models for particle intensities indicate a reduction in the dynamic range of protein signal intensities as a function of abundance. This is consistent with previous observations60,61 that NPs can effectively reduce the measured dynamic range for abundances in the corona compared to the range in plasma by effectively normalizing protein abundance by binding affinity. Our multi-NP protein corona strategy thus facilitates the identification of a broad spectrum of plasma proteins, particularly those with low abundance, which pose challenges to rapid detection by conventional proteomic techniques.

To determine the linearity of our platform as a measurement tool and to support its utility in detecting true differences between groups of samples in biomarker discovery and validation studies, we first performed a spike-recovery study across four particles and three proteins comprising four polypeptides using Angiogenin, C-Reactive-Protein (CRP), Calprotectin (S100a8/9) (concentrations determined by ELISA: 3.3, 49, 8.9, and 8.9 ng/ml, respectively) and observed R2 between 0.90 and 1 (Supplementary Table 3, Supplementary Data 2). As exemplification, we present the results for SP-007 NP and C-reactive protein (CRP) in Fig. 3d. First, we used ELISA to determine the endogenous plasma level of CRP. Next, we spiked purified CRP (see Methods) to achieve testable multiples of the endogenous level. Post-spiking CRP levels were determined to be 4.11, 7.10, 11.5, 22.0, and 215.0 µg/mL corresponding to 1× (control), 2×, 5×, 10×, and 100× the endogenous level, respectively. We then plotted the quantities for the four indicated CRP peptides on the SP-007 NP versus the CRP concentrations as appropriate for comparing methods reporting different value types (Fig. 3d). Note that the MS1 feature intensity was undetectable for two of the CRP peptides in the unspiked plasma. The fitted lines are linear models using the given feature’s spike intensities.
为了确定我们的平台作为测量工具的线性度,并支持其在生物标志物发现和验证研究中检测样本组间真实差异的实用性,我们首先进行了跨四种粒子和三种蛋白质的尖峰恢复研究,包括使用Angiogenin、C-反应蛋白(CRP)、Calprotectin(S100a8/9)(浓度由ELISA确定:分别为3.3、49、8.9和8.9 ng/ml),观察到R值在0.90和1之间(附表3,附录2)。作为示例,我们在图3d中展示了SP-007 NP和C-反应蛋白(CRP)的结果。首先,我们使用ELISA确定了CRP的内源性血浆水平。接下来,我们添加纯化的CRP(见方法)以达到内源水平的可测倍数。添加后的CRP水平分别确定为4.11、7.10、11.5、22.0和215.0 µg/mL,分别对应于内源水平的1×(对照)、2×、5×、10×和100×。然后,我们根据适用于比较报告不同数值类型的方法,将四个指示的CRP肽在SP-007 NP上与CRP浓度进行绘制(图3d)。 请注意,未经加标的血浆中两种CRP肽的MS1特征强度无法检测。拟合的线性模型使用了给定特征的加标强度。

Fitting a regression model to all four of the CRP tryptic peptides yielded a slope of 0.90 (95% CI 0.81–0.98) for the response of corona MS signal intensity versus ELISA plasma level, approaching perfect analytical performance. In contrast, a similar regression model fitted to 1308 other (nonspiked) MS features identified in at least four of the five plasma samples, for which signals from associated MS features should not vary across samples, had a slope of −0.086 (95% CI −0.1 to −0.068). These results indicate that the NPs’ linearity of response will likely prove useful in quantifying potential markers in comparative studies. Moreover, the response of the spiked-protein peptide features also suggests that with appropriate calibration, the NP protein corona method could be used to determine absolute, rather than relative, analyte levels.
将所有四个CRP胰蛋白酶肽段拟合到回归模型中,得到了0.90的斜率(95% CI 0.81–0.98),用于描述冠状病毒MS信号强度与ELISA血浆水平之间的响应,接近完美的分析性能。相比之下,对1308个其他(非加标)MS特征进行类似的回归模型拟合,这些特征在至少五个血浆样本中出现,相关MS特征的信号不应在样本间变化,得到了-0.086的斜率(95% CI -0.1 to -0.068)。这些结果表明,纳米颗粒的响应线性可能在定量比较研究中证明其有用性。此外,加标蛋白肽特征的响应也表明,通过适当的校准,纳米颗粒蛋白质包被方法可以用于确定绝对而非相对的分析物水平。

Linearity of response was explored in greater depth with the addition of two other spiked proteins, Angiogenin and Calprotectin (S100a8/9), comprising three additional polypeptides and three additional NPs. The intensity data for these additional proteins and NPs were modeled against the measured ELISA values by linear regression, and a summary of the fits for the models is shown in Supplementary Table 3. The mean slope across all proteins and NPs is 1.06, indicating a linear response across the two orders of magnitude used in the spiked sample preparation (i.e., from 1× to 100× endogenous levels). The adjusted-R2 correlation for the intensities is also high (mean 0.95). These results confirm the linearity of response and indicate the ability of the NP platform to measure relative changes in peptide/protein levels across a broad range of concentrations with high precision.

To address the effect of background interference, we investigated the impact of varying lipid levels and extent of hemolysis: two common variables in plasma matrix composition. The lipid content of plasma changes not only with fasting state but also with age and state of health62. It is therefore important for every blood assay to be either insensitive to background matrix changes or to be able to control and correct for those introduced. We compared the number of identified proteins, the protein overlaps among conditions, and the intensity distributions measured from a pooled plasma sample spiked with low and high amounts of lipids, and subsequently treated with several NPs (Supplementary Figs. 4 and 5). Our data show that even high amounts of lipids do not affect the number or makeup of protein IDs or the intensity distributions compared to control samples with no lipid spikes. One tested NP (SP-356-001) shows a small reduction in protein IDs with high concentrations of spiked lipids when the sample is not centrifuged before measurement. This in fact highlights one of the advantages of using NPs: different surface properties could allow for the detection of biases comparing the coronas of particles for the same sample. We also observed good correlation in intensities across conditions, indicating the robustness of protein quantities.
为了解决背景干扰的影响,我们研究了不同脂质水平和溶血程度对血浆基质组成的影响:这是血浆基质中的两个常见变量。血浆中的脂质含量不仅随着禁食状态而变化,还随着年龄和健康状况的变化。因此,每项血液检测都要么对背景基质变化不敏感,要么能够控制和纠正引入的变化。我们比较了从添加了低和高量脂质的混合血浆样本中识别的蛋白质数量、不同条件下的蛋白质重叠以及测量的强度分布,并随后用几种NPs进行处理(附图4和5)。我们的数据显示,即使添加了大量脂质,也不会影响蛋白质标识的数量或构成,也不会影响强度分布,与未添加脂质的对照样本相比。一种被测试的NP(SP-356-001)在样本在测量之前未经离心时,高浓度的添加脂质会导致蛋白质标识数量略有减少。 事实上,这突显了使用NPs的优势之一:不同的表面特性可以允许比较相同样品的颗粒冠的偏倚检测。我们还观察到在不同条件下强度之间有很好的相关性,表明蛋白质数量的稳健性。

Similarly, we investigated the effect of hemolysis using a human-derived red blood cell hemolysate spiked into a pooled plasma sample at low and high concentrations, as well as a control with no spike. As expected, cell debris introduced by hemolysis changes the protein count and content, as would be the case in any proteomics pipeline. However, proteins that overlap those detected in normal plasma are unaffected by the massively changing background introduced by hemolysis, as demonstrated by the correlation analysis (Supplementary Figs. 6 and 7).

Optimized panel of 10 magnetic NPs

To further expand NP corona protein selection in a practicable format amenable to automation, we screened the coronas formed on 43 distinct SPIONs (Supplementary Data 3) in a similar fashion to the original three SPIONs. The goal was to select an optimized panel of 10 NPs that maximize the detection of proteins from a pooled plasma sample. The 43 candidate SPIONs were evaluated under six conditions (Methods), and the optimal conditions were used in a secondary analysis to select the best combination. The 43-SPION screen was conducted using pooled plasma from both healthy subjects and lung cancer patients (i.e., different from the pool used for the original three particles), to demonstrate platform validation across biological samples. In this analysis, a simpler criterion for protein detection was used for panel selection and optimization, i.e., a protein had to be represented by at least one peptide-spectral-match (PSM; 1% FDR) in each of three full-assay replicates to be counted as identified. The panel with the largest number of individual unique Uniprot identifiers was selected. This approach avoids any differential protein grouping effects possible across different combinations of evaluated NPs, since protein groups are based on the empirical data contained within any given analysis and might be confounded by the many diverse NP corona subsets.
为了进一步扩大可自动化的NP冠状蛋白选择范围,我们以类似于原始三种SPIONs的方式筛选了43种不同的SPIONs形成的冠状物。目标是选择一组优化的10种NPs,最大限度地提高对混合血浆样本中蛋白质的检测。对43个候选SPIONs进行了六种条件的评估,并在次要分析中使用最佳条件选择最佳组合。43-SPION筛选是使用来自健康受试者和肺癌患者的混合血浆进行的(即与用于原始三种粒子的混合不同),以证明在生物样本中跨平台验证。在此分析中,用于面板选择和优化的蛋白检测的简化标准是,蛋白质必须在三个完整的检测重复中至少由一个肽谱匹配(PSM; 1% FDR)表示才能被计为已识别。选择具有最多个体独特Uniprot标识符的面板。 这种方法避免了可能在评估的不同NP组合中产生的任何差异蛋白质分组效应,因为蛋白质组是基于特定分析中包含的经验数据,并可能被众多不同的NP气溶胶子集所混淆。

The two-tiered screening approach described above yielded an optimized panel of 10 NPs with which we interrogated a common pooled plasma sample in three full-assay replicates (Fig. 4, Supplementary Fig. 8, Supplementary Data 4). We determined the median CVs for protein group quantification using MaxQuant (see Methods). The results ranged from 16.4 to 30.8% (Fig. 4b, Supplementary Table 4, 5), which is in the range of the precision determined for previous studies4.

Fig. 4: Optimized panel of 10 SPIONs in comparison to neat plasma.
figure 4

a Protein groups from the NP corona of 10 SPIONs, quantified by DDA LC-MS/MS (1% protein and peptide FDR). All: number of quantified protein groups across all NPs (excluding neat plasma). White line indicates the number of proteins detected with two or more peptides with at least one NP. For respective NPs median count and standard deviation across three assay replicates are shown as bar plots. Upper dashes depict number of proteins detected in any sample; lower dashes depict number of proteins detected in all three replicates. White circles show number of protein IDs for each assay replicate. b CV% distribution (precision) of the NP protein corona-based workflow for neat plasma and 10 SPIONs (filtering for three out of three valid values across assay replicates). Inner boxplots report the 25% (lower hinge), 50%, and 75% quantiles (upper hinge). Whiskers indicate observations equal to or outside hinge ± 1.5 * interquartile range (IQR). Outliers (beyond 1.5 * IQR) are not plotted. violin plots capture all data points. c Matching 10 SPIONs to a plasma protein database of MS intensities. Ranked intensities for the database proteins5 are shown in the top panel. Most intense protein is in the upper left corner of the panel; least intense is in the lower right corner. Intensities for proteins from neat plasma are shown in the bottom panel (plasma). Intensities for 10 SPIONs are shown in the remaining panels. Red dots indicated FDA-approved protein biomarkers1. d Volcano plot depicting annotation enrichment analysis (Fisher’s exact test) for functional pathways (GOCC,GOBP, KEGG, Uniprot Keywords, Pfam) of proteins detected in the optimized panel of 10 NPs in comparison to the database. Enriched = Log2 Odds > 0; depleted = Log2 Odds < 0. Blue circles indicated pathways with a Benjamini–Hochberg (B.H.) false discovery rate (FDR) < 1%. Green annotations indicate some enriched annotations enriched for NPs. Selected depleted annotatons are depicted in black. Keratin and Meiosis are depleted annotations with a B.H. FDR > 5%. e 1D annotation enrichment analysis comparing the protein intensity distribution (median intensity across assay triplicates, requiring three out of three quantifications) of each NP against the average of all. 1D scores are plotted as heat maps for annotations (minimal size 11) that are significantly enriched or depleted (2% B.H. FDR) for at least 1 NP. All data were acquired in n = 3 independent assay replicates. Source data are provided as a Source Data file.
10个SPIONs的NP冠状蛋白质组,通过DDA LC-MS/MS定量(1%的蛋白质和肽段FDR)。全部:所有NP上定量蛋白质组的数量(不包括原始血浆)。白线表示至少有一个NP上至少检测到两个或更多肽段的蛋白质数量。对于各自的NP,通过三个测定重复显示的中位数计数和标准差在条形图中显示。上方虚线表示在任何样本中检测到的蛋白质数量;下方虚线表示在所有三个重复中检测到的蛋白质数量。白色圆圈显示每个测定重复的蛋白质ID数量。b 基于NP蛋白冠的工作流程在原始血浆和10个SPIONs中的CV%分布(精度)(对测定重复中的三个有效值进行筛选)。内部箱图报告了25%(下边缘)、50%和75%分位数(上边缘)。触须表示观测结果等于或超出边缘±1.5 * 四分位距(IQR)。离群值(超出1.5 * IQR)未绘制。小提琴图捕捉所有数据点。c 将10个SPIONs匹配到血浆蛋白质数据库的MS强度。数据库蛋白的排序强度 5 显示在顶部面板中。 面板的左上角是最强烈的蛋白质;最不强烈的在右下角。从原始血浆中蛋白质的强度显示在底部面板(血浆)。10个SPIONs的强度显示在其余面板中。红点表示FDA批准的蛋白质生物标志物。d 火山图显示了在功能通路(GOCC、GOBP、KEGG、Uniprot关键词、Pfam)中蛋白质的注释富集分析(Fisher确切性检验),与数据库中的10个NPs的优化面板进行比较。富集= Log2几率 > 0;耗尽= Log2几率 < 0。蓝色圆圈表示Benjamini-Hochberg(B.H.)假发现率(FDR)< 1%的通路。绿色注释表示一些富集的NPs富集的注释。选定的耗尽注释以黑色显示。角蛋白和减数分裂是具有B.H. FDR > 5%的耗尽注释。e 1D注释富集分析比较每个NP的蛋白质强度分布(跨试验三重复的中位强度,需要三次三次定量)与所有平均值。 1D得分被绘制为热图,用于标注(最小尺寸11),这些标注在至少1个NP中显著富集或耗竭(2% B.H. FDR)。所有数据均在n = 3个独立的测定重复中获取。源数据提供为源数据文件。

Next we compared the precision of protein quantification to a published proteomics dataset. Given the large diversity in acquisition modes, quantification strategies, and protein inference pipelines, direct comparison of assay reproducibility is non-trivial. Geyer et al.4 describe a rapid LC-MS/MS proteomics approach with an abridged sample preparation protocol yielding an average of 284 protein groups per assay and 321 protein groups across all replicates. We found 88 identical protein groups between the 321 of Geyer et al. and our 1184 protein groups. Because protein groups can comprise multiple related proteins and assemble those proteins differently depending on the detected peptides, two mass spectrometry experiments can report partially overlapping protein groups. To allow as fair of a comparison as possible on the protein level, we compared the 88 protein groups that were composed of exactly the same Uniprot entries so there would be no ambiguity.

For these 88 common protein groups, we analyzed the data of Geyer et al.4 and found a median CV of 12.1% compared to a median CV across our NPs of 7.2%. We selected the NP that reports the best CV for each protein, as that is the one that would be selected for an assay. For a comparison from another perspective, Geyer also reports the number of protein groups with CVs < 20%, as this is a common cutoff for in vitro diagnostic assays. Our 10-NP panel detects 761 protein groups (with CV < 20%), which is 3.7 times greater than the number reported by Geyer4.
对于这88个常见的蛋白质组,我们分析了Geyer等人的数据,并发现与我们的NPs的中位CV相比,中位CV为12.1%。我们选择了每个蛋白质报告最佳CV的NP,因为这是将用于检测的NP。从另一个角度进行比较,Geyer还报告了CV <20%的蛋白质组的数量,这是体外诊断试验的常见截止值。我们的10-NP面板检测到761个蛋白质组(CV <20%),这比Geyer报告的数量高出3.7倍。

Next we investigated how the proteins detected with the 10-NP panel map to the abundance range of the plasma proteome (Fig. 4c). To this end, we mapped the proteins quantified with the 10-NP panel to the normalized intensities reported by Keshishian et al.5. In this study, more than 5000 protein groups were detected across 16 individual plasma samples in a complex workflow involving analysis of ~30 MS fractions per sample, taking a few weeks to complete5. Using the MS-derived plasma protein group intensities from that study, the coverage of each NPs was compared to this reference and to neat plasma (no depletion or enrichment). Proteins from neat plasma matching the database were skewed towards higher intensity (a proxy for abundance) in the full plasma protein database, whereas the protein constituents of the protein coronas from all 10 NPs extended nearly throughout the database’s entire dynamic range (Fig. 4c). Only 39 proteins in the database had intensities lower than the lowest protein group matched from a NP.

One key application of rapid, deep proteome analysis is the identification and quantification of protein biomarkers. While there are more than 100 FDA-cleared protein biomarkers1, the rate of the appearance of novel protein biomarkers per year is very low (less than 2 per year)63. In line with the observation made by Geyer et al.3, most biomarkers are in the high abundance range. Of the 90 mapped biomarkers, we identified between 33 and 43 within each of the NPs and in neat plasma (Fig. 4c, Supplementary Table 6, Supplementary Fig. 9).

While it is certainly important to compare the individual protein IDs, it is also of interest to determine which functional classes present in the reference plasma proteome are covered. To this end we mapped functional annotations (GOCC, GOBP, KEGG, Uniprot Keywords, Pfam) to Uniprot IDs and compared the enrichment and depletion of annotations in the panel of 10 NPs. Proteins covered with the 10 NPs panel showed significant enrichment for a variety of functional annotations including “secretion”, “innate immunity”, and “vesicles”. Underrepresented annotations include membrane- and DNA-associated annotations (Fig. 4d).
尽管比较各个蛋白质ID非常重要,但确定参考血浆蛋白质组中存在哪些功能类别也很有趣。为此,我们将功能注释(GOCC、GOBP、KEGG、Uniprot关键词、Pfam)映射到Uniprot ID,并比较了10个NPs面板中注释的富集和耗竭。被10个NPs面板覆盖的蛋白质显示出多种功能注释的显著富集,包括“分泌”、“先天免疫”和“囊泡”。被低估的注释包括与膜和DNA相关的注释(图4d)。

To further explore the capacity of individual NPs to interrogate different functional classes of proteins (i.e., extracellular region, membrane, or cytosol), we looked at NP-specific enriched annotations. For this analysis we employed a 1D annotation enrichment64 to compare protein coronas from individual NPs to the average profile of the entire 10-NP panel. Clustering based on 1D enrichment score (Fig. 4e) shows distinct and differential patterns of enrichment and depletion across the 10-NP panel. For example, GO Cellular Compartment annotations characterize protein location. In that category, NPs cluster into major branches (Cluster 1 with SP-373, SP-365, SP-347, and SP-406 versus Cluster 2/3 with SP-064, SP-007, SP-047, SP-339, SP-390, and SP-333). In contrast to Cluster 2 and 3, Cluster 1 shows depletion of proteins associated with the extracellular region and enrichment for intracellular proteins. Uniprot Keywords shows that some NPs specifically deplete for immune globulins (IgG) while showing enrichment for proteins annotated as secreted and involved in inflammation (e.g., SP-390, SP-339). Moreover, Uniprot Keywords and GO biological Process (GOBP) indicate that a subset of NPs, including SP-390 and SP-047, allow enrichment for lipid transport proteins, while other NPs like SP-007 could deplete proteins belonging to this functional class. In summary, annotation enrichments show that NP coronas can be categorized not only on the level of individual proteins but also based on functional groups of proteins. In principle, an experiment could take advantage of different subsets of particles focusing on specific protein group IDs or enriched annotations, whichever is more relevant to the question at hand. Moreover, the capacity to interrogate different functional classes of proteins (i.e., extracellular region, membrane, or cytosol) illustrates the capability of NP coronas to sample a wide dynamic range in complex proteomes.
进一步探索个体NPs的能力,以审查不同功能类蛋白质(即细胞外区域、膜或细胞质)的能力,我们查看了NP特定的富集注释。对于这项分析,我们采用了一维注释富集 64 来比较个体NPs的蛋白质包被与整个10-NP面板的平均配置文件。基于一维富集分数的聚类(图4e)显示出10个NP面板上富集和耗竭的明显和差异模式。例如,GO细胞组分注释表征蛋白质位置。在该类别中,NPs聚类成主要分支(Cluster 1与SP-373、SP-365、SP-347和SP-406相对于Cluster 2/3与SP-064、SP-007、SP-047、SP-339、SP-390和SP-333)。与Cluster 2和3相比,Cluster 1显示出与细胞外区域相关的蛋白质的耗竭和胞内蛋白质的富集。Uniprot关键词显示,一些NPs特别减少免疫球蛋白(IgG),同时富集于被注释为分泌和参与炎症的蛋白质(例如SP-390、SP-339)。 此外,Uniprot关键词和GO生物过程(GOBP)表明,包括SP-390和SP-047在内的一部分NPs可富集脂质转运蛋白,而像SP-007这样的其他NPs可能会耗尽属于这个功能类别的蛋白质。总之,注释富集显示NP冠可以不仅根据单个蛋白质的水平进行分类,还可以根据蛋白质的功能组进行分类。原则上,实验可以利用不同的颗粒子集合,重点关注特定的蛋白质组ID或富集的注释,以适应当前问题。此外,审查不同功能类别的蛋白质(即细胞外区域、膜或细胞质)的能力展示了NP冠在复杂蛋白质组中取样广泛动态范围的能力。

Large-scale application: non-small-cell lung cancer study

To illustrate the performance of the Proteograph in a large human cohort, we performed a deep and rapid plasma proteome profiling of non-small-cell lung cancer (NSCLC) subjects and age- and gender-matched healthy and pulmonary comorbidity control subjects (Fig. 5; Supplementary Data 58, Supplementary Table 7). We used short a gradient (20 min gradient, 33 min sample-to-sample time) and a panel of five NPs selected from the original 10, optimized for maximum protein group coverage, in order to further reduce total experiment time. The total time required to complete these analyses was ~2 weeks. We evaluated precision using QC samples throughout the study, which showed that the Proteograph enables low CVs and a reproducible number of protein identifications even when processing more than 1500 assays measured across three mass spectrometers (five NPs and depleted plasma for each of the 141 subject samples).

Fig. 5: Classification of early NSCLC vs healthy using five NPs.
figure 5

a Protein group counts by NP and depleted plasma (filtered for 1% peptide and protein FDR). The green bars show the mean number of proteins in the cohort of 141 subjects found with the five NPs. The yellow bar shows the mean number of proteins in the cohort of 141 subjects for depleted plasma, the black bar shows the number of proteins across the five NP panel and all 141 subjects. The white line indicates the proteins that were detected with two peptides or more with one or more NP. The blue bar shows number of proteins across the five NP panel that were detected in at least 25% of all 141 subjects. Error bars depict standard deviation of identifications. White circles show number of protein IDs for each biological sample. b Heatmap showing the median normalized intensities (natural logarithm) of protein groups (rows) detected with five NPs (columns) or depleted plasma across 141 subjects (early NSCLC and healthy). Protein groups were filtered for 1% peptide and protein FDR and detection in at least 10% of the samples. Missing values were set to 0 (dark blue). Hierarchical clustering was performed in R using the ward.d2 method. c Receiver operating characteristic (ROC) curves quantifying the classification performance of healthy vs. early-stage NSCLC patients. Each colored curve represents one of the 10 repeats of the 10-fold cross validation where the performance was assessed on the hold-out test splits. The ROC average area under the curve (AUC) for across the 10 repeats is 0.91. d Top 20 most important features to classify healthy vs early NSCLC, with the color gradient showing the associated Open Targets Score for lung carcinoma targets. Source data are provided as a Source Data file.
蛋白质组通过NP和去除的血浆计数(筛选出1%的肽和蛋白质FDR)。绿色条形图显示了在141名受试者中发现的五个NP的队列中的蛋白质平均数。黄色条形图显示了在141名受试者中去除血浆的蛋白质平均数,黑色条形图显示了在五个NP面板和所有141名受试者中的蛋白质数量。白线表示使用两个或更多肽段在一个或多个NP中检测到的蛋白质。蓝色条形图显示了在所有141名受试者中至少检测到25%的五个NP面板中的蛋白质数量。误差条显示了鉴定的标准偏差。白色圆圈显示了每个生物样本的蛋白质ID数量。热图显示了141名受试者(早期非小细胞肺癌和健康人群)中使用五个NP(列)或去除血浆检测到的蛋白质组(行)的中位数归一化强度(自然对数)。蛋白质组经过1%肽和蛋白质FDR的筛选,并在至少10%的样本中检测到。缺失值设为0(深蓝色)。使用ward.d2方法在R中进行了分层聚类。 c 用于量化健康与早期NSCLC患者分类性能的接收者操作特征(ROC)曲线。每条彩色曲线代表10次10折交叉验证中的一次,性能是在保留测试拆分上评估的。跨越10次重复的ROC平均曲线下面积(AUC)为0.91。d 用于分类健康与早期NSCLC的前20个最重要特征,颜色渐变显示了与肺癌靶标相关的Open Targets Score。源数据提供为源数据文件。

To investigate the possibility of early NSCLC detection, we performed classification modeling on the sample set consisting of 80 healthy and 61 early-stage NSCLC subjects. On average, we identified 1664 proteins in these 141 subjects across five NPs (Fig. 5a). NPs composed distinct clusters for patterns of protein abundances (Fig. 5b). This unsupervised clustering analysis also showed a few subject specific differences but no clear pathology driven separation. We were particularly interested in how useful the additional proteins detected with NPs (beyond those detected in depleted plasma) are in stratifying healthy and NSCLC subjects, and removed the proteins detected in depleted plasma before building the classification models. The healthy vs early NSCLC classification achieved an average AUC of 0.91 (Fig. 5c) using a Random Forest model and 10 repeats of 10-fold cross validation. Random class permutation of the subjects achieved an average AUC of only 0.51, confirming the absence of overfitting in the classifier results. Examination of the top 20 classifier features (combination of particle and protein group), ranked by feature importance, highlights proteins both known and unknown to play a role in NSCLC as judged by Open Targets65 (OT) annotation (Fig. 5d). Among the most important features, we identified tubulin, which is the target of chemotherapeutic drugs including paclitaxel and its derivatives66.
为了探究早期非小细胞肺癌检测的可能性,我们对由80名健康人和61名早期非小细胞肺癌患者组成的样本集进行了分类建模。平均而言,我们在这141名受试者中鉴定了1664种蛋白质,涵盖了五种NPs(图5a)。NPs为蛋白质丰度模式组成了不同的簇(图5b)。这种无监督聚类分析还显示了一些受试者特定差异,但没有明显的病理驱动分离。我们特别关注NPs检测到的额外蛋白质(超出去除血浆中检测到的蛋白质)在区分健康人和非小细胞肺癌患者方面的实用性,并在构建分类模型之前去除了在去除血浆中检测到的蛋白质。使用随机森林模型和10次10折交叉验证,健康人与早期非小细胞肺癌的分类平均AUC为0.91(图5c)。对受试者进行随机类别排列的平均AUC仅为0.51,证实了分类器结果中不存在过拟合。 根据特征重要性排名,检查了前20个分类器特征(粒子和蛋白质组合),突出显示根据Open Targets(OT)注释判断,在NSCLC中发挥作用的已知和未知蛋白质(图5d)。在最重要的特征中,我们确定了微管蛋白,它是包括紫杉醇及其衍生物在内的化疗药物的靶点。

In a recent study, Geyer et al. noted that the quality of clinical samples is often compromised by contamination with platelets and erythrocytes67. We checked which proteins of the most important classification features overlap with the deep platelet proteome published by Geyer et al.67. Only one of the top five features was detected in the platelet proteome (three other features with lower importance were also found in the remaining top 10). Notably, independent of the platelet index (see Methods) the Proteograph yields a considerably higher number of quantified proteins compared to depleted plasma (Supplementary Fig. 10, Supplementary Table 8).

Discussion 讨论

Since early studies of biological protein association with the surface of NPs30, enormous strides have been made in understanding the protein corona, yielding numerous insights in nanomedicine and drug delivery31,32,33. It has increasingly been recognized that the protein corona determines the physiological responses to NPs (e.g., pharmacokinetics, biodistribution, cellular uptake, and therapeutic efficacy) and that NP-protein interactions are highly dependent on the NP’s physicochemical properties, exposure time, and protein source and concentration. More recently, ex vivo and in vitro interrogation of protein corona have been proposed for disease diagnosis and prediction68,69,70 and the LC-MS/MS proteomics analysis of protein corona formed on PEGylated liposomal doxorubicin (Caleyx™) after in vivo circulation has been shown to reveal low-abundance plasma proteins46.

Notwithstanding the above, little has been done to apply multiple NPs to the challenges of large proteomic biomarker studies that require broad protein coverage, deep dynamic range interrogation, and high sample throughput. The rationale for the current study is that small alterations to NP physicochemical properties can elicit dramatic but reproducible changes in protein corona36,41,42,43,44,45. We thus hypothesized that, compared to any single NP, multiple NPs with distinct engineered physicochemical properties offer expanded but partially overlapping proteomic sampling and more-comprehensive proteomics data.

We developed a highly parallel and automated protein separation technology platform (we refer to as Proteograph), which incorporates a panel of NPs selected from screening 46 engineered SPIONs with distinct physicochemical properties, into an ex vivo assay for protein corona formation and LC-MS/MS analysis, to achieve unbiased protein collection/detection. Using pooled plasma as a model complex biological sample, we validated our hypothesis that a larger NP panel identifies more proteins, particularly low-abundance proteins. In a panel of 10 NPs, we not only found distinct proteins but also protein pathways to associate with respective protein coronas. This suggests that addition of further distinct NPs should enable even broader and deeper proteome profiling. Thus, the platform can be tailored to profile the proteome at different levels by varying the number and type of NPs, analogous to different levels of coverage in gene sequencing. With the same NP panel, we detected 53 FDA-approved protein biomarkers. In agreement with previous observations3, most of these biomarkers were detected in the high-abundance range. Given the large number of low-abundance proteins NPs can detect, we predict that future studies will identify a number of novel biomarkers using a combination of NPs.
我们开发了一种高度并行和自动化的蛋白质分离技术平台(我们称之为Proteograph),该平台将来自筛选46种具有不同物理化学特性的改良SPIONs的NPs面板整合到蛋白质包被形成的离体评估和LC-MS/MS分析中,以实现不偏的蛋白质收集/检测。使用混合血浆作为模型复杂生物样品,我们验证了我们的假设,即更大的NP面板能够识别更多的蛋白质,特别是低丰度蛋白质。在一个包含10种NPs的面板中,我们不仅发现了不同的蛋白质,还发现了与各自蛋白包被相关的蛋白质途径。这表明进一步添加不同的NPs将使蛋白质组蛋白质谱分析更加广泛和深入。因此,该平台可以通过改变NPs的数量和类型来定制不同层次的蛋白质组谱分析,类似于基因测序中不同的覆盖水平。在使用相同的NP面板时,我们检测到了53种FDA批准的蛋白质生物标志物。与先前观察一致 3 ,大多数这些生物标志物处于高丰度范围内。 鉴于NPs能够检测到大量低丰度蛋白质,我们预测未来的研究将利用NPs的组合来发现许多新的生物标志物。

The multi-NP protein corona assay has also demonstrated several advantages in plasma proteome analysis. Unlike conventional deep proteomic techniques requiring depletion and fractionation workflows, our strategy is fast, scalable, and leverages physicochemical differences on the protein level without specifically targeting proteins. Notably, the multi-NP assay can be robustly automated and expanded by simply adding new NP variants, further increasing precision and breadth while speeding analysis in a 96-well plate format. Reproducibility and spike-recovery experiments also highlight the ability of our multi-NP protein corona platform to measure differences between samples, while reducing the concentration range of proteins in the enriched samples and facilitating detection of even low-abundance proteins, a key advantage of NP protein corona proteomic analysis. Since compressing the dynamic range affects measured abundance differences between different proteins within one sample, future studies could evaluate isotopically labeled protein spike-ins to calibrate measured quantities and derive absolute abundance information such concentrations or copy numbers.

In our NP-based classification feasibility study focusing on differentiation between samples from early-stage NSCLC patients and healthy controls, we demonstrated the utility of the platform to rapidly evaluate a large number of samples in a short period of time and identified novel combinations of known and unknown proteins as potential novel starting points for downstream NSCLC test development. In this study, more than 2000 proteins were quantified across 141 subjects in 2 weeks, a throughput enabled by the simplicity and robustness of the NP platform.

The performance of the healthy vs. early NSCLC (stages 1, 2, and 3) classifier was high (AUC 0.91), and we were able to identify proteins both known and unknown to play roles in NSCLC, supporting the value of proteins as an analyte class in developing better tests for early disease detection. Interestingly, among the most important features in the classification of healthy vs. early NSCLC, we identified tubulin, which—as a component of the cytoskeleton—is a usually intracellular protein detected in platelets67 but also a target for the chemotherapeutic paclitaxel and biomarker for neuronal tissue damage in cerebrospinal fluid71. Tissue damage and diseases like cancer could be associated with higher abundance of intracellular proteins that are otherwise correlated with contamination. New strategies to distinguish contamination markers from biological/ disease signatures are needed, in particular when interrogating complex physiological changes with highly sensitivity mass spectrometers. While this initial study provides a proof-of-concept for employing multiple NPs to identify protein biomarkers in a clinical cohort, these potential disease signatures have to be validated in follow-up studies.
健康与早期非小细胞肺癌(1、2和3期)分类器的性能很高(AUC 0.91),我们能够识别在非小细胞肺癌中发挥作用的已知和未知蛋白质,支持蛋白质作为一种分析类别在开发更好的早期疾病检测测试中的价值。有趣的是,在健康与早期非小细胞肺癌分类中最重要的特征中,我们确定了微管蛋白,作为细胞骨架的组成部分,通常是在血小板中检测到的一种细胞内蛋白质,但也是化疗药物紫杉醇的靶点,以及脑脊液中神经组织损伤的生物标志物。组织损伤和癌症等疾病可能与细胞内蛋白质的丰度增加相关,这些蛋白质通常与污染相关。需要新的策略来区分污染标记和生物/疾病特征,特别是在使用高灵敏度质谱仪研究复杂生理变化时。 尽管这项初步研究为在临床队列中利用多个NPs识别蛋白质生物标志物提供了概念验证,但这些潜在的疾病特征必须在后续研究中得到验证。

The scalability and efficiency of our platform can fuel large proteomics studies, deepening our understanding of disease and biological mechanisms. It would be particularly interesting to integrate NPs into new mass spectrometry acquisition strategies such as BoxCar18, Scanning SWATH19, or ion mobility-enabled PASEF20. Another interesting possibility would be to use isobaric labeling (e.g., TMT) of peptides derived from our NP workflow to reduce MS run time by a factor of 10 or more. Despite the time advantage, isobaric labeling might be less suitable for some large-scale proteomic studies since it increases the costs of reagents and requires expensive MS3 capable instruments for the most accurate results72. Significant concurrent increase in the throughput of proteomic assay/analysis enabling larger size studies could help add proteomic data to large multiomic data sets to generate novel classifications and put genomic disease information that is still not well understood into functional context, such as single nucleotide polymorphism variants, changes in DNA methylation patterns, and splice variants. Moreover, protein-level information such as interactions or structural information are preserved on NP surface and can further elucidate functional context.
我们平台的可扩展性和效率可以推动大规模蛋白质组学研究,加深我们对疾病和生物机制的理解。将NPs整合到新的质谱采集策略中,如BoxCar、扫描SWATH或离子迁移启用的PASEF,尤其有趣。另一个有趣的可能性是利用来自我们NP工作流的肽段的同位素标记(例如TMT),将MS运行时间缩短至10倍或更多。尽管同位素标记具有时间优势,但对于一些大规模蛋白质组学研究可能不太适用,因为它会增加试剂的成本,并需要昂贵的MS3能力仪器以获得最准确的结果。蛋白质组学分析吞吐量的显著同时增加,使更大规模的研究成为可能,有助于将蛋白质组数据添加到大型多组学数据集中,生成新的分类,并将仍不太了解的基因组疾病信息置于功能背景中,例如单核苷酸多态性变体、DNA甲基化模式变化和剪接变体。 此外,蛋白质水平的信息,如相互作用或结构信息,被保留在NP表面上,可以进一步阐明功能背景。

In addition, our NP technology could be extended and tailored to cerebrospinal fluid, cell lysates, and even tissue homogenates for rapid, accurate, and precise profiling of proteomes, facilitating discovery of new disease biomarkers. Furthermore, the multi-NP workflow addresses the dynamic range challenge at the intact protein level, and it is agnostic regarding the downstream protein identification and quantification strategy and can be integrated into low cost ELISA or emerging protein sequencing workflows. Ultimately, the broad utility of the functionalized multi-NPs workflow could be expanded into fields beyond proteomics, as NP surfaces can bind with any type of molecule. Possibilities include enrichment of nucleic acids for genomics, detection, and measurements of impurities in water sampling, and enhancing chemical sensing in environmental monitoring applications.

Methods 方法

Materials 材料

Iron (III) chloride hexahydrate ACS, sodium acetate (anhydrous ACS), ethylene glycol, ammonium hydroxide 28–30%, ammonium persulfate (APS) (≥98%, Pro-Pure, Proteomics Grade), ethanol (reagent alcohol ACS), and methanol (≥99.8% ACS) were purchased from VWR. N,N′-Methylenebisacrylamide (99%) was purchased from EMD Millipore. Trisodium citrate dihydrate (ACS reagent, ≥99.0%), tetraethyl orthosilicate (TEOS) (reagent grade, 98%), 3-(trimethoxysilyl)propyl methacrylate (MPS) (98%), and poly(ethylene glycol) methyl ether methacrylate (OEGMA, average Mn 500, contains 100 ppm MEHQ as inhibitor, 200 ppm BHT as inhibitor) were purchased from Sigma–Aldrich. 4,4′-Azobis(4-cyanovaleric acid) (ACVA, 98%, cont. ca 18% water) and divinylbenzene (DVB, 80%, mixture of isomers) were purchased from Alfa Aesar and purified by passing a short silica column to remove the inhibitor. N-(3-Dimethylaminopropyl)methacrylamide (DMAPMA) was purchased from TCI and also purified by passing a short silica column to remove the inhibitor. The ELISA kit to measure human C-reactive protein (CRP) was purchased from R&D Systems (Minneapolis, MN). Human CRP protein purified from human serum was from Sigma–Aldrich.
氯化铁(III)六水合物ACS,乙酸钠(无水ACS),乙二醇,氢氧化铵28-30%,过硫酸铵(APS)(≥98%,Pro-Pure,蛋白质组学级),乙醇(试剂酒精ACS),甲醇(≥99.8%ACS)均从VWR购买。N,N'-亚甲基双丙烯酰胺(99%)从EMD Millipore购买。柠檬酸三钠二水合物(ACS试剂,≥99.0%),正硅酸四乙酯(TEOS)(试剂级,98%),3-(三甲氧基硅烷基)丙基甲基丙烯酸酯(MPS)(98%)和聚乙二醇甲醚甲基丙烯酸酯(OEGMA,平均Mn 500,含100 ppm MEHQ作为抑制剂,200 ppm BHT作为抑制剂)均从Sigma-Aldrich购买。4,4'-偶氮(4-氰丙酸)(ACVA,98%,含约18%水)和二乙烯基苯(DVB,80%,异构体混合物)均从Alfa Aesar购买,并通过通过短硅柱纯化以去除抑制剂。N-(3-二甲基氨基丙基)甲基丙烯酰胺(DMAPMA)从TCI购买,并通过通过短硅柱纯化以去除抑制剂。用于测量人类C-反应蛋白(CRP)的ELISA试剂盒从R&D Systems(明尼阿波利斯,明尼苏达州)购买。从Sigma-Aldrich购买的人类CRP蛋白从人血清中纯化。

Synthesis of NP SP-003, SP-007, and SP-011
NP SP-003、SP-007和SP-011的综合

The iron oxide core was synthesized following the published method via solvothermal reaction (Supplementary Fig. 3A)54,55. Typically, 26.4 g of iron (III) chloride hexahydrate was dissolved in 220 mL of ethylene glycol at 160 °C for ~10 min under mixing. Then 8.5 g of trisodium citrate dihydrate and 29.6 g sodium acetate anhydrous were added and fully dissolved by mixing for an additional 15 min at 160 °C. The solution was then sealed in a Teflon-lined stainless-steel autoclave (300 mL capacity) and heated to 200 °C for 12 h. After cooling to room temperature (RT), the black paramagnetic product was isolated by a magnet and washed with DI water 3–5 times. The final product was freeze-dried to a black powder for further use.

The silica-coated iron oxide NPs (SP-003) were prepared through a modified Stöber process as reported before (Supplementary Fig. 3B)56,57. Typically, 1 g of the SPIONs were homogeneously dispersed in a mixture of ethanol (400 mL), DI water (10 mL), and concentrated ammonia aqueous solution (10 mL, 28–30 wt%), followed by the addition of TEOS (2 mL). After stirring at 70 °C for 6 h, amorphous silica-coated SPIONs (denoted Fe3O4@SiO2) were washed three times with methanol, three times with water, and the final product was freeze-dried to a powder.
硅包覆的氧化铁纳米颗粒(SP-003)是通过改良的Stöber工艺制备的,如之前报道的(附图3B)。通常,1克SPIONs均匀分散在乙醇(400毫升)、去离子水(10毫升)和浓氨水溶液(10毫升,28-30 wt%)的混合物中,然后加入TEOS(2毫升)。在70°C搅拌6小时后,得到无定形的硅包覆SPIONs(标记为FeO@SiO),用甲醇洗涤三次,用水洗涤三次,最终产品经冻干制成粉末。

To prepare SP-007 (PDMAPMA-modified SPION) and SP-011 (PEG-modified SPION), vinyl group–functionalized SPIONs (denoted Fe3O4@MPS) were first prepared through a modified Stöber process as previously reported (Supplementary Fig. 3C)41. Briefly, 1 g of the SPIONs was homogeneously dispersed under the aid of vortexing (or sonication) in a mixture of ethanol (400 mL), DI water (10 mL), and concentrated ammonia aqueous solution (10 mL, 28–30 wt%), followed by the addition of TEOS (2 mL). After stirring at 70 °C for 6 h, 2 mL of 3-(trimethoxysilyl)propyl methacrylate was added into the reaction mixture and stirred at 70 °C overnight. Vinyl-functionalized SPIONs were obtained and washed three times with methanol, three times with water, and the final product freeze-dried to a powder. Next, for synthesis of poly(dimethylaminopropyl methacrylamide) (PDMAPMA)-coated SPIONs (denoted Fe3O4@PDMAPMA, SP-007 in Supplementary Fig. 3D), 100 mg of Fe3O4@MPS was homogeneously dispersed in 125 mL of DI water. After bubbling with N2 for 30 min, 2 g of N-[3-(dimethylamino)propyl] methacrylamide (DMAPMA) and 0.2 g of divinylbenzene (DVB) were added into the Fe3O4@MPS suspension under N2 protection. After the resulting mixture was heated to 75 °C, 40 mg of ammonium persulfate (APS) in 5 mL DI water was added and stirred at 75 °C overnight. After cooling, Fe3O4@PDMAPMA were isolated with a magnet and washed 3–5 times with water. The final product was freeze-dried to a dark brown powder. For synthesis of poly(ethylene glycol) (PEG)-coated SPIONs (denoted as Fe3O4@PEGOMA, SP-011 in Supplementary Fig. 3E), 100 mg of Fe3O4@MPS was homogeneously dispersed in 125 mL of DI water. After bubbling with N2 for 30 min, 2 g of poly(ethylene glycol) methyl ether methacrylate (OEGMA, average Mn 500) and 50 mg of N,N′-Methylenebisacrylamide (MBA) were added into the Fe3O4@MPS suspension under N2 protection. After the resulting mixture was heated to 75 °C, 50 mg of 4,4’-azobis(4-cyanovaleric acid) (ACVA) in 5 mL ethanol was added and stirred at 75 °C overnight. After cooling, Fe3O4@POEGMA were isolated with a magnet and washed 3–5 times with water. The final product was freeze-dried to a dark brown powder.
为了制备SP-007(PDMAPMA修饰的SPION)和SP-011(PEG修饰的SPION),首先通过改良的Stöber过程制备了含乙烯基的SPIONs(表示为FeO@MPS)。简而言之,1克SPIONs在乙醇(400毫升)、去离子水(10毫升)和浓氨水溶液(10毫升,28-30重量%)的混合物中在涡旋(或超声)的帮助下均匀分散,随后加入TEOS(2毫升)。在70°C搅拌6小时后,向反应混合物中加入2毫升3-(三甲氧基乙基)丙烯酸丙烯酯,并在70°C下搅拌过夜。得到乙烯基功能化的SPIONs,并用甲醇洗涤三次,用水洗涤三次,最终产品冷冻干燥成粉末。接下来,为了合成聚(二甲氨基丙基甲基丙烯酰胺)(PDMAPMA)包覆的SPIONs(表示为FeO@PDMAPMA,Supplementary Fig. 3D中的SP-007),将100毫克FeO@MPS均匀分散在125毫升去离子水中。经N气体通气30分钟后,加入2克N-[3-(二甲基氨基)丙基]甲基丙烯酰胺(DMAPMA)和0。将2克的二乙烯苯(DVB)加入到Fe 3 O 4 @MPS悬浮液中,并在N 2 保护下进行。将混合物加热至75°C后,加入40毫克过硫酸铵(APS)和5毫升去离子水,然后在75°C搅拌过夜。冷却后,用磁铁分离出Fe 3 O 4 @PDMAPMA,并用水洗涤3-5次。最终产品经过冷冻干燥成为深褐色粉末。对于合成聚乙二醇(PEG)包被的SPIONs(标记为Fe 3 O 4 @PEGOMA,Supplementary Fig. 3E中的SP-011),将100毫克的Fe 3 O 4 @MPS均匀分散在125毫升去离子水中。在N 2 冒泡30分钟后,将2克聚乙二醇甲醚甲基丙烯酸酯(OEGMA,平均Mn 500)和50毫克N,N'-亚甲基双丙烯酰胺(MBA)加入到Fe 3 O 4 @MPS悬浮液中,并在N 2 保护下进行。将混合物加热至75°C后,加入50毫克4,4'-偶氮(4-氰基戊酸)(ACVA)和5毫升乙醇,然后在75°C搅拌过夜。 冷却后,Fe 3 O 4 @POEGMA 用磁铁分离并用水洗涤3-5次。最终产品经过冷冻干燥成为深褐色粉末。

Characterization of NP physicochemical properties

Dynamic light scattering (DLS) and zeta potential were measured on a Zetasizer Nano ZS (Malvern Instruments, Worcestershire, UK). NPs were suspended at 10 mg/mL in water with 10 min of bath sonication prior to testing. Samples were then diluted to ~0.02 wt% for both DLS and zeta potential measurements in respective buffers. DLS was performed in water at 25 °C in disposable polystyrene semi-micro cuvettes (VWR, Randor, PA, USA) with a 1 min temperature equilibration time and the average taken from three runs of 1 min, with a 633 nm laser in 173° backscatter mode. DLS results were analyzed using the cumulants method. Zeta potential was measured in 5% pH 7.4 PBS (Gibco, PN 10010-023, USA) in disposable folded capillary cells (Malvern Instruments, PN DTS1070) at 25 °C with a 1 min equilibration time. Three measurements were performed with automatic measurement duration with a minimum of 10 runs, a maximum of 100 runs, and a 1 min hold between measurements. The Smoluchowski model was used to determine the zeta potential from the electrophoretic mobility.
动态光散射(DLS)和ζ电位是在Zetasizer Nano ZS(Malvern Instruments,英国伍斯特郡)上测量的。在测试之前,NPs在水中以10mg/mL的浓度悬浮,并进行了10分钟的浴超声处理。然后,样品被稀释至约0.02wt%,分别在相应的缓冲液中进行DLS和ζ电位测量。DLS在25°C的水中进行,使用一分钟的温度平衡时间,在一次1分钟的三次运行中取平均值,采用633nm激光和173°背向散射模式,使用紫外可见分光光度计进行测量。DLS结果采用累积方法进行分析。ζ电位在5%pH 7.4 PBS(Gibco,PN 10010-023,美国)中测量,在25°C下使用一分钟的平衡时间,在自动测量持续时间内进行三次测量,最少进行10次运行,最多进行100次运行,并在测量之间保持1分钟。使用Smoluchowski模型来确定ζ电位的电泳迁移率。

Scanning electron microscopy (SEM) was performed using a FEI Helios 600 Dual-Beam FIB-SEM. Aqueous dispersions of NPs were prepared to a concentration of 10 mg/mL from weighted NP powders re-dispersed in DI water by 10 min sonication. Then the samples were 4× diluted by methanol (Fisher) to make a dispersion in water/methanol that was directly used for electron microscopy. SEM substrates were prepared by drop-casting 6 µL of NP samples on the Si wafer from Ted Pella, and the droplet was completely dried in a vacuum desiccator for about 24 h prior to measurements.
扫描电子显微镜(SEM)使用FEI Helios 600 双束离子束-扫描电子显微镜进行。将纳米颗粒的水分散液制备成10毫克/毫升的浓度,通过将加权的纳米颗粒粉末重新分散在去离子水中,经过10分钟的超声处理。然后,样品用甲醇(Fisher)稀释4倍,制成水/甲醇分散液,直接用于电子显微镜观察。扫描电子显微镜基板通过滴涂6微升纳米颗粒样品在Ted Pella的硅片上制备,并在进行测量前在真空干燥器中完全干燥约24小时。

A Titan 80–300 transmission electron microscope (TEM) with an accelerating voltage of 300 kV was used for both low- and high-resolution TEM measurements. The TEM grids were prepared by drop-casting 2 µL of the NP dispersion in a water-methanol mixture (25–75 v/v%) with a final concentration of 0.25 mg/mL and dried in a vacuum desiccator for about 24 h prior to TEM analysis. All measurements were performed on the lacey holey TEM grids from Ted Pella.
一台加速电压为300kV的Titan 80-300透射电子显微镜(TEM)用于低分辨率和高分辨率TEM测量。TEM网格通过在水-甲醇混合物(25-75 v/v%)中滴铸2µL的NP分散液制备,最终浓度为0.25mg/mL,并在进行TEM分析前在真空干燥器中干燥约24小时。所有测量均在Ted Pella的蕾丝孔洞TEM网格上进行。

X-Ray Photoelectron Spectroscopy (XPS) was performed using a PHI VersaProbe and a Thermo Scientific ESCALAB 250e III. XPS analysis was performed on the NP fine powders kept sealed and stored under desiccation prior to measurement. Materials were mounted on carbon tape to achieve a uniform surface for analysis. A monochromatic Al K-alpha X-ray source (50 W and 15 kV) was used over a 200 µm2 scan area with a pass energy of 140 eV, and all binding energies were referenced to the C–C peak at 284.8 eV. Both survey scans and high-resolution scans were performed to assess in detail the elements of interest. The atomic concentration of each element was determined from integrated intensity of elemental photoemission features corrected by relative atomic sensitivity factors by averaging the results from two different locations on the sample. In some cases, four or more locations were averaged to assess uniformity.
采用PHI VersaProbe和Thermo Scientific ESCALAB 250e III进行X射线光电子能谱(XPS)分析。在测量之前,NP细粉保持密封并存放在干燥条件下。材料被安装在碳胶带上,以获得用于分析的均匀表面。使用单色Al K-alpha X射线源(50瓦和15千伏)在200 µm的扫描区域上,通过140电子伏的通量能量,所有结合能都参考了284.8电子伏的C-C峰。进行了概览扫描和高分辨率扫描,以详细评估感兴趣的元素。每个元素的原子浓度是通过校正相对原子灵敏度因子的元素光电发射特征的积分强度来确定的,通过对样品上两个不同位置的结果进行平均。在某些情况下,对四个或更多位置进行平均以评估均匀性。

Protein corona preparation and proteomic analysis

Plasma and serum samples (BioIVT, Hicksville NY) were diluted 1:5 in a dilution buffer composed of TE buffer (10 mM Tris, 1 mM disodium EDTA, 150 mM KCl) with 0.05% CHAPS. NP powder was reconstituted by sonicating for 10 min in DI water followed by vortexing for 2–3 sec. To form the protein corona, 100 µL of NP suspension (SP-003, 5 mg/ml; SP-007, 2.5 mg/ml; SP-011, 10 mg/ml) was mixed with 100 µL of diluted biological samples in microtiter plates. The plates were sealed and incubated at 37 °C for 1 h with shaking at 300 rpm. After incubation, the plate was placed on top of a magnetic collection device for 5 min to draw down the NPs. Unbound proteins in supernatant were pipetted out. The protein corona was further washed with 200 µL of dilution buffer three times with magnetic separation.
血浆和血清样本(BioIVT,纽约希克斯维尔)在稀释缓冲液中稀释1:5,该缓冲液由TE缓冲液(10毫摩尔三羟甲基氨基甲烷,1毫摩尔二钠盐乙二胺四乙酸,150毫摩尔氯化钾)和0.05% CHAPS组成。NP粉末在DI水中超声10分钟后涡旋2-3秒重新溶解。为形成蛋白质包被,将100µL NP悬浮液(SP-003,5毫克/毫升;SP-007,2.5毫克/毫升;SP-011,10毫克/毫升)与100µL稀释的生物样本混合在微孔板中。封板后在37°C下摇动300rpm孵育1小时。孵育后,将板放在磁性收集装置上5分钟以沉淀NP。上清液中的未结合蛋白质被吸出。用200µL稀释缓冲液三次进行磁性分离进一步洗涤蛋白质包被。

For the 10-NP screen, the five additional assay conditions evaluated were identical to those described above, with one of the following exceptions. First, a low concentration of NPs was evaluated that was 50% the original concentration (ranging from 2.5–15 mg/ml for each NP, depending on expected peptide yield). For the second and third assay variations, both low and high NP concentrations were run using an undiluted, neat plasma rather than diluting the plasma in buffer. For the fourth and fifth assay variations, both low and high NP concentrations were run using a pH 5 citrate buffer for both dilution and rinse.
对于10-NP屏幕,评估的五个额外的测定条件与上述描述的条件相同,但有以下例外。首先,评估了低浓度的NP,其浓度为原始浓度的50%(每个NP的浓度范围为2.5-15mg/ml,取决于预期的肽产量)。对于第二和第三种测定变化,使用未稀释的纯血浆同时运行低和高浓度的NP,而不是将血浆稀释在缓冲液中。对于第四和第五种测定变化,使用pH 5的柠檬酸盐缓冲液进行稀释和冲洗,同时运行低和高浓度的NP。

To digest the proteins bound onto NPs, a trypsin digestion kit (iST 96×, PreOmics, Germany) was used according to protocols provided. Briefly, 50 µL of Lyse buffer was added to each well and heated at 95 °C for 10 min with agitation. After cooling the plates to room temperature, trypsin digestion buffer was added, and the plate incubated at 37 °C for 3 h with shaking. The digestion process was stopped with a stop buffer. The supernatant was separated from the NPs by a magnetic collector and further cleaned up by a peptide cleanup cartridge included in the kit. The peptide was eluted with 75 µL of elution buffer twice and combined. Peptide concentration was measured by a quantitative colorimetric peptide assay kit from Thermo Fisher Scientific (Waltham, MA).
为了消化结合在NPs上的蛋白质,根据提供的方案使用了一种胰蛋白酶消化试剂盒(iST 96×,PreOmics,德国)。简而言之,每个孔中加入50µL的裂解缓冲液,并在95°C下加热10分钟并搅拌。冷却至室温后,加入胰蛋白酶消化缓冲液,将板子在37°C下摇动孵育3小时。消化过程用停止缓冲液停止。上清液通过试剂盒中包含的磁性收集器与NPs分离,并通过肽清洁柱进一步净化。用75µL洗脱缓冲液两次洗脱肽,并合并。肽的浓度由Thermo Fisher Scientific(马萨诸塞州沃尔瑟姆)的定量比色肽分析试剂盒测量。

NSCLC sample processing NSCLC样本处理

As part of an ongoing, IRB-approved observational sample collection protocol, 24 sites were used to collect subject samples grouped into NSCLC (all stages, with 1, 2, and 3 referred to herein as early, and stage 4 defined as late), or healthy and pulmonary comorbid control arms. Subjects with pathology-confirmed NSCLC were enrolled post-diagnosis (typically achieved via a CT-guided fine-needle aspirant biopsy) but pretreatment. The protocol for obtaining blood samples from patients (Supplementary Note 1) was approved by the collections sites’ respective IRB’s (Supplementary Data 7), and all subjects gave written informed consent. Subjects were not necessarily fasted at the time of collection. Subjects for the pulmonary comorbidity control and healthy control groups were enrolled based on patient call-backs from participating study sites. In this context, healthy means the subjects did not have a current diagnosis of any form of cancer or any of the targeted pulmonary comorbidities including COPD, emphysema, etc. Sample types collected included EDTA plasma tubes, serum tubes, PAXgene RNA tubes, and Streck Blood Cell Collection tubes. For the purposes of this study, EDTA plasma was prepared as follows: After collection into the EDTA plasma tube per vendor instructions, the samples were centrifuged within 1 h of collection and the plasma fraction was aspirated and frozen within one hour of centrifugation prior to initial storage at −70 °C and subsequent shipment on dry ice. Study plasma samples were thawed at 4 °C, realiquoted, and refrozen once prior to NP processing. A randomly selected subcohort of 141 age- and gender-matched subjects from the healthy and early-stage NSCLC groups was selected for analysis from the collected samples with no significant differences between the groups based on Wilcoxon or Fisher tests, respectively. For NP analysis, the 141 plasma samples were randomized across sets of 96-well plates, one set for each NP. In addition to NP-plasma interrogation, a depleted plasma sample was prepared using the MARS-14 column (Agilent) per the manufacturer’s instructions. The NP-isolated peptides, as well as the peptides from equivalently digested depleted plasma, were evaluated by data-independent-acquisition mass spectrometry (DIA-MS) on Sciex Triple TOF 6600+ instruments coupled to an EKSPERT nano-LC 425 LC system running a 33 min sample-to-sample gradient. MS data acquisition took 2 weeks for all 141 samples.
作为持续进行的、经IRB批准的观察性样本收集协议的一部分,使用了24个地点来收集被分组为NSCLC(所有阶段,其中1、2和3在此处称为早期,第4阶段定义为晚期)、或健康和肺部共病对照组的受试者样本。经病理学确认为NSCLC的受试者在诊断后入组(通常通过CT引导的细针穿刺活检获得),但在治疗前。从患者获取血液样本的协议(附注1)已获得各收集地点的IRB批准(附录7),所有受试者均签署了书面知情同意书。在收集时,受试者不一定需要空腹。肺部共病对照组和健康对照组的受试者是基于参与研究地点的患者回访而入组的。在这种情况下,健康意味着受试者没有任何形式的癌症诊断或任何目标肺部共病,包括COPD、肺气肿等。收集的样本类型包括EDTA血浆管、血清管、PAXgene RNA管和Streck血细胞收集管。 为了本研究的目的,EDTA血浆的制备如下:按照供应商的说明,将样本收集到EDTA血浆管中后,样本在采集后1小时内离心,血浆部分在离心后1小时内抽取并冷冻存储在-70°C,随后在干冰上运输。研究血浆样本在4°C解冻后重新分装,在进行NP处理之前再次冷冻。从健康和早期NSCLC组中随机选择了141名年龄和性别匹配的受试者子队列,从收集的样本中进行分析,根据Wilcoxon或Fisher检验,两组之间没有显著差异。对于NP分析,这141个血浆样本被随机分配到96孔板的一组中,每个NP一个组。除了NP-血浆的检测,还使用MARS-14柱(Agilent)按照制造商的说明制备了一个耗尽的血浆样本。 NP-孤立的肽段以及等效消化的贫血血浆中的肽段,通过Sciex Triple TOF 6600+仪器上的数据无差别采集质谱(DIA-MS)进行评估,该仪器与运行33分钟样品梯度的EKSPERT nano-LC 425 LC系统相连。所有141个样品的质谱数据采集耗时2周。

Data-dependent acquisition (DDA)

LC-MS/MS: Next, the peptide eluates were lyophilized and reconstituted in 0.1% TFA. A 2 µg aliquot from each sample was analyzed by nano-LC-MS/MS with either a Waters NanoAcquity HPLC system or a Thermo Scientific UltiMate 3000 RSLCnano system interfaced to an Orbitrap Fusion Lumos Tribrid Mass Spectrometer from Thermo Scientific. Peptides were loaded on a trapping column and eluted over a 75 µm analytic column at either 350 nL/min (NanoAcquity HPLC) or 250 nL/min (UltiMate 3000 RSLCnano system) using a gradient of 2–35% acetonitrile over 44 min, for a total time between injections of 64 (UltiMate 3000 RSLCnano system) or 66 min (NanoAcquity HPLC). The mass spectrometer was operated in data-dependent mode, with MS and MS/MS performed in the Orbitrap at 60,000 FWHM resolution and 15,000 FWHM resolution, respectively.
LC-MS/MS:接下来,肽洗脱液经过冻干后,重新溶解在0.1%TFA中。从每个样品中取2µg进行分析,分别使用Waters NanoAcquity HPLC系统或Thermo Scientific UltiMate 3000 RSLCnano系统连接到Thermo Scientific的Orbitrap Fusion Lumos Tribrid质谱仪进行纳米液相色谱-质谱/质谱分析。肽经过进样柱载入,并在75µm的分析柱上,使用梯度从2%至35%的乙腈,在44分钟内沿着350 nL/min(NanoAcquity HPLC)或250 nL/min(UltiMate 3000 RSLCnano系统)的速度洗脱,注射间隔时间为64分钟(UltiMate 3000 RSLCnano系统)或66分钟(NanoAcquity HPLC)。质谱仪在60,000 FWHM分辨率下进行数据依赖模式操作,MS和MS/MS分别在Orbitrap上以60,000 FWHM分辨率和15,000 FWHM分辨率完成。

DDA Data Processing (all data excluding the NSCLC study): The MS data at the protein group level were acquired as follows. MS raw files were processed with MaxQuant/Andromeda (v. 1.6.7)21,22, searching MS/MS spectra against the UniProtKB human FASTA database (UP000005640, 74,349 forward entries; version from August 2019) employing standard settings. Enzyme digestion specificity was set to trypsin, allowing cleavage N-terminal to proline and up to 2 miscleavages. Minimum peptide length was set to seven amino acids and maximum peptide mass to 4600 Da. Methionine oxidation and protein N-terminus acetylation were configured as a variable modification, and carbamidomethylation of cysteines was set as a fixed modification. MaxQuant improves precursor ion mass accuracy by time-dependent recalibration algorithms and defines individual mass tolerances for each peptide. As initial maximum precursor mass tolerances, we allowed 20 ppm during the first search and 4.5 ppm in the main search. The MS/MS mass tolerance was set to 20 ppm. For analysis, we applied a false discovery rate (FDR) cutoff of 1% at both the peptide and protein level (protein groups are reported with their corresponding q-value). “Match between runs” was disabled. Identifications were quantified based on protein intensities (only proteins with q-value < 1%) requiring at least one razor peptide (Supplementary Data 3, 4). MaxLFQ58 normalized protein intensities (requiring at least one peptide ratio count) are reported in the raw output and were used only for the CV precision analysis. Proteins that could not be discriminated based on unique peptides were assembled in protein groups. Furthermore, proteins were filtered for a list of common contaminants included in MaxQuant. Proteins identified only by site modification were strictly excluded from analysis.
DDA数据处理(不包括NSCLC研究):蛋白质组水平的MS数据获取如下。MS原始文件使用MaxQuant/Andromeda(v. 1.6.7)进行处理,针对UniProtKB人类FASTA数据库(UP000005640,74,349个正向条目;2019年8月版本)进行MS/MS谱的搜索,采用标准设置。酶消化特异性设置为胰蛋白酶,允许在脯氨酸的N-末端和最多2个错切位点进行裂解。最小肽段长度设置为七个氨基酸,最大肽段质量为4600 Da。蛋氨酸氧化和蛋白质N-末端乙酰化被配置为可变修饰,半胱氨酸的碳酰胺基团被设置为固定修饰。MaxQuant通过时间依赖的重新校准算法改善前体离子质量精度,并为每个肽段定义单独的质量容差。作为初始最大前体质量容差,我们允许第一次搜索时为20 ppm,在主要搜索中为4.5 ppm。MS/MS质量容差设置为20 ppm。 对于分析,我们在肽段和蛋白质水平上应用了1%的假发现率(FDR)截断(蛋白质组与其相应的q值一起报告)。“跨样本匹配”已禁用。鉴定基于蛋白质强度进行量化(仅限q值<1%的蛋白质),需要至少一个剃刀肽段。最大LFQ 58 标准化蛋白质强度(需要至少一个肽段比率计数)在原始输出中报告,并仅用于CV精度分析。基于唯一肽段无法区分的蛋白质被组装成蛋白质组。此外,蛋白质经过MaxQuant包含的常见污染物列表进行过滤。仅通过位点修饰鉴定的蛋白质严格排除在分析之外。

Annotation-diversity analysis

To determine which annotations are predominantly enriched in the 10-NP panel (Fig. 4), we performed an annotation enrichment analysis using a Fisher’s exact test comparing proteins identified throughout the 10 NPs (requiring three out of three identifications across replicates) in a pooled plasma sample. Uniprot IDs (MaxQuant: Majority protein IDs) were matched to a list of 5304 published plasma proteins5 if any of the Uniprot IDs in the MaxQuant output matched the reported Uniprot ID. Next, annotations from five different spaces, GO Cellular Compartment (GOCC), GO Biological Process (GOBP), Uniprot Keywords, Protein families (Pfam), and Kyoto Encyclopedia of Genes and Genomes (KEGG), were matched to the protein groups based on Uniprot identifiers. Using Fisher’s exact test, we determined enriched annotations comparing the population of proteins identified by the 10 NPs within the reference database against the proteins that did not map into the 10-NP panel. Enrichment scores (Log2 Odds ratios) where calculated and plotted against the p-values (Fig. 4d). Annotations significantly enriched with a Benjamini–Hochberg FDR < 1% are indicated in blue. If log2 Odds were infinite, the maximum/ minimum log2 Odds where used for drawing.
为了确定在10-NP面板中主要富集的注释,我们使用Fisher确切性检验进行了注释富集分析,比较了在汇总血浆样本中识别的10个NPs中的蛋白质(需要在重复实验中的三次识别中有三次识别)。如果MaxQuant输出中的Uniprot ID与报告的Uniprot ID匹配,则将Uniprot ID(MaxQuant:大多数蛋白质ID)与已发表的5304个血浆蛋白质列表匹配。接下来,根据Uniprot标识符,将GO细胞组分(GOCC)、GO生物过程(GOBP)、Uniprot关键词、蛋白质家族(Pfam)和京都基因与基因组百科全书(KEGG)等五个不同空间的注释与蛋白质组匹配。使用Fisher确切性检验,我们确定了富集的注释,比较了在参考数据库中由10个NPs识别的蛋白质群体与未映射到10-NP面板的蛋白质。富集分数(Log2比值)被计算并绘制在p值上(图4d)。通过Benjamini-Hochberg FDR <1%显著富集的注释用蓝色表示。 如果log2 Odds是无穷大的,那么用于绘制的最大/最小log2 Odds。

We used continuous enrichment analysis (e.g., 1D annotation enrichment) to compare individual NPs at the annotation level, which has the advantage of using quantitative comparison, as a more powerful evaluation tool then requiring a binary input (e.g., presence/absence, threshold counting, etc.)64. We used this method to interrogate annotations enriched in the protein coronas by computing the 1D enrichment scores for each NP in the panel. In summary, log10-transformed MaxQuant intensities for each protein group in each sample were normalized by median subtraction. Protein groups that were not quantified in three out of three replicates used in the analysis on at least one NP were removed. A difference score was calculated for each protein group between the medians on one NP versus the average for that group across all of the other NPs. Annotations from five different spaces, GO Cellular Compartment (GOCC), GO Biological Process (GOBP), Uniprot Keywords, Protein families (Pfam), and Kyoto Encyclopedia of Genes and Genomes (KEGG), were matched to the protein groups based on the Uniprot identifiers reported in the MaxQuant output for each group as Majority Protein IDs. To match identifier format in the annotation reference, the isoform extensions were removed. The annotation references were retrieved from Uniprot on November 25, 2019 using the Perseus/MaxQuant framework73. The 1D annotation enrichment was calculated using R scripts adapted from the reported literature64. The results were filtered requiring (1) an annotation group size (i.e., number of protein groups with that annotation) greater than 10, and (2) a Benjamini–Hochberg-adjusted p-value (FDR) less than 2% for enrichment or depletion for at least one NP. The 1D enrichment score was visualized as a heatmap after hierarchical clustering as shown in Fig. 4e Gene Ontology Cellular Component (GOCC), B) Gene Ontology Biological Process (GOBP), C) Uniprot Keywords, D) Protein families (Pfam), E) Kyoto Encyclopedia of Genes and Genomes (KEGG). Hierarchical clustering is based on “complete linkage”.
我们使用连续富集分析(例如,1D注释富集)来比较注释级别的单个NPs,这具有使用定量比较的优势,作为一个更强大的评估工具,而不是需要二进制输入(例如,存在/不存在,阈值计数等)。我们使用这种方法来审查蛋白质包被富集的注释,通过计算面板中每个NP的1D富集分数。总的来说,每个样本中每个蛋白质组的log10转换的MaxQuant强度通过中位数减法进行了归一化。在分析中至少在一个NP上的三个重复中未被定量的蛋白质组被移除。为每个蛋白质组计算了一个差异分数,该分数是该组在一个NP上的中位数与该组在所有其他NP上的平均值之间的差异。基于MaxQuant输出中每个组的Uniprot标识符,将来自五个不同空间的注释(GO细胞组分(GOCC),GO生物过程(GOBP),Uniprot关键词,蛋白质家族(Pfam)和京都基因组百科全书(KEGG))与蛋白质组匹配为主要蛋白质ID。 为了匹配注释参考中的标识符格式,已移除了异构体扩展。注释参考是使用Perseus/MaxQuant框架于2019年11月25日从Uniprot检索得到的。使用从已报道的文献中调整的R脚本计算了1D注释富集。结果经过筛选,要求(1)注释组大小(即具有该注释的蛋白质组数)大于10,以及(2)富集或耗竭的Benjamini-Hochberg调整的p值(FDR)小于2%,至少对一个NP。1D富集分数经过分层聚类后显示为热图,如图4e所示:A)基因本体细胞组分(GOCC),B)基因本体生物过程(GOBP),C)Uniprot关键词,D)蛋白质家族(Pfam),E)京都基因和基因组百科全书(KEGG)。分层聚类基于“完全连接”。

Data-independent acquisition (DIA), NSCLC study

LC-MS/MS: For DIA analyses using SWATH, peptides were reconstituted in a solution of 0.1% FA and 3% ACN spiked with 5fmol/uL PepCalMix from SCIEX (Framingham, MA). A constant mass of 5 ug of peptides per MS injection volume of 10 uL was targeted, but in some instances with lesser yield the maximum amount available was injected. Each sample was analyzed by an Eksigent nano-LC system coupled with a SCIEX Triple TOF 6600+ mass spectrometer equipped with OptiFlow source using a trap-and-elute method. First, the peptides were loaded on a trap column and then separated on an Eksigent ChromXP analytical column (150 mm × 15 cm, C18, 3 mm, 120 Å) at a flow rate of 5 uL/min using a gradient of 3–32% solvent B (0.1% FA, 100% ACN) over 20 min, resulting in a 33 min total run time. The mass spectrometer was operated in SWATH mode using 100 variable windows across the 400–1250 m/z range.
LC-MS/MS:对使用SWATH进行DIA分析的肽段,将其重新溶解在含有0.1% FA和3% ACN的溶液中,并添加了来自SCIEX(马萨诸塞州弗雷明汉姆)的5fmol/uL PepCalMix。每次MS注射体积为10 uL,目标是每次注射5微克的肽段质量,但在某些情况下,如果产量较少,则注入最大可用量。每个样品均通过与装备有OptiFlow源的SCIEX Triple TOF 6600+质谱仪耦合的Eksigent纳米LC系统进行分析,使用捕获和洗脱方法。首先,肽段被加载到一个捕获柱上,然后在Eksigent ChromXP分析柱(150 mm × 15 cm,C18,3 mm,120 Å)上以5 uL/min的流速进行分离,使用3-32%溶剂B(0.1% FA,100% ACN)的梯度在20分钟内进行,总运行时间为33分钟。质谱仪在400-1250 m/z范围内使用100个可变窗口以SWATH模式运行。

Library generation for NSCLC study: To build a peptide-spectral library, four plasma pools were created from the patients in the lung cancer. Each pool was analyzed by the Proteograph using the panel of 10 NPs. In addition, the four plasma pools were depleted using a MARS-14 column (Agilent, Santa Clara, CA) and the Agilent 1260 Infinity II HPLC system. The samples were analyzed in data-dependent mode on the UltiMate 3000 RSLCnano system coupled with Orbitrap Fusion Lumos using a gradient of 5–35% over 109 min, for a total run time of 125 min. The rest of the parameters were set as mentioned above.
NSCLC研究的文库生成:为构建肺癌患者的肽-光谱文库,从这些患者中创建了四个血浆池。每个池子都使用10个NPs的面板通过Proteograph进行分析。此外,使用MARS-14柱(Agilent,加利福尼亚州圣克拉拉)和Agilent 1260 Infinity II HPLC系统对这四个血浆池进行了去除。样品在UltiMate 3000 RSLCnano系统上以数据依赖模式进行分析,与Orbitrap Fusion Lumos耦合,使用5-35%的梯度在109分钟内进行分析,总运行时间为125分钟。其他参数设置如上述。

To further expand the spectral library, a dataset from a separate experiment using a pooled plasma consisting of 157 healthy and lung cancer patients varying in age, gender, and disease stage was used in combination with the NSCLC-DDA data. In short, the pooled plasma was analyzed by the Proteograph assay using the panel of 10 NPs. Furthermore, the pooled plasma was depleted using the MARS-14 column and fractionated into nine concatenated fractions using a high-pH fractionation method (XBridge BEH C18 column, Waters). All samples were prepared in three replicates and analyzed in data-dependent mode using the same parameters as NSCLC-DDA analysis.
为了进一步扩展光谱库,使用了一个数据集,该数据集来自一个单独的实验,使用了一个包含157名健康人和肺癌患者的混合血浆,这些患者在年龄、性别和疾病阶段上有所不同,并与NSCLC-DDA数据结合使用。简而言之,通过使用包含10种NPs的面板对混合血浆进行了Proteograph测定。此外,使用MARS-14柱去除了混合血浆,并使用高pH分级方法(XBridge BEH C18柱,Waters)将其分级为九个串联分级。所有样本均以三个复制品准备,并使用与NSCLC-DDA分析相同的参数以数据依赖模式进行分析。

Plasma depletion: All depleted plasma samples were prepared using an Agilent 1260 Infinity II Bioinert HPLC system consisting of autosampler, pumps, column compartment, UV detector, and fraction collector. Plasma depletion was conducted by first diluting 25 μL of plasma to a final volume of 100 μL using Agilent Buffer A plasma depletion mobile-phase. Each diluted sample was filtered through an Agilent 0.22 μm cellulose acetate spin filter to remove any particulates and transferred to a 96-well plate. The plate was then placed in an autosampler and held at 4 °C for the entirety of the assay. Eighty microliters of the diluted plasma was then injected onto an Agilent 4.6 × 50 mm Human 14 Multiple Affinity Removal System (MARS-14) depletion column housed in the column compartment at a constant temperature of 20 °C. Mobile-phase conditions used during protein depletion consisted of 100% Buffer A mobile-phase flowing at a rate of 0.125 mL/min. Proteins eluting from the column were detected using the Agilent UV absorbance detector operated at 280 nm with a bandwidth of 4 nm. The early eluding peak for each injection, representing the depleted plasma proteins, was collected using a refrigerated fraction collector with peak-intensity based triggering (i.e., 200 mAu threshold with a maximum peak width of 3 min). After peak collection, the fractions were held at 4 °C for the duration of the analysis. The sample volume was then reduced to approximately 20 μL using an Amicon Centrifugal Concentrator (Amicon Ultra-0.5 mL, 3k MWCO) with a centrifuge operating at 4 °C and 14,000 × g. Five microliters of each depleted sample was then reduced, alkylated, digested, desalted, and analyzed according to the sample preparation and MS analysis protocols described. During each sample depletion cycle, the MARS-14 column was regenerated with the Agilent Buffer B mobile-phase for ~4 ½ min at a flow rate of 1 mL/min and equilibrated back to the original protein capture condition by flowing Buffer A at 1 mL/min for ~9 min.
等离子体耗尽:所有耗尽的血浆样本均使用包括自动进样器、泵、柱恒温箱、紫外检测器和分级收集器的安捷伦1260 Infinity II生物惰性高效液相色谱系统准备。首先将25微升血浆稀释至最终体积为100微升,使用安捷伦缓冲液A血浆耗尽流动相进行血浆耗尽。每个稀释样品通过安捷伦0.22微米醋酸纤维素旋转滤器过滤,以去除任何颗粒物,并转移到96孔板中。然后将板放入自动进样器中,并在整个测定过程中保持在4°C。然后将80微升稀释血浆注入到安捷伦4.6×50毫米人类14多亲和去除系统(MARS-14)耗尽柱中,该柱位于柱恒温箱中,恒温为20°C。蛋白质耗尽期间使用的流动相条件包括100%缓冲液A流动相,流速为0.125毫升/分钟。从柱中洗脱的蛋白质使用安捷伦紫外吸收检测器检测,操作波长为280纳米,带宽为4纳米。 每次注射的早期逃峰,代表耗尽的血浆蛋白质,使用带有峰强度触发的制冷分级收集器收集(即,200 mAu阈值,最大峰宽3分钟)。在收集峰后,分数在4°C下保持分析期间。然后使用Amicon离心浓缩器(Amicon Ultra-0.5 mL,3k MWCO)将样品体积减少至约20μL,离心机在4°C和14,000×g下运转。然后,每个耗尽样品的5微升被还原、烷基化、消化、脱盐,并根据所述的样品制备和MS分析协议进行分析。在每个样品耗尽周期期间,MARS-14柱使用Agilent缓冲液B流动相再生约4 ½分钟,流速为1 mL/min,并通过以1 mL/min的速度流动缓冲液A约9分钟重新平衡到原始蛋白质捕获条件。

Peptide fractionation: A total of 100 μl of reconstituted peptides was loaded to a Waters XBridge column (2.1 × 250 mm, BEH C18, 3.5 mm, 300 Å) using the Agilent 1260 Infinity II HPLC system. The peptides were separated at the flow rate of 350 mL/min using a gradient of 3–30% in 30 min, with a total run time of 47 min, and the fractions were collected every 1.5 min. The fractions were then dried using a speed vac. Finally, the dried peptides were reconstituted in a solution of 0.1% FA and 3% ACN and concatenated to 9 fractions.
肽分离:将100μl重组肽加载到Waters XBridge柱(2.1×250mm,BEH C18,3.5mm,300Å),使用Agilent 1260 Infinity II HPLC系统。肽以350mL/min的流速进行分离,使用3-30%的梯度在30分钟内进行,总运行时间为47分钟,每1.5分钟收集一次分离物。然后,用速度真空干燥分离物。最后,将干燥的肽重新溶解在0.1% FA和3% ACN的溶液中,并连接成9个分离物。

Data analysis for library generation: To generate a spectral library, all the DDA data were first searched against human Uniprot database using the Pulsar search engine in Spectronaut (Biognosys, Switzerland). Then the library was generated using Spectronaut with 1% FDR cutoff at peptide and protein level.

DIA raw data processing: The SWATH data were processed on Spectronaut. The default settings (version 13.8.190930.43655) were used for the analysis with the Q-value cutoff at precursor and protein level set to 0.01 (Supplementary Data 5).

For classification analysis (NSCLC study), primary MS data were prepared as follows. Statistical analysis was performed using the R platform as described above including the core ‘tidyverse‘ packages, the ‘caret‘ classification framework and the ‘ranger‘ random forest model package. Missing values for a given protein group within a subject were median imputed. No other normalization was applied to the data prior to classification. In order to construct between-group classifier models, log-transformed protein group data were evaluated in ten rounds of 10-fold cross validation. All protein group features were used for classification and the relative importance of those features in the cross-validations was reported. In order to detect possible overfitting, ten iterations of the cross-validation procedure were performed after randomization of the subjects’ class assignments. Initial classification results highlighted a significant signal from both the depleted plasma and NP panel data from proteins typically associated with stress and acute-phase response, likely a result of the sample acquisition strategy (e.g., post biopsy, diagnosis-aware). To eliminate this possibly confounding signal, all protein group data from the NP-derived dataset that was derived from any protein also observed in depleted plasma was removed from subsequent analysis.
对于分类分析(NSCLC研究),主要的MS数据准备如下。统计分析使用R平台进行,包括核心的“tidyverse”包、‘caret’分类框架和‘ranger’随机森林模型包。在给定主题内的蛋白质组中,缺失值被中位数插补。在分类之前,数据未进行其他归一化处理。为了构建组间分类器模型,对经过对数变换的蛋白质组数据进行了十轮10折交叉验证。所有蛋白质组特征都用于分类,并报告了这些特征在交叉验证中的相对重要性。为了检测可能的过拟合,对主题的类别分配进行随机化后,进行了十次交叉验证程序的迭代。初始分类结果突出显示了来自典型与压力和急性阶段反应相关的蛋白质的血浆和NP面板数据中的显著信号,这可能是样本获取策略(例如,活检后,诊断感知)的结果。 为了消除这种可能混淆的信号,从任何在贫血血浆中观察到的蛋白质中派生的NP衍生数据集中的所有蛋白质组数据都被从后续分析中移除。

Platelet Index (PI) 血小板指数(PI)

Protein groups identified in a sample by particle were matched to the platelet signature protein list from Geyer et al.67, and the sample platelet index (PI) was calculated as the median of the ln intensity of the signature proteins divided by the median of the ln intensity of the non-signature proteins. In order to summarize an overall PI for the sample from all particles and depleted plasma, the PIs for each particle were scaled and centered (default scale() R function) and the average was taken across the six values (five NPs and DP).
通过粒子在样品中识别的蛋白质组与Geyer等人的血小板特征蛋白质列表相匹配,并计算样品血小板指数(PI),作为特征蛋白质ln强度的中位数除以非特征蛋白质ln强度的中位数。为了总结样品在所有粒子和去除血浆中的整体PI,每个粒子的PI被比例缩放并居中(默认scale() R函数),并对六个值(五个NP和 DP)取平均值。

Spike recovery 尖峰恢复

Baseline concentration of CRP in a pooled healthy plasma sample was measured with the ELISA kit as described above (Materials) according to the manufacturer-suggested protocols. A stock solution and appropriate dilutions of CRP were prepared and spiked into the identical pooled plasma samples to make final concentrations 2×, 5×, 10×, and 100× baseline endogenous concentrations. The volume of additions to the pooled plasma was 10% of the total sample volume. A spike control was made by adding the same volume of buffer to the pooled plasma sample. Concentrations of spiked samples were measured again by ELISA to confirm the CRP levels in each spiking level. The samples were used to evaluate Proteograph NP corona measurement linearity as described in the Results above.
基线浓度的CRP在一个混合的健康血浆样本中,根据上述(材料)使用的ELISA试剂盒和制造商建议的方案进行测量。准备了CRP的储备溶液和适当稀释液,并将其加入相同的混合血浆样本中,使最终浓度为基线内源浓度的2倍、5倍、10倍和100倍。加入到混合血浆中的添加物体积为总样本体积的10%。通过向混合血浆样本中添加相同体积的缓冲液制备了一个控制样品。再次通过ELISA测量加入添加物的样本浓度,以确认每个加入水平中的CRP水平。如上述结果中所述,使用这些样本来评估Proteograph NP冠状测量的线性。

Background robustness test

Interference substances were obtained from Sun Diagnostics. Lipids: Triglyceride-rich lipoproteins derived from human. Hemolysate: Red blood cell hemolysate derived from human. A pooled plasma was spiked at different concentrations Lipid: High (1000 mg/dL), Low (100 mg/dL), and Control (buffer only). Hemolysate: High (1000 mg/dL), Low(100 mg/dL), and Control (buffer only).
干扰物质来自Sun Diagnostics。脂质:来源于人类的富含甘油三酯的脂蛋白。溶血液:来源于人类的红细胞溶血液。混合血浆被加入不同浓度的脂质:高(1000毫克/分升)、低(100毫克/分升)和对照(仅缓冲液)。溶血液:高(1000毫克/分升)、低(100毫克/分升)和对照(仅缓冲液)。

Statistics and reproducibility

Statistical analysis and visualization were performed using R (v3.5.2) with appropriate packages74. Experiments were conducted in assay replicates (n = 3) unless noted differently. NSCLC data were acquired for biological replicates (see above). Mass spectrometry raw data and functional protein annotation references are available through PRIDE75 and Perseus76, respectively.
使用 R(v3.5.2)进行了统计分析和可视化,使用了适当的软件包。实验进行了检测复制(n = 3),除非另有说明。NSCLC 数据是通过生物复制获得的。质谱原始数据和功能蛋白注释参考分别可通过 PRIDE 和 Perseus 获得。

Reporting summary 汇报总结

Further information on research design is available in the Nature Research Reporting Summary linked to this article.