习 Dawn Chen†, Zeyu Chen†, George Wythes, Yifan Zhang, Benno C. Orr, Gary Sun, Yu-Kai Chao, Andrea Navarro Torres, Ka Thao, Mounica Vallurupalli, Jing Sun, Mehdi Borji, Emre Tkacik, Haiqi Chen, Bradley E. Bernstein*, Fei Chen* 习 Dawn Chen†, Zeyu Chen†, George Wythes, Yifan Zhang, Benno C. Orr, Gary Sun, Yu-Kai Chao, Andrea Navarro Torres, Ka Thao, Mounica Vallurupalli, Jing Sun, Mehdi Borji, Emre Tkacik, Haiqi Chen, Bradley E. Bernstein*, Fei Chen*。
基本原理: 我们试图开发一种工具来对内源性哺乳动物基因组进行靶向诱变。从大自然中,我们观察到解旋酶是可以穿越大基因组区域的高度合成酶。一些解旋酶,包括那些参与 DNA 损伤修复的解旋酶,可以在基因组中的单链 DNA 区域加载并开始解旋 DNA。我们推断,当与脱氨酶融合时,这种解旋酶可用于长程靶向诱变。然后,可以通过单向导 RNA (sgRNA) 使用 Cas9 切口酶将融合构建体及其超突变间隔可编程地靶向特定基因组区域。然后,募集的解旋酶的定向和长程 DNA 解旋事件将在该区域产生随机突变。 基本原理: 我们试图开发一种工具来对内源性哺乳动物基因组进行靶向诱变。从大自然中,我们观察到解旋酶是可以穿越大基因组区域的高度合成酶。一些解旋酶,包括那些参与 DNA 损伤修复的解旋酶,可以在基因组中的单链 DNA 区域加载并开始解旋 DNA。我们推断,当与脱氨酶融合时,这种解旋酶可用于长程靶向诱变。然后,可以通过单向导 RNA (sgRNA)使用 Cas9 切口酶将融合构建体及其超突变间隔可编程地靶向特定基因组区域。然后,募集的解旋酶的定向和长程 DNA 解旋事件将在该区域产生随机突变。使用 Cas9 切口酶将融合构建体及其超突变间隔可编程地靶向特定基因组区域。然后,募集的解旋酶的定向和长程 DNA 解旋事件将在该区域产生随机突变。
HACE:内源性 DNA 的长程诱变
开发一种系统来执行内源基因组的远程靶向连续诱变。解旋酶辅助连续编辑 (HACE) 系统允许对内源性 DNA 进行长距离诱变,并在多代细胞中实现连续进化。我们应用 HACE 来识别编码和非编码基因组中的功能变异。CDS,编码序列;D, 天蚜;E, 格鲁;G, Gly;K,利斯。 开发一种系统来执行内源基因组的远程靶向连续诱变。系统允许对内源性 DNA 进行长距离诱变,并在多代细胞中实现连续进化。我们应用 HACE 来识别编码和非编码基因组中的功能变异。CDS,编码序列;D, 天蚜;E, 格鲁;G, Gly;K,利斯。
Lastly, we found that the fusion of UGI significantly elevated the editing levels for AID (Fig. 2G, unpaired tt test, P < 0.05P<0.05 for all +-\pm UGI groups), consistent with reports from previous cytidine-base editor studies (14). These results demonstrate that HACE editing rates can be tuned by varying the helicase and Cas9 variants and by incorporating UGI, making HACE suitable for diverse applications. Further engineering may leverage these insights to opti- 最后,我们发现融合 UGI 能显著提高 AID 的编辑水平(图 2G,非配对 tt 检验, P < 0.05P<0.05 适用于所有 +-\pm UGI 组),这与之前的胞嘧啶碱基编辑器研究报告一致(14)。这些结果表明,HACE 编辑率可以通过改变螺旋酶和 Cas9 的变体以及加入 UGI 进行调整,从而使 HACE 适合于各种应用。进一步的工程设计可能会利用这些洞察力来优化 HACE 的编辑率。
mize the editing rate and range for specific applications. 根据具体应用调整编辑速率和范围。
HACE is minimally perturbative in mammalian cells HACE 对哺乳动物细胞的扰动很小
Although we found the HACE constructs to be well tolerated in transfection experiments, we sought to quantify the effects of different HACE constructs on cell viability. To do so, we quantified cell viability using a luciferase-based aden- 尽管我们发现 HACE 构建物在转染实验中具有良好的耐受性,但我们还是试图量化不同 HACE 构建物对细胞活力的影响。为此,我们使用基于荧光素酶的腺嘌呤核苷酸来量化细胞活力。
osine triphosphate assay (CellTiter-Glo) across various helicase constructs-both with and without deaminase-along with a locus-targeting sgRNA and nCas9 (fig. S5A). We found that HEs constructed with BLM and PcrA helicases did not result in a significant decrease in cell viability (unpaired tt test, P > 0.05P>0.05 for each group). However, AID-NS3h-UGI did decrease in cell viability (unpaired tt test, P < 0.05P<0.05 ), possibly because of the toxicity of NS3h helicases, 我们发现,用 BLM 和 PcrA 螺旋酶构建的 HEs 并没有导致细胞活力显著下降(非配对 tt 检验 P > 0.05P>0.05 )。我们发现,用 BLM 和 PcrA 螺旋酶构建的 HEs 不会导致细胞活力显著下降(非配对 tt 检验,每组 P > 0.05P>0.05 )。然而,AID-NS3h-UGI确实降低了细胞活力(非配对 tt 检验, P < 0.05P<0.05 ),这可能是因为NS3h螺旋酶的毒性、
which also act on RNA (21, 22). Consistently, we observed a similar decrease in cell viability for Ns3h-UGI (unpaired tt test, P < 0.001P<0.001 ). AIDPcrA M6-UGI also significantly decreased cell viability (unpaired tt test, P < 0.001P<0.001 ), whereas PcrA M6 alone did not affect cell viability (unpaired tt test, P=0.118P=0.118 ). These results suggest that helicases used for HACE are well tolerated for cell viability. 也对 RNA 起作用(21、22)。同样,我们观察到 Ns3h-UGI 也会降低细胞活力(非配对 tt 检验, P < 0.001P<0.001 )。AIDPcrA M6-UGI 也会显著降低细胞活力(非配对 tt 检验, P < 0.001P<0.001 ),而单独使用 PcrA M6 不会影响细胞活力(非配对 tt 检验, P=0.118P=0.118 )。这些结果表明,用于 HACE 的螺旋酶对细胞活力有很好的耐受性。
Lastly, we explored whether HACE generated elevated substitution rates in nontargeted parts of the genome. To do so, we subjected cells expressing different HE variants or AID to whole-exome sequencing at high coverage ( ∼1000 xx\sim 1000 \times coverage). To increase our detection power, we divided the genome into 100-kb100-\mathrm{kb} bins, calculated the editing rate of each bin, and then compared the editing rate between HE variants and control cells (fig. S5B). In the 16,621 genomic bins, we observed that overexpression of AID alone generated the most signifi- 最后,我们探讨了 HACE 是否会在基因组的非目标部分产生较高的替换率。为此,我们对表达不同 HE 变体或 AID 的细胞进行了高覆盖率( ∼1000 xx\sim 1000 \times 覆盖率)的全外显子测序。为了提高检测能力,我们将基因组划分为 100-kb100-\mathrm{kb} bins,计算每个bins的编辑率,然后比较HE变体和对照细胞的编辑率(图S5B)。在 16,621 个基因组分区中,我们观察到仅 AID 的过表达就产生了最显著的编辑率(图 S5B)。
cant off-target bins (38 bins, Fisher’s exact test). We detected 13 bins with elevated editing rates for AID-NS3h-UGI, 5 for AID-PcrA M6-UGI, 3 for AID-PcrA-UGI, and 3 for AID-BLM-UGI. We found nine bins that were elevated across multiple HE constructs, indicating common off-target sites across helicases (all sites outlined in table S5). These data suggest that there is minimal elevation of global substitution rates in the exome due to HACE and that off-target editing is likely driven by AID overexpression rather than by HE itself. 我们检测到 13 个 AID-NS3h-UGI 的编辑率升高,5 个 AID-PcrA M6-UGI 的编辑率升高,3 个 AID-PcrA-UGI 的编辑率升高。我们检测到 AID-NS3h-UGI 有 13 个编辑率升高,AID-PcrA M6-UGI 有 5 个,AID-PcrA-UGI 有 3 个,AID-BLM-UGI 有 3 个。我们发现有九个位点在多个 HE 构建物中都出现了升高,这表明不同螺旋酶存在共同的脱靶位点(表 S5 中列出了所有位点)。这些数据表明,由于 HACE 的存在,外显子组中全局取代率的升高微乎其微,脱靶编辑可能是由 AID 过表达而非 HE 本身驱动的。
We also performed whole-exome sequencing of cells expressing HE with alternative base editors and compared against control cells (fig. S5C). We found that all base editors (rAPOBEC1, TadA-8e, and TadDE) generated, at most, three bins with elevated editing rates for either C > T\mathrm{C}>\mathrm{T} or A > G\mathrm{A}>\mathrm{G} editing modes (table S 5 ), again suggesting minimal elevation of global substitution rates. 我们还对使用替代碱基编辑器表达 HE 的细胞进行了全外显子组测序,并与对照细胞进行了比较(图 S5C)。我们发现,所有碱基编辑器(rAPOBEC1、TadA-8e 和 TadDE)在 C > T\mathrm{C}>\mathrm{T} 或 A > G\mathrm{A}>\mathrm{G} 编辑模式下最多产生三个编辑率升高的 bins(表 S 5),这再次表明全局取代率的升高微乎其微。
HACE enables the identification of HACE 可以识别
MEK1 inhibitor-resistance mutations MEK1 抑制剂抗性突变
We sought to apply HACE to perform functional mutagenesis in both coding and noncoding genome contexts. We initially screened for mutations within MEK1 (also known as MAP2K1) that promote resistance to smallmolecule drug inhibition. MEK inhibitors target the MAPK-ERK (extracellular signal-regulated kinase) pathway, which is aberrantly upregulated in one-third of all cancers (23). We used HACE to diversify exons of the MEK1 gene in A375 cells, a melanoma line sensitive to MEK inhibition. After 3 days of mutagenesis using AID* Delta\Delta-PcrA M6-UGI, we selected cells for resistance to two MEK1 inhibitors: selumetinib and trametinib (Fig. 3A). We targeted exons 2, 3 , and 6 , which contain previously identified mutation hotspots (24). Because the mutagenesis range of HACE is long, we only needed to design one sgRNA per exon. We placed each 我们试图应用 HACE 对编码和非编码基因组进行功能诱变。我们首先筛选了MEK1(又称MAP2K1)中能促进对小分子药物抑制产生抗性的突变。MEK 抑制剂针对的是 MAPK-ERK(细胞外信号调节激酶)通路,在三分之一的癌症(23)中,该通路被异常上调。我们使用 HACE 对 A375 细胞中 MEK1 基因的外显子进行了多样化处理,A375 是一种对 MEK 抑制敏感的黑色素瘤细胞系。使用 AID* Delta\Delta -PcrA M6-UGI 诱变 3 天后,我们筛选出了对两种 MEK1 抑制剂:赛鲁米替尼和曲美替尼耐药的细胞(图 3A)。我们的靶点是第 2、3 和 6 号外显子,它们包含先前确定的突变热点(24)。由于 HACE 的诱变范围较长,我们只需为每个外显子设计一个 sgRNA。我们将每个
To validate top mutation candidates, we designed sgRNAs to introduce mutations individually into A375 cells using base editing. After 20 days of selection in the presence of drug, we evaluated allele frequencies of introduced mutations before and after selection by amplicon sequencing (Fig. 3D). We observed substantial postselection enrichment of G128D (sg383) and E203K (sg607-1 and sg607-2) with both inhibitors (sgRNAs in table S11). We could not introduce the 605G > A605 \mathrm{G}>\mathrm{A} (G202E) mutation by base editing owing to the artificial linkage in base editing between G202E and E203K. We therefore further validated candidates that conferred resistance to trametinib using a luciferase serum response element (SRE) reporter of MAPK-ERK signaling activity and overexpression of candidate MEK1 mutants (Fig. 3E and table S7). All three mutations individually increased trametinib resistance [median inhibitory concentration (IC_(50))=68.0,46.1\left(\mathrm{IC}_{50}\right)=68.0,46.1, and 46.1 nM for G128D, G202E, and E203K, respectively, versus 5.28 nM for wild type]. Structural analysis revealed that G128D is in the ligand-binding pocket. This mutation may function by inducing conformational changes of the binding pocket through steric interactions (Fig. 3F)3 \mathrm{~F}). On the other hand, E203K has been shown to cause constitutive MEK1 activation and downstream ERK phosphorylation, conferring gain of function (25,26)(25,26). It is possible that the proximal G202E likely induces resistance through a mechanism similar to that of E203K. Overall, this demonstrates that HACE can identify mutations conferring drug resistance, while reducing confounding effects of artificial genetic linkage. Although previous studies have screened for MEK1 inhibitor resistance (3,24)(3,24), our approach is a mutagenesis-based resistance screen performed in the endogenous genome context. 为了验证顶级突变候选者,我们设计了 sgRNA,利用碱基编辑将突变单独导入 A375 细胞。在有药物存在的情况下选择 20 天后,我们通过扩增子测序评估了选择前后引入突变的等位基因频率(图 3D)。我们观察到 G128D(sg383)和 E203K(sg607-1 和 sg607-2)在两种抑制剂(表 S11 中的 sgRNAs)作用下的选择后富集。由于 G202E 和 E203K 之间的碱基编辑存在人为联系,我们无法通过碱基编辑引入 605G > A605 \mathrm{G}>\mathrm{A} (G202E) 突变。因此,我们使用荧光素酶血清反应元件(SRE)报告MAPK-ERK信号活性和过表达候选MEK1突变体,进一步验证了对曲美替尼产生耐药性的候选基因(图3E和表S7)。所有三个突变都单独增加了曲美替尼的耐药性[中位抑制浓度 (IC_(50))=68.0,46.1\left(\mathrm{IC}_{50}\right)=68.0,46.1 ,G128D、G202E和E203K分别为46.1 nM,而野生型为5.28 nM]。结构分析表明,G128D 位于配体结合袋中。这种突变可能是通过立体相互作用引起结合口袋的构象变化而发挥作用的(图 3F)3 \mathrm{~F}) 。另一方面,E203K 已被证明会导致构成性 MEK1 激活和下游 ERK 磷酸化,从而产生功能增益 (25,26)(25,26) 。近端 G202E 可能通过与 E203K 类似的机制诱导抗性。总之,这表明 HACE 可以鉴定出赋予耐药性的突变,同时减少人工遗传连锁的干扰效应。 虽然以前的研究已经筛选出了 MEK1 抑制剂的耐药性 (3,24)(3,24) ,但我们的方法是在内源基因组环境中进行的基于诱变的耐药性筛选。
HACE identifies SF3B1 variants that result in alternative 3^(')3^{\prime} branch point usage HACE 确定了导致替代 3^(')3^{\prime} 分支点使用的 SF3B1 变体
Next, we applied HACE to explore the function of individual variants in SF3B1 for splicing regulation. RNA splicing factor mutations occur in various cancers and are especially prev- 接下来,我们应用 HACE 探索了 SF3B1 中单个变体在剪接调控方面的功能。RNA 剪接因子突变发生在各种癌症中,尤其是在癌症的早期阶段。
alent in hematopoietic malignancies (27-29). The most frequently mutated splicing factor, SF3B1, is a member of the U2 small nuclear ribonucleoprotein complex that binds to the branch point nucleotide in the precatalytic spliceosome (30, 31). Pan-cancer analysis of SF3B1 mutations has identified hotspot mutations clustered within C-terminal HEAT repeat domains 4 to 8 , which display an alternative 3 'ss usage signature (32, 33) (Fig. 4A). This missplicing occurs through the recognition of a different branch point sequence during 3 'ss selection and results in global splicing changes associated with tumorigenesis. However, few clinical mutations have been functionally validated for their effect on splicing, and, moreover, these clinical variants only represent a small subset of the functional space of SF3B1 variants. Distinguishing the mutations in SF3B1 that drive missplicing could improve understanding of splicing biology and potentially guide new diagnostic and therapeutic strategies. 在造血恶性肿瘤(27-29)中,剪接因子的突变最为常见。最常发生突变的剪接因子 SF3B1 是 U2 小核核糖核蛋白复合物的成员,它与前催化剪接体中的分支点核苷酸结合(30,31)。对 SF3B1 突变的泛癌分析发现了聚集在 C 端 HEAT 重复域 4 至 8 的热点突变,这些突变显示了另一种 3 'ss 使用特征(32、33)(图 4A)。这种错剪接是在 3'ss 选择过程中通过识别不同的分支点序列而发生的,并导致与肿瘤发生相关的全局剪接变化。然而,很少有临床变异对剪接的影响进行功能验证,而且这些临床变异仅代表 SF3B1 变异功能空间的一小部分。区分SF3B1中驱动剪接错误的突变可提高对剪接生物学的理解,并有可能指导新的诊断和治疗策略。
To screen for mutations in SF3B1 that functionally lead to missplicing, we first sought to construct a minigene reporter that could distinguish between wild-type SF3B1(SF3B1^("WT "))\mathrm{SF} 3 \mathrm{~B} 1\left(S F 3 B 1^{\text {WT }}\right) and mutated SF3B1 splicing patterns. These patterns display alternative 3’ss usage characteristic of hematopoietic malignancies. First, we compared RNA sequencing (RNA-seq) data from isogenic K562 cells containing either SF3B1^("WT ")S F 3 B 1^{\text {WT }} or mutant SF3B1 (SF3B1 ^(K 700 E){ }^{\mathrm{K700E}}, a mutation known to induce the alternative 3 'ss phenotype). We shortlisted splicing events that were differential between WT and mutant cells, and constructed minigene reporters from two of the top sequences to test their ability to functionally distinguish between SF3B1^(WT)S F 3 B 1^{\mathrm{WT}} - and SF3B1^(K7700E)S F 3 B 1^{\mathrm{K7} 700 \mathrm{E}}-induced missplicing. These reporters concatenated a constant upstream exon and a green fluorescent protein (GFP) reporter, followed by the last 150 bp of the endogenous intron and its downstream exon (Fig. 4B and table S8; methods). They were designed so that correct splicing would result in a frameshift in the open reading frame of GFP, suppressing fluorescence, while missplicing would permit GFP expression. Each minigene-GFP sequence was cloned into a vector with constitutive mCherry expression, enabling quantitative assessment of mutation-dependent protein production through flow cytometry (methods). 为了筛查SF3B1中功能上导致剪接错误的突变,我们首先试图构建一种微型基因报告器,它可以区分野生型 SF3B1(SF3B1^("WT "))\mathrm{SF} 3 \mathrm{~B} 1\left(S F 3 B 1^{\text {WT }}\right) 和突变的SF3B1剪接模式。这些模式显示了造血恶性肿瘤特有的 3'ss 交替使用。首先,我们比较了含有 SF3B1^("WT ")S F 3 B 1^{\text {WT }} 或突变SF3B1(SF3B1 ^(K 700 E){ }^{\mathrm{K700E}} ,已知突变可诱导替代3'ss表型)的同源K562细胞的RNA测序(RNA-seq)数据。我们筛选出在 WT 细胞和突变体细胞之间存在差异的剪接事件,并用其中两个顶级序列构建了微型基因报告器,以测试它们在功能上区分 SF3B1^(WT)S F 3 B 1^{\mathrm{WT}} - 和 SF3B1^(K7700E)S F 3 B 1^{\mathrm{K7} 700 \mathrm{E}} - 诱导的剪接缺失的能力。这些报告基因连接了一个恒定的上游外显子和一个绿色荧光蛋白(GFP)报告基因,然后是内源内含子的最后 150 bp 及其下游外显子(图 4B 和表 S8;方法)。设计这些序列的目的是使正确的剪接会导致 GFP 开放阅读框的帧移位,从而抑制荧光,而错误的剪接则允许 GFP 表达。每个微型基因-GFP 序列都被克隆到具有组成型 mCherry 表达的载体中,这样就能通过流式细胞仪定量评估突变依赖性蛋白质的产生(方法)。
To validate these splicing reporters, we transfected each construct into isogenic SF3B1^(WT)S F 3 B 1^{W T} or SF3B1 ^("K700E "){ }^{\text {K700E }} K562 cells and measured mutantdependent protein expression by flow cytometry. Both reporters demonstrated mutant-dependent specificity, showing elevated GFP expression in SF3B1^("K700E ")S F 3 B 1^{\text {K700E }} cells compared with SF3B1^("WT ")S F 3 B 1^{\text {WT }} cells (fig S5A). We confirmed that alternative 3’ss usage drives the reporter expression using targeted RNA-seq (fig. S7B). We selected the 为了验证这些剪接报告,我们将每种构建体转染到同源的 SF3B1^(WT)S F 3 B 1^{W T} 或SF3B1 ^("K700E "){ }^{\text {K700E }} K562细胞中,并通过流式细胞术测量突变依赖性蛋白的表达。两种报告因子都表现出突变依赖性特异性,与 SF3B1^("WT ")S F 3 B 1^{\text {WT }} 细胞相比, SF3B1^("K700E ")S F 3 B 1^{\text {K700E }} 细胞中的 GFP 表达量升高(图 S5A)。我们利用靶向 RNA-seq 验证了 3'ss 的替代使用驱动了报告基因的表达(图 S7B)。我们选择了
reporter with the highest mutant-dependent specificity, which was derived from dihydrolipoamide SS-succinyltransferase (DLST) exon 6, for the SF3B1 variant screen (Fig. 4C). 在 SF3B1 变体筛选中,来自二氢脂酰胺 SS -琥珀酰基转移酶(DLST)第 6 外显子的特异性最强的突变体依赖性报告物(图 4C)。
We used HACE to diversify exons 13 to 17 of the SF3B1 gene in HEK293FT cells for 3 days. This region corresponds to HEAT repeat domains 4 to 8 , which are enriched for known missplicing mutations. As in the MEK1 screen, we targeted these exons with seven sgRNAs complementary to flanking intronic sequences. To widen the genetic search space, we used HACE editors that incur both C:G > T:A\mathrm{C}: \mathrm{G}>\mathrm{T}: \mathrm{A} and A:T > G:C\mathrm{A}: \mathrm{T}>\mathrm{G}: \mathrm{C} mutations (AID* Delta\Delta-PcrA M6UGI and TadA-8e-PcrA M6-UGI). We then transfected the minigene reporter into the edited pool of cells, sorted GFP ^(-){ }^{-}and GFP ^(+){ }^{+} cells on the basis of the GFP:mCherry ratio, and performed high-throughput sequencing (Fig. 4D). Enriched mutations were identified by comparing the mutational frequencies between GFP^(-)\mathrm{GFP}^{-}and GFP^(+)\mathrm{GFP}^{+}cells (Fig. 4E and table S9). We observed a high degree of replicate correlation for enriched variants between two independent biological replicates (Pearson’s rho=0.795\rho=0.795 ). Furthermore, nine of the most highly enriched variants in our assay ( > 10>10-fold enrichment) corresponded to clinically observed mutations (34) (fig. S7C). 我们使用 HACE 对 HEK293FT 细胞中 SF3B1 基因的 13 至 17 号外显子进行了为期 3 天的分化。该区域与 HEAT 重复域 4 至 8 相对应,富含已知的错剪接突变。与 MEK1 筛选一样,我们用与侧翼内含子序列互补的 7 个 sgRNA 靶向这些外显子。为了拓宽基因搜索空间,我们使用了同时引起 C:G > T:A\mathrm{C}: \mathrm{G}>\mathrm{T}: \mathrm{A} 和 A:T > G:C\mathrm{A}: \mathrm{T}>\mathrm{G}: \mathrm{C} 突变的 HACE 编辑器(AID* Delta\Delta -PcrA M6UGI 和 TadA-8e-PcrA M6-UGI)。然后,我们将迷你基因报告转染到编辑过的细胞池中,根据 GFP:mCherry 比率分选 GFP ^(-){ }^{-} 和 GFP ^(+){ }^{+} 细胞,并进行高通量测序(图 4D)。通过比较 GFP^(-)\mathrm{GFP}^{-} 和 GFP^(+)\mathrm{GFP}^{+} 细胞的突变频率,发现了丰富的突变(图 4E 和表 S9)。我们观察到,在两个独立的生物重复本之间,富集变异具有高度的重复本相关性(皮尔逊 rho=0.795\rho=0.795 )。此外,在我们的检测中,9 个富集程度最高的变异( > 10>10 -倍富集)与临床观察到的突变(34)相对应(图 S7C)。
To validate our hits, we used conventional base editing to introduce the top candidates into HEK293FT cells coexpressing the minigene reporter. We then quantified the fold change in the GFP:mCherry ratio compared with unedited cells. Three mutations (I617V, Y623C, and K666E) led to a significant increase in reporter fold change (unpaired tt test, P < 0.001P<0.001 for all base editing groups compared with control; Fig. 4F and fig. S7D). The reporter fold change values were similar to those observed for K700E, a well-validated mutation affecting SF3B1 alternative 3’ss usage (35). We also performed targeted amplicon sequencing for each cell population to validate editing at each target base (fig. S7E). We note that two of the mutations, Y623C and K666E, have been observed in clinical datasets and are highly enriched in hematopoietic tumor samples (27, 36). K666E has been previously validated for its effect on alternative 3 'ss usage (35, 37). An additional top-ranked mutation, I617V/M, was not previously observed in clinical datasets. We further validated these top candidate mutations by prime editing (Fig. 4G, fig. S7F, and table S12). Despite the low editing efficiency, we observed significantly increased splicing reporter fold changes for the mutations, relative to WT cells (unpaired tt test, P <P< 0.01 for all prime editing groups compared with control). Overall, the editing rate for candidate mutations that affect SF3B1 alternative 3’ss usage across validation experiments correlates well with the minigene reporter fold change (fig. S7G). 为了验证我们的研究结果,我们使用传统的碱基编辑方法将候选基因导入表达微型基因报告基因的 HEK293FT 细胞中。然后,我们量化了与未编辑细胞相比 GFP:mCherry 比率的折叠变化。三个突变(I617V、Y623C 和 K666E)导致报告基因折叠变化显著增加(非配对 tt 检验,所有碱基编辑组与对照组相比 P < 0.001P<0.001 ;图 4F 和图 S7D)。S7D)。报告基因的折变值与 K700E 的折变值相似,K700E 是一种影响 SF3B1 3'ss 替代使用的已被证实的突变(35)。我们还对每个细胞群进行了目标扩增子测序,以验证每个目标碱基的编辑情况(图 S7E)。我们注意到,Y623C 和 K666E 这两个突变已在临床数据集中观察到,并在造血肿瘤样本中高度富集(27, 36)。K666E 对替代 3 'ss 使用的影响先前已得到验证(35、37)。另外一个排名靠前的突变 I617V/M 以前未在临床数据集中观察到。我们通过质粒编辑进一步验证了这些顶级候选突变(图 4G、图 S7F 和表 S12)。尽管编辑效率较低,但我们观察到,相对于 WT 细胞,突变的剪接报告倍数变化显著增加(非配对 tt 检验,与对照组相比,所有质粒编辑组 P <P< 0.01)。总的来说,在整个验证实验中,影响 SF3B1 3'ss 替代使用的候选突变的编辑率与迷你基因报告基因折叠变化有很好的相关性(图 S7G)。
Mutagenesis of noncoding regulatory elements uncovers functional bases and variants 非编码调控元件的突变发现功能基础和变体
We next explored the potential of HACE to parse gene regulatory elements. Resolving the individual bases and variants that underlie the activity of enhancers remains challenging, as these elements must ideally be examined in their native chromatin contexts. Although CRISPR base editors have been used to characterize enhancers (39-41), such approaches require separate sgRNAs for each narrow tar- 我们接下来探讨了 HACE 分析基因调控元件的潜力。解析增强子活性所依赖的单个碱基和变体仍然具有挑战性,因为这些元件最好必须在其原生染色质环境中进行检测。虽然 CRISPR 碱基编辑器已被用于鉴定增强子(39-41),但这种方法需要为每一个狭窄的增强子提供单独的 sgRNA。
get window and may be limited by the occurrence of protospacer adjacent motif (PAM) sites ( ∼30%\sim 30 \% of locations are targetable with NGG PAM sites). Furthermore, artificial linkage between variants due to correlated mutations in an editing window limits the ability of base editors to resolve the functions of individual bases (39). 此外,编辑窗口中的相关突变导致的变异之间的人为联系也限制了碱基编辑器解析单个碱基功能的能力(39)。此外,编辑窗口中的相关突变导致的变异之间的人为联系也限制了碱基编辑器解析单个碱基功能的能力(39)。
We targeted HACE to an enhancer region that regulates CD 69C D 69, a membrane-bound lectin receptor gene that contributes to immune cell tissue residency (42-45). We designed three 我们将 HACE 定位于调控 CD 69C D 69 的增强子区域,CD 69C D 69 是一种膜结合凝集素受体基因,有助于免疫细胞组织的驻留(42-45)。我们设计了三个
Protein structure modeling revealed that the top-ranked mutations are all located at the edge of the HEAT repeat helices of the SF3B1 protein structure, which matches the pattern of hotspot mutations in pan-cancer analyses (Fig. 4H). These mutations are located near the pre-mRNA-binding region and have been shown to disrupt the tertiary structure of the SF3B1 protein (38). Taken together, HACE has helped us to identify clinically relevant mutations that result in SF3B1-dependent missplicing. 蛋白质结构建模显示,排名靠前的突变都位于SF3B1蛋白质结构的HEAT重复螺旋边缘,这与泛癌症分析中的热点突变模式相吻合(图4H)。这些突变位于前 mRNA 结合区附近,已被证明会破坏 SF3B1 蛋白的三级结构(38)。综上所述,HACE 帮助我们确定了导致 SF3B1 依赖性剪接错误的临床相关突变。
sgRNAs targeting the core region of the CD69 enhancer (table S1) and infected K562 cells with these nCas9-sgRNAs and HE (AID* Delta\Delta-PcrA M6-UGI) constructs. After 6 days, the cells were stimulated with phorbol 12-myristate 13-acetate (PMA) and ionomycin to induce CD69 expression and then sorted on the basis of CD69 表 S1),并用这些 nCas9-sgRNA 和 HE(AID* Delta\Delta -PcrA M6-UGI)构建体感染 K562 细胞。6天后,用光滑醇-12-肉豆蔻酸-13-乙酸酯(PMA)和离子霉素刺激细胞以诱导CD69表达,然后根据CD69表达量对细胞进行分拣。
surface expression (methods). We assessed mutations by amplifying and sequencing the targeted region in CD69 ^("low "){ }^{\text {low }} and CD69 ^("high "){ }^{\text {high }} subsets (Fig. 5A and fig. S8A). The relative effect of each base edit was calculated according to its enrichment or depletion in CD69 ^("high "){ }^{\text {high }} relative to CD69 ^("low "){ }^{\text {low }} libraries (methods). Multiple indi- 表面表达(方法)。我们通过对 CD69 ^("low "){ }^{\text {low }} 和 CD69 ^("high "){ }^{\text {high }} 亚群中的目标区域进行扩增和测序来评估突变(图 5A 和图 S8A)。根据CD69 ^("high "){ }^{\text {high }} 相对于CD69 ^("low "){ }^{\text {low }} 文库的富集或缺失情况,计算每个碱基编辑的相对影响(方法)。多重指示
vidual bases reduced CD69 activation, with most of them located in motifs of immunerelated transcription factors (Fig. 5B and table S10). The base enrichment pattern was highly consistent across biological replicates (Pearson’s rho=0.845\rho=0.845 ), confirming the robustness of our screen (Fig. 5C). 其中大多数碱基位于免疫相关转录因子的基序中(图 5B 和表 S10)。碱基富集模式在不同生物重复中高度一致(皮尔逊 rho=0.845\rho=0.845 ),证实了我们筛选的稳健性(图 5C)。
\section*{D \分节*{D
[
] sgRNA (4995)} sgRNA (4995)}
G
pegRNA template for prime editor 用于素材编辑器的 pegRNA 模板
G G A C C A C A WT G G A C C C A WT
G G A T T A T A 4995/6/8 C>T, loss of RUNX, gain of GATA G G A T T A T A 4995/6/8 C>T,RUNX 缺失,GATA 增益
G G A T C A C A 4995 C>T, loss of RUNX G G A T C A C A 4995 C>T,RUNX 缺失
G G A C T A C A 4996 C>T, loss of RUNX G G A C T A C A 4996 C>T,RUNX 缺失
G G A C C A T A 4998C >4998 \mathrm{C}> T, loss of RUNX G G A C C A T A 4998C >4998 \mathrm{C}> T,RUNX 缺失
现有的 HACE 系统还受到其他限制的影响,这些限制应通过进一步的工程设计来解决。一个目标是通过优化 HACE 系统的模块化组件来提高突变率和范围。例如,可以探索多种解旋酶,以提高效率、持续合成能力和保真度以及加快动力学速度。这些努力可以利用解旋酶的合理工程对其他应用的见解(13,51)(13,51). Although HACE has a tunable editing rate and range, the substitution efficiency is also dependent on the targeted genomic locus. This might be a result of differences in Cas9 binding, helicase loading efficiency, target sequence, or chromatin context. Whereas we predominantly used nCas9 D10A variant for its consistent high levels of editing, further engineering with Cas9 variants to increase the editing rate can lead to superior HACE constructs that would minimize indel generation. Further analysis of HACE across genomic contexts may uncover new insights on controlling editing rate and range to achieve tunable mutagenesis. This would potentially allow for helicases with adjustable mutation ranges, which could restrict mutations to relevant genomic windows (e.g., enhancers). Additionally, HACE is currently limited to transition mutation modes owing to the use of cytidine and adenosine deaminases but might be expanded by incorporating emerging base-editing enzymes (52-54). Finally, long-range sequencing technologies could accelerate investigations of sequence coevolution in HACE mutagenized regions and thus provide new insights regarding concurrent or epistatic mutations. 现有的 HACE 系统还受到其他限制的影响,这些限制应通过进一步的工程设计来解决。一个目标是通过优化 HACE 系统的模块化组件来提高突变率和范围。例如,可以探索多种解旋酶,以提高效率、持续合成能力和保真度以及加快动力学速度。这些努力可以利用解旋酶的合理工程对其他应用的见解 (13,51)(13,51) 。虽然 HACE 具有可调的编辑速率和范围,但其取代效率也取决于目标基因组位点。这可能是 Cas9 结合力、螺旋酶装载效率、靶序列或染色质环境不同的结果。虽然我们主要使用的是 nCas9 D10A 变体,因为它的编辑水平一直很高,但进一步使用 Cas9 变体来提高编辑率可能会产生更好的 HACE 构建物,从而最大限度地减少嵌合体的产生。进一步分析跨基因组环境的 HACE 可能会发现控制编辑率和范围以实现可调诱变的新见解。这将有可能使螺旋酶具有可调节的突变范围,从而将突变限制在相关的基因组窗口(如增强子)。此外,由于使用胞苷和腺苷脱氨酶,HACE 目前仅限于过渡突变模式,但可以通过加入新出现的碱基编辑酶(52-54)来扩展。最后,长程测序技术可加快对 HACE 诱变区域序列协同进化的研究,从而为并发或表观突变提供新的见解。
Overall, HACE is a powerful tool for continuous, long-range, programmable diversification of endogenous mammalian genomes. We envision that HACE will substantially expand the functional genomics toolbox and unlock new molecular-scale insights toward building sequence-function maps of both coding and noncoding genomes. 总之,HACE 是对哺乳动物内源基因组进行持续、长程、可编程多样化的强大工具。我们预计,HACE 将大大扩展功能基因组学工具箱,并为构建编码和非编码基因组的序列功能图谱提供新的分子尺度见解。
Materials and methods 材料和方法
Plasmids and oligonucleotides 质粒和寡核苷酸
The guide sequences used for HACE mutagenesis (table S1) are cloned by Gibson assembly 用于 HACE 诱变的引导序列(表 S1)是通过 Gibson 组装克隆的
or Golden Gate assembly. The oligos used in this study for sequencing (table S2) were purchased from Integrated DNA technologies (IDT) or Azenta/GENEWIZ. The Cas9 nickase plasmids were derived from plasmids pSpCas9 (BB)-2A-GFP (Addgene 48138) and pCMV-PEmax-P2A-GFP (Addgene 180020). The dCas9 plasmid was generated by site-directed mutagenesis. Plasmids expressing sgRNAs and prime editing guide RNAs (pegRNAs) were cloned by Gibson assembly or Golden Gate assembly. HACE editor plasmids were cloned by Gibson assembly of polymerase chain reaction (PCR) products. Individual helicases and deaminases were either subcloned from existing plasmids or synthesized by Integrated DNA Technologies after mammalian codon optimization. pEGFPBLM was a gift from C. K.-L. Chan (Addgene plasmid #110299; http://n2t.net/addgene:110299; RRID:Addgene_110299). pCMV-Tag1-NS3 was a gift from X. Wang (Addgene plasmid #17645; http://n2t.net/addgene:17645; RRID:Addgene_ 17645). pET22B_SA_PcrA was a gift from M. Webb (Addgene plasmid #102999; http:// n2t.net/addgene:102999; RRID:Addgene_102999). pSpCas9n(BB)-2A-GFP (PX461) was a gift from F. Zhang (Addgene plasmid #48140; http://n2t. net/addgene:48140; RRID:Addgene_48140). SpCas9 TadDE was a gift from D. Liu (Addgene plasmid #193837; http://n2t.net/addgene:193837; RRID:Addgene_193837). The helicases tested are summarized in table S3S 3, and sequences of individual helicases tested are listed in table S4. All new plasmids generated during this study will be deposited on Addgene. 或 Golden Gate 组装。本研究中用于测序的寡核苷酸(表 S2)购自 Integrated DNA technologies (IDT) 或 Azenta/GENEWIZ。Cas9缺口酶质粒来自质粒 pSpCas9 (BB)-2A-GFP (Addgene 48138) 和 pCMV-PEmax-P2A-GFP (Addgene 180020)。dCas9 质粒是通过定点诱变产生的。表达 sgRNA 和质粒编辑向导 RNA(pegRNA)的质粒是通过 Gibson 组装或 Golden Gate 组装克隆的。HACE 编辑质粒是通过聚合酶链反应(PCR)产物的 Gibson 组装克隆的。单个螺旋酶和脱氨酶或者从现有质粒中亚克隆,或者由 Integrated DNA Technologies 公司经过哺乳动物密码子优化后合成。pCMV-Tag1-NS3 由 X. Wang 赠送 (Addgene plasmid #17645; http://n2t.net/addgene:17645; RRID:Addgene_ 17645)。pET22B_SA_PcrA 由 M. Webb 赠送 (Addgene plasmid #17645; http://n2t.net/addgene:17645; RRID:Addgene_ 17645)。Webb (Addgene plasmid #102999; http:// n2t.net/addgene:102999; RRID:Addgene_102999). pSpCas9n(BB)-2A-GFP (PX461) 由 F. Zhang (Addgene plasmid #48140; http://n2t. net/addgene:48140; RRID:Addgene_48140) 提供。SpCas9 TadDE 由 D. Liu 惠赠(Addgene 质粒 #193837; http://n2t.net/addgene:193837; RRID:Addgene_193837)。表 S3S 3 总结了测试的螺旋酶,表 S4 列出了测试的各个螺旋酶的序列。本研究中产生的所有新质粒都将存放在 Addgene 上。
Mammalian cell culture 哺乳动物细胞培养
HEK293FT cells (Thermo Fisher - R70007) and A375 cells (ATCC, CRL-1619) were cultured in Dulbecco’s Modified Eagle Medium with GlutaMAX (Thermo Fisher Scientific 10564011) supplemented with 10%10 \% (v/v) fetal bovine serum (FBS, Sigma-Aldrich F4135) and 1x penicillin-streptomycin (Thermo Fisher Scientific 15140122). Adherent cells were maintained at confluency below 80 to 90%90 \% at 37^(@)C37^{\circ} \mathrm{C} and 5%CO_(2)5 \% \mathrm{CO}_{2}. HEK293FT 细胞(赛默飞世尔 - R70007)和 A375 细胞(ATCC,CRL-1619)在含 GlutaMAX 的 Dulbecco's Modified Eagle Medium(赛默飞世尔科技公司 10564011)中培养,并添加 10%10 \% (v/v) 胎牛血清(FBS,Sigma-Aldrich F4135)和 1x 青霉素-链霉素(赛默飞世尔科技公司 15140122)。粘附细胞的汇合度保持在 80 以下, 90%90 \% 为 37^(@)C37^{\circ} \mathrm{C} 和 5%CO_(2)5 \% \mathrm{CO}_{2} 。
K562 cells (ATCC, CCL-243) were cultured in RPMI 1640 medium with GlutaMax (Thermo Fisher-61870036) supplemented with 10%10 \% (v/v) FBS and 1xx1 \times penicillin-streptomycin. Suspended cells were maintained at confluency below 1.5 xx1.5 \times10^(6)10^{6} cells //ml/ \mathrm{ml} at 37^(@)C37^{\circ} \mathrm{C} and 5%CO_(2)5 \% \mathrm{CO}_{2}. For stimulation experiments, 50ng//ml50 \mathrm{ng} / \mathrm{ml} PMA (Sigma-Alrich, P8139) and 500ng//ml500 \mathrm{ng} / \mathrm{ml} ionomycin calcium salt from Streptomyces conglobatus (ionomycin, Sigma-Aldrich, I0634) are added to the cell culture media to simulate the cells for 2 to 3 hours. K562 细胞(ATCC,CCL-243)在 RPMI 1640 培养基中培养,培养基为 GlutaMax(赛默飞世尔-61870036),辅以 10%10 \% (v/v) FBS 和 1xx1 \times 青霉素-链霉素。悬浮细胞的汇合度保持在 1.5 xx1.5 \times10^(6)10^{6} 细胞 //ml/ \mathrm{ml}37^(@)C37^{\circ} \mathrm{C} 和 5%CO_(2)5 \% \mathrm{CO}_{2} 以下。进行刺激实验时,在细胞培养基中加入 50ng//ml50 \mathrm{ng} / \mathrm{ml} PMA(Sigma-Alrich,P8139)和 500ng//ml500 \mathrm{ng} / \mathrm{ml} 来自 Streptomyces conglobatus 的离子霉素钙盐(离子霉素,Sigma-Aldrich,I0634),模拟细胞 2 至 3 小时。
Transfection of HACE plasmid and genomic DNA preparation for HACE editing assays 转染 HACE 质粒和制备用于 HACE 编辑试验的基因组 DNA
The day before transfection, 10,000 HEK293FT cells were seeded per well on 96 -well plates 转染前一天,在 96 孔板上每孔播种 10 000 个 HEK293FT 细胞
(Corning). Then, 16 to 24 hours after seeding, cells were transfected at ∼70%\sim 70 \% confluency with 0.3 mul0.3 \mu \mathrm{l} of TransIT-LT1 (Mirus Bio) according to the manufacturer’s specifications. Each well was transfected with 40 ng of HACE editor plasmid and 40 ng of Cas9 variant plasmid, and 16 ng of sgRNA plasmid was delivered to each well unless otherwise specified. For control conditions, HACE editor plasmid and/or Cas9 variant plasmid were substituted with the same amount of pUC19\mathrm{pUC1} 9 plasmid. Cells were cultured for 3 days after transfection. DNA was collected from transfected cells by removal of medium, resuspension in 50 mul50 \mu \mathrm{l} of QuickExtract (Lucigen), and incubation at 65^(@)C65^{\circ} \mathrm{C} for 15 min , 68^(@)C68^{\circ} \mathrm{C} for 15 min , and 98^(@)C98^{\circ} \mathrm{C} for 10 min . After thermocycling, lysate was used directly in downstream PCR reactions as per manufacturer protocol. (康宁)。然后,在播种后 16 到 24 小时,按照生产商的说明,用 0.3 mul0.3 \mu \mathrm{l} TransIT-LT1 (Mirus Bio)转染 ∼70%\sim 70 \% 融合度的细胞。每孔转染 40 ng HACE 编辑质粒和 40 ng Cas9 变体质粒,除非另有说明,每孔转染 16 ng sgRNA 质粒。在对照条件下,用等量的 pUC19\mathrm{pUC1} 9 质粒代替 HACE 编辑质粒和/或 Cas9 变体质粒。转染后培养细胞 3 天。移除培养基,重悬于 50 mul50 \mu \mathrm{l} QuickExtract(Lucigen)中,在 65^(@)C65^{\circ} \mathrm{C} 温度下孵育 15 分钟, 68^(@)C68^{\circ} \mathrm{C} 温度下孵育 15 分钟, 98^(@)C98^{\circ} \mathrm{C} 温度下孵育 10 分钟,从转染细胞中收集 DNA。热循环后,按照制造商的规定,裂解液直接用于下游 PCR 反应。
High-throughput DNA sequencing of genomic DNA samples 基因组 DNA 样品的高通量 DNA 测序
For short amplicon ( ∼300bp\sim 300 \mathrm{bp} ) sequencing for HEK293FT and A375 cells, the target region was amplified from genomic DNA samples using Phusion U Hot Start PCR master mix (Thermo Fisher Scientific, F562) in a 20-mul20-\mu \mathrm{l} reaction. The following program was used: 98^(@)C98^{\circ} \mathrm{C} for 30 s ; 28 cycles of 98^(@)C98^{\circ} \mathrm{C} for 10s,65^(@)C10 \mathrm{~s}, 65^{\circ} \mathrm{C} for 30s,72^(@)C30 \mathrm{~s}, 72^{\circ} \mathrm{C} for 30s;72^(@)C30 \mathrm{~s} ; 72^{\circ} \mathrm{C} for 2 min , then 4^(@)C4^{\circ} \mathrm{C} thereafter. Barcodes and adapters for Illumina sequencing were added in a subsequent PCR amplification using Q5 High-Fidelity Hot-Start Polymerase Master Mix ( 2xx2 \times, New England Biolabs). Amplicons were pooled and prepared for sequencing on a NextSeq (Illumina) with paired-end reads (read 1, 160 bp ; index 1,8bp1,8 \mathrm{bp}; index 2,8bp2,8 \mathrm{bp}; read 2,160bp2,160 \mathrm{bp} ). Reads were demultiplexed and analyzed with appropriate pipelines. 对于 HEK293FT 和 A375 细胞的短扩增片段( ∼300bp\sim 300 \mathrm{bp} )测序,在 20-mul20-\mu \mathrm{l} 反应中使用 Phusion U Hot Start PCR 母液(Thermo Fisher Scientific,F562)从基因组 DNA 样品中扩增目标区域。使用的程序如下: 98^(@)C98^{\circ} \mathrm{C} 反应 30 秒; 98^(@)C98^{\circ} \mathrm{C} 、 10s,65^(@)C10 \mathrm{~s}, 65^{\circ} \mathrm{C} 、 30s,72^(@)C30 \mathrm{~s}, 72^{\circ} \mathrm{C} 、 30s;72^(@)C30 \mathrm{~s} ; 72^{\circ} \mathrm{C} 、 4^(@)C4^{\circ} \mathrm{C} 循环 28 次,每次 2 分钟。在随后的 PCR 扩增中,使用 Q5 高保真热启动聚合酶主混合物( 2xx2 \times , New England Biolabs)加入用于 Illumina 测序的条形码和适配体。扩增子被汇集起来,准备在NextSeq(Illumina)上用成对端读数(读数1,160 bp;索引 1,8bp1,8 \mathrm{bp} ;索引 2,8bp2,8 \mathrm{bp} ;读数 2,160bp2,160 \mathrm{bp} )进行测序。读数经解多路复用后使用适当的流水线进行分析。
For long amplicon ( ∼2000bp\sim 2000 \mathrm{bp} ) sequencing, the targeted region was amplified using Phusion U Hot Start PCR master mix in a 20-mul20-\mu \mathrm{l} reaction. The following program was used: 98^(@)C98^{\circ} \mathrm{C} for 30 s ; 28 cycles of 98^(@)C98^{\circ} \mathrm{C} for 10s,65^(@)C10 \mathrm{~s}, 65^{\circ} \mathrm{C} for 30 s , 72^(@)C72^{\circ} \mathrm{C} for 2min;72^(@)C2 \mathrm{~min} ; 72^{\circ} \mathrm{C} for 5 min , then 4^(@)C4^{\circ} \mathrm{C} thereafter. PCR products were purified using Magnetic Ampure XP beads (Beckman Coulter) using a 1:1 bead solution:DNA solution ratio to select the PCR fragments. Purified PCR products were eluted in 20 mul20 \mu \mathrm{l} of water. The concentration of each sample was measured by Qubit (Thermo Fisher Scientific). The sequencing library was prepared following the Nextera XT Kit protocol (Illumina) using 1 ng of purified amplicon DNA per sample as starting material and half of the recommended amount of each kit reagent. Sequencing was performed on a NextSeq (Illumina) with paired-end reads (read 1, 100 bp ; index 1,8bp1,8 \mathrm{bp}; index 2,8bp2,8 \mathrm{bp}; read 2,100bp2,100 \mathrm{bp} ). 对于长扩增片段( ∼2000bp\sim 2000 \mathrm{bp} )测序,在 20-mul20-\mu \mathrm{l} 反应中使用 Phusion U Hot Start PCR 母液扩增目标区域。使用的程序如下: 98^(@)C98^{\circ} \mathrm{C} 30 秒;28 个循环: 98^(@)C98^{\circ} \mathrm{C} 为 10s,65^(@)C10 \mathrm{~s}, 65^{\circ} \mathrm{C} 30 秒, 72^(@)C72^{\circ} \mathrm{C} 为 2min;72^(@)C2 \mathrm{~min} ; 72^{\circ} \mathrm{C} 5 分钟,然后是 4^(@)C4^{\circ} \mathrm{C} 。使用 Magnetic Ampure XP beads(Beckman Coulter)纯化 PCR 产物,珠液与 DNA 溶液的比例为 1:1,以筛选 PCR 片段。纯化的 PCR 产物用 20 mul20 \mu \mathrm{l} 水洗脱。用 Qubit(赛默飞世尔科技公司)测量每个样品的浓度。测序文库是按照 Nextera XT 试剂盒协议(Illumina)制备的,每个样本使用 1 ng 纯化的扩增子 DNA 作为起始材料,每个试剂盒的试剂用量为建议用量的一半。测序在NextSeq(Illumina)上进行,采用成对末端读数(读数1,100 bp;索引 1,8bp1,8 \mathrm{bp} ;索引 2,8bp2,8 \mathrm{bp} ;读数 2,100bp2,100 \mathrm{bp} )。
Quantification of editing rate 编辑率的量化
Raw fastq reads obtained from sequencing were quality trimmed using BBduk (55) (BBMap v38.93) with the options “qtrim=rl trimq=28 使用 BBduk (55)(BBMap v38.93)对测序获得的原始 fastq 读数进行质量修剪,选项为 "qtrim=rl trimq=28
maq=25.” Next, all bases with quality scores of < 28<28 were masked to N using seqtk v1.3 (56). The filtered reads were aligned to the reference sequence using Bowtie2 (57) (version 2.3.4.3). The pileup at each base was calculated using a custom Python script. maq=25"。接着,使用 seqtk v1.3 (56)将质量分数为 < 28<28 的所有碱基屏蔽为 N。过滤后的读数使用 Bowtie2 (57)(2.3.4.3 版)与参考序列进行比对。使用自定义 Python 脚本计算每个碱基的堆积。
To calculate the substitution rate, we filtered for base positions with a sequencing coverage of at least 10,000. Bases that had a > 5%>5 \% substitution rate in the control condition were masked because this either indicated that it was a variant or an artifact from sequence alignment. The average G > A\mathrm{G}>\mathrm{A} editing rate was calculated by extracting all positions where “G” was the reference base, then taking the average of the per-base G > AG>A editing rate. The editing rate for other base transition and transversion modes was calculated similarly. To calculate the number of mutations per contiguous read, the aligned reads were converted to a tab-delimited format using Sam2Tsv (https:// lindenb.github.io/jvarkit/Sam2Tsv.html). Subsequently, the number of substitutions per read was tabulated and normalized by the total number of reads per sample. 为了计算替换率,我们筛选了测序覆盖率至少为 10,000 的碱基位置。在对照条件下,具有 > 5%>5 \% 替换率的碱基被屏蔽,因为这要么表明它是一个变异体,要么表明它是序列比对的假象。平均 G > A\mathrm{G}>\mathrm{A} 编辑率的计算方法是:提取所有以 "G "为参考碱基的位置,然后取每个碱基 G > AG>A 编辑率的平均值。其他碱基转换和反转模式的编辑率计算方法类似。为了计算每个连续读数的突变数,我们使用 Sam2Tsv(https:// lindenb.github.io/jvarkit/Sam2Tsv.html)将对齐读数转换为制表符分隔格式。随后,将每个读数的替换数制成表格,并按每个样本的读数总数进行归一化处理。
To calculate the local editing rate, the alignment was centered such that the nick site is centered at base position 0 . For every base position, the local G > AG>A editing rate was calculated by extracting all the “G” bases within a 100-bp window (50 bp upstream and 50 bp downstream) and then taking the average of all per-base G >\mathrm{G}> A editing rates. Code for quantifying the edit rate is available at https:// github.com/chen-dawn/hace. Indel rates were quantified using CRISPResso2 (58). 为了计算局部编辑率,将排列居中,使缺口位点居中于碱基位置 0。对于每个碱基位置,通过提取 100 bp 窗口(上游 50 bp,下游 50 bp)内的所有 "G "碱基,然后取每个碱基 G >\mathrm{G}> A 编辑率的平均值,计算出本地 G > AG>A 编辑率。量化编辑率的代码见 https:// github.com/chen-dawn/hace。使用 CRISPResso2 (58) 对嵌合率进行量化。
Long-term monitoring of mutation accumulation 变异积累的长期监测
A tetracycline-inducible promoter-controlled HE variant (AID-PcrA M6-UGI) was stably integrated into the HEK293FT cell genome. The sgRNA-nCas9 plasmid were transfected into the cells and doxycycline (Dox, an analog of tetracycline) was added into the culture medium ( 10ng//ml10 \mathrm{ng} / \mathrm{ml} ) to activate the HE expression. The expression of the HE [blue fluorescent protein (BFP)] and sgRNA-nCas9 (GFP) were visually monitored daily. A second transfection of the sgRNA-nCas9 plasmid was performed on day 5. Genomic DNA (gDNA) samples from day 0 (before diversification) and days 2,4,62,4,6, 8 , and 10 were collected where the target loci was sequenced and analyzed as described above. 将四环素诱导的启动子控制 HE 变体(AID-PcrA M6-UGI)稳定地整合到 HEK293FT 细胞基因组中。将 sgRNA-nCas9 质粒转染到细胞中,并在培养液中加入强力霉素(Dox,四环素的类似物)( 10ng//ml10 \mathrm{ng} / \mathrm{ml} )以激活 HE 的表达。每天目测 HE [蓝色荧光蛋白(BFP)] 和 sgRNA-nCas9 (GFP) 的表达。第 5 天对 sgRNA-nCas9 质粒进行第二次转染。收集第 0 天(分化前)、第 2,4,62,4,6 天、第 8 天和第 10 天的基因组 DNA(gDNA)样本,按上述方法对目标基因座进行测序和分析。
Cell viability measurements 细胞活力测量
Cell viability was measured using the CellTiterGlo Luminescent Cell Viability Assay (Promega) following the manufacturer’s protocols. Briefly, HEK293FT cells were seeded at a density of 10,000 cells per 100 mu100 \mu l per well in a 96 -well plate in biological triplicates. The next day, cells were transfected with respective HACE plasmids according to the above protocol. Cell viability was measured 72 hours after transfection. 使用 CellTiterGlo Luminescent Cell Viability Assay(Promega 公司)按照生产商提供的方法测量细胞活力。简而言之,HEK293FT 细胞以每 100 mu100 \mu l 孔 10,000 个细胞的密度播种在 96 孔板中,生物三倍体。第二天,按照上述方案用各自的 HACE 质粒转染细胞。转染 72 小时后测量细胞活力。
Luminescence readings were performed using a SpectraMax M5 (Molecular Devices) plate reader. 使用 SpectraMax M5(Molecular Devices)平板阅读器进行发光读数。
Whole-exome sequencing and off-target analysis 全外显子组测序和脱靶分析
The day before transfection, 50,000 HEK293FT cells were seeded in a 24 -well plate. The next day, individual HE constructs were transfected together with a sgRNA targeting the MAP2K1 locus. Genomic DNA was extracted from cells 3 days after transfection using the Zymo Quick-DNA Miniprep Kit (Cat D3024). Amplicon sequencing was performed at the MAP2K1 loci to confirm that there is HACEdependent editing at the target loci in each condition. The whole-genome DNA sequencing library was prepared using the NEBNext Ultra II FS DNA Library Prep Kit for Illumina (Cat E7805S). Exome sequences were enriched using the xGen Exome Hybridization Panel (IDT 10005152) following the manufacturer’s protocols. Exome libraries were sequenced on a NovaSeq X (Illumina) with paired-end reads (read 1, 150 bp ; index 1, 8 bp ; index 2,8bp2,8 \mathrm{bp}; read 2,150bp2,150 \mathrm{bp} ) at a minimum of 100 million reads per sample. 转染前一天,将 50,000 个 HEK293FT 细胞播种到 24 孔板中。第二天,将单个 HE 构建体与靶向 MAP2K1 基因座的 sgRNA 一起转染。转染 3 天后,使用 Zymo Quick-DNA Miniprep Kit(Cat D3024)从细胞中提取基因组 DNA。在 MAP2K1 基因座上进行扩增子测序,以确认在每种情况下靶基因座上都存在 HACE 依赖性编辑。使用用于 Illumina 的 NEBNext Ultra II FS DNA 文库制备试剂盒(Cat E7805S)制备全基因组 DNA 测序文库。外显子组序列使用 xGen 外显子组杂交面板(IDT 10005152)按照制造商的协议进行富集。外显子组文库在NovaSeq X(Illumina)上用成对端读数(读数1,150 bp;索引1,8 bp;索引 2,8bp2,8 \mathrm{bp} ;读数 2,150bp2,150 \mathrm{bp} )测序,每个样本至少有1亿个读数。
The sequencing output was demultiplexed using bcl2fastq, and the paired-end reads were aligned to the reference genome hg38 using HISAT2 v2.2.1 (59). Aligned reads from each replicate were subsampled using reformat.sh (BBMap v38.93), and 100 million aggregated reads per replicate for each condition were used for further analysis. The HEK293FT-specific single-nucleotide polymorphisms (SNPs) were determined following the GATK4 variant calling workflow for germline short variant discovery (https://gatk.broadinstitute.org/hc/en-us/ articles/360035535932-Germline-short-variant-discovery-SNPs-Indels) on WT HEK293FT exome libraries ( > 50 xx>50 \times coverage). In brief, the aligned reads were deduplicated using Picard v2.27.5. HaplotypeCaller (GATK4) was used for calling variants, and known variants in dbSNP version 138 were used during basequality recalibration. The chromosomal coordinates where SNPs were detected were excluded from subsequent analysis. 使用 bcl2fastq 对测序结果进行解复用,并使用 HISAT2 v2.2.1 将成对末端读数与参考基因组 hg38 对齐(59)。使用 reformat.sh(BBMap v38.93)对每个重复序列的对齐读数进行子采样,并使用每个重复序列的 1 亿个聚合读数进行进一步分析。HEK293FT 特异性单核苷酸多态性(SNPs)是在 WT HEK293FT 外显子组文库( > 50 xx>50 \times 覆盖率)上按照种系短变异发现的 GATK4 变异调用工作流程(https://gatk.broadinstitute.org/hc/en-us/ articles/360035535932-Germline-short-variant-discovery-SNPs-Indels)确定的。简而言之,使用 Picard v2.27.5 对对齐的读数进行了重复。HaplotypeCaller(GATK4)被用来调用变异,在碱基质量重新校准过程中使用了dbSNP 138版中的已知变异。随后的分析排除了检测到 SNP 的染色体坐标。
To quantify the per-base editing rate of the exome, the base pileup at each base was calculated using samtools mpileup (v 1.15.1), followed by post processing using mpileup2readcounts (https://github.com/IARCbioinfo/mpileup2read counts). Bases with <50 total read depth were excluded from subsequent analysis. The genome was binned into 100-kb100-\mathrm{kb} bins using bcftools v1.15.1. The off-target C > T\mathrm{C}>\mathrm{T} editing rates for each genomic bin were obtained using a custom RR script by counting the number of CC and T bases in each bin. Fisher’s exact test was used to quantify significant changes in editing for each bin relative to cells transfected with only nCas9, using the false discovery rate correction to adjust for multiple hypothesis test- 为了量化外显子组的每碱基编辑率,使用 samtools mpileup (v 1.15.1) 计算了每个碱基的碱基堆积,然后使用 mpileup2readcounts ( https://github.com/IARCbioinfo/mpileup2read counts) 进行后处理。总读数深度小于 50 的碱基被排除在后续分析之外。使用 bcftools v1.15.1 将基因组分为 100-kb100-\mathrm{kb} bins。使用自定义的 RR 脚本,通过计算每个基因组 bin 中 CC 和 T 碱基的数量,得到每个基因组 bin 的脱靶 C > T\mathrm{C}>\mathrm{T} 编辑率。费雪精确检验用于量化相对于仅转染 nCas9 的细胞而言,每个基因组的编辑率发生的显著变化,并使用错误发现率校正来调整多重假设检验--"......"。
ing. Significant off-target sites are listed in table S5. 表 S5 列出了重要的脱靶位点。表 S5 列出了重要的脱靶位点。
MEK1 inhibitor-resistance screen MEK1 抑制剂抗性筛选
A375 cells were diversified for 3 days by transfection of HE variant AID-PcrA M6-UGI, nCas9 D10A, and sgRNAs targeting exons 2,3 , and 6 of the MEK1 gene using TransIT-2020 (Mirus Bio). Approximately 5 million cells in a 15-cm15-\mathrm{cm} dish were placed under selection with either 100 nM selumetinib or 5 nM trametinib for 20 days. A portion of preselection cells were harvested as a control. Cells were passaged every 3 days to ensure that they were maintained at < 70%<70 \% confluency. After selection, cells were harvested, and genomic DNA was extracted using QuickExtract (Lucigen). The MEK1 exons were amplified with exon-specific primers (table S2) using Phusion U Hot Start Master Mix. Concurrently, we also harvested RNA from selected cells using the Qiagen RNeasy Mini Plus Kit (Cat 74134). The cDNA was generated by reverse transcription using Maxima H Minus Reverse Transcriptase (Thermo Fisher). Sequencing libraries for cDNA were generated using the modified Nextera XT Kit protocol described in the “High-throughput DNA sequencing of genomic DNA samples” section above. All libraries were sequenced on a NextSeq (Illumina) with paired-end reads (read 1, 160 bp ; index 1, 8 bp ; index 2,8bp2,8 \mathrm{bp}; read 2,160bp2,160 \mathrm{bp} ). The substitution rate (allele frequency) for each base of the MEK1 sequence was calculated for both pre-and postselection samples. Significant mutations were identified by comparing the base counts between preand postselection samples using a Fisher’s exact test (table S 6 ). We compared the substitution rate between RNA and DNA samples and found that they had a high correlation. 使用 TransIT-2020 (Mirus Bio)转染 HE 变体 AID-PcrA M6-UGI、nCas9 D10A 和靶向 MEK1 基因第 2、3 和 6 号外显子的 sgRNA,使 A375 细胞多样化 3 天。在 15-cm15-\mathrm{cm} 培养皿中放入约 500 万个细胞,用 100 nM selumetinib 或 5 nM trametinib 选择 20 天。收获部分预选细胞作为对照。每 3 天对细胞进行一次传代,以确保细胞保持在 < 70%<70 \% 汇合度。选择后,收获细胞,用 QuickExtract (Lucigen) 提取基因组 DNA。使用 Phusion U Hot Start Master Mix,用外显子特异性引物(表 S2)扩增 MEK1 外显子。同时,我们还使用 Qiagen RNeasy Mini Plus Kit(Cat 74134)从选定的细胞中收获了 RNA。使用 Maxima H Minus 逆转录酶(赛默飞世尔)反转录生成 cDNA。cDNA 测序文库采用上文 "基因组 DNA 样品的高通量 DNA 测序 "部分所述的改良 Nextera XT 试剂盒方案生成。所有文库都在NextSeq(Illumina)上进行了成对端读数测序(读数1,160 bp;索引1,8 bp;索引 2,8bp2,8 \mathrm{bp} ;读数 2,160bp2,160 \mathrm{bp} )。计算了选择前和选择后样本 MEK1 序列每个碱基的替换率(等位基因频率)。使用费雪精确检验(表 S 6)比较了选择前和选择后样本的碱基数,确定了显著的突变。我们比较了 RNA 样本和 DNA 样本的替换率,发现它们之间有很高的相关性。
SRE reporter assay SRE 报告试验
pEF1a-MEK1 wild type, pEF1a-MEK1G128D, pEF1a-MEK1G202E, and pEF1a-MEK1E203K were generated using Gibson assembly. Sequences of MEK1-derived constructs are available in table S 7 . The SRE reporter assay was performed using the SRE reporter kit (BPS Biosciences) according to the manufacturer’s protocols. In brief, ∼10,000\sim 10,000 HEK293FT cells in 100 mul100 \mu \mathrm{l} of growth medium were seeded in 96 -well white opaque assay plates. The cells were transfected with 60 ng of reporter plasmid and 40 ng of respective MEK1 plasmids. The culture medium was replaced 6 hours after transfection with 50 mul50 \mu \mathrm{l} of trametinibcontaining medium with 0.5%0.5 \% FBS. After 12 hours, the cells were washed and incubated with 50 mul50 \mu \mathrm{l} of 0.5%0.5 \% FBS-containing culture medium supplemented with recombinant human epidermal growth factor protein (Life Technologies) at a final concentration of 10ng//ml10 \mathrm{ng} / \mathrm{ml}. After 6 hours of incubation, the reporter activity was assayed using a dual luci- pEF1a-MEK1 野生型、pEF1a-MEK1G128D、pEF1a-MEK1G202E 和 pEF1a-MEK1E203K 是通过 Gibson 组装生成的。MEK1 衍生构建体的序列见表 S 7。SRE 报告检测是使用 SRE 报告试剂盒(BPS Biosciences)按照制造商的方案进行的。简言之, ∼10,000\sim 10,000 HEK293FT 细胞在 100 mul100 \mu \mathrm{l} 生长培养基中播种于 96 孔白色不透明检测板中。用 60 ng 的报告质粒和 40 ng 的 MEK1 质粒转染细胞。转染 6 小时后,用含 0.5%0.5 \% FBS 的 50 mul50 \mu \mathrm{l} 曲美替尼培养基更换培养基。12 小时后,洗涤细胞并用 50 mul50 \mu \mathrm{l} 含 0.5%0.5 \% FBS 的培养基培养,培养基中补充了重组人表皮生长因子蛋白(Life Technologies 公司),最终浓度为 10ng//ml10 \mathrm{ng} / \mathrm{ml} 。培养 6 小时后,使用双荧光检测器检测报告活性。
ferase (Firefly-Renilla) assay system (BPS Bioscience) according to the manufacturer’s instructions using a SpectraMax M5 (Molecular Devices) plate reader. The ratio between Firefly luminescence and Renilla luminescence intensity was calculated for each well after background subtraction. 使用 SpectraMax M5(Molecular Devices)平板阅读器,按照制造商的说明使用萤火虫-雷尼拉酶(Firefly-Renilla)检测系统(BPS Bioscience)。每孔的萤火虫发光强度与 Renilla 发光强度之比均在扣除背景后计算得出。
Design of SF3B1 splicing minigene reporter 设计 SF3B1 剪接微型基因报告器
The minigene reporter to probe SF3B1 function was constructed by Gibson assembly of a synthetic minigene sequence (synthesized by Twist Biosciences) into a custom bicistronic mCherry/GFP reporter plasmid. To construct the minigene reporter, we fused the VCP exon 10 sequence and 150 bp of its immediate downstream intron with DLST exon 6 and 97 bp of its immediate upstream intron. We also appended an “ATG” start codon at the beginning of the sequence. The open reading frame was adjusted such that correct splicing in WT cells will result in premature termination before the GFP. In contrast, the alternative 3’ss usage in SF3B1 mutant cells will result in full-length GFP expression. The minigene reporter sequences are annotated in table S 8 . 用于检测 SF3B1 功能的迷你基因报告物是通过 Gibson 将合成的迷你基因序列(由 Twist Biosciences 合成)组装到定制的双螺旋 mCherry/GFP 报告质粒中构建的。为了构建微型基因报告基因,我们将 VCP 第 10 号外显子序列及其紧接下游内含子的 150 bp 与 DLST 第 6 号外显子及其紧接上游内含子的 97 bp 融合。我们还在序列开头添加了一个 "ATG "起始密码子。我们对开放阅读框进行了调整,使 WT 细胞中的正确剪接会导致 GFP 提前终止。相反,在 SF3B1 突变体细胞中,3'ss 的替代用法将导致全长 GFP 表达。迷你基因报告序列的注释见表 S 8。
SF3B1 missplicing screen SF3B1 错接筛选
HEK293FT cells were diversified for 3 days by cotransfection of HE variants AID-PcrA M6UGI and TadA-PcrA M6-UGI, nCas9 D10A, sgRNAs targeting SF3B1 exons 13 to 17, and splicing minigene reporter. Cells transfected with only minigene reporter were used as undiversified control. The experiment was performed in triplicate, with ∼10\sim 10 million cells transfected per replicate. After diversification, cells were prepared for flow sorting by washing and resuspending in 1xx1 \times phosphate-buffered saline (PBS) with 2%2 \% bovine serum albumin. Cells were sorted using a SONY MA900 sorter, where mCherry-positive cells were sorted into a GFP ^(-){ }^{-}and GFP^(+)\mathrm{GFP}^{+}bin. At least 1 million cells were collected for each cell population. After flow sorting, the RNA of the cells was extracted using the Qiagen RNeasy Mini Plus Kit. The cDNA was generated by reverse transcription using Maxima H Minus Reverse Transcriptase. Sequencing libraries for cDNA were generated using the modified Nextera XT Kit protocol and sequenced on a NextSeq. Fold enrichment was calculated by dividing the substitution rate in GFP^(+)\mathrm{GFP}^{+}by that of GFP ^(-){ }^{-}samples. The significant mutations were identified using a Fisher’s exact test and are shown in table S9. The clinical mutations that are observed in SF3B1 were retrieved from COSMIC (34). A mutation was considered high frequency if there were at least three observations in the dataset. 通过共转染 HE 变体 AID-PcrA M6UGI 和 TadA-PcrA M6-UGI、nCas9 D10A、靶向 SF3B1 第 13 至 17 号外显子的 sgRNA 和剪接微型基因报告基因,对 HEK293FT 细胞进行为期 3 天的多样化处理。仅转染迷你基因报告基因的细胞作为未分化对照。实验一式三份,每份转染 ∼10\sim 10 百万个细胞。分化后,将细胞清洗并重新悬浮在含 2%2 \% 牛血清白蛋白的磷酸盐缓冲液(PBS)中,为流式细胞分拣做准备。使用 SONY MA900 分拣机对细胞进行分拣,其中 mCherry 阳性细胞被分拣到 GFP ^(-){ }^{-} 和 GFP^(+)\mathrm{GFP}^{+} 仓中。每个细胞群至少收集 100 万个细胞。流式分选后,使用 Qiagen RNeasy Mini Plus Kit 提取细胞的 RNA。使用 Maxima H Minus 逆转录酶反转录生成 cDNA。cDNA 测序文库使用修改后的 Nextera XT Kit 方案生成,并在 NextSeq 上测序。用 GFP^(+)\mathrm{GFP}^{+} 中的替换率除以 GFP ^(-){ }^{-} 样品的替换率,计算出折合富集率。重大突变是通过费雪精确检验确定的,如表 S9 所示。在 SF3B1 中观察到的临床突变来自 COSMIC(34)。如果数据集中至少有三个突变,则该突变被认为是高频突变。
CD69 enhancer tiling for functional bases CD69 增强子的功能性基底堆积
K562 cells were nucleofected with 2.5 mug2.5 \mu \mathrm{~g} of HE and 2.5 mug2.5 \mu \mathrm{~g} of nCas9 and sgRNA plasmids 用 2.5 mug2.5 \mu \mathrm{~g} 的 HE 和 2.5 mug2.5 \mu \mathrm{~g} 的 nCas9 和 sgRNA 质粒核感染 K562 细胞
using the SF Cell Line 4D-Nucleofector X Kit L (Lonza V4XC-2024), following the manufacturer’s protocol. Each plasmid contained a fluorescent protein reporter (sgRNA mCherry, nCas9 GFP, HE BFP). Approximately 1.5 xx10^(6)1.5 \times 10^{6} to 2xx10^(6)2 \times 10^{6} cells were used per nucleofection reaction. After 24 hours, cells were sorted using either SONY SH800 or BD Aria flow cytometry sorter to isolate cells expressing all plasmid components. 使用 SF Cell Line 4D-Nucleofector X Kit L (Lonza V4XC-2024),按照生产商提供的方法进行处理。每个质粒都含有荧光蛋白报告物(sgRNA mCherry、nCas9 GFP、HE BFP)。每次核感染反应大约使用 1.5 xx10^(6)1.5 \times 10^{6} 到 2xx10^(6)2 \times 10^{6} 个细胞。24 小时后,使用 SONY SH800 或 BD Aria 流式细胞仪分拣机分拣细胞,分离出表达所有质粒成分的细胞。
On day 7 after nucleofection, cells were stimulated with PMA/ionomycin for 2 to 3 hours. The cells were stained with the antibody cocktail in the staining buffer of a 1:1 mix of PBS and Brilliant Stain Buffer (BD 566349) at room temperature for 20 min or at 4^(@)C4^{\circ} \mathrm{C} for 30 min . The following antibodies and dyes from BioLegend were used: Brilliant Violet 510 anti-human CD69 Antibody (310936), APC anti-human CD69 Antibody (310910), and Zombie NIR Fixable Viability Kit (423106). Cells were washed once in PBS with 1% FBS and then resuspended in the same buffer to prepare for flow sorting. Subsequently, the top 40%40 \% of cells showing high CD69 expression (CD69 ^("high "){ }^{\text {high }} ) and the bottom 20%20 \% with low CD69 expression (CD69 ^("low "){ }^{\text {low }} ) were sorted using the SONY SH800 flow cytometer. A minimum of 100,000 cells were collected per tube. Genomic DNA was then isolated from these cells either by using the QIAGEN DNA micro isolation kit (Cat #56304) or by lysis buffer [0.5% Triton X-100, 0.1 AU/ml QIAGEN Protease (Cat 19157) in H_(2)O\mathrm{H}_{2} \mathrm{O}. The lysis process involved incubation at 56^(@)C56^{\circ} \mathrm{C} for 20 min and at 72^(@)C72^{\circ} \mathrm{C} for 20 min at 600 rpm on a thermo shaker. Amplicon PCR for the genomic DNA was processed using the KAPA HiFi HotStart ReadyMix PCR Kit (Roche, KR0370). The following program was used: 95^(@)C95^{\circ} \mathrm{C} for 5 min ; 30 cycles of 95^(@)C95^{\circ} \mathrm{C} for 30s,60^(@)C30 \mathrm{~s}, 60^{\circ} \mathrm{C} for 30s,72^(@)C30 \mathrm{~s}, 72^{\circ} \mathrm{C} for 30s;72^(@)C30 \mathrm{~s} ; 72^{\circ} \mathrm{C} for 5min;4^(@)C5 \mathrm{~min} ; 4^{\circ} \mathrm{C} forever. The amplicon libraries were sequenced on a NextSeq. 核感染后第7天,用PMA/洋霉素刺激细胞2-3小时。用鸡尾酒抗体在PBS和Brilliant Stain Buffer(BD 566349)1:1混合的染色缓冲液中对细胞进行染色,染色时间为室温20分钟或 4^(@)C4^{\circ} \mathrm{C} 30分钟。使用了以下来自 BioLegend 的抗体和染料:Brilliant Violet 510 抗人 CD69 抗体 (310936)、APC 抗人 CD69 抗体 (310910) 和 Zombie NIR 固定活力试剂盒 (423106)。细胞在含 1% FBS 的 PBS 中洗涤一次,然后在相同的缓冲液中重悬,以备流式分拣。随后,使用 SONY SH800 流式细胞仪对顶部 40%40 \% 高 CD69 表达(CD69 ^("high "){ }^{\text {high }} )的细胞和底部 20%20 \% 低 CD69 表达(CD69 ^("low "){ }^{\text {low }} )的细胞进行分拣。每管至少收集 100,000 个细胞。然后使用 QIAGEN DNA 微分离试剂盒(Cat #56304)或裂解缓冲液[0.5% Triton X-100、0.1 AU/ml QIAGEN Protease(Cat 19157)于 H_(2)O\mathrm{H}_{2} \mathrm{O} 中]从这些细胞中分离基因组 DNA。裂解过程包括在恒温振荡器上以 600 rpm 的转速在 56^(@)C56^{\circ} \mathrm{C} 条件下孵育 20 分钟和在 72^(@)C72^{\circ} \mathrm{C} 条件下孵育 20 分钟。基因组 DNA 的扩增子 PCR 使用 KAPA HiFi HotStart ReadyMix PCR Kit(罗氏,KR0370)进行处理。使用的程序如下: 95^(@)C95^{\circ} \mathrm{C} 5分钟; 95^(@)C95^{\circ} \mathrm{C} 为 30s,60^(@)C30 \mathrm{~s}, 60^{\circ} \mathrm{C} 为 30s,72^(@)C30 \mathrm{~s}, 72^{\circ} \mathrm{C} 为 30s;72^(@)C30 \mathrm{~s} ; 72^{\circ} \mathrm{C} 为 5min;4^(@)C5 \mathrm{~min} ; 4^{\circ} \mathrm{C} 的30个循环。扩增子文库在 NextSeq 上测序。
To identify enriched bases, the %C > T\% \mathrm{C}>\mathrm{T} or %\%G > A\mathrm{G}>\mathrm{A} of each group were first calculated for both CD69 ^("high "){ }^{\text {high }} and CD69 ^("low "){ }^{\text {low }} groups (%high or %\% low, respectively). Then the log_(2)\log _{2} odds ratio of CD69 ^("high "){ }^{\text {high }} versus CD69 ^("low "){ }^{\text {low }} was calculated as log_(2)OR=log_(2){[%high//(1-%high)]//[%low//\log _{2} \mathrm{OR}=\log _{2}\{[\% h i g h /(1-\% h i g h)] /[\% l o w / ( 1-%1-\% low) ]}]\}. The correlation of technical replicates was plotted using GraphPad Prism 10.0. The top hits are recorded in table S10. 为了确定富集碱基,首先计算 CD69 ^("high "){ }^{\text {high }} 组和 CD69 ^("low "){ }^{\text {low }} 组各组的 %C > T\% \mathrm{C}>\mathrm{T} 或 %\%G > A\mathrm{G}>\mathrm{A} (分别为%高或 %\% 低)。然后计算 CD69 ^("high "){ }^{\text {high }} 与 CD69 ^("low "){ }^{\text {low }} 的 log_(2)\log _{2} 几率比为 log_(2)OR=log_(2){[%high//(1-%high)]//[%low//\log _{2} \mathrm{OR}=\log _{2}\{[\% h i g h /(1-\% h i g h)] /[\% l o w / ( 1-%1-\% 低) ]}]\} 。技术重复的相关性用 GraphPad Prism 10.0 绘制。表 S10 中记录了最高点击率。
Base editing validation 基础编辑验证
The following base editors are used in this study: pRDA_478 (Addgene 179096), pRDA 479 (Addgene 179099), pCAG-CBE4max-SpG-P2A-EGFP (Addgene: RTW4552/139998), pCAG-CBE4max-SpRY-P2A-EGFP (Addgene: RTW5133/ 139999), pCMV-T7-ABE8.20m-nSpCas9-NG-P2AEGFP (Addgene: KAC1164/185919), and pCMV-T7-ABE8.20m-nSpRY-P2A-EGFP (Addgene: KAC1335/185917). The validation sgRNAs are listed in table S11. The sgRNA sequences were cloned into pCMV-BFP-U6-sgRNA (Addgene: 196725, gift from B. Schmierer) or directly into 本研究使用了以下碱基编辑器:pRDA_478 (Addgene 179096)、pRDA 479 (Addgene 179099)、pCAG-CBE4max-SpG-P2A-EGFP (Addgene: RTW4552/139998)、pCAG-CBE4max-SpRY-P2A-EGFP (Addgene: RTW5133/ 139999)、pCMV-T7-ABE8.20m-nSpCas9-NG-P2AEGFP(Addgene:KAC1164/185919)和 pCMV-T7-ABE8.20m-nSpRY-P2A-EGFP(Addgene:KAC1335/185917)。表 S11 列出了验证 sgRNA。sgRNA 序列被克隆到 pCMV-BFP-U6-sgRNA 中(Addgene:196725,B. Schmierer 赠送)或直接克隆到 pCMV-T7-ABE8.20m-nSpRY-P2A-EGFP 中。
pRDA_478 or pRDA_479. Mutations for sgRNAs and bystander editing rates were quantified using CRISPResso2 (58). pRDA_478 或 pRDA_479。使用 CRISPResso2(58)对 sgRNA 的突变和旁观者编辑率进行了量化。
To validate MEK1 variants, 67 ng of cytidine base editor and 33 ng sgRNA plasmids were transfected into A375 cells per well in a 96 -well format. A nontargeting sgRNA was used as a control. After diversification for 3 days, the cells were selected with either 100 nM selumetinib or 5 nM trametinib for 14 days. The pre- and postselection substitution rates were analyzed by amplicon sequencing. All experiments were conducted in triplicates. 为了验证 MEK1 变体,以 96 孔格式将 67 ng 胞苷碱基编辑和 33 ng sgRNA 质粒转染到 A375 细胞中。非靶向 sgRNA 用作对照。多样化 3 天后,用 100 nM selumetinib 或 5 nM trametinib 选择细胞 14 天。通过扩增子测序分析了选择前和选择后的取代率。所有实验均以三重法进行。
For validation of SF3B1 variants, HEK293FT cells in a 96 -well format were transfected with 67 ng of the base editor-sgRNA plasmid and 33 ng of the minigene splicing reporter per well. A nontargeting sgRNA was used as a control. Cells were diversified for 3 days, then the GFP:mCherry ratio in each well was quantified by confocal microscopy using a custom cell segmentation and quantification pipeline. Briefly, individual cells were segmented by watershed segmentation using the mCherry channel. For each segmented cell, the total pixel area and mean intensity of the pixels were computed for GFP ( 488 nm ) and mCherry ( 561 nm ) channels to obtain a “pseudo-flow cytometry” dataset. The fluorescence background for each channel was subtracted from all conditions in that channel, and aggregated values for each condition were divided by area to obtain average fluorescence intensity. Standard deviation was computed by comparing average values in three technical transfection replicates. Editing at each sgRNA was quantified by amplicon sequencing of genomic DNA samples from each well. All experiments were conducted in triplicates. 为了验证 SF3B1 变体,在 96 孔格式的 HEK293FT 细胞中,每孔转染 67 ng 碱基编辑器-sgRNA 质粒和 33 ng 迷你基因剪接报告基因。非靶向 sgRNA 用作对照。细胞分化 3 天,然后使用定制的细胞分割和量化管道,通过共聚焦显微镜量化每孔中的 GFP:mCherry 比率。简而言之,使用 mCherry 通道通过分水岭分割法分割单个细胞。对于每个分割的细胞,计算 GFP(488 纳米)和 mCherry(561 纳米)通道的总像素面积和像素平均强度,以获得 "伪流式细胞仪 "数据集。从该通道的所有条件中减去每个通道的荧光背景,然后将每个条件的汇总值除以面积,得出平均荧光强度。通过比较三个技术转染重复的平均值计算标准偏差。通过对每个孔的基因组 DNA 样品进行扩增子测序,对每个 sgRNA 的编辑进行量化。所有实验均以三重复进行。
For validation of CD69 variants, 2mug2 \mu \mathrm{~g} of the base editor plasmid and 2mug2 \mu \mathrm{~g} of the sgRNA plasmid were nucleofected into 1.5 xx10^(6)K5621.5 \times 10^{6} \mathrm{~K} 562 cells using the SF Cell Line 4D-Nucleofector X Kit L according to the manufacturer’s protocol. At 24 hours after nucleofection, the cells coexpressing base editor and sgRNA were sorted on the basis of reporter expression. On day 4 after nucleofection, cells were stimulated with PMA/ionomycin for 2 to 3 hours, and the top 40%40 \% of CD69 high expression cells and bottom 20%20 \% of CD69 low expression cells were sorted using a Sony SH800 flow cytometer, collecting at least 10,000 cells per tube. Genomic DNA was isolated from the sorted cells using the QIAgen DNA Micro Kit (Cat# 56304) and prepared for amplicon sequencing using the protocol described in the “Highthroughput DNA sequencing of genomic DNA samples” section. The substitution rate at each locus was quantified using CRISPResso2. 为了验证CD69变体,使用SF Cell Line 4D-Nucleofector X Kit L按照制造商的方案将碱基编辑质粒 2mug2 \mu \mathrm{~g} 和sgRNA质粒 2mug2 \mu \mathrm{~g} 核染到 1.5 xx10^(6)K5621.5 \times 10^{6} \mathrm{~K} 562 细胞中。核感染 24 小时后,根据报告基因的表达对共表达碱基编辑器和 sgRNA 的细胞进行分选。核感染后第 4 天,用 PMA/ionomycin 刺激细胞 2-3 小时,用索尼 SH800 流式细胞仪分选顶部 40%40 \% 的 CD69 高表达细胞和底部 20%20 \% 的 CD69 低表达细胞,每管至少收集 10,000 个细胞。使用QIAgen DNA Micro试剂盒(Cat# 56304)从分选的细胞中分离基因组DNA,并按照 "基因组DNA样本的高通量DNA测序 "部分所述的方案准备扩增子测序。使用 CRISPResso2 对每个基因座的取代率进行量化。
Prime editing validation 主要编辑验证
The following prime editor plasmids are used in this study: pCMV-PEmax-P2A-hMLH1dn 本研究使用了以下质粒:pCMV-PEmax-P2A-hMLH1dn
(Addgene: 174828), pCMV-PEmax-P2A-GFP (Addgene: 180020), and pEFla-hMLHidn (Addgene: 174824). Desired pegRNA and nickase sgRNA sequences were designed using PrimeDesign (60). The engineered pegRNA (epegRNA) overhang was designed using pegLIT (61). Sequences of pegRNAs are shown in table S12. (Addgene:174828)、pCMV-PEmax-P2A-GFP(Addgene:180020)和 pEFla-hMLHidn(Addgene:174824)。使用 PrimeDesign(60)设计了所需的 pegRNA 和切口酶 sgRNA 序列。使用 pegLIT(61)设计了工程 pegRNA(epegRNA)悬空。pegRNA 序列见表 S12。
For prime editing validation of SF3B1 variants, HEK293FT cells in a 96 -well format were transfected with 150 ng of PEmax, 50 ng of epegRNA, 25 ng of nicking sgRNA, and 50 ng of minigene splicing reporter using 0.5 mu0.5 \mu of TransIT-LT1 per well. Cells were diversified for 3 days, then the GFP:mCherry ratio in each well was quantified by confocal microscopy as described above. Editing at each sgRNA was quantified by amplicon sequencing of genomic DNA samples from each well using CRISPResso2. 为了对SF3B1变体进行质粒编辑验证,采用96孔格式的HEK293FT细胞,每孔转染150纳克PEmax、50纳克epegRNA、25纳克nicking sgRNA和50纳克迷你基因剪接报告基因,使用 0.5 mu0.5 \mu TransIT-LT1。细胞分化 3 天,然后如上所述通过共聚焦显微镜量化每孔中的 GFP:mCherry 比率。使用 CRISPResso2 对每孔的基因组 DNA 样品进行扩增子测序,对每个 sgRNA 的编辑进行量化。
For prime editing validations for CD69 enhancer variants in K562 cells, 2mu2 \mu g of the prime editor plasmid, 1mug1 \mu \mathrm{~g} of hMLH1dn plasmid, 1mug1 \mu \mathrm{~g} of epegRNA plasmid, and 0.5 mug0.5 \mu \mathrm{~g} of nickase sgRNA plasmid were nucleofected in 1.5 xx10^(6)1.5 \times 10^{6} cells using SF Cell Line 4D-Nucleofector X Kit L according to the manufacturer’s protocols. After 24 hours, the cells that were positive for both prime editor and epegRNA were sorted according to the GFP and mCherry reporters and cultured in regular complete RPMI media. A second round of nucleofection and sorting was performed 4 days after transfection to increase prime editing efficiency. On day 5 after the second nucleofection, CD69 expression levels were quantified by flow cytometry. Genomic DNA was harvested from CD69 ^("high "){ }^{\text {high }} (top 40%) and CD69 ^("low "){ }^{\text {low }} (bottom 20%) cells, and the editing efficiency was quantified by performing amplicon sequencing using the protocols described above. The substitution rate at each locus was quantified using CRISPResso2. 在K562细胞中进行CD69增强子变体的质粒编辑验证时,使用SF Cell Line 4D-Nucleofector X Kit L将 2mu2 \mu g的质粒编辑器质粒、 1mug1 \mu \mathrm{~g} hMLH1dn质粒、 1mug1 \mu \mathrm{~g} epegRNA质粒和 0.5 mug0.5 \mu \mathrm{~g} 缺口酶sgRNA质粒核感染 1.5 xx10^(6)1.5 \times 10^{6} 细胞。24 小时后,根据 GFP 和 mCherry 报告分选素编辑器和 epegRNA 均为阳性的细胞,并在常规完全 RPMI 培养基中培养。转染 4 天后进行第二轮核转染和分选,以提高原核编辑效率。第二次核转染后第 5 天,用流式细胞术量化 CD69 的表达水平。从 CD69 ^("high "){ }^{\text {high }} (顶部 40%)和 CD69 ^("low "){ }^{\text {low }} (底部 20%)细胞中获取基因组 DNA,并采用上述方案进行扩增子测序,以量化编辑效率。使用 CRISPResso2 对每个基因座的取代率进行量化。
REFERENCES AND NOTES 参考文献和注释
D. M. Fowler, S. Fields, Deep mutational scanning: A new style of protein science. Nat. Methods 11, 801-807 (2014). doi: 10.1038//10.1038 / nmeth. 3027; pmid: 25075907 D.M. Fowler, S. Fields, Deep mutational scanning:蛋白质科学的新风格。Nat.Doi: 10.1038//10.1038 / nmeth.3027; PMID: 25075907
J. G. English et al., VEGAS as a platform for facile directed evolution in mammalian cells. Cell 178, 748-761.el7 (2019). doi: 10.1016/j.cell.2019.05.051; pmid: 31280962 J.G. English 等人,VEGAS 作为哺乳动物细胞中快速定向进化的平台。DOI:10.1016/j.cell.2019.05.051;PMID:31280962
H. Chen et al., Efficient, continuous mutagenesis in human cells using a pseudo-random DNA editor. Nat. Biotechnol. 38, 165-168 (2020). doi: 10.1038/s41587-019-0331-8; pmid: 31844291 H.Chen 等人,使用伪随机 DNA 编辑器在人类细胞中进行高效、连续的诱变。Nat.Biotechnol.DOI: 10.1038/s41587-019-0331-8; PMID: 31844291
A. Cravens, OK, Jamil, D. Kong, JT Sockolosky, CD Smolke, 聚合酶引导的碱基编辑可实现体内诱变和快速蛋白质工程。Nat. Commun.12, 1579 (2021).doi: 10.1038/s41467-021-21876-z;PMID:33707425 A.Cravens, OK, Jamil, D. Kong, JT Sockolosky, CD Smolke, 聚合酶引导的碱基编辑可实现体内诱变和快速蛋白质工程。Nat.Commun.12,1579 (2021).doi: 10.1038/s41467-021-21876-z;PMID:33707425
R. Cuella-Martin 等人,使用碱基编辑筛选对 DNA 损伤反应变体进行功能询问。单元格 184, 1081-1097 e19 (2021)。doi: 10.1016/j.cell.2021.01.041;PMID:33606978 R.Cuella-Martin 等人,使用碱基编辑筛选对 DNA 损伤反应变体进行功能询问。单元格 184, 1081-1097 e19 (2021)。doi: 10.1016/j.cell.2021.01.041;PMID:33606978
R. E. Hanna et al., 使用碱基编辑器筛选对人类变异进行大规模平行评估。单元格 184, 1064-1080.e20 (2021)。doi: 10.1016/j.cell.2021.01.012;PMID:33606977 R.E. Hanna et al.,使用碱基编辑器筛选对人类变异进行大规模平行评估。单元格 184,1064-1080.e20 (2021)。doi: 10.1016/j.cell.2021.01.012;PMID:33606977
M. Gavrilov et al., 工程解旋酶在 DNA 扩增中取代了热循环仪,同时保留了所需的 PCR 特性。Nat. Commun.13, 6312 (2022).doi: 10.1038/s41467-022-34076-0;PMID:36274095 M.Gavrilov et al.,工程解旋酶在 DNA 扩增中取代了热循环仪,同时保留了所需的 PCR 特性。Commun.13, 6312 (2022).doi: 10.1038/s41467-022-34076-0;PMID:36274095
A. C. Komor, YB Kim, M. S. Packer, J. A. Zuris, D. R. Liu, 基因组DNA中靶碱基的可编程编辑,无双链DNA切割。自然 533, 420-424 (2016)。doi: 10.1038/nature17946;PMID:27096365 A.C. Komor, YB Kim, M. S. Packer, J. A. Zuris, D. R. Liu, 基因组 DNA 中靶碱基的可编程编辑,无双链 DNA 切割。自然 533, 420-424 (2016)。
MS Dillingham、P. Soultanas、P. Wiley、MR Webb、DBwigley,定义 PcrA 解旋酶单链 DNA 结合位点中单个残基的作用。美国国家科学院院刊 98, 8381-8387 (2001)。doi: 10.1073/ pnas.131009598;PMID:11459979 MS Dillingham、P.Soultanas、P.Wiley、MR Webb、DBwigley,定义 PcrA 解旋酶单链 DNA 结合位点中单个残基的作用。美国国家科学院院刊 98, 8381-8387 (2001)。
J. Park 等人,PcrA 解旋酶通过以均匀的步骤卷入 DNA 来分解 RecA 细丝。细胞 142, 544-555 (2010)。doi: 10.1016/j.cell.2010.07.016;PMID:20723756 J.Park 等人,PcrA 解旋酶通过以均匀的步骤卷入 DNA 来分解 RecA 细丝。细胞 142, 544-555 (2010)。doi: 10.1016/j.cell.2010.07.016;PMID:20723756
TA Kunkel,考虑 DNA 聚合酶功能改变的癌症后果。癌细胞 3, 105-110 (2003)。doi: 10.1016/S1535-6108(03)00027-8;PMID:12620405
L. W. Koblan 等人,通过表达优化和祖先重建改进胞嘧啶和腺嘌呤碱基编辑器。Nat. 生物技术。36, 843-846 (2018).doi: 10.1038/nbt.4172;PMID:29813047 L.W. Koblan 等人,通过表达优化和祖先重建改进胞嘧啶和腺嘌呤碱基编辑器。Nat.生物技术。36, 843-846 (2018).doi: 10.1038/nbt.4172;PMID:29813047
MF Richter 等人,噬菌体辅助进化的腺嘌呤碱基编辑器,具有改进的 Cas 结构域兼容性和活性。Nat. 生物技术。38, 883-891 (2020).doi: 10.1038/ s41587-020-0453-z;PMID:32433547 MF Richter 等人,噬菌体辅助进化的腺嘌呤碱基编辑器,具有改进的 Cas 结构域兼容性和活性。Nat.生物技术。38, 883-891 (2020).doi: 10.1038/ s41587-020-0453-z;PMID:32433547
M. E. Neugebauer et al., 腺嘌呤碱基编辑器进化为具有低脱靶活性的小型、高效的胞嘧啶碱基编辑器。Nat. 生物技术。41, 673-685 (2023).doi: 10.1038/ s41587-022-01533-6;PMID:36357719 M.E. Neugebauer et al., 腺嘌呤碱基编辑器进化为具有低脱靶活性的小型、高效的胞嘧啶碱基编辑器。生物技术。41, 673-685 (2023).doi: 10.1038/ s41587-022-01533-6;PMID: 36357719
M. Gu, C. M. Rice,丙型肝炎病毒 NS3 解旋酶的三个构象快照揭示了棘轮易位机制。美国国家科学院院刊 107, 521-528 (2010)。doi: 10.1073/pnas.0913380107;PMID:20080715 M.Gu,C. M. Rice,丙型肝炎病毒 NS3 解旋酶的三个构象快照揭示了棘轮易位机制。美国国家科学院院刊 107,521-528 (2010)。doi: 10.1073/pnas.0913380107;PMID:20080715
J. L. Kim et al., 具有结合寡核苷酸的丙型肝炎病毒 NS3 RNA 解旋酶结构域:晶体结构提供了对展开模式的见解。结构 6, 89-100 (1998)。doi: 10.1016/S0969-2126(98)00010-0;PMID:9493270 J.L. Kim et al.,具有结合寡核苷酸的丙型肝炎病毒 NS3 RNA 解旋酶结构域:晶体结构提供了对展开模式的见解。结构 6,89-100 (1998)。doi: 10.1016/S0969-2126(98)00010-0;PMID:9493270
C. Yan, R. Wan, R. Bai, G. Huang, Y. Shi, Structure of a yeast activated spliceosome at 3.5"Å"3.5 \AA resolution. Science 353 , 904-911 (2016). doi: 10.1126/science.aag0291; pmid: 27445306 C.Yan, R. Wan, R. Bai, G. Huang, Y. Shi, Structure of a yeast activated spliceosome at 3.5"Å"3.5 \AA resolution.doi: 10.1126/science.aag0291; pmid: 27445306
R. B. Darman et al., Cancer-associated SF3B1 hotspot mutations induce cryptic 3 ’ splice site selection through use of a different branch point. Cell Rep. 13, 1033-1045 (2015). doi: 10.1016/j.celrep.2015.09.053; pmid: 26565915 R.B. Darman 等人,癌症相关 SF3B1 热点突变通过使用不同的分支点诱导隐性 3 ' 剪接位点选择。Cell Rep. 13, 1033-1045 (2015). doi: 10.1016/j.celrep.2015.09.053; pmid: 26565915
M. Seiler et al., Somatic mutational landscape of splicing factor genes and their functional consequences across 33 cancer types. Cell Rep. 23, 282-296.e4 (2018). doi: 10.1016/ j.celrep.2018.01.088; pmid: 29617667 M.Seiler等人,33种癌症类型中剪接因子基因的体细胞突变景观及其功能性后果。Cell Rep. 23, 282-296.e4 (2018). doi: 10.1016/ j.celrep.2018.01.088; pmid: 29617667
J. G. Tate et al., COSMIC: The Catalogue Of Somatic Mutations In Cancer. Nucleic Acids Res. 47, D941-D947 (2019). doi: 10.1093/nar/gky1015; pmid: 30371878 J.COSMIC: The Catalogue Of Somatic Mutations In Cancer.Nucleic Acids Res. 47, D941-D947 (2019). doi: 10.1093/nar/gky1015; pmid: 30371878
D. Inoue et al., Spliceosomal disruption of the non-canonical BAF complex in cancer. Nature 574, 432-436 (2019). doi: 10.1038/s41586-019-1646-9; pmid: 31597964 D.Inoue 等人,《非经典 BAF 复合物在癌症中的剪接体破坏》。Nature 574, 432-436 (2019). doi: 10.1038/s41586-019-1646-9; pmid: 31597964
H. Makishima et al., Mutations in the spliceosome machinery, a novel and ubiquitous pathway in leukemogenesis. Blood 119, 3203-3210 (2012). doi: 10.1182/blood-2011-12-399774; pmid: 22323480 H.Makishima et al.,《剪接体机制中的突变,白血病发生中一个新颖且无处不在的途径》(Mutations in the spliceosome machinery, a new and ubiquitous pathway in leukemogenesis)。DOI:10.1182/blood-2011-12-399774;PMID:22323480
K. North et al., Synthetic introns enable splicing factor mutation-dependent targeting of cancer cells. Nat. Biotechnol. 40, 1103-1113 (2022). doi: 10.1038/s41587-022-01224-2; pmid: 35241838 K.North 等人,合成内含子使剪接因子突变依赖性靶向癌细胞。Nat.Biotechnol.DOI:10.1038/s41587-022-01224-2;PMID:35241838
C. Cretu et al., Molecular architecture of SF3b and structural consequences of its cancer-related mutations. Mol. Cell 64 , 307-319 (2016). doi: 10.1016/j.molcel.2016.08.036; pmid: 27720643 C.Cretu 等人,SF3b 的分子结构及其癌症相关突变的结构性后果。Mol. Cell 64 , 307-319 (2016).DOI: 10.1016/j.molcel.2016.08.036; PMID: 27720643
Z. Chen et al., Integrative dissection of gene regulatory elements at base resolution. Cell Genomics 3, 100318 (2023). doi: 10.1016/j.xgen.2023.100318; pmid: 37388913 Z.Chen 等人,基因调控元件的碱基分辨率整合分析。doi: 10.1016/j.xgen.2023.100318; pmid: 37388913
J. D. Martin-Rufino et al., Massively parallel base editing to map variant effects in human hematopoiesis. Cell 186, 2456-2474.e24 (2023). doi: 10.1016/j.cell.2023.03.035; pmid: 37137305 J.D. Martin-Rufino 等人,大规模并行碱基编辑绘制人类造血过程中的变异效应。DOI:10.1016/j.cell.2023.03.035;PMID:37137305
J. A. Morris et al., Discovery of target genes and pathways at GWAS loci by pooled single-cell CRISPR screens. Science 380, eadh7699 (2023). doi: 10.1126/science.adh7699; pmid: 37141313 J.A. Morris 等人,通过集合单细胞 CRISPR 筛选发现 GWAS 基因位点的靶基因和通路。Doi: 10.1126/science.adh7699; pmid: 37141313
D. Cibrián, F. Sánchez-Madrid, CD69: From activation marker to metabolic gatekeeper. Eur. J. Immunol. 47, 946-953 (2017). doi: 10.1002/eji.201646837; pmid: 28475283 D.Cibrián, F. Sánchez-Madrid, CD69:从活化标记到代谢守门员。Eur.J. Immunol.47, 946-953 (2017). doi: 10.1002/eji.201646837; pmid: 28475283
D. Sancho, M. Gómez, F. Sánchez-Madrid, CD69 is an immunoregulatory molecule induced following activation. Trends Immunol. 26, 136-140 (2005). doi: 10.1016/ j.it.2004.12.006; pmid: 15745855 D.Sancho、M. Gómez、F. Sánchez-Madrid, CD69 是活化后诱导的免疫调节分子。Trends Immunol.26, 136-140 (2005). Doi: 10.1016/ j.it.2004.12.006; pmid: 15745855
T. Sathaliyawala et al., Distribution and compartmentalization of human circulating and tissue-resident memory T cell subsets. Immunity 38, 187-197 (2013). doi: 10.1016/ j.immuni.2012.09.020; pmid: 23260195 T.Sathaliyawala 等人,人类循环和组织驻留记忆 T 细胞亚群的分布和分区。Doi: 10.1016/ j.immuni.2012.09.020; pmid: 23260195
J. M. Schenkel, D. Masopust, Tissue-resident memory T cells. Immunity 41, 886-897 (2014). doi: 10.1016/ j.immuni.2014.12.007; pmid: 25526304 J.M. Schenkel, D. Masopust, 组织驻留记忆 T 细胞。doi: 10.1016/ j.immuni.2014.12.007; pmid: 25526304
T. Laguna et al., New insights on the transcriptional regulation of CD69 gene through a potent enhancer located in the conserved non-coding sequence 2. Mol. Immunol. 66, 171-179 (2015). doi: 10.1016/j.molimm.2015.02.031; pmid: 25801305 T.Laguna 等人,通过位于保守非编码序列 2 的强效增强子调控 CD69 基因转录的新见解。Mol.Immunol.66,171-179(2015)。DOI:10.1016/j.molimm.2015.02.031;PMID:25801305
S. Wahlen et al., The transcription factor RUNX2 drives the generation of human NK cells and promotes tissue residency. eLife 11, e80320 (2022). doi: 10.7554/eLife.80320; pmid: 35793229 S.eLife 11, e80320 (2022). doi: 10.7554/eLife.80320; pmid: 35793229
H. A. Rees, D. R. Liu, Base editing: Precision chemistry on the genome and transcriptome of living cells. Nat. Rev. Genet. 19, 770-788 (2018). doi: 10.1038/s41576-018-0059-1; pmid: 30323312 H.H. A. Rees, D. R. Liu, Base editing:活细胞基因组和转录组的精密化学。Nat.Rev. Genet.19, 770-788 (2018). DOI: 10.1038/s41576-018-0059-1; PMID: 30323312
S. J. Hendel, M. D. Shoulders, Directed evolution in mammalian cells. Nat. Methods 18, 346-357 (2021). doi: 10.1038/s41592-021-01090-x; pmid: 33828274 S.S. J. Hendel, M. D. Shoulders, 哺乳动物细胞的定向进化。Nat.DOI: 10.1038/s41592-021-01090-X; PMID: 33828274
R. S. Molina et al., In vivo hypermutation and continuous evolution. Nat. Rev. Methods Primers 2, 36 (2022). doi: 10.1038/s43586-022-00119-5; pmid: 37073402 R.S. Molina 等人,体内超突变与持续进化。Nat.DOI:10.1038/s43586-022-00119-5;PMID:37073402
S. Arslan, R. Khafizov, C. D. Thomas, Y. R. Chemla, T. Ha, Engineering of a superhelicase through conformational control. Science 348, 344-347 (2015). doi: 10.1126/science.aaa0445; pmid: 25883358 S.Arslan, R. Khafizov, C. D. Thomas, Y. R. Chemla, T. Ha, Engineering of a superhelicase through conformational control.doi: 10.1126/science.aaa0445; pmid: 25883358
D. Zhao et al., Glycosylase base editors enable C-to-A and C-to-G base changes. Nat. Biotechnol. 39, 35-40 (2021). doi: 10.1038/s41587-020-0592-2; pmid: 32690970 D.Zhao 等人,糖基化酶碱基编辑器实现了 C 到 A 和 C 到 G 的碱基变化。Nat.Biotechnol.DOI:10.1038/s41587-020-0592-2;PMID:32690970
I. C. Kurt et al., CRISPR C-to-G base editors for inducing targeted DNA transversions in human cells. Nat. Biotechnol. 39, 41-46 (2021). doi: 10.1038/s41587-020-0609-x; pmid: 32690971 I.C. Kurt 等人,CRISPR C-to-G 碱基编辑器在人类细胞中诱导有针对性的 DNA 逆转录。Nat.Biotechnol.DOI:10.1038/s41587-020-0609-x;PMID:32690971
H. Tong et al., Programmable A-to-Y base editing by fusing an adenine base editor with an N -methylpurine DNA glycosylase. H.Tong 等人,通过融合腺嘌呤碱基编辑器和 N -甲基嘌呤 DNA 糖基化酶实现可编程的 A-Y 碱基编辑。
Nat. Biotechnol. 41, 1080-1084 (2023). doi: 10.1038/s41587-022-01595-6; pmid: 36624150 Nat.Biotechnol.41, 1080-1084 (2023). doi: 10.1038/s41587-022-01595-6; pmid: 36624150
55. B. Bushnell, “BBMap: A fast, accurate, splice-aware aligner,” LBNL-7065E (Lawrence Berkeley National Lab, 2014); https:// www.osti.gov/biblio/1241166. 55.B. Bushnell, "BBMap:A fast, accurate, splice-aware aligner," LBNL-7065E (Lawrence Berkeley National Lab, 2014); https:// www.osti.gov/biblio/1241166.
56. H. Li, seqtk: Toolkit for processing sequences in FASTA/Q formats, GitHub (2023); https://github.com/lh3/seqtk. 56.H. Li, seqtk:处理 FASTA/Q 格式序列的工具包,GitHub (2023); https://github.com/lh3/seqtk。
57. B. Langmead, S. L. Salzberg, Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357-359 (2012). doi: 10.1038/ nmeth.1923; pmid: 22388286 57.B. Langmead, S. L. Salzberg, Fast gapped-read alignment with Bowtie 2.Nat.doi: 10.1038/ nmeth.1923; pmid: 22388286
58. K. Clement et al., CRISPResso2 provides accurate and rapid genome editing sequence analysis. Nat. Biotechnol. 37, 224-226 (2019). doi: 10.1038/s41587-019-0032-3; pmid: 30809026 58.K. Clement 等人,《CRISPResso2 提供准确快速的基因组编辑序列分析》。Nat.Biotechnol.37, 224-226 (2019). doi: 10.1038/s41587-019-0032-3; pmid: 30809026
59. D. Kim, J. M. Paggi, C. Park, C. Bennett, S. L. Salzberg, Graphbased genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37, 907-915 (2019). doi: 10.1038/s41587-019-0201-4; pmid: 31375807 59.D.Kim、J.M.Paggi、C.Park、C.Bennett、S.L.Salzberg,用 HISAT2 和 HISAT-genotype 进行基于图形的基因组比对和基因分型。Nat.Biotechnol.37, 907-915 (2019). doi: 10.1038/s41587-019-0201-4; pmid: 31375807
60. J. Y. Hsu et al., PrimeDesign software for rapid and simplified design of prime editing guide RNAs. Nat. Commun. 12, 1034 (2021). doi: 10.1038/s41467-021-21337-7; pmid: 33589617 60.J. Y. Hsu 等人,PrimeDesign 软件,用于快速简化素体编辑引导 RNA 的设计。Nat.12, 1034 (2021).doi: 10.1038/s41467-021-21337-7; pmid: 33589617
61. J. W. Nelson et al., Engineered pegRNAs improve prime editing efficiency. Nat. Biotechnol. 40, 402-410 (2022). doi: 10.1038/ s41587-021-01039-7; pmid: 34608327 61.J. W. Nelson 等人,Engineered pegRNAs Improved prime editing efficiency.Nat.Biotechnol.DOI: 10.1038/ S41587-021-01039-7; PMID: 34608327
62. D. Chen, Helicase-assisted continuous editing for programmable mutagenesis of endogenous genomes, Zenodo (2024); https://doi.org/10.5281/zenodo.11436318. 62.D. Chen, Helicase-assisted continuous editing for programmable mutagenesis of endogenous genomes, Zenodo (2024); https://doi.org/10.5281/zenodo.11436318.
^(1){ }^{1} Gene Regulation Observatory, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA. ^(2){ }^{2} Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA 02138, USA. ^(3){ }^{3} Systems, Synthetic, and Quantitative Biology PhD Program, Harvard University, Cambridge, MA 02138, USA. ^(4){ }^{4} Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA 02215, USA. ^(1){ }^{1} Gene Regulation Observatory, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA. ^(2){ }^{2} 干细胞和再生生物学系,哈佛大学,美国马萨诸塞州剑桥,02138。 ^(3){ }^{3} 系统、合成和定量生物学博士项目,哈佛大学,美国马萨诸塞州剑桥 02138。 ^(4){ }^{4} 癌症生物学系,丹娜-法伯癌症研究所,波士顿,马萨诸塞州 02215,美国。 ^(5){ }^{5} Department of Cell Biology and Pathology, Harvard Medical School, Boston, MA 02115, USA. ^(6){ }^{6} Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA 02138, USA. ^(7){ }^{7} Cancer Program, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA. ^(8){ }^{8} Cecil H. and Ida Green Center for Reproductive Biology Sciences, University of Texas Southwestern Medical Center, Dallas, TX 75390 , USA. ^(9){ }^{9} Department of Obstetrics and Gynecology, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA. ^(10){ }^{10} The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA. ^(5){ }^{5} 哈佛医学院细胞生物学和病理学系,波士顿,马萨诸塞州 02115,美国。 ^(6){ }^{6} 化学与化学生物学系,哈佛大学,美国马萨诸塞州剑桥市,02138。 ^(7){ }^{7} 癌症项目,麻省理工学院和哈佛大学布罗德研究所,美国马萨诸塞州剑桥 02142。 ^(8){ }^{8} 得克萨斯大学西南医学中心塞西尔-H.和艾达-格林生殖生物学科学中心,达拉斯,德克萨斯州,75390,美国。 ^(9){ }^{9} 德克萨斯大学西南医学中心妇产科,美国德克萨斯州达拉斯 75390。 ^(10){ }^{10} 诺和诺德基金会疾病基因组机制中心,麻省理工学院和哈佛大学布罗德研究所,美国马萨诸塞州剑桥 02142。
*Corresponding author. Email: chenf@broadinstitute.org (F.C.); bradley_bernstein@dfci.harvard.edu (B.E.B.) *通讯作者。电子邮件:chenf@broadinstitute.org (F.C.); bradley_bernstein@dfci.harvard.edu (B.E.B.) †\dagger These authors contributed equally to this work. †\dagger 这些作者对这项工作做出了同样的贡献。