这是用户在 2024-11-26 19:21 为 https://app.immersivetranslate.com/word/ 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?

A High-Resolution 20K SNP Array for Comprehensive Genotyping and Genetic Mapping in Nicotiana tabacum L.
一种用于烟草基因分型和遗传作图的高分辨率20K SNP芯片

Shizhou Yu1#*, Zhixiao Yang1#, Jie Zhang1, Linggai Cao1, Jie Liu1, Peng Lu2,3, Jiemeng Tao2,3, Jufen Wan1,4, Qingdong Zeng4, Tenghang Xu1,5, Peijian Cao2,3, Jingjing Jin2,3*, Xueliang Ren1*
于世1#*志晓1#,张杰11,刘杰1,陆鹏2,陶2,3,万1,4,曾4腾航<span id=47>徐1,523,金晶晶2,3*任学1*

1 Molecular Genetics Key Laboratory of China Tobacco, Guizhou Academy of Tobacco Science, Guiyang 550081, China
1贵州省烟草科学研究院中国烟草分子遗传学重点实验室

2 Beijing Life Science Academy, Beijing 102200, China
2北京生命科学院,北京102200

3China Tobacco Gene Research Center, Zhengzhou Tobacco Research Institute of CNTC, Zhengzhou 450001, China
3中国烟草基因研究中心,中国烟草集团郑州烟草研究所,郑州450001

4 State Key Laboratory for Crop Stress Resistance and High-Efficiency Production, Northwest A&F University, Yangling, 712100, China
4西北农林科技大学作物抗逆性与高效生产国家重点实验室,杨凌712100

5 College of Agronomy and Biotechnology, Hebei Normal University of Science and Technology, Qinhuangdao 066004, China
5河北科技师范学院农学与生物技术学院秦皇岛066004

#These authors contributed equally to this work.
#这些作者对这项工作做出了同等的贡献。

*Correspondence
* 通信

Email:
电邮地址:

Shizhou Yu, yusz@nwafu.edu.cn
于世洲,yusz@nwafu.edu.cn

Zhixiao Yang, linyingxian2006@126.com
ZhixiaoYang,linyingxian2006@126.com

Jie Zhang, zhechen361023@163.com
张杰,zhechen361023@163.com

Linggai Cao, caolinggai@126.com
Linggai Cao,www.example.com

Jie Liu, jackdnew@163.com
Jie Liu,www.example.com

Peng Lu, penglu2004@hotmail.com
彭璐penglu2004@hotmail.com

Jiemeng Tao, taojiemeng_zz@163.com
陶洁梦,www.example.com

Jufen Wan, 2236202754@qq.com
万菊芬,2236202754@qq.com

Qingdong Zeng, zengqd@nwafu.edu.cn
曾庆东,www.example.com

Tenghang Xu, pfgipf3089@163.com
徐腾航,www.example.com

Peijian Cao, caopj@blsa.com.cn
曹培建,www.example.com

Jingjing Jin, jinjj@ztri.com.cn
Jingjing Jin,www.example.com

Xueliang Ren, renxuel@126.com
任学良,www.example.com

Abstract
摘要

Nicotiana tabacum (2n = 4x = 48), cultivated in approximately 120 countries, serves as both an economically important non-food crop and a model plant. Despite the abundance of tobacco genomic resources, efficiently genotyping novel germplasm and breeding progeny remains challenging. Here, we introduced the Ta-LD-SC, a 20K SNP Affymetrix Axiom® array designed for large-scale single nucleotide polymorphism (SNP) genotyping of tobacco species. The final design included 20,213 unique and informative SNPs, covering over 90% of the tobacco genome (Nitab4.5 and NtaSR1), with probes uniformly distributed at a density of no more than 5 SNPs per 200 Kb. To validate effectiveness of the Ta-LD-SC array, we genotyped a set of 866 tobacco accessions (natural population, NP) and 288 F2 individuals derived from the cross between K326 and Oxford 26 (genetic population, GP). The genotyping results revealed a high 'PolyHighResolution' SNP rate (79.97%) among the NP, and repeated experiments have shown that the technical error rate was less than 1%. Additionally, we explored the population structure of the NP using the Ta-LD-SC array and identified 62 important genes associated with eight traits via genome-wide association studies, demonstrating the array's usefulness in genetic research. Furthermore, we constructed a high-density genetic map for the GP, comprising 4,553 SNPs spanning 6,606.08 cM, significantly advancing tobacco genome assembly. Taken together, the newly developed SNP array could serve as a vital tool in tobacco genetic research and breeding practices in future.
烟草(2n = 4x = 48)在大约120个国家种植,既是经济上重要的非粮食作物,也是模式植物。尽管烟草基因组资源丰富,但有效地对新种质和育种后代进行基因分型仍然具有挑战性。在这里,我们介绍Ta-LD-SC,一种设计用于烟草物种的大规模单核苷酸多态性(SNP)基因分型的20 K SNP Affytechnic Axiom®阵列。最终的设计包括d20,213个独特的和信息丰富的SNP,覆盖超过90%的烟草基因组(Nitab 4.5和NtaSR 1),探针以每200 Kb不超过5个SNP的密度均匀分布。为了验证Ta-LD-SC阵列的有效性我们组866份烟草种质(自然群体,NP)和288份来自K326和Oxford 26之间杂交的F2个体(遗传群体,GP)进行了基因分型基因分型结果显示高的“PolyHighResolution”SNP率(7997%重复实验表明技术误差率小于1%此外,我们还利用Ta-LD-SC阵列研究了NP的群体结构,并通过全基因组关联研究确定了与8个性状相关的62个重要基因,证明了该阵列在遗传研究中的有用性。此外,我们构建了GP的高密度遗传图谱,包括4,553个SNP,跨度为6,606.08 cM,显着推进烟草基因组组装总之,新开发的SNP阵列可以作为未来烟草遗传研究和育种实践的重要工具

Keywords: cultivated tobacco, SNP array, population structure, differentiation patterns, breeding and genetic improvement.
关键词:栽培烟草,SNP阵列,群体结构,分化模式,育种和遗传改良。

Introduction
介绍

Tobacco (Nicotiana tabacum L.), originating from Middle and South America, was introduced to Europe in the early 1500s by Columbus and his successors, along with crops such as maize, potato, tomato, eggplant, and pepper (Ivanov et al., 2020). Today, As one of the most economically important non-food crops, tobacco is cultivated in 120 countries (Sierro et al., 2014). According to FAOSTAT 2022, global tobacco production has reached about 5.8 million tons of leaves, grown in 3.1 million hectares (http://www.fao.org/faostat/en/#data/QCL). Based on the method of curing and biochemical characteristics, tobacco is classified into several major types, including flue-cured, sun-/air-cured, burley, cigar, and oriental tobacco (Fricano et al., 2012; Yang et al., 2007). Like other crops, breeding methods for tobacco began in the early 20th century. Since the 1940s, significant progress has been made in increasing yield and disease resistance for flue-cured and burley tobacco. However, little change has been observed for sun-/air-cured, cigar, and oriental tobacco over the past century, with many traditional varieties still widely used(Lewis and Nicholson, 2007; Moon et al., 2009). Nevertheless, compared with flue-cured and burley tobacco, their cultivation area remains relatively small, comprising less than 10% of the global tobacco cultivation area. Currently, how to effectively utilize these rare or endangered tobacco germplasm resources to breed superior tobacco varieties is a crucial issue for tobacco biologists.
烟草(Nicotiana tabacum L.),起源于中美洲和南美洲,在16世纪初由哥伦布及其继任者与玉米、马铃薯、番茄、茄子和胡椒等作物一起引入欧洲(Ivanov et al.,2020年)。今天,作为经济上最重要的非粮食作物之一,烟草在约120个国家种植(Sierro等人,2014年)。根据FAOSTAT 2022,全球烟草产量已达到约580万吨,种植面积为310万公顷(www.example.com)。 基于烘烤方法和生物化学特性,烟草被分为几种主要类型,包括烤烟、晒制/风干、白肋烟、雪茄和东方烟草(Rupano等人,2012年; Yang等人,2007年)其他作物一样,烟草的育种方法始于世纪初20世纪40年代以来,烤烟和白肋烟在提高产量和抗病性方面取得了重大进展。然而,在过去的世纪中,对于日光/风干烟草、雪茄和东方烟草几乎没有观察到变化,许多传统品种仍然被广泛使用(刘易斯和尼科尔森,2007年; Moon等人,2008年)。,2009年)。然而,与烤烟和白肋烟相比,其种植面积仍然相对较小,仅占全球烟草种植面积的不到10%如何有效地利用这些珍稀濒危烟草种质资源,培育出优良的上级烟草品种,是烟草生物学家们面临的一个重要课题

Breeding and selection of superior genotypes, along with intensified cultivation practices, have reduced genetic diversity among cultivated tobacco genotypes (Lewis, 2023; Lewis and Nicholson, 2007). For instance, by analyzing over 300 modern flue-cured tobacco varieties developed by breeders from the USA and China, Wang and Zhou (1995) revealed that more than 97% of them originated from a limited number of breeding backbone parents. Similarly, Yang et al (2013) found that most tobacco varieties cultivated in China since 1983 shared a genetic lineage with G28, K326, NC89, and NC82. Therefore, the conservation and utilization of genetic resources to improve cultivated tobacco genotypes are becoming increasingly crucial. The Tobacco Research Institute of the Chinese Academy of Agricultural Sciences (TRI) maintains the world's largest tobacco collection, with over 5,300 accessions (Wang et al., 2014; Jiao et al., 2019), significantly larger than repositories in the United States, Turkey, Poland, Zimbabwe and others (Lewis and Nicholson, 2007; Czubaeka, 2022; Saygili et al., 2021; Shava et al., 2020). Between 1998 and 2000, TRI identified 446 core tobacco germplasms based on variety type, origin, characteristics, and ecological distribution (Wang et al., 2014). However, this selection strategy may be inefficient and of limited value, especially when pedigree information is lacking, and phenotype data is derived solely from single-environment assessments.
沿着强化栽培实践的上级基因型的育种和选择降低了栽培烟草基因型之间的遗传多样性(刘易斯,2023;刘易斯和尼科尔森,2007)。例如,通过分析来自美国和中国的育种家开发的300多个现代烤烟品种,Wang和Zhou1995)发现,其中97%以上来自数量有限的育种骨干亲本。类似地,Yang等人2013)发现,自1983年以来在中国种植的大多数烟草品种与G28、K326、NC 89和NC 82共享遗传谱系因此,保护和利用烟草遗传资源改良烟草栽培品种的基因型就显得越来越重要。中国农业科学院烟草研究所(TRI)拥有世界上最大的烟草收藏,拥有超过5,300份Wang et al. ,2014;Jiao等人,2019年)明显大于美国,土耳其,波兰,津巴布韦和其他国家的储存库(刘易斯和尼科尔森,2007年;Czubaeka,2022年; Saygili等人,2021; Shava等人,2020年)。 在1998年至2000年期间,TRI根据品种类型、起源、特征和生态分布鉴定了446种核心烟草种质Wang et al. ,2014年。然而,这种选择策略可能是低效的和有限的价值,特别是当系谱信息是缺乏的,表型数据是来自单一的环境评估。

The advancement of sequencing technology has opened a new era to identify germplasm resources and discover essential functional genes by large-scale genotyping of thousands of genome-wide molecular markers(Marrano et al., 2019). Molecular markers of tobacco have evolved from Restriction Fragment Length Polymorphism (RFLP) markers, which are initially used to study a few of cloned genes (Widrlechner, 1995), PCR-based markers such as Random Amplified Polymorphic DNA (RAPD) (Arslan and Okumus, 2006; D’hoop et al., 2010; Denduangboripant et al., 2010; Evanno et al., 2005; Xu et al., 1998; Del Piano et al., 2000; Sarala and Rao, 2008; SivaRaju et al., 2008; Zhang et al., 2005; Zhang et al., 2008), Amplified Fragment Length Polymorphism (AFLP) (Liu et al., 2009; Huang et al., 2008; Zhang et al., 2008), Simple Sequence Repeats (SSR) (Bindler et al., 2011; Bindler et al., 2007; Cai et al., 2015; Madhav et al., 2015; Tong et al., 2016; Tong et al., 2012), Inter-Simple Sequence Repeats (ISSRs) (Cai et al., 2015; Qi et al., 2006), Interspersed Repetitive Sequence PCR (IRAP) (Yang et al., 2007), and Sequence-Related Amplified Polymorphism (SRAP) (Nie et al., 2012). However, these markers are labor-intensive, time-consuming, not uniformly distributed across the genome, and thus have limited utility in implementing new genome-based approaches such as genome selection (GS) and association mapping. Compared to these markers, Single Nucleotide Polymorphisms (SNPs) are preferred due to their cost-effectiveness and relative stability. Recent studies have identified a large number of SNPs in tobacco (Thimmegowda et al., 2018; Tong et al., 2020b; Xiao et al., 2015), which are being utilized to characterize germplasm, develop molecular maps, and investigate the structure and organization of the genome. However, these genome-wide SNPs are primarily acquired through whole-genome resequencing techniques, and their dependability mainly relies upon the quality of the reference genome. Moreover, they are influenced by the extensive repetitiveness and polyploidy features of tobacco genome. In the absence of supplementary experimental validation, these markers remain inaccessible for direct application or exploitation by other researchers. Presently, there is still a significant gap in tobacco genetic research between accessible, streamlined, and efficient SNP detection and genotyping methodology.
测序技术的进步开启了通过对数千个全基因组分子标记进行大规模基因分型来鉴定种质资源和发现必需功能基因的新时代(Marrano et al.,2019年)烟草的分子标记已经从最初用于研究一些克隆基因的限制性片段长度多态性(RFLP)标记(Widrlechner,1995)、基于PCR的标记如随机扩增多态性DNA(RAPD)(Arslan和Okumus,2006; D 'hoop等人,2010; Denduangboripant等人,2010; Evanno等人,2005年;Xu等人,1998;DelPiano等人,2000; Sarala and Rao,2008; SivaRaju et al.,2008; Zhang等人,2005; Zhang等人,2008)、扩增片段长度多态性(AFLP)(Liu等人,2009;Huang等人,2008; Zhang等人,2008)、简单序列重复(SSR)(Bindler等人,2011; Bindler等人,2007; Cai等人,2015; Madhav等人,2015年; Tong等人,2016年; Tong等人,2012)简单序列间重复(ISSR)(Cai等人,2015年;Qi等人,2006间隔重复序列PCR(IRAP)(Yang et al.,2007)序列相关扩增多态性(SRAP)(Nie等人,2012年)。 然而,这些标记是劳动密集型的,耗时的,不均匀地分布在整个基因组中,因此在实施新的基于基因组的方法,如基因组选择(GS)和关联作图的效用有限。与这些标记物相比,单核苷酸多态性(SNP)由于其成本效益和相对稳定性而被优选。最近的研究已经鉴定了烟草中大量SNP(Thimmegowda等人,2018年; Tong等人,2020 b; Xiao等人,2015),其被用于表征种质、开发分子图谱以及研究基因组的结构和组织。然而,这些全基因组SNP主要通过全基因组重测序技术获得,并且它们的可靠性主要依赖于参考基因组的质量。 此外,它们还受到烟草基因组广泛重复性和多倍性特征的影响。在没有补充实验验证的情况下,这些标记仍然无法直接应用或开发其他研究人员。目前,烟草遗传学研究中的SNP检测和基因分型方法之间仍存在显著差距。

Although the development of an SNP array can address these issues compared to the whole-genome resequencing methodology (Merot-L'anthoene et al., 2019), the distinct characteristics of the tobacco genome present constraints to its widespread application. With its genome size ranging from 3.99 Gb to 4.5 Gb, a significant proportion of repetitive elements exceeding 70%, and heterologous polyploidy (2n = 4x = 48) (Edwards et al., 2017; Sierro et al., 2024; Sierro et al., 2014; Wang et al., 2024), the application of whole-genome SNP markers for tobacco genetic research is still limited (Ikram et al., 2022; Leventhal et al., 2022; Tong et al., 2020a; Tong et al., 2020b; Xu et al., 2022). Here, we presented the development and validation of a tobacco 20K SNP Affymetrix Axiom® array, named as Ta-LD-SC, comprising 20,213 unique and informative SNPs spanning the entire tobacco genome. Initially, the SNP positions and probe sequences were based on the Nitab4.5 genome (Edwards et al., 2017), and we then optimized them to align with the latest NtaSR1 genome sequence (Wang et al., 2024). In addition, the Ta-LD-SC array was proven by various genetic analyses, including tobacco germplasm characterization, Genome-Wide Association Studies (GWAS), and genetic linkage map construction. The deployment of the Ta-LD-SC array makes a significant advancement in tobacco genetic research and breeding practices.
尽管与全基因组重测序方法相比,SNP阵列的开发可以解决这些问题(Merot-L 'anthoene et al.烟草基因组的独特特征限制了其广泛应用。其基因组大小范围为3.99 Gb至4.5 Gb,重复元件的显著比例超过70%,并且异源多倍体(2n = 4x = 48)(Edwards et al.,2017; Sierro等人,2024; Sierro等人,2014; Wang等人,2024),全基因组SNP标记在烟草遗传研究中的应用仍然有限(Ikram et al.,2022; Leventhal等人,2022年; Tong等人,2020 a; Tong等人,2020 b; Xu等,2022年)。 在这里,我们艾德介绍了烟草20 K SNP Affyssin Axiom ®阵列的开发和验证命名为Ta-LD-SC,包括跨越整个烟草基因组的20,213个独特的和信息丰富的SNP。最初,SNP位置和探针序列基于Nitab 4.5基因组(Edwards等人,2017),然后我们优化它们以与最新的NtaSR 1基因组序列比对(Wang et al.,2024年)。此外,Ta-LD-SC阵列还得到了各种遗传分析的证实,包括烟草种质鉴定、全基因组关联研究(GWAS)和遗传连锁图谱构建。 Ta-LD-SC阵列的部署在烟草遗传研究和育种实践中取得了重大进展。

Materials and Methods
材料和方法

2.1 Plant material
2.1植物材料

The current study utilized two experimental groups: the NP and the GP of N. tabacum germplasm. The NP group comprised 866 tobacco accessions, while the GP group included 288 F2 individuals. These groups were employed to design and validate the tobacco 20K SNP Affymetrix Axiom® array (Table S5). The 866 tobacco accessions were constituted of 496 flue-cured tobaccos, 348 sun-/air-cured tobaccos, 15 burley tobaccos, 4 cigar tobaccos, and 3 oriental tobaccos. Among these, 416 accessions were from the important germplasm representing most of the genetic diversity in the 5,700 accessions held by TRI (Wang et al., 2014). The flue-cured tobaccos included landraces from different Chinese regions, modern cultivars, and cultivars introduced from the United States, Canada, Japan, etc. The 288 F2 individuals formed a genetic subpopulation, known as TGP03, derived from the cross between K326 and Oxford 26. These individuals were selected for SNP array evaluation and constructing a genetic map.
本研究采用了两个实验组:NP和GP的N。烟草种质NP组包括866份烟草材料,而GP组包括288个F2个体。采用这些组来设计和验证烟草20 K SNP Affyssin Axiom®阵列(表S5)。866份烟草材料中,烤烟496份,晒烟348份,白肋烟15份,雪茄烟4份,东方烟3份。 其中,416份种质来自代表TRI所持有的5,700份种质中的大部分遗传多样性的重要种质Wang等人,2014年)。烤烟品种包括来自中国各地的地方品种、现代栽培品种和从美国、加拿大、日本等地引进的栽培品种。288个F2个体形成了一个遗传亚群,称为TGP 03,来自K326和Oxford 26的杂交。选择这些个体进行SNP阵列评估并构建遗传图谱。

2.2 SNP detection and filtering
2.2SNP检测和过滤

150 tobacco accessions were selected for SNP calling through genome resequencing (Yu et al., 2023). SNP calling was performed by mapping the short reads to the reference genome (Edwards et al., 2017) using the workflow that had been established earlier (Yu et al., 2017). The process for SNP detection and filtering is shown in Figure 1. The sites with variant quality 50, mapping quality 20, 5 read depth 200, and genotype quality 20 were retained. A set of 1,216,749 polymorphic SNPs were identified according to the criteria of ‘‘minor allele frequency (MAF) > 0.05, missing rate < 0.1, Hardy-Weinberg equilibrium (HWE) 0.05, 0.2 polymorphic information content (PIC)0.45, heterozygosity frequency (HF) 0.1.
通过基因组重测序选择150个烟草种质用于SNP调用Yu等人,2023。通过将短读段映射到参考基因组来进行SNP识别(Edwards等人,2017)使用之前建立的工作流程Yuet al.,2017年)。SNP检测和过滤的过程图1所示。保留具有变体质量 50、作图质量 205 读取深度 200和基因型质量 20的位点。根据“次要等位基因频率(MAF)% 3E 0. 0”标准,确定了一组1,216,749个多态性SNPs。05,缺失率< 0.1,Hardy-Weinberg平衡(HWE) 0.05,0.2 多态信息含量(PIC) 0.45,杂合频率(HF) 0.1。

2.3 SNP selection and SNP array design
2.3SNP选择和SNP阵列设计

To select SNPs for the array, the initial set of 1,216,749 raw SNPs underwent a comprehensive filtering process. First, we extracted the 50-bp flanking sequences surrounding each SNP and performed sequence homology alignment against the reference genome using BLASTn software (Camacho et al., 2009). This step excluded SNPs located within duplicated regions of the genome, ensuring the retention of unique, non-duplicated SNP loci. Subsequently, we employed the SNP_Primer_Pipeline (https://github.com/pinbo/SNP_Primer_Pipeline) to design KASP primers for all raw SNPs and to identify the intersection of unique SNPs obtained from the BLAST step. Next, we partitioned the genome into 200-bp windows, moving in 50-bp increments, and tallied the SNPs filtered in the preceding steps within each window. Priority was given to SNPs in windows with 5 or fewer sites. For windows with more than 5 SNPs, several criteria were considered. We ensured the absence of any SNP, INDEL, or SSR within a 50-bp region and prioritized SNPs from coding sequences (CDS) of genes identified across the genome. We calculated r² values between SNPs within each window, focusing on maximizing r² values among retained SNPs. SNPs with higher PIC values were favored, and higher priority was given to mutation types A/G, A/C, T/G, and T/C over A/T and G/C. Additionally, we retained SNPs with flanking sequences having a GC content within 30% to 70%. High-quality SNPs meeting these criteria, including important markers, were submitted to the Affymetrix Axiom® myDesign GW bioinformatics pipeline provided by Thermo Fisher Scientific Inc., USA. After careful evaluation, 20,213 recommended SNPs were retained based on the Affymetrix Bioinformatics Service results (Table S1), named as Ta-LD-SC array.
为了选择用于阵列的SNP,对1,216,749个原始SNP的初始集合进行全面的过滤过程。首先,我们提取围绕每个SNP的50-bp侧翼序列,并使用BLASTn软件(Camacho et al.,2009年)的报告。这一步骤排除了位于基因组重复区域内的SNP,确保保留了独特的非重复SNP基因座。随后,我们采用SNP_Primer_Pipelinehttps://github.com/pinbo/SNP_Primer_Pipeline)来设计所有原始SNP的KASP引物,并鉴定从BLAST步骤获得的独特SNP的交叉。接下来,我们将基因组划分为200 bp的窗口,以50 bp为增量移动,并在每个窗口内统计在前面步骤中过滤的SNP。 优先考虑5个或更少位点的窗口中的SNP。对于具有超过5个SNP的窗口,考虑几个标准。我们确保在50 bp区域内不存在任何SNP、INDEL或SSR,并优先考虑来自整个基因组中鉴定的基因的编码序列(CDS)的SNP。我们计算了每个窗口内SNP之间的r ²值,重点是最大化保留SNP之间的r ²值。具有较高PIC值的SNP受到青睐,并且对突变类型A/G、A/C、T/G和T/C给予了高于A/T和G/C的优先级。此外,我们保留了具有GC含量在30%至70%内的侧翼序列的SNP。将满足这些标准的高质量SNP(包括重要标志物)提交给由Thermo Fisher Scientific Inc.提供的Affyphidaxiom® myDesign GW生物信息学管道,USA. 经过仔细评估后,基于Affytechnology Bioinformatics Service结果(表S1)保留了20,213个推荐的SNP命名为Ta-LD-SC阵列

2.4 SNP array evaluation
2.4SNP阵列评估

To assess the efficacy of the Ta-LD-SC array for tobacco, we applied it to genotype different tobacco accessions in the diverse Nicotiana germplasm understudy. Genomic DNA was extracted from frozen young leaves using the CTAB method, ensuring a concentration of ≥10 ng/μl and a total amount of ≥600 ng. The DNA fragments should meet quality standards, including a molecular weight of ≥10 Kb, OD260/OD280 ratios between 1.7 and 2.1, and OD260/OD230 ratios of ≥1.4. The DNA was then hybridized to the Ta-LD-SC array.
为了评估Ta-LD-SC阵列对烟草的功效,我们将其应用于在研究的不同烟草种质中对不同的烟草种质进行基因分型。采用CTAB法从冷冻嫩叶中提取基因组DNA,确保浓度≥10 ng/μl,总量≥600 ng。DNA片段符合质量标准,包括分子量≥10Kb、OD 260/OD 280比在1.7至2.1之间、OD 260/OD 230比≥1.4。然后将DNA与Ta-LD-SC阵列杂交。

Allele calling was performed by generating '.CEL files' with Axiom Analysis Suite (version 5.3). Quality control parameters for samples were set with a DQC value of ≥0.82 and a call rate (CR) of ≥0.95. SNP quality control thresholds were set using default parameters for polyploid special types: cr-cutoff ≥95; fld-cutoff ≥3.6; het-so-cutoff ≥-0.3; hom-ro-1-cutoff ≥0.6; hom-ro-2-cutoff ≥0.3; hom-ro-3-cutoff ≥-0.9. Considering the self-pollinating nature of tobacco, the 'Inbred penalty score' parameter was adjusted to 4.
通过用Axiom Analysis Suite(版本5.3)生成“.CEL文件”来进行等位基因调用。样品的质量控制参数设定为DQC值≥0.82,检出率(CR)≥0.95。使用多倍体特殊类型的默认参数设置SNP质量控制阈值:cr-截止值≥95;fld-截止值≥3.6; het-so-截止值≥-0.3; hom-ro-1-截止值≥0.6; hom-ro-2-截止值≥0.3; hom-ro-3-截止值≥-0.9。考虑到烟草自花授粉性质,将“近交惩罚分数”参数调整为4。

SNPs were categorized into six groups: ‘PolyHighResolution’, ‘OTV’, ‘NoMinorHom’, ‘CallRateBelowThreshold’, ‘MonoHighResolution’, and ‘Other’. The first three were classified as high-quality, while the rest were considered lower-quality genotype calls (Burridge et al., 2024; Merot-L'anthoene et al., 2019; Wei et al., 2022). To ensure the most accurate results in the current study, we specifically selected SNPs labeled as 'PolyHighResolution' that had consistent genome positions in both Nitab4.5 and NtaSR1 genome for subsequent analysis.
将SNP分为六组:“PolyHighResolution”、“0 TV”、“NoMinorHom”、“CallRateBelowThreshold”、“MonoHighResolution”和“Hom”。前三个被归类为高质量,而其余的被认为是低质量的基因型调用(Burridge等人,2024; Merot-L 'anthoene等人,2019年; Wei等人,2022年)为了确保当前研究中最准确的结果,我们特别选择了在Nitab4.5和NtaSR 1基因组中具有一致基因组位置的标记为“PolyHighResolution”的SNP用于后续分析。

To evaluate technical variation, we randomly selected 127 individuals from the NP experimental group (Table S5) and repeated the genetic profiling using the Ta-LD-SC array. Among the 20,213 selected SNPs for array design, a subset of 3,209 SNPs was validated using KASP genotyping at the LGC Genomics platform, following their specific protocol.
为了评估技术变异,我们从NP实验组中随机选择了127个个体(表S5),并使用Ta-LD-SC阵列重复了遗传谱分析。在用于阵列设计的20,213个选择的SNP中,在LGC Genomics平台上使用KASP基因分型按照其特定方案验证了3,209个SNP的子集。

2.5 SNP array application
2.5SNP阵列应用

2.5.1 Population structure analysis
2.5.1人口结构分析

The genotype file for the 866 tobacco accessions, generated by the Axiom Analysis Suite (5.3), was first converted into 'ped' and 'map' files of the Plink file type using an in-house Perl script. Subsequently, PLINK v2.0 (https://www.cog-genomics.org/plink/2.0/) was used to derive the VCF file, facilitating downstream analysis and data manipulation for the study. To assess population structure, Admixture was employed, configuring the number of clusters (K) from 2 to 10 (Alexander et al., 2009). Principal components analysis (PCA) was conducted using GCTA software (Yang et al., 2011). The p-distance metric was utilized to construct NJ phylogenetic trees with 1000 bootstrap replicates via MEGA-CC software (Kumar et al., 2012). LD estimation and LD decay analysis followed methodologies outlined by Yu et al (Yu et al., 2020).
首先使用内部Perl脚本将Axiom Analysis Suite(5.3)生成的866个烟草种质的基因型文件转换为Plink文件类型的“ped”和“map”文件。随后,使用PLINK v2.0(https://www.cog-genomics.org/plink/2.0/)导出VCF文件,便于研究的下游分析和数据操作。为了评估群体结构,采用混合物,将簇的数目(K)配置为2至10(亚历山大等人,2009年)的报告。使用GCTA软件进行主成分分析(PCA)(Yang等人,2011年)。利用p-距离度量通过MEGA-CC软件构建具有1000个自举重复的NJ系统发生树(Kumar et al.,2012年)。 LD估计和LD衰减分析遵循Yu等人概述的方法(Yu等人,2020年)

2.5.2 Genome-wide association study(GWAS)
2.5. 2全基因组关联研究(GWAS)

In the year 2022, all the tobacco accessions of the NP were cultivated using potting methods. Specifically, three plants from each accession were individually planted in three round pots (42 cm diameter) in Zunyi (112°13′ E, 24°53′ N). Throughout the growing season, various agronomic traits were recorded, including plant height (PH), leaf number (LN), stem girth (SG), internodal distance (ID), leaf length (LL), leaf width (LW), days from transplanting to budding (DB), and weight of axillary bud (WB). GWAS was performed using the mixed linear model (MLM) with GEMMA software (Zhou and Stephens, 2012). Significant SNPs identified from the GWAS results were presented using a Manhattan plot, while P-value distributions were visualized through a quantile-quantile plot (Q-Q plot), both generated using the qqman package in R ( 4.1.2). A significance threshold of 4.20 (-log10(1/n), n represents the total number of high-quality SNPs), was set (Wang et al., 2016). Candidate genes located within 500 kb upstream and downstream of significant SNP associations for each trait were identified regarding the NtaSR1 genome (Wang et al., 2024).
在2022年,NP的所有烟草种质使用盆栽方法栽培。具体地,将来自每个加入物的三株植物分别种植在遵义(112°13′E,24°53′ N)的三个圆形盆(直径42 cm)中。在整个生长季节中,记录各种农艺性状包括株高(PH)、叶片数(LN)、茎围(SG)、节间距(ID)、叶长(LL)、叶宽(LW)、从移植到出芽的天数(DB)和腋芽重量(WB)。使用GEMMA软件(Zhou和Stephens,2012)使用混合线性模型(MLM)进行GWAS。 使用曼哈顿图显示从GWAS结果中识别的显著SNP,而P值分布通过分位数-分位数图(Q-Q图)可视化,两者均使用R(4.1.2)中的qqman软件包生成。将显著性阈值设定为4.20(-log 10(1/n),n表示高质量SNP的总数(Wang et al.,2016年)关于NtaSR 1基因组,鉴定了位于每个性状的显著SNP关联的上游和下游500 kb内的候选基因Wanget al.,2024

2.5.3 Genetic map construction
2.5. 3基因图谱构建

Samples from the GP were genotyped using the Ta-LD-SC SNP array. Initially, raw SNPs’ quality control and ‘BIN’ analysis were performed using QTL IciMapping (Meng et al., 2015). SNPs showing significant deviation from the expected 1:2:1 ratio (P < 0.01) in the Chi-square test were filtered out. The genetic map was constructed using the R/ASMap package (Taylor and Butler, 2017). Linkage groups were calculated using the function ‘mstmap.cross’ with default parameters: Kosambi distance function, cut-off p-value of 1e-10, and a missing threshold of 0.3. Separate linkage groups originating from the same chromosome were merged based on SNP positions in NtaSR1, and linkage groups were named according to the corresponding NtaSR1 chromosomes. Genetic maps were drawn using the R-package LinkageMapView (Ouellette et al., 2018).
使用Ta-LD-SC SNP阵列对来自GP的样品进行基因分型。最初,使用QTLIciMapping(Meng et al.,2015年)。在卡方检验中显示与预期的1:2:1比率(P % 3 C 0.01)显著偏离的SNP被过滤掉。使用R/ ASMap软件包构建遗传图谱(Taylor和Butler,2017)。使用函数“mstmap.cross”计算连锁群,默认参数为:Kosambi距离函数,截止p值为1 e-10,缺失阈值为0.3。 基于NtaSR 1中的SNP位置合并源自同一染色体的单独的连锁群,并且根据相应的NtaSR 1染色体命名连锁群。使用R-包LinkageMapView(Ouellette等人,2018年)

Results
结果

3.1 Characterization and Genomic Insights from the Tobacco 20K SNP Array
3.1来自烟草20 K SNP阵列的表征和基因组学见解

As illustrated in Figure 1, resequencing data from 150 genotypes were aligned to the tobacco reference genome, resulting in the identification of 1,216,749 SNPs. Subsequently, a refined set of 20,213 high-quality SNPs was selected and anchored on the tobacco 20K SNP array, named Ta-LD-SC Array (Table S1), representing approximately 1.67% of the initial count. According to the Nitab4.5 genome, these 20,213 SNPs provided even coverage of the tobacco genome, of which 14,779 were distributed across all 24 chromosomes, ranging from 380 on chromosome Nt03 to 1,261 on chromosome Nt17(Figure S1a,b). The remaining 5,434 SNPs were located within 3,614 scaffold sequences, with 1 to 13 SNPs per scaffold.
如图1所示,将来自150个基因型的重测序数据与烟草参考基因组进行比对,从而鉴定出1,216,749个SNP。随后,选择了一组精确的20,213个高质量SNP并锚定在烟草20 K SNP阵列上,称为Ta-LD-SC阵列(表S1),约占初始计数的1.67%。根据Nitab 4.5基因组,这20,213个SNP提供了烟草基因组的均匀覆盖,其中14,779个分布在所有24条染色体上,范围从染色体Nt 03上的380到染色体Nt 17上的1,261(图S1 a,B)。剩余的5,434个SNP位于3,614个支架序列内,每个支架1至13个SNP。

Using BLASTn, the alignment of flank sequences (50 bp up/downstream of each SNP) from the Nitab4.5 genome to the NtaSR1 genome revealed that 19,916 (98.53%) flank sequences precisely matched positions in the NtaSR1 genome while only 297 flank sequences failed to attain an exact position (Table S1). Compared to the Nitab4.5 genome, a total of 19,784 (97.88%) SNPs were dispersed across all 24 chromosomes of the NtaSR1 genome (Figure S1c,d), with 132 SNPs localized within 32 scaffolds. Analysis of SNP distribution across the NtaSR1 chromosomes revealed interesting patterns, showing a broad and consistent inter-spacing spectrum. The average spacing between adjacent loci was 200.27 kb, with inter-SNP spacings as follows: 7,569 (38.3%) SNPs spaced < 100 kb, 7,261 (36.7%) spaced 100–200 kb, 2,337 (11.8%) spaced 200–300 kb, 1,637 (8.3%) spaced 300–500 kb, 534 (2.7%) spaced 500–10,000 kb, and 422 (2.1%) spaced >1000 kb. Segmentation of the genome at 1 Mb intervals revealed that approximately 90% of them contained SNPs. These markers could serve as conservative landmarks for studying the collinearity relationships between different versions of the tobacco genome.
使用BLASTn,来自Nitab 4.5基因组的侧翼序列(每个SNP的上/下游50 bp)与NtaSR 1基因组的比对显示,19,916(98.53%)个侧翼序列精确匹配NtaSR 1基因组中的位置,而只有297个侧翼序列未能获得精确位置(表S1)。Nitab4.5基因组相比,总共19,784个(97.88%)SNP分散在NtaSR 1基因组的所有24条染色体上图S1 c,d,其中132个SNP位于32个支架跨NtaSR 1染色体的SNP分布的分析揭示了有趣的模式,显示了广泛一致的间隔谱。相邻基因座之间的平均间距为200.27 kb,SNP间间距如下:7569个(38.3%)SNP间隔< 100 kb,7261个(36.7%)SNP间隔100-200 kb,2337个(11.8%)SNP间隔200-300 kb,1637个(8.3%)SNP间隔300-500 kb,534个(2.7%)SNP间隔500- 10,000 kb,422个(2.1%)SNP间隔> 1000 kb。以1 Mb间隔对基因组进行分割,发现其中约90%含有SNP。 这些标记可以作为研究烟草基因组不同版本之间共线性关系的保守标志。

Comparison of the physical positions of markers of the Nitab4.5 and NtaSR1 genomes revealed general consistency between these two genomes, except for chromosome 17. The markers on chromosome 17 of the Nitab4.5 genome were primarily located on chromosomes 3 and 4 of the NtaSR1 genome, with a small portion distributed on chromosomes 1, 5, 6, and 11 (Figure 2a and Table S2). Additionally, 19,546 (96.70% ) SNPs in the NtaSR1 genome were found to be located within genes (Figure 2b). Among them, 6,189 (31.66%) SNPs were located in annotated genic regions of 5,930 genes, including UTRs, splicing sites, exons, introns, and 5 kb upstream and downstream regions of the genes (Figure 2b and Table S3). Furthermore, Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), and NCBI non-redundant (NR) annotations were conducted for these 5,930 genes (Table S3). Most annotated terms in the biological process were included in “cellular process (2,306 genes, 59.74%)”, “metabolic process (2,288 genes, 59.27%)”, “single-organism process (2,026 genes, 52.49%)” and “response to stimulus (1,177 genes, 30.49%)”. In the cellular component, most of the annotated genes were classified as “cell (2,180 genes, 56.48%)”, “cell part (2,174 genes, 56.32%)”, and “organelle (1,816 genes, 47.05%)” respectively. In the molecular function, most annotated genes were categorized as “catalytic activity (1,883 genes, 48.78%)”, “binding (1,716 genes, 44.46%)”, and “transporter activity (233 genes, 6.03%)” (Figure 2c).
Nitab4.5和NtaSR 1基因组标记的物理位置的比较揭示了这两个基因组之间的一般一致性,除了染色体17。Nitab4.5基因组的染色体17上的标记主要位于NtaSRl基因组的染色体3和4上,其中小部分艾德在染色体1、5、6和11上(图2a和表S2)。此外,19,54696发现NtaSRl基因组中的SNP位于基因内(图2b)。其中,6,189人(31.SNP位于5,930个基因的注释基因区域中,包括UTR、剪接位点、外显子、内含子和基因的5kb上游和下游区域(图2b和表S3)。此外,对这5,930个基因进行基因本体(GO)、基因和基因组的京都百科全书(KEGG)和NCBI非冗余(NR)注释(表S3)。生物学过程中注释最多的术语包括“细胞过程(2,306个基因,59.74%)"、“代谢过程(2,288个基因,59.27%)"、“单一生物过程(2,026个基因,52.49%)”和“对刺激的反应(1,177个基因,30.49%)"。在细胞组分中,大部分注释的基因分别被归类为“细胞(2,180个基因,56.48%)"、“细胞部分(2,174个基因,56.32%)"和“细胞器(1,816个基因,47.05%)”。 在分子功能中,大多数注释的基因被分类为“催化活性(1,883个基因,48.78%)"、“结合(1,716个基因,44.46%)"和“转运蛋白活性(233个基因,6.03%)”(图2c)。

3.2 Genotyping Validation and Efficiency of Ta-LD-SC Array
3.2Ta-LD-SC阵列的基因分型验证和效率

To assess the effectiveness of the Ta-LD-SC array, 866 tobacco accessions (nature population, NP) and 288 F2 individuals (genetic population, GP) were independently genotyped. All 866 NP accessions met the quality standards, with DQC values ranging from 0.82 to 0.99 and call rates from 93.6% to 99.8%. Following the protocol in the Methods section, we categorized 20,213 SNPs into six distinct categories (Table 1 and Table S1). The result showed that the majority of the loci in NP, i.e., 16,127 (79.79%), were categorized as 'PolyHighResolution', while 2,558 (12.8%) SNPs were classified into 'other' category. The remaining loci were distributed as follows: 566 (2.8%) in 'CallRateBelowThreshold', 477 (2.3%) in 'OTV', 368 (1.8%) in 'MonoHighResolution', and 117 (0.6%) in 'NoMinorHom', respectively. For the GP accessions, 7 samples were excluded from analysis due to DQC values below 0.82. Overall, a relatively large proportion of 11,362 (56.2%) polymorphic SNPs were detected, followed by 3,848 (19.0%) SNPs as 'NoMinorHom'. The remaining loci exhibited poor genotyping quality, and inadequate performance in cluster properties, and were classified as 'OTV,' 'CallRateBelowThreshold,' or 'Others.'
为了评估Ta-LD-SC阵列的有效性,对866份烟草种质自然群体,NP)和288份F2个体(遗传群体,GP)进行了独立的基因分型866份NP材料均符合质量标准,DQC值在0.82 ~ 0.99之间,呼叫率在93.6%~ 99.8%之间。按照方法部分的方案,我们将20,213个SNP分为六个不同的类别(表1和表S1)。结果表明,NP中大多数位点,即,16,127个(79.79%)被归类为“PolyHighResolution”,而2,558个(12.8%)SNPs被归类为“其他”类别。其余位点分布如下:566(2.在“CallRateBelowThreshold”中有477例(2.3%),在“OTV”中有477例(2.3%),在“MonoHighResolution”中有368例(1.8%),在“NoMinorHom”中有117例(0.6%)。对于GP种质,由于DQC值低于0.82,7个样品被排除在分析之外。总体而言,检测到相对较大比例11,362(56.2%)个多态性SNP,其次是3,848(19.0%)个SNP为“NoMinorHom”。其余的基因座表现出较差的基因分型质量聚类性能不足,并被归类为“OTV”、“CallRateBelowThreshold”或“Others”。'

The stability within each trial was crucial as a measure of reproducibility between technical replicates. In our study, the effective SNP differences per sample among the 127 replicate genotyping samples varied from 4 to 616, with an average difference of approximately 111(Figure 3a). Notably, 108 (85.1%) of these samples exhibited a difference in SNP count (i.e., less than 200 across repetitive experiments), corresponding to a technical error rate of less than 1%. This performance was consistent with other studies using Axiom arrays, demonstrating that the Ta-LD-SC array was reliable and performed comparably in terms of accuracy and consistency (Burridge et al., 2024). Additionally, 3,209 SNPs from the array, previously targeted by Kompetitive allele-specific PCR (KASP) assays, were utilized for validation. The genotypes generated from these 3,209 SNPs showed 100% concordance with those obtained via the KASP assay (Table S4). Among them, 501 were classified as low quality, which were successfully validated through KASP genotyping (Figure 3b,c).
每个试验中的稳定性作为技术重复之间的再现性的量度是至关重要的。在我们的研究中,127个重复基因分型样品中每个样品的有效SNP差异从4到616不等,平均差异约为111(图3a)。值得注意的是,这些样品中的108个(85.1%)表现出SNP计数的差异(即,在重复实验中小于200),对应于小于1%的技术错误率。该性能与使用Axiom阵列的其他研究一致,证明了Ta-LD-SC阵列可靠的,并且在准确性和一致性方面表现艾德稳定(Burridge等人,2024年)。 此外,来自阵列的3,209个SNP(先前通过竞争性等位基因特异性PCR(KASP)测定靶向)用于验证。从这3,209个SNP产生的基因型显示与通过KASP测定获得的基因型100%一致(表S4)。其中,501个被归类为低质量通过KASP基因分型成功验证(图3b,c)。

3.3 Population structure and genetic diversity of tobacco germplasm
3.3烟草种质的群体结构和遗传多样性

We genotyped 866 tobacco accessions using the Ta-LD-SC array. 15,819 'PolyHighResolution' SNPs, positioned in both Nitab4.5 and NtaSR1 genomes, were used to construct a phylogenetic tree and perform principal components (PCA) and linkage disequilibrium analyses. The phylogenetic analysis revealed that all accessions were classified into three major clades: Group I, consisting of 275 sun-/air-cured tobacco accessions; Group III, which included 460 flue-cured and 15 Burley tobacco accessions; and Group II, which served as a transitional clade bridging sun-/air-cured and flue-cured tobaccos, with a total of 116 varieties (Figure 4a and Table S5). Group II comprised 73 sun-/air-cured, 36 flue-cured, 4 cigar, and 3 oriental tobacco accessions. The PCA (Figure 4b) and model-based clustering analysis (K = 3,4,5; Figure 4c) further corroborated the population structure, distinctly segregating the accessions into Group I, predominantly sun-/air-cured tobacco; Group III, primarily flue-cured tobacco; and Group II, characterized mainly by transitional mixed types. The cross-validation error showed that the optimum K value was 3, and the group III accessions consistently formed six clusters (Sp1-Sp6) at K >8 (Figure 4c and Table S5). Varieties Sp1 and Sp2 primarily consisted of those developed or cultivated domestically before the 1970s and 1980s. Their genetic composition was predominantly derived from traditional domestic tobaccos, closely related to certain domestic air-cured tobaccos. Notably, under a classification parameter of K=3, these varieties aligned with Group II. Sp3 included superior flue-cured tobacco varieties introduced from overseas after the 1970s. Varieties Sp4, Sp5, and Sp6 emerged from modern tobacco breeding efforts, incorporating excellent foreign breeds and hybrids produced by crossing these international varieties with domestic tobacco germplasm. It was noteworthy to mention that the three leading varieties (Yunyan 87, K326, and Yunyan85) in terms of cultivation area within China, were all categorized within Sp6. The extent of LD decay, measured as r2, was estimated for Groups I, II, and III, respectively (Figure 4d), as well as for each chromosome of Group III (Figure S2). LD decay values for Group III accessions were slower (28 Mb) than for Group I accessions (23 Mb) when r2 was set at 0.2.
我们使用Ta-LD-SC阵列对866份烟草种质进行了基因分型使用位于Nitab4.5和NtaSR 1基因组中的15,819个“PolyHighResolution”SNP构建系统发育树,并进行主成分(PCA)和连锁不平衡分析。系统发育分析显示,所有种质被分为三个主要分支:组I,由275个晒/晾烟种质组成;组III,包括46烤烟和15个白肋烟种质;组II,作为桥接晒/晾烟和烤烟的过渡分支,共有116个品种(图4a表S5)。第二组包括7个晒/晾烟品种、36个烤烟品种、4个雪茄品种和3个东方烟草品种主成分分析(图4 b)和基于模型的聚类分析(K = 3,4,5;图4c)进一步证实了群体结构,明显地将加入分为第一组,主要是晒/晾烟;第三组,主要是烤烟;和第二组,主要是过渡混合类型。交叉验证误差表明,最佳K值为3,并且组III种质在K % 3E 8下一致地形成六个聚类(Sp1-Sp 6)(图4c表S5)。品种Sp1和Sp2主要由20世纪70年代和80年代之前国内开发或栽培的品种组成。 它们的遗传组成主要来自传统的国产烟草,与某些国产晾烟密切相关。值得注意的是,在K=3的分类参数下,这些品种与第II组对齐。SP3包括20世纪70年代以后从国外引进的上级烤烟品种。品种Sp4、Sp5和Sp6来自现代烟草育种努力,融合了优秀的外国品种和通过将这些国际品种与国内烟草种质杂交而产生的杂交种。值得一提的是,国内种植面积最大的三个品种(云烟87、K326和云烟85),都属于Sp6。LD衰减的程度,测量为r2分别估计I、II和III(图4d)以及组III的每条染色体(图S2)。LD衰减值为第III组加入慢(28 Mb)比第I组加入(23 Mb)时,r2设定为0.2。

3.4 GWAS analysis
3.4GWAS分析

To evaluate the performance of the Ta-LD-SC array, we conducted a preliminary GWAS assay for gene/QTL mapping using eight agronomic traits from 866 tobacco accessions based on a potted experiment in 2022. The results revealed a total of 119 loci influencing the eight traits, distributed across 19 chromosomes (Figure S3 and Table S6). The significant loci (p-value < 6.31×10-5) associated with each trait were as follows: 27 for plant height (PH), 9 for leaf number (LN), 8 for stem girth (SG), 10 for internode distance (ID), 29 for leaf length (LL), 12 for leaf width (LW), 5 for days from transplanting to budding (DB), and 20 for weight of axillary bud (BW). The regions within 500 Kb up/downstream of the significant SNPs were analyzed to identify candidate genes, and the significant associate region was defined as Qts (Quantitative trait locus). Finally, 62 candidate genes associated with eight traits were identified in 41 Qts regions (Table S7). Seven genes, located in Qts4-4 and Qts4-7, with 11 significant SNPs on chromosome 4 were identified to be significantly associated with multiple traits, including SG, LW, and BW in Qts4-4, and PH and ID in Qts4-7. Detailed analysis revealed that the two genes in the Qts4-4 region, NPY2, and SHR, have been reported to be associated with auxin polar transport (Li et al., 2011) and gibberellin regulation (Hirsch and Oldroyd, 2009) respectively. Moreover, the effect of the SHR gene on the development of plant leaves and axillary buds has been confirmed (Dhondt et al., 2010; Winter et al., 2024; Yi et al., 2022). Focusing on the Qts4-7 regions (Figure 5a,b,c), we identified 5 candidate genes, such as SOX-4, XTH8, GA2OX8, BHLH105, and PSBQ-2. Among them, two genes (GA2OX8 and XTH8) have been explored for their important functions in plant height and leaf development in plants. For example, GA2OX8 participates in the regulation of plant height in soybean and barley (Cheng et al., 2024; Wang et al., 2021). XTH8 encodes xyloglucan endotransglucosylase/hydrolase, which is thought to facilitate leaf cell expansion in Arabidopsis (Miura et al., 2010). Comprehensive analysis of all significant SNPs in the Qst4-7 region revealed that the SNP locus AX-106649506 (Chr04:112214617) was situated in the exon region of the GA2OX8 gene. A genotype mutation at this locus from "CC" to "GG" resulted in increased plant height and internode length in tobacco (Figure 5d, e). For other three genes, BHLH105 was a regulator of Fe homeostasis (Gao et al., 2020), SOX-4, and PSBQ-2, remained unannotated and required further investigation.
为了评估Ta-LD-SC阵列的性能,我们基于2022年的盆栽试验,使用来自866份烟草种质的8个农艺性状进行了初步的GWAS测定用于基因/QTL作图。结果显示,共有119个基因座影响8个性状,分布在19条染色体上(图S3和表S6)。与各性状相关的显著性位点(p值为6.31×10- 5)为:株高27个,叶片数9个,茎围8个,节间距10个,叶长29个,叶宽12个,移栽至现蕾天数5个,腋芽重20个。 对显著SNPs上下游500 Kb内的区域进行分析,确定候选基因,并将显著相关区域定义Qtsquantitative trait locus最后在41个Qts区域中鉴定了与8个性状相关的62个候选基因(表S7)。 Qts 4 -4Qts 4 -7中发现了7个基因,其中4号染色体上的11个显著SNP与多个性状显著相关,包括Qts 4 -4中的SG,LW和BW以及Qts 4 -7中的PH和ID详细的分析揭示Qts 4 -4区域中的两个基因,NPY 2SHR,已经报道与生长素极性运输相关(Li et al.,2011)和赤霉素监管(Hirsch和Oldroyd,2009)此外SHR基因对植物叶和腋芽发育的影响已经得到证实(Dhondt et al.,2010年; Winter等人,2024年; Yi等人,2022年)聚焦于Qts 4 -7区域5a、B、c,我们鉴定了5个候选基因,例如SOX-4XTH 8、GA 2 OX 8BHLH 105PSBQ-2其中GA2 OX 8XTH 8基因在植物的株高和叶片发育中具有重要的功能。例如,GA 2 OX 8参与大豆和大麦中植物高度的调节(Cheng et al.,2024; Wang等人,2021年)XTH 8编码木葡聚糖内切转葡糖基酶/水解酶,其认为促进拟南芥中的叶细胞扩增Miura et al.,2010年)Qst 4 -7区域中所有显著SNP的综合分析显示,SNP位点AX-106649506(Chr 04:112214617)位于GA 2 OX 8基因的外显子区域。在该基因座处从“CC”到“GG”的基因型突变导致烟草中植物高度和节间长度增加(图5d、e)。对于其他种基因,BHLH 105Fe稳态的调节剂Gaoet al.,2020SOX-4PSBQ-2,仍然未作艾德注释,需要进一步研究。

3.5 Construction of genetic maps
3.5基因图谱的构建

Two parental lines (K326 and Oxford 26) and their 281 F2 progeny (GP panel) were utilized to construct a genetic linkage map (LinkMap, Table S8). Only 11,362 'PolyHighResolution' SNPs were selected as raw data from the genotyping results. Quality control and ‘BIN’ analysis was performed on the raw data using QTL IciMapping and resulted in a total of 5,863 SNPs (bin marker). Subsequently, chi-square testing and the construction of a linkage map were conducted using the R/ASMap. Finally, 4,553 SNP bin markers from 220 lines were successfully mapped to 24 linkage groups (LGs) covering 6,606.08 cM, with an average marker density of 1.45 cM/marker. The highest genetic distance among 24 LGs was on LG11 covering 394.07 cM with 291 linked SNP markers. There were 11 LGs with a length of less than 300 cM, 7 LGs with a length in the range of 200–300 cM, and the remaining 6 LGs displaying lengths in the range of 100–200 cM(Figure 6 and Figure S4).
利用两个亲本系(K326和Oxford 26)及其281个F2后代GP组构建遗传连锁图谱LinkMap表S8仅选择11,362个“PolyHighResolution”SNP作为基因分型结果的原始数据。使用QTL IciMapping对原始数据进行质量控制和“BIN”分析得到总共5,863个SNP(bin标记)。随后,卡方检验和连锁图谱的构建进行了使用R/ASMap。 最后,将来自220个株系的4,553个SNPbin标记成功定位到24个连锁群(LGs)上,覆盖6,606.08 cM,平均标记密度为1.45 cM/标记。24个群体间的遗传距离以LG 11最大,为394.07cM,有29个1连锁的SNP标记。有11个LG的长度小于300 cM,7个LG的长度在200-300 cM范围内,其余6个LG的长度在100-200 cM范围内(图6和图S4)。

The genetic map positions of markers were compared for their physical assignment between the Nitab4.5 and NtaSR1 genomes. The comparative results indicated that among the 4,553 molecular markers on the linkage map, 3,379 markers were mapped to 24 chromosomes on the Nitab4.5 genome, while 1,174 markers were assigned to 980 scaffolds (Figure 7 and Table S9). On the NtaSR1 genome, 4,536 markers were mapped to 24 chromosomes, with 17 markers assigned to 13 scaffolds. Furthermore, we used the 3,368 markers that were mapped to the chromosomes of both the Nitab4.5 and NtaSR1 genomes, as well as the genetic map, to compare the positional consistency among the three datasets. We focused on the markers located on chromosome 17 of the Nitab4.5 genome, analyzing their positional distribution on the genetic map and their corresponding locations on the Nitab4.5 and NtaSR1 genomes. The results indicated a high degree of consistency between the marker positions on the NtaSR1 genome and those on the genetic map. Although a few markers show discrepancies between their linkage groups and chromosomes, the vast majority exhibit complete correspondence, with identical positional order. Notably, the comparison of markers on chromosome 17 of the Nitab4.5 genome with their distribution on the genetic map and the NtaSR1 genome demonstrated that our constructed genetic map was more aligned with the NtaSR1 genome and chromosomal arrangement. Hence, we are confident that the SNP markers selected by the Ta-LD-SC chip and the constructed genetic map will be of significant value for future tobacco genome assembly and research.
比较了Nitab4.5和NtaSR1基因组之间标记的遗传图谱位置物理分配。比较结果表明在连锁图谱上的4,553个分子标记中,3,379个标记定位到Nitab 4.5基因组上的24条染色体上,而1,174个标记分配到980个支架上(图7和表S9)。在NtaSR1基因组上,4,536个标记定位到24条染色体,其中17个标记被分配到13个支架。此外,我们使用了定位到两个Nitab4染色体上的3,368个标记。5和NtaSR1基因组,以及遗传图谱,以比较三个数据集之间的位置一致性。我们重点关注位于Nitab4.5基因组17号染色体上的标记,分析它们在遗传图谱上的位置分布及其在Nitab4.5和NtaSR 1基因组上的相应位置。结果表明在NtaSR1基因组上的标记位置和遗传图谱上的标记位置之间具有高度的一致性。虽然少数标记显示其连锁群和染色体之间的差异,绝大多数表现出完全对应,具有相同的位置顺序。值得注意的是,Nitab4.结果表明,构建的遗传图谱与NtaSR 1基因组和染色体排列更加一致。因此Ta-LD-SC芯片筛选的SNP标记和构建的遗传图谱对烟草基因组的组装和研究具有重要的应用价值。

Discussion
讨论

Over the past decade, molecular markers have emerged as a crucial tool extensively utilized in genetic research and modern breeding programs, including genetic diversity analysis, evolutionary studies, genomic selection, QTL analysis, and marker-assisted selection (Hasan et al., 2021). With the growing availability of genome assemblies, the accessibility of NGS sequencing, and the benefits of cost-effectiveness, SNP markers have become indispensable in crop genetic studies (Yang et al., 2023b). However, one of the key challenges or drawbacks of using NGS sequencing methods for SNP genotyping is the requirement for extensive computational resources and specialized expertise to effectively analyze the data. In addition, there is a very low concordant rate of called SNPs between different SNP calling software (Balagué-Dobón et al., 2022; Howard et al., 2021). In contrast, SNP arrays are fast, reproducible, user-friendly, and cost-efficient screening methods for the generation and analysis of genotyping data (Arca et al., 2023). Currently, there are hundreds of different types of SNP arrays for various research objectives in many crops (Burridge et al., 2024). However, in tobacco research, only a few SNP arrays have been utilized, specifically the Tobacco 430K SNP array (Xu et al., 2022; Yuan et al., 2023) and the 30K Infinium HD array (Ivanov et al., 2020). However, the unavailable of the specific probe sequences of these arrays limits their further use by researchers. In this study, we utilized the tobacco genome resequencing data, combined with the latest publicly available tobacco reference genome (Edwards et al., 2017; Wang et al., 2024), to design and develop a tobacco 20K SNP array (Ta-LD-SC) for genetic and breeding applications in tobacco. The objective of this work was to develop a cost-effective, low-density tobacco SNP array that could achieve genome-wide coverage with evenly distributed markers to facilitate tobacco breeding applications.
在过去的十年中,分子标记已经成为广泛用于遗传研究和现代育种计划的重要工具,包括遗传多样性分析、进化研究、基因组选择、QTL分析和标记辅助选择(Hasan et al. 2021年)随着基因组组装的日益可用性、NGS测序的可及性以及成本效益的益处,SNP标记在作物遗传研究中已变得不可或缺(Yang et al.,第2023段b)。然而,使用NGS测序方法进行SNP基因分型的关键挑战或缺点之一是需要大量的计算资源和专业知识来有效分析数据。 此外,在不同的SNP调用软件之间存在非常低的调用SNP的一致率(Balagué-Dobón et al.,2022;霍华德等人,2021年)。相比之下,SNP阵列是用于产生和分析基因分型数据的快速、可再现、用户友好且成本有效的筛选方法(Arca et al.,2023年)。目前,存在数百种不同类型的SNP阵列,用于许多作物中的各种研究目的(Burridge et al. ,2024。然而,在烟草研究中,仅利用了少数SNP阵列,特别是烟草430 K SNP阵列(Xu et al.,2022; Yuan等人,2023)和30 K Infinium HD阵列(Ivanov等人,2020年)然而,这些阵列的特定探针序列的不可用限制了研究人员对其的进一步使用。在这项研究中,我们利用烟草基因组重测序数据,结合最新的公开可用的烟草参考基因组(Edwards et al.,2017; Wang等人,2024),设计和开发烟草20 K SNP阵列(Ta-LD-SC)用于烟草的遗传和育种应用。本工作的目的是开发一种具有成本效益的、低密度的烟草SNP阵列,该阵列可以实现全基因组覆盖,并具有均匀分布的标记,以促进烟草育种应用。

4.1 Challenges in SNP Array Development for Polyploid Species
4.1多倍体物种SNP阵列开发的挑战

Developing SNP arrays for polyploid species is challenging due to the complexity of aligning and identifying variants within multiple sets of homologous chromosomes. Different software algorithms and parameter settings can produce inconsistent results, and the large genome sizes and high repetitive content typical of polyploids further complicate the development and analysis processes(Walkowiak et al., 2020; Wang et al., 2023). To ensure successful probe design for SNP genotyping arrays, it is crucial to avoid repetitive genomic regions (Burridge et al., 2024; Montanari et al., 2023). We utilized SNP data from tobacco genome resequencing, extracting 50 bp flanking sequences upstream and downstream of each SNP. These sequences were aligned against the entire assembled genome, retaining only uniquely matched fragments to ensure high probe specificity, reduce cross-hybridization, and improve genotyping accuracy. For species with multiple reference genomes, aligning fragments to all available versions could further increase the success of probe design (Makhoul et al., 2020). This method is vital for maintaining the reliability and precision of genetic analyses in polyploid species, where repetitive elements can complicate genomic studies. The importance of this approach is also highlighted by other studies, which demonstrated the enhanced reliability of genotyping by using flanking sequences for SNP probe design (Clark et al., 2016; Yang et al., 2023a). Advances in probe design and SNP array development underscore the critical role of avoiding repetitive genomic regions to improve the precision and efficiency of genetic analysis (Bennetzen and Wang, 2014; Scheben et al., 2017). By ensuring that SNP probes are highly specific and avoiding regions prone to cross-hybridization, we can significantly enhance the accuracy and effectiveness of genotyping, which is essential for robust genetic studies and accurate population structure inference.
由于在多组同源染色体内比对和鉴定变体的复杂性,开发用于多倍体物种的SNP阵列具有挑战性。不同的软件算法和参数设置可能会产生不一致的结果,并且多倍体典型的大基因组大小和高重复内容进一步使开发和分析过程变得复杂Walkowiak等人,2020; Wang等人,2023年)为了确保SNP基因分型阵列的成功探针设计,避免重复的基因组区域是至关重要的(Burridge et al.,2024;Montanari等人,2023年)。我们利用来自烟草基因组重测序的SNP数据,提取每个SNP上游和下游的50 bp侧翼序列。 将这些序列与整个组装的基因组进行比对,仅保留唯一匹配的片段,以确保高探针特异性,减少交叉杂交,并提高基因分型准确性对于具有多个参考基因组的物种,将片段与所有可用版本比对可以进一步增加探针设计的成功(Makhoul等人,2020年)。这种方法对于保持多倍体物种遗传分析的可靠性和精确性至关重要,其中重复元件会使基因组研究复杂化。这种方法的重要性也其他研究所强调,这些研究证明了通过使用SNP探针设计的侧翼序列来增强基因分型的可靠性Clark等人,2016年; Yang等人,2023a)。探针设计和SNP阵列开发的进展强调了避免重复基因组区域以提高遗传分析的精度和效率的关键作用(Bennetzen和Wang,2014;Scheben等人,2017年)的报告。 通过确保SNP探针具有高度特异性并避免易于交叉杂交的区域,我们可以显着提高基因分型的准确性和有效性,这对于稳健的遗传研究和准确的群体结构推断至关重要。

4.2 Comparison between tobacco Ta-LD-SC and SNP array in other crops
4.2烟草Ta-LD-SC和SNP阵列在其他作物中的比较

Designing an SNP array is complex due to the need to balance marker density with cost (Xu et al., 2017). It is reported that higher marker density improves the genetic mapping resolution and diversity analysis but increases genotyping costs (Ongom et al., 2024). The Ta-LD-SC array consists of 20,213 probes, covering over 90% of the tobacco genome, with probes uniformly distributed at a density of no more than 5 SNPs per 200 Kb. The low-density SNP panels make the Ta-LD-SC array a cost-effective solution for genomic selection, while their uniformity enhances its effectiveness. Statistical analysis revealed that 75% of adjacent SNP markers are within 200 Kb of each other, and approximately 98% of SNP pairs are within 1 Mb. Using this array, genotyping was conducted on 866 tobacco germplasm resources (NP) and 288 F2 progeny materials (GP), with only seven F2 samples failing to be genotyped. The array achieved an average QC call rate of over 99%, a missing rate of less than 1%, and a technical error rate of less than 1%. Notably, 'PolyHighResolution' categories SNPs, considered the most valuable markers, constituted 79.8% and 56.2% of the genotyping results for tobacco NP and GP materials, respectively. This proportion is significantly higher than the similar SNP arrays, such as the Sugarcane100K SNP array (5.17%-5.96%) (You et al., 2019), the chickpea CicerSNP Array (25.89%-35.70%) (Roorkiwal et al., 2018), the pear 200K PyrSNP array (17%) (Li et al., 2019), and the wheat TaNG v1.1 (21.0%) (Burridge et al., 2024). This observation highlights not only the relevance of the materials studied but also the successful design strategy of this SNP array. Montanari et al. (2019) developed and validated a 70K Affymetrix Axiom Genotyping Array for pear, based on re-sequencing data from 55 accessions. Their results demonstrated that over 90% of the SNPs were high quality and polymorphic (PolyHighResolution), which aligned well with our findings. In comparison, the Axiom Apple480K Array, which was created using high-depth re-sequencing of 63 Malus x domestica cultivars, featured 74% PolyHighResolution SNPs, with 54% of them further validated as robust PolyHighResolution SNPs (Bianco et al., 2016). Despite the higher number of SNPs in this array, our design ensured that the Ta-LD-SC Array remained highly cost-efficient for repeated usage. Furthermore, we found that all the PolyHighResolution SNPs in the Ta-LD-SC array were robust, facilitating straightforward and reproducible genotypic data analysis across various studies.
设计SNP阵列是复杂的,因为需要平衡标记密度与成本(Xu et al.,2017年)。据报道,较高的标记密度提高了遗传作图分辨率和多样性分析,但增加了基因分型成本(Ongom等人,2024年)。Ta-LD-SC阵列由20,213个探针组成,覆盖超过90%的烟草基因组,探针以每200 Kb不超过5个SNP的密度均匀分布。低密度SNP组使Ta-LD-SC阵列成为基因组选择的具有成本效益的解决方案,而它们的均匀性提高了其有效性。统计分析显示,75%的相邻SNP标记彼此在200 Kb内,并且大约98%的SNP对在1 Mb内。 利用该芯片对866份烟草种质资源和288份F2代材料进行了基因分型,只有7份F2代材料未能进行基因分型。该阵列实现了平均QC调用率超过99%,缺失率低于1%,技术错误率低于1%。值得注意的是,“PolyHighResolution”类SNP被认为是最有价值的标记,分别占烟草NP和GP材料基因分型结果的79.8%和56.2%。该比例显著高于类似的SNP阵列,例如Sugarcane 100 K SNP阵列(5.17%-5.96%)(You et al.,2019)、鹰嘴豆CicerSNP阵列(25.89%-35.70%)(Roorkiwal等人,2018)、梨200 KPyrSNP阵列(17%)(Li等人,2019)和小麦TaNGv1.1(21.0%)(Burridge等人,2024年)。这一观察结果不仅突出了所研究材料的相关性,而且突出了这种SNP阵列的成功设计策略。Montanari等人(2019年)基于55个品种的重新测序数据,开发并验证了梨的70K Affyphon Axiom基因分型阵列。他们的结果表明,超过90%的SNPs是高质量和多态性的(PolyHighResolution),这与我们的发现非常一致。相比之下,使用63个苹果栽培品种的高深度重测序创建的Axiom Apple480K阵列以74%的PolyHighResolution SNP为特征,其中54%m被进一步验证为稳健的PolyHighResolution SNP(Bianco等人,2016年)。 尽管该阵列中SNP的数量较高,但我们的设计确保Ta-LD-SC阵列对于重复使用保持高度成本效益。此外,我们发现Ta-LD-SC阵列中的所有PolyHighResolution SNP都是稳健的,有助于在各种研究中进行直接和可重复的基因型数据分析。

4.3 Possible application of SNP array in tobacco genetic research
4.3SNP阵列在烟草遗传研究中的可能应用

Most tobacco varieties planted in China were derived from a few sources of germplasm, all of which were introduced from abroad (Wang et al., 2014). Through analyzing the population structure and evolutionary relationships of tobacco germplasm resources preserved in China, we have gained a comprehensive understanding of the genetic foundation and evolutionary history of Chinese tobacco germplasm. These research findings indicate that two significant introductions of foreign germplasm have profoundly impacted the genetic structure of Chinese tobacco germplasm resources. The first introduction occurred during the Ming Dynasty when tobacco was brought to China through the Columbian Exchange. These tobacco varieties spread widely within China, leading to the formation of numerous local sun-/air-cured tobacco landraces, categorized in this study as Group I. The second major introduction happened after the establishment of the People's Republic of China, with the large-scale importation of high-quality flue-cured tobacco varieties, particularly from the United States, which now form the genetic foundation of China's main flue-cured tobacco cultivars (Group III, Sp3), These elite alleles carried by introduced cultivars constitute the genetic basis of tobacco breeding in China, but it leads to a narrow genetic base and low diversity. Therefore, it needs to pay great attention to discover and transfer agronomically valuable genes from different genetic resources, and even wild species to sun-/air- cured tobacco cultivars. Additionally, this study found that average LD decay (r2 threshold = 0.2) across the whole genome for both flue-cured tobacco and sun-/air-cured tobacco exceeds 20 MB (Figure 4d). These results were similar to the LD decay distance observed in the wheat (Triticum aestivum L.) genome (Roncallo et al., 2021). Fricano et al. (2012) conducted a genotyping analysis using 422 SSR markers on a global collection of 312 tobacco accessions to assess LD. Their findings revealed that LD in tobacco can extend up to 75 cM, with notable variations observed across different linkage groups. These results were consistent with our study (Figure S2). The high levels of LD in tobacco may result from factors such as the low genetic diversity within germplasm, selective breeding pressures, self-pollination, and highly structured population (Fricano et al., 2012; Ivanov et al., 2020). These factors collectively influence the genetic linkage disequilibrium observed in tobacco.
中国种植的大多数烟草品种来源于少数几个种质来源,这些种质都是从国外引进的Wang,2014年)。通过对我国烟草种质资源群体结构和进化关系的分析,全面了解了我国烟草种质资源的遗传基础和进化历史。研究结果表明,两次重要的国外种质引进对我国烟草种质资源的遗传结构产生了深刻的影响。第一次引进是在明朝,当时烟草通过哥伦比亚交易所被带到中国。 这些烟草品种在中国广泛传播,形成了许多地方/烟地方品种,在本研究中被归类为I组。第二次大规模引进是在中华人民共和国成立后,随着美国等国优质烤烟品种的大量引进,形成了我国烤烟主栽品种Group III,Sp3)的遗传基础这些引进品种所携带的优良等位基因构成了我国烟草育种的遗传基础,但也导致了我国烟草遗传基础狭窄、多样性低的问题。 因此,从不同的遗传资源乃至野生种中发掘具有农艺学价值的基因,并将其转移到晒晾烟品种中,是需要引起高度重视的问题另外,该研究发现,烤烟和/烟两者的整个基因组中的RLD衰减(r2阈值= 0.2)超过s20MB(图4d)。 一结果在小麦中观察到的LD衰减距离相似基因组Roncallo等人,2021年Ragano等人(2012)使用422个SSR标记对全球收集的312份烟草种质进行了基因分型分析,以评估LD。他们的研究结果表明,烟草中的LD可以延伸至75 cM,并且在不同的连锁群中观察到显着差异。这些结果与我们的研究一致(图S2)烟草中高水平的LD可能是由诸如种质内低遗传多样性、选择性育种压力、自花授粉高度结构化的群体等因素造成的(C.E. 2012;Ivanovet al. 2020这些因素共同影响在烟草中观察到的遗传连锁不平衡。

The low genetic polymorphism and high genomic LD levels in tobacco germplasm make it challenging to obtain polymorphic molecular markers that can effectively break tight linkage. However, the Ta-LD-SC array has proven its value and reliability by offering a high-quality set of SNP markers. These markers are essential for identifying marker-trait associations for various desirable traits and constructing dense genetic maps. We utilized the tobacco array to conduct GWAS analysis, identifying significant marker-trait associations for eight agronomic traits in 866 tobacco accessions. Here, we identified 62 candidate genes associated with eight traits in 41 Qts regions. Among them, the gene GA2OX8 located at Qts4-7, which was closely associated with plant height and internode length, was identified along with its superior haplotypes (Figure 5). This had laced a foundation for future gene function studies and breeding applications.
烟草种质资源遗传多态性低,基因组LD水平高,这使得获得能有效打破紧密连锁的多态性分子标记具有挑战性。然而,Ta-LD-SC阵列已经通过提供一组高质量的SNP标记证明了其价值和可靠性。这些标记对于鉴定各种理想性状的标记-性状关联和构建密集的遗传图谱是必不可少的。我们利用烟草阵列进行GWAS分析,在866个烟草种质中鉴定了8个农艺性状的显著标记-性状关联。在这里,我们确定了62个候选基因与8个性状在41个Qts区域其中,基因GA 2 OX 8位于Qts4-7与植株高度和节间长度密切相关,沿着其上级单倍型被鉴定(图5。这为今后基因功能研究和育种应用奠定了艾德基础。

The Ta-LD-SC SNP array has also demonstrated its advantages in constructing tobacco genetic maps. Tong et al. (2016) highlighted several previous tobacco genetic maps, indicating the best SSR linkage map from an intertype tobacco cross. This map comprised 2,318 SSR markers, spanning 2,363 loci across 24 clearly defined linkage groups, with a total length of 3,270 cM (Bindler et al. 2011). The first SNP genetic map of tobacco was published in 2015, containing 4,138 SNP markers mapped onto 24 linkage groups with a total length of 1,944.74 cM (Xiao et al. 2015). Subsequently, Gong et al. (2016) generated a high-density genetic map of tobacco containing 4,215 SNPs and 194 SSRs distributed across 24 linkage groups, with a total length of 2,662.43 cM. Using whole-genome resequencing data, Tong et al. (2020b, 2021) developed SNP bin markers numbering 7,038 and 4,895, with total lengths of 3,486.78 and 2,885.36 cM in the genetic maps, respectively. In this study, we constructed a tobacco genetic map with 4,553 SNP bin markers mapped to 24 linkage groups, covering 6,606.08 cM, with an average marker density of 1.45 cM/marker. The quality of this map was comparable to the current best tobacco SNP genetic maps, but it was more cost-effective. We also compared the genetic map positions of markers to their physical assignment in the Nitab4.5 and NtaSR1 genomes. The results indicated that our genetic map aligned well with NtaSR1, particularly on chromosome 17 of Nitab4.5. Notably, markers on Nitab4.5 chromosome 17 showed closer alignment with the NtaSR1 genome, affirming the robustness of our genetic map in reflecting the NtaSR1 assembly's chromosomal arrangement. Overall, this robust genetic map and the identified SNP markers will greatly aid in advancing tobacco genome assembly and genetic research, aligning well with findings from similar studies (Guden et al., 2023; Jiang et al., 2024; Li et al., 2023).
Ta-LD-SC SNP阵列在构建烟草遗传图谱方面也显示了其优势。Tong et al.(2016)强调了几个以前的烟草遗传图谱指出了来自型间烟草杂交的最佳SSR连锁图谱。该图谱包含2,318个SSR标记,跨越24个明确定义的连锁群中的2,363个位点,总长度为3,270cMBindleret al. 2011)。烟草的第一个SNP遗传图谱于2015年发表,包含4,138个SNP标记,映射到24个连锁群,总长度为1,944.74 cM(Xiao et al. 2015)。随后,Gong等人(2016)生成了一张高密度的烟草遗传图谱,包含分布在24个连锁群中的4,215个SNP和194个SSR,总长度为2662。43厘米。使用全基因组重测序数据,Tong等人(2020 b,2021)开发了SNP bin标记,编号为7,038和4,895,遗传图谱中的总长度分别为3,486.78和2,885.36 cM。本研究利用4,553个SNP bin标记构建了烟草遗传图谱,覆盖范围为6606.08 cM,平均标记密度为1.45 cM/标记。该图谱的质量与目前最好的烟草SNP遗传图谱相当,但成本效益更高。我们还比较了Nitab4.5和NtaSR 1基因组中标记的遗传图谱位置与其物理分配。结果表明,我们的遗传图谱与NtaSR 1,特别是在Nitab4.5的17号染色体上的NtaSR 1很好地对齐。 值得注意的是,Nitab4.5染色体17上的标记显示出与NtaSR 1基因组更接近的比对,证实了我们的遗传图谱在反映NtaSR 1组装的染色体排列方面的稳健性。总的来说,这种稳健的遗传图谱和鉴定的SNP标记将极大地有助于推进烟草基因组组装和遗传研究,与来自类似研究的发现很好地一致(Gudenet al.,2023年; Jiang等人,2024年; Li等人,2023年)。

Funding
资金

This work was supported by the Guizhou Provincial Basic Research Program (Natural Science) [ (2024) 648], the Program of China National Tobacco Corporation (110202101032(JY-09)110202201003(JY-03)), and the Program of Guizhou Branch of China National Tobacco Corporation (2023XM02, 2021XM05, 2022XM05, 2024XM01)
这项工作得到了贵州省基础研究计划的支持中国烟草总公司自然科学[(2024)648]计划(110202101032(JY-09)、110202201003(JY-03))、中国烟草总公司贵州分支项目(2023XM02,2021XM05,2022XM05,2024XM01)
.

Author contributions
作者贡献

S.Y. and Z.Y. Conceptualization, Methodology, Supervision, Writing – review, Editing and Funding acquisition. J. Z., L.C., and J.L. Data curation, Visualization. P.L., J.T., and P.C. Data curation, Supervision. J.W., Q. Z., and T.X. Development of KASP markers, Supervision. J.J. and X. R. Writing – review, Editing, Supervision and Funding acquisition. All authors read and approved the final manuscript.
S. Y. Z. Y. 概念化,方法论,监督,写作-审查,编辑和资金获取. J. Z. 、L. C.的方法,J. L. 数据管理、可视化。P. L. J. T. P. C. 数据管理,监督。J.W. 、红藻Q. Z. T. X. 开发KASP标记,监督。J. J. 和X. R. 写作-审查,编辑,监督和资金收购。所有作者均阅读并批准了最终手稿。

Data availability statement
数据可用性声明

The raw sequence data reported in this paper have been deposited in the Genome Sequence Archive in National Genomics Data Center (NGDC), under accession numbers PRJCA028842.
本文中报道的原始序列数据已经以登录号PRJCA 028842保藏在国家基因组数据中心(NGDC)的基因组序列档案

Conflict of interest
利益冲突

The authors have no conflicts of interest to declare.
作者没有利益冲突需要声明。

References
引用

Alexander, D.H., Novembre, J. and Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19 (2009) 1655-1664.
亚历山大,D.H.,Novembre,J.和Lange,K.快速的基于模型的不相关个体的祖先估计。基因组研究19(2009)1655-1664。

Arca, M., Gouesnard, B., Mary-Huard, T., Le Paslier, M.-C., Bauland, C., Combes, V., Madur, D., Charcosset, A. and Nicolas, S.D. Genotyping of DNA pools identifies untapped landraces and genomic regions to develop next-generation varieties. Plant Biotechnol. J. 21 (2023) 1123-1139.
Arca,M.,Gouesnard,B.,玛丽-哈德,T.,Le Paslier,M. C.的方法,Bauland,C.,Combes,V.马杜尔,D.,Charcosset,A.和Nicolas,S. D. DNA库的基因分型识别未开发的地方品种和基因组区域,以开发下一代品种。植物生物技术J. 21(2023)1123-1139。

Arslan, B. and Okumus, A. Genetic and geographic polymorphism of cultivated tobaccos (Nicotiana tabacum) in Turkey. Russ. J. Gene. 42 (2006) 667-671.
阿尔斯兰,B。和Okumus,A.土耳其栽培烟草的遗传和地理多态性。拉斯J. 基因42(2006)667-671。

Balagué-Dobón, L., Cáceres, A. and González, J.R., 2022. Fully exploiting SNP arrays: a systematic review on the tools to extract underlying genomic structure. Brief. Bioinform. 23, bbac043.
巴拉盖-多邦,L.,卡塞雷斯,A.和冈萨雷斯,J.R. 2022年。完全利用单核苷酸多态性阵列:对提取隐藏基因组结构的工具的系统综述。简介Bioinform23bbac043。

Bennetzen, J.L. and Wang, H. The Contributions of Transposable Elements to the Structure, Function, and Evolution of Plant Genomes. Annu. Rev. Plant Biol. 65 (2014), 505-530.
Bennetzen,J.L.和王,H.转座因子对植物基因组的结构、功能和进化的贡献。安努Rev. 植物生物学65(2014),第505-530页。

Berger, B. and Yu, Y. Navigating bottlenecks and trade-offs in genomic data analysis. Nat. Rev. Genet. 24 (2023) 235-250.
伯杰,B。Yu,Y.基因组数据分析中的导航瓶颈和权衡。遗传学国家牧师24(2023)235-250。

Bianco, L., Cestaro, A., Linsmith, G., Muranty, H., Denancé, C., Théron, A., Poncet, C., Micheletti, D., Kerschbamer, E. and Di Pierro, E.A. Development and validation of the Axiom® Apple480K SNP genotyping array. Plant J. 86 (2016) 62-74.
比安科湖,Cestaro,A.,Linsmith,G.,Muranty,H.,Denancé角,Théron,A.,蓬塞角,Micheletti,D.,Kerschbamer,E.和Di Pierro,E.A. Axiom® Apple480K SNP基因分型阵列的开发和验证。植物J.86(2016)62 - 74。

Bindler, G., Plieske, J., Bakaher, N., Gunduz, I., Ivanov, N., Van der Hoeven, R., Ganal, M. and Donini, P. A high density genetic map of tobacco (Nicotiana tabacum L.) obtained from large scale microsatellite marker development. Theor. Appl. Genet. 123 (2011) 219-230.
宾德勒,G.,Plieske,J.,Bakaher,N.,冈杜兹岛伊万诺夫,N.,货车德胡芬河,Ganal,M.和Donini,P.烟草(Nicotiana tabacum L.)的高密度遗传图谱从大规模微卫星标记开发中获得。Theor.Appl.Genet.123(2011)219 - 230。

Bindler, G., V. R. der Hoeven., Gunduz, I., Plieske, J., Ganal, M., Rossi, L., Gadani, F. and Donini, P. A microsatellite marker based linkage map of tobacco. Theor. Appl. Genet. 114 (2007) 341-349.
宾德勒,G.,V.R.的值德文,Gunduz,I. Plieske,J.,Ganal,M.,罗西湖,加-地加达尼,F.和Donini,P.烟草的基于微卫星标记的连锁图谱。Theor.应用遗传学114(2007)341-349。

Burridge, A.J., Winfield, M., Przewieslik‐Allen, A., Edwards, K.J., Siddique, I., Barral‐Arca, R., Griffiths, S., Cheng, S., Huang, Z. and Feng, C. Development of a next generation SNP genotyping array for wheat. Plant Biotechnol. J. 8 (2024) 2235-2247.
Burridge,A.J.,温菲尔德,M.,Przewieslik艾伦,A.,爱德华兹,K. J.,西迪克岛巴拉阿卡河格里菲斯,S.,郑,S.,Huang,Z.和Feng,C.新一代小麦SNP基因分型芯片的研制植物生物技术J. 8(2024)2235-2247。

Cai, C., Yang, Y., Cheng, L., Tong, C. and Feng, J. Development and assessment of EST-SSR marker for the genetic diversity among tobaccos (Nicotiana tabacum L.) Genetika. 51 (2015) 694-703.
蔡,C.,杨,Y.,郑湖,通角,澳-地Feng,J.烟草(Nicotiana tabacumL.)吉妮缇51(2015)694-703。

Camacho, C., Coulouris, G., Avagyan, V., Ma, N., Papadopoulos, J., Bealer, K. and Madden, T.L. BLAST+: architecture and applications. BMC Bioinf. 10 (2009) 421.
卡马乔角,库卢里斯,G.,Avagyan,V.,妈妈,N.,帕帕佐普洛斯,J.,Bealer,K.和Madden,T.L. BLAST+:体系结构和应用。BMC Bioinf. 10(2009)421

Cheng, J., Jia, Y., Hill, C., He, T., Wang, K., Guo, G., Shabala, S., Zhou, M., Han, Y. and Li, C. Diversity of Gibberellin 2-oxidase genes in the barley genome offers opportunities for genetic improvement. J. Adv. Res. (2024) In press.
郑杰,Jia,Y.,希尔角,澳-地他,T.,Wang,K.,Guo,G.,沙巴拉,周,M.,汉,Y。和Li、C.大麦基因组中赤霉素2-氧化酶基因的多样性为遗传改良提供了机会。J. 一个DVRes. (2024)In press.

Clarke, W.E., Higgins, E.E., Plieske, J., Wieseke, R., Sidebottom, C., Khedikar, Y., Batley, J., Edwards, D., Meng, J., Li, R., et al. A high-density SNP genotyping array for Brassica napus and its ancestral diploid species based on optimised selection of single-locus markers in the allotetraploid genome. Theor. Appl. Genet. 129 (2016) 1887-1899.
克拉克,W.E.,希金斯,E. E.,Plieske,J.,Wieseke河,Sidebottom,C.,Khedikar,Y.,Batley,J.,爱德华兹,D.,孟,J.,Li,R.,等人基于异源四倍体基因组中单基因座标记的优化选择的甘蓝型油菜及其祖先二倍体物种的高密度SNP基因分型阵列Theor.应用遗传学129(2016)1887-1899。

Czubaeka, A. The use of the Polish germplasm collection of Nicotiana tabacum in research and tobacco breeding for disease resistance. Agriculture. 12 (2022) 1994.
Czubaeka,A. 波兰烟草种质资源在抗病研究和烟草育种中的应用。农业12(2022)1994年。

D’hoop, B.B., Paulo, M.J., Kowitwanich, K., Sengers, M., Visser, R.G., van Eck, H.J. and van Eeuwijk, F.A. Population structure and linkage disequilibrium unravelled in tetraploid potato. Theor. Appl. Genet. 121 (2010) 1151-1170.
D 'hoop,B. B.,Paulo,M.J.,科威特瓦尼奇,K.,S.,M.,Visser,R.G.,货车Eck,H. J.和货车Eeuwijk,F.A.四倍体马铃薯群体结构与连锁不平衡的研究。Theor.应用遗传学121(2010)1151-1170。

Del Piano, L., Abet, M., Sorrentino, C., Acanfora, F., Cozzolino, E. and Di Muro, A. Genetic variability in Nicotiana tabacum and Nicotiana species as revealed by RAPD markers: Development of the RAPD procedure. Contrib. Tob. Res. 19 (2000) 1-15.
德尔皮亚诺湖,Abet,M.,Sorrentino,C.,Acanfora,F.,Cozzolino,E.和Di Muro,A. RAPD标记揭示的烟草和烟草属物种的遗传变异:RAPD程序的发展。康特里托布第19(2000)号决议第1-15段。

Denduangboripant, J., Piteekan, T. and Nantharat, M. Genetic polymorphism between tobacco cultivar-groups revealed by amplified fragment length polymorphism analysis. J. Agric. Sci. 2 (2010) 41-48.
Denduangboripant,J.,Piteekan,T.和Nantharat,M.烟草品种群间遗传多态性的扩增片段长度多态性分析。J. Agric. Sci. 2(2010)41-48。

Dhondt, S., Coppens, F., De Winter, F., Swarup, K., Merks, R.M., Inzé, D., Bennett, M.J. and Beemster, G.T. SHORT-ROOT and SCARECROW regulate leaf growth in Arabidopsis by stimulating S-phase progression of the cell cycle. Plant Physiol. 154 (2010) 1183-1195.
Dhondt,S.,Coppens,F.,De Winter,F.,Swarup,K.,Merks,R.M.,Inzé,D.,班尼特,M. J.和Beemster,G.T.短根和SCARECROW通过刺激细胞周期的S期进程来调节拟南芥叶片的生长。植物生理154(2010)1183-1195。

Edwards, K.D., Fernandez-Pozo, N., Drake-Stowe, K., Humphry, M., Evans, A.D., Bombarely, A., Allen, F., Hurst, R., White, B., Kernodle, S.P., Bromley, J.R., et al. A reference genome for Nicotiana tabacum enables map-based cloning of homeologous loci implicated in nitrogen utilization efficiency. Bmc Genomics 18 (2017) 448.
爱德华兹,K.D.,Fernando-Pozo,N.,Drake-Stowe,K.,汉弗莱,M.,埃文斯,A.D.,博姆斯特拉夫,A.,艾伦,F.,赫斯特河,白色,B.,Kernodle,S.P.,布罗姆利,J.R.,等人烟草的参考基因组使得能够基于图位克隆与氮利用效率有关的同源染色体基因座。448.第448章:你是我的女人

Evanno, G., Regnaut, S., and Goudet, J. Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study. Mol.Ecol. 14 (2005) 2611-2620.
Evanno,G.,Regnaut,S. Goudet,J.使用软件STRUCTURE检测个体集群的数量:模拟研究。分子14(2005)2611-2620。

Fricano, A., Bakaher, N., Del Corvo, M., Piffanelli, P., Donini, P., Stella, A., Ivanov, N.V. and Pozzi, C. Molecular diversity, population structure, and linkage disequilibrium in a worldwide collection of tobacco (Nicotiana tabacum L.) germplasm. BMC Genet. 13 (2012) 18.
Fricano A ,bakaher n . ,关于 乌鸦 , 先生 。皮卡丘 , p 。Dear , P 。Stella , A ,伊凡 诺夫 , N . V. 和 Pozzi , C 。《 烟草 的 分子 多样 性 , 人口 结构 和 连锁 不 平衡 》 ( Molecular Diversity , Population structure , and Linkage Disequilibrium in a Worldwide Collection of Tobacco )Germplasm 的 。BMC Genet 基因. 13( 2012 年 )18 .

Gao, F., Robe, K., Bettembourg, M., Navarro, N., Rofidal, V., Santoni, V., Gaymard, F., Vignols, F., Roschzttardtz, H., Izquierdo, E., and Dubos, C.. The transcription factor bHLH121 interacts with bHLH105 (ILR3) and its closest homologs to regulate iron homeostasis in Arabidopsis. Plant Cell 32 (2020) 508–524.
gao , f ,K-K , K Bettembourg , M 。Navarro N . ,罗 非 尔 , 五 ,索尼 , V 。Gaymard , F . ,Vignors , F . ,Roschzttardtz , H 。yjjj , e ,杜波士 ( Dubos ) , C..转录 因子 bHLH 121 交互 与 bHLH 105 ( ILR 3 ) 和 其 关闭 HOMs 来 调节 伊朗 稳态 在 拟南芥 中 。32 号 牢房( 2020 年 )508-524.

Gong, D., Huang, L., Xu, X., Wang, C., Ren, M., Wang, C. and Chen, M., Construction of a high-density SNP genetic map in flue-cured tobacco based on SLAF-seq. Mol. Breeding, 36 (2016) 1-12.
龚,D.,黄湖,加-地徐,X.,Wang,C.,中国地质大学,Ren,M.,Wang,C.和Chen,M.,基于SLAF-seq的烤烟高密度SNP遗传图谱构建摩尔Breeding,36(2016)1-12.

Guden, B., Yol, E., Erdurmus, C., Lucas, S.J. and Uzun, B. Construction of a high-density genetic linkage map and QTL mapping for bioenergy-related traits in sweet sorghum [Sorghum bicolor (L.) Moench]. Front. Plant Sci. 14 (2023) 1081931.
Guden,B.,约尔,Erdurmus,C.,Lucas,S.J.和Uzun,B.甜高粱高密度遗传连锁图谱的构建及生物能源相关性状的QTL定位Moench]。前面植物科学14(2023)1081931。

Hasan, N., Choudhary, S., Naaz, N., Sharma, N. and Laskar, R.A. Recent advancements in molecular marker-assisted selection and applications in plant breeding programmes. J. Genet. Eng. Biotechn. 19 (2021) 128.
哈桑,N.,Choudhary,S.,Naaz,N.,夏尔马,加-地和Laskar,R.A. 分子标记辅助选择及其在植物育种中的应用。J. Genet. Eng. 生物技术公司19(2021)128.

Hirsch, S. and Oldroyd, G.E. GRAS-domain transcription factors that regulate plant development. Plant Signal. Behav. 4 (2009) 698-700.
Hirsch,S.和Oldroyd,通用电气公司调节植物发育的GRAS结构域转录因子。植物信号Behav. 4(2009)698-700。

Howard, N.P., Troggio, M., Durel, C.-E., Muranty, H., Denancé, C., Bianco, L., Tillman, J. and Van de Weg, E. Integration of Infinium and Axiom SNP array data in the outcrossing species Malus× domestica and causes for seemingly incompatible calls. BMC genomics 22 (2021) 1-18.
霍华德,N. P.,Troggio,M.,Durel,C.- E、Muranty,H.,Denancé角,比安科湖,Tillman,J.和货车de Weg,E. Infinium和Axiom SNP阵列数据在异交种苹果中的整合以及看似不相容的调用的原因。BMC基因组学22(2021)1-18。

Huang, W., Guo, J., Wan, F., Gao, B., and Xie, X. AFLP analyses on genetic diversity and structure of Eupatorium adenophorum populations in China. J. Agric. Biotechnol. 5 (2008) 33-41.
黄伟,郭杰,万,F.,Gao,B.,和Xie,X.中国紫茎泽兰居群遗传多样性和遗传结构的AFLP分析。J.Agric. Biotechnol. 5(2008)33-41。

Ikram, M., Lai, R., Xia, Y., Li, R., Zhao, W., Siddique, K.H.M., Chen, J., and Guo, P. Genetic dissection of tobacco (Nicotiana tabacum L.) plant height using single-locus and multi-locus genome-wide association studies. Agronomy. 5 (2022) 1047.
伊克拉姆,M.,赖河,英-地夏,Y.,Li,R.,赵文,西迪克,K.H.M.,陈,J. Guo,P.烟草(Nicotiana tabacum L.)株高的单位点和多位点全基因组关联研究。农学5(2022)1047.

Ivanov, N.V., Sierro, N. and Peitsch, M.C. The Tobacco Plant Genome:Springer International Publishing. (2020)
伊万诺夫,N. V.,Sierro,N.和Peitsch,M.C.烟草植物基因组:施普林格国际出版社。(2020年)

Jiang, Y., Dong, L., Li, H., Liu, Y., Wang, X. and Liu, G. Genetic linkage map construction and QTL analysis for plant height in proso millet (Panicum miliaceum L.). Theor. Appl. Genet. 137 (2024) 78.
江,Y.,东湖,加-地Li,H.,Liu,Y.,Wang,X.和Liu,G.谷子遗传连锁图谱构建及株高QTL分析Theor.应用遗传学137(2024)78.

Jiao, F., Wu, X., Chen, X., Xu, M., Li, Y., Collectio, identification and application of tobacco resources in China. Tob. Sci. Technol. 10 (2019) 1-7.
F.,Wu,X.,中国农业科学院,陈旭,徐,M.,李,Y.,中国烟草资源的收集、鉴定及应用。托比Sci. 技术10(2019)1-7.

Kumar, S., Stecher, G., Peterson, D. and Tamura, K. MEGA-CC: computing core of molecular evolutionary genetics analysis program for automated and iterative data analysis. Bioinformatics 28 (2012) 2685-2686.
库马尔,S.,施彻尔,G.,彼得森,D. Tamura,K. MEGA-CC:用于自动化和迭代数据分析的分子进化遗传学分析程序的计算核心。生物信息学28(2012)2685-2686。

Leventhal, A.M., Conti, D.V., Ray, L.A., Baurley, J.W., Bello, M.S., Cho, J., Zhang, Y., Pester, M.S., Lebovitz, L., Budiarto, A., et al. A genetic association study of tobacco withdrawal endophenotypes in African Americans. Exp. Clin. Psychopharmacol 30 (2022) 673-681.
Leventhal,A. M.,Conti,D.V.,雷,洛杉矶Baurley,J.W.,Bello,M.S.,周,J.,张玉,Pester,MS,莱博维茨湖,Budiarto,A.,等人非裔美国人烟草戒断内在表型的遗传关联研究Exp. Clin. Psychopharmacol 30(2022)673-681.

Lewis, R.S. Use of exotic Nicotiana tabacum germplasm for confronting an inverse genetic correlation in flue-cured tobacco. Crop Science. 3 (2023) 1397-1407.
刘易斯,RS利用外来烟草种质资源应对烤烟的逆遗传相关。作物科学3(2023)1397-1407。

Lewis, R.S. and Nicholson, J.S. Aspects of the evolution of Nicotiana tabacum L. and the status of the United States Nicotiana Germplasm Collection. Genet. Resour. Crop. Ev. 54 (2007) 727-740.
刘易斯,RS和Nicholson,J.S.烟草进化的若干问题。以及美国烟草种质资源保藏中心的地位。Genet. 资源作物EV. 54(2007)727-740。

Li, M., He, Y., Rong, L., Guan, L., Dong, W., Ji, Y., Xin, Y., Huang, S., Wang, C. and Yu, M. Construction of SNP genetic maps based on targeted next-generation sequencing and QTL mapping of vital agronomic traits in faba bean (Vicia faba L.). J. Integr. Agr. 22 (2023) 2648-2659.
Li,M.,他,Y.,荣湖,加-地关湖,加-地董伟,Ji,Y.,Xin,Y.,Huang,S.,(1996年),中国科学院,Wang,C.和Yu,M.基于下一代定向测序构建蚕豆SNP遗传图谱和重要农艺性状QTL定位J. 积分协议22(2023)2648-2659。

Li, X., Singh, J., Qin, M., Li, S., Zhang, X., Zhang, M., Khan, A., Zhang, S. and Wu, J. Development of an integrated 200K SNP genotyping array and application for genetic mapping, genome assembly improvement and genome wide association studies in pear (Pyrus). Plant Biotechnol. J. 17 (2019) 1582-1594.
Li,X.,辛格,J.,秦,M.,Li,S.,张,X.,张,M.,汗,A.,Zhang,S.和Wu,J.综合200 K SNP基因分型阵列的开发和在梨(Pyrus)中用于遗传作图、基因组组装改进和基因组广泛关联研究的应用。植物生物技术J. 17(2019)1582-1594.

Li, Y., Dai, X., Cheng, Y. and Zhao, Y. NPY genes play an essential role in root gravitropic responses in Arabidopsis. Mol. Plant 4 (2011) 171-179.
李,Y.,Dai,X.,(1986 - 1990)中国科学院院士,郑,Y.和Zhao,Y. NPY基因在拟南芥根向重力反应中起重要作用。摩尔Plant 4(2011)171-179.

Liu, X., He, C., Yang, Y. ZHang, H.Y. Genetic diversity among flue-cured tobacco cultivars on the basis of AFLP markers. Czech J. Genet. Plant Breed. 45 (2009) 155-159.
刘,X.,他,C.,Yang,Y. Zhang,H.Y.烤烟品种遗传多样性的AFLP分析。捷克J. Genet. 植物品种45(2009)155-159。

Madhav, M., Raju, K.S., Gaikwad, K., Vishalakshi, B., Murthy, T. and Umakanth, B. Development of new set of microsatellite markers in cultivated tobacco and their transferability in other Nicotiana spp. Mol. Plant Breed. 16 (2015) 1-13
Madhav,M.,Raju,K.S.,Gaikwad,K.,Vishalakshi,B.,穆尔蒂,T。和Umakanth,B.栽培烟草微卫星标记的开发及其在其他烟草属中的可转移性。摩尔植物品种16(2015)1-13

Makhoul, M., Rambla, C., Voss-Fels, K.P., Hickey, L.T., Snowdon, R.J. and Obermeier, C. Overcoming polyploidy pitfalls: a user guide for effective SNP conversion into KASP markers in wheat. Theor. Appl. Genet. 133 (2020) 2413-2430.
Makhoul,M.,兰布拉角,Voss-Fels,K.P.,希基中尉Snowdon,R.J.和Obermeier,C.克服多倍体陷阱:小麦中有效的SNP转化为KASP标记的用户指南。Theor.应用遗传学133(2020)2413-2430.

Marrano, A., Martinez-Garcia, P.J., Bianco, L., Sideli, G.M., Di Pierro, E.A., Leslie, C.A., Stevens, K.A., Crepeau, M.W., Troggio, M., Langley, C.H. et al. A new genomic tool for walnut (Juglans regia L.): development and validation of the high-density Axiom J. regia 700K SNP genotyping array. Plant Biotechnol. J. 17 (2019) 1027-1036.
Marrano,A.,马丁内斯-加西亚,P. J.,比安科湖,Sideli,G.M.,Di Pierro,E.A.,莱斯莉,CA史蒂文斯,堪萨斯州,克雷波,M.W.,Troggio,M.,兰利,C.H. 等人一种新的核桃基因组工具:开发和验证高密度Axiom J. regia 700 K SNP基因分型阵列。植物生物技术J.17(2019)1027-1036.

Meng, L., Li, H., Zhang, L. and Wang, J. QTL IciMapping: Integrated software for genetic linkage map construction and quantitative trait locus mapping in biparental populations. Crop J. 3 (2015) 269-283.
孟湖,加-地Li,H.,张丽和Wang,J. QTL IciMapping:Integratedsoftware for geneticlinkingmapconstructionandquantitativetraitlocemapping in biparent populations. Crop J. 3(2015)269-283。

Merot-L'anthoene, V., Tournebize, R., Darracq, O., Rattina, V., Lepelley, M., Bellanger, L., Tranchant-Dubreuil, C., Coulee, M., Pegard, M., Metairon, S., Fournier, C., et al. Development and evaluation of a genome-wide Coffee 8.5K SNP array and its application for high-density genetic mapping and for investigating the origin of Coffea arabica L. Plant Biotechnol. J. 17 (2019) 1418-1430.
Merot-L 'anthoene,V.,图尔内比兹河Darracq,O.,Rattina,V.,Lepelley,M.,贝朗格湖Tranchant-Dubreuil,C.古力,M.,Pegard,M.,金属铁,S.,福尼尔角,澳-地等人全基因组咖啡8.5K SNP芯片的构建与评价及其在高密度遗传作图和小粒咖啡起源研究中的应用植物生物技术J.17(2019)1418-1430.

Miura, K., Lee, J., Miura, T., and Hasegawa, P.M.. SIZ1 controls cell growth and plant development in Arabidopsis through salicylic acid. Plant Cell Physiol. 51 (2010)103–113.
米乌拉,K.,李,J.,三浦,T.,和长谷川,下午。SIZ 1通过水杨酸调控拟南芥的细胞生长和植物发育。Plant Cell Physiol.51(2010)103-113.

Montanari, S., Bianco, L., Allen, B.J., Martínez-García, P.J., Bassil, N.V., Postman, J., Knäbel, M., Kitson, B., Deng, C., and Chagné, D. Development of a highly efficient Axiom™ 70 K SNP array for Pyrus and evaluation for high-density mapping and germplasm characterization. BMC genomics 20 (2019) 1-18.
Montanari,S.,比安科湖,艾伦,B. J.,Martínez-García,P.J.,Bassil,N.V. Postman,J.,Knäbel,M.,Kitson,B.,登角,澳-地Chagné,D.开发用于梨属的高效Axiom™ 70 K SNP阵列并评估其高密度作图和种质表征。BMC Genomics 20(2019)1-18.

Montanari, S., Deng, C., Koot, E., Bassil, N.V., Zurn, J.D., Morrison-Whittle, P., Worthington, M.L., Aryal, R., Ashrafi, H., Pradelles, J., Wellenreuther, M. and Chagné, D. A multiplexed plant–animal SNP array for selective breeding and species conservation applications. G3: Genes, Genomes, Genetics. (2023) 10.jakd170
Montanari,S.,邓角,Koot,E.,Bassil,N.V. Zurn,J.D.,Morrison-Whittle,P.,马里兰州沃辛顿,阿亚尔河阿什拉菲,H.,Pradelles,J.,Wellenreuther,M.和Chagné,D.用于选择性育种和物种保护应用的多重植物-动物SNP阵列。G3:基因,基因组,遗传学(2023)10. jakd170

Moon, H.S., Nicholson, J.S., Heineman, A., Lion, K., van der Hoeven, P., Hayes, A.J. and Lewis, R.S. Changes in Genetic Diversity of US Flue-Cured Tobacco Germplasm over Seven Decades of Cultivar Development. Crop Sci. 49 (2009) 498-508.
穆恩,H.S.,Nicholson,J.S.,Heineman,A.,Lion,K.,货车德胡芬,P.,Hayes,A. J.和刘易斯,R.S.美国烤烟种质资源70年来品种开发的遗传多样性变化作物科学49(2009)498-508。

Nie, Q., Huang, X., Meng, J., Li, X. and Liu, R. Genetic diversity and genetic relatives analysis of tobacco germplasm based on SRAP. Southwest China J. Agric. Sci. 25 (2012) 1578-1584.
聂,Q,黄,X.,孟,J.,Li,X.和Liu,R.基于SRAP的烟草种质资源遗传多样性及亲缘关系分析西南J. Agric. Sci. 25(2012)1578-1584。

Ongom, P.O., Fatokun, C., Togola, A., Garcia-Oliveira, A.L., Ng, E.H., Kilian, A., Lonardi, S., Close, T.J. and Boukar, O. A mid‐density single‐nucleotide polymorphism panel for molecular applications in cowpea (Vigna unguiculata (L.) Walp). Int. J. Genomics 2024 (2024) 9912987.
Ongom,P.O.,Fatokun角,Togola,A.,Garcia-Oliveira,A.L.,Ng,E.H.,Kilian,A.,Lonardi,S.,Close,T.J. and Boukar,O.一个中等密度的单核苷酸多态性面板的分子应用在豇豆(豇豆unguiculata(L.)Walp)。于t. J. 基因组学2024(2024)9912987.

Ouellette, L.A., Reid, R.W., Blanchard, S.G. and Brouwer, C.R. LinkageMapView—rendering high-resolution linkage and QTL maps. Bioinformatics 34 (2018) 306-307.
洛杉矶,韦莱特,Reid,R.W.,Blanchard,S.G.和Brouwer,C.R. LinkageMapView-绘制高分辨率的连锁和QTL图谱。生物信息学34(2018)306-307。

Qi, J., Wang, T., Chen, S., Zhou, D., Z., Fang, P., Tao,A., Liang, J., and Wu,W. Genetic diversity and genetic relatives analysis of tobacco germplasm based on inter-simple sequence repeat (ISSR). Acta Agronomica Sinica 32 (2006) 373-378.
齐,J.,王,T.,陈淑仪,Zhou,D.,中国科学院学报,Z.,Fang,P.,中国农业科学院,陶,A.,梁杰,和Wu,W.烟草种质资源遗传多样性及亲缘关系的ISSR分析.农学学报32(2006)373-378.

Roncallo, P. F., Larsen, A. O., Achilli, A. L., Pierre, C. S., Gallo, C. A., Dreisigacker, S., Echenique V.. Linkage disequilibrium patterns, population structure and diversity analysis in a worldwide durum wheat collection including Argentinian genotypes. BMC Genomics 22 (2021) 1–17.
Roncallo,P. F.,拉森A. O.,阿奇里A. L.,皮埃尔角S.,加洛角,澳-地一、Dreisigacker,S.,EcheniqueV..包括阿根廷基因型在内的世界范围硬粒小麦收集的连锁不平衡模式、群体结构和多样性分析。BMC Genomics 22(2021)1-17.

Roorkiwal, M., Jain, A., Kale, S.M., Doddamani, D., Chitikineni, A., Thudi, M. and Varshney, R.K. Development and evaluation of high-density Axiom((R)) CicerSNP Array for high-resolution genetic mapping and breeding applications in chickpea. Plant Biotechnol. J. 16 (2018) 890-901.
罗基瓦尔,M.,贾恩,A.,羽衣甘蓝,SM,Doddamani,D.,Chitikineni,A.,图迪,M。和Varshney,R.K.开发和评估高密度Axiom((R))CicerSNP阵列,用于鹰嘴豆的高分辨率遗传作图和育种应用。植物生物技术J. 16(2018)890-901。

Sarala, K. and Rao, S. Genetic diversity in Indian FCV and burley tobacco cultivars. J. Genet. 87 (2008) 159.
Sarala,K.和Rao,S。印度FCV和白肋烟品种的遗传多样性。J. Genet. 第87(2008)号来文。

Saygili, I., Kinay, A., Kurt, D. and Kandemir, N. Genetic and agronomic diversity of Basma tobacco (Nicotiana tabacum L.) landrace in Turkey. Biotechnol. Agron. Soc. 25 (2021) 279-290.
萨伊吉利岛Kinay,A.,Kurt,D.和Kandemir,N.巴斯玛烟草的遗传多样性和农艺多样性土耳其的长白猪Biotechnol. 艾力冈Soc. 25(2021)279-290。

Scheben, A., Batley, J. and Edwards, D. Genotyping-by-sequencing approaches to characterize crop genomes: choosing the right tool for the right application. Plant Biotechnol. J. 15 (2017) 149-161.
Scheben,A.,Batley,J.和爱德华兹,D.基因分型测序方法表征作物基因组:为正确的应用选择正确的工具。植物生物技术J.15(2017)149-161.

Shava, J.G., Richardson-Kageler, S., Dari, S., Magama, F. and Rukuni, D. Breeding for Yield, Quality and Associated Traits in the Zimbabwean Flue-Cured Tobacco (Nicotiana tabacum L.). Agricultural Reviews 41 (2020) 79-84.
Shava,J.G.,Richardson-Kageler,S.,Dari,S.,Magama,F.和Rukuni,D.津巴布韦烤烟产量、品质及相关性状的育种农业评论41(2020)79-84。

Sierro, N., Auberson, M., Dulize, R. and Ivanov, N.V. Chromosome-level genome assemblies of Nicotiana tabacum, Nicotiana sylvestris, and Nicotiana tomentosiformis. Sci. Data 11 (2024) 135.
Sierro,N.,Auberson,M.,迪沃河和Ivanov,NV烟草、林烟草和绒毛烟草的染色体水平基因组组装。Sci. 第135章.

Sierro, N., Battey, J.N., Ouadi, S., Bakaher, N., Bovet, L., Willig, A., Goepfert, S., Peitsch, M.C. and Ivanov, N.V. The tobacco genome sequence and its comparison with those of tomato and potato. Nat. Commun. 5 (2014) 3833.
Sierro,N.,Battey,J.N.,Ouadi,S.,Bakaher,N.,博韦湖,加-地Willig,A.,Goepfert,S.,Peitsch,M.C.和Ivanov,N. V.烟草基因组序列及其与番茄和马铃薯基因组序列的比较。Nat. Commun. 5(2014)3833。

SivaRaju, K., Madhav, M., Sharma, R., Murthy, T., Singh, N., Bansal, K., Koundal, K. and Mohapatra, T. Molecular diversity in Indian tobacco types as revealed by randomly amplified DNA polymorphism. Journal of Plant Biochemistry and Biotechnology 17 (2008) 51-56.
SivaRaju,K.,Madhav,M.,夏尔马河,巴西-地Murthy,T.,辛格,N.,班萨尔,K.,昆达尔角和Mohapatra,T.随机扩增DNA多态性揭示的印度烟草类型的分子多样性。植物生物化学和生物技术杂志17(2008)51-56。

Taylor, J. and Butler, D. R package ASMap: efficient genetic linkage map construction and diagnosis. J. Stat. Softw. 6 (2017) 1-29.
Taylor,J.和Butler,D. R软件包ASMap:高效的遗传连锁图谱构建和诊断。J. Stat.软件6(2017)1-29.

Thimmegowda, G.C., Ramadoss, S.K., Kaikala, V., Rathinavelu, R., Thamalampudi, V.R., Dhavala, V. and Saiprasad, G. Whole genome resequencing of tobacco (Nicotiana tabacum L.) genotypes and high-throughput SNP discovery. Mol. Breed. 38 (2018) 1-10.
Thimmegowda,G.C.,Ramadoss,S.K.,Kaikala,V.,Rathinavelu河,Thumarampudi,V.R.,Dhavala,V.和Saiprasad,G.烟草全基因组重测序基因型和高通量SNP发现。摩尔品种38(2018)1-10。

Tong, Z., Fang, D., Chen, X., Jiao, F., Zhang, Y., Li, Y. and Xiao, B. Genome-wide association study of leaf chemistry traits in tobacco. Breed. Sci. 70 (2020a) 253-264.
唐,Z. Fang,D.,中国植物志,陈旭,焦,F.,张玉,Li,Y.和Xiao,B.烟草叶片化学性状的全基因组关联研究。品种。Sci. 70(2020 a)253-264。

Tong, Z., Jiang, S., He, W., Chen, X., Yin, L., Fang, D., Hu, Y., Jiao, F., Zhang, C., Zeng, J. and Wu, X. Construction of high-density genetic map and QTL mapping in Nicotiana tabacum backcrossing BC4F3 population using whole-genome sequencing. Czech J. Genet. Plant Breed. 3 ( 2021)102-112.
唐,Z.江,S.,他,W.,陈旭,Yin,L.,(1996 - 1997),美国,Fang,D.,中国植物志,Hu,Y.,焦,F.,张,C.,Zeng,J.和Wu,X.烟草回交BC 4F 3群体高密度遗传图谱构建及QTL定位捷克J. Genet. 植物品种3(2021)102-112。

Tong, Z., Xiao, B., Jiao, F., Fang, D., Zeng, J., Wu, X., Chen, X., Yang, J. and Li, Y. Large-scale development of SSR markers in tobacco and construction of a linkage map in flue-cured tobacco. Breed. Sci. 66 (2016) 381-390.
唐,Z. Xiao,B.,焦,F.,Fang,D.,中国植物志,曾杰,Wu,X.,中国农业科学院,陈旭,Yang,J.和Li,Y.烟草SSR标记的大规模开发及烤烟连锁图谱的构建。品种Sci. 66(2016)381-390。

Tong, Z., Yang, Z., Chen, X., Jiao, F., Li, X., Wu, X., Gao, Y., Xiao, B. and Wu, W. Large‐scale development of microsatellite markers in Nicotiana tabacum and construction of a genetic map of flue‐cured tobacco. Plant Breed. 131 (2012) 674-680.
唐,Z.杨志,陈旭,焦,F.,Li,X.,Wu,X.,中国农业科学院,高,Y.,肖,B。和Wu,W.烟草微卫星标记的大规模开发和烤烟遗传图谱的构建。植物品种131(2012)674-680。

Tong, Z., Zhou, J., Xiu, Z., Jiao, F., Hu, Y., Zheng, F., Chen, X., Li, Y., Fang, D. and Li, S. Construction of a high-density genetic map with whole genome sequencing in Nicotiana tabacum L. Genomics 112 (2020b) 2028-2033.
唐,Z.周杰,Xiu,Z.,焦,F.,Hu,Y.,郑芳,陈旭,李,Y.,Fang,L.方氏D.和Li,S.烟草高密度遗传图谱的构建及全基因组测序。基因组学112(2020 b)2028-2033。

Walkowiak, S., Gao, L., Monat, C., Haberer, G., Kassa, M.T., Brinton, J., Ramirez-Gonzalez, R.H., Kolodziej, M.C., Delorean, E., Thambugala, et al. Multiple wheat genomes reveal global variation in modern breeding. Nature 588 (2020) 277-283.
Walkowiak,S.,高湖,加-地Monat角,哈伯雷尔,G.,卡萨,麻省理工学院,Brinton,J.,Ramirez-Gonzalez,R.H.,Kolodziej,M.C.,Delorean,E.,Thambugala等人多个小麦基因组揭示了现代育种中的全球变异。Nature 588(2020)277-283.

Wang, J., Chu, S., Zhang, H., Zhu, Y., Cheng, H. and Yu, D. Development and application of a novel genome-wide SNP array reveals domestication history in soybean. Sci. Rep. 6 (2016) 20728.
王杰,Chu,S.,张洪,Zhu,Y.,中国科学院,Cheng,H.和Yu,D.开发和应用一种新的全基因组SNP阵列揭示大豆的驯化历史。Sci. 议员6(2016)20728。

Wang, J., Zhang, Q., Tung, J., Zhang, X., Liu, D., Deng, Y., Tian, Z., Chen, H., Wang, T. and Yin, W. High-quality assembled and annotated genomes of Nicotiana tabacum and Nicotiana benthamiana reveal chromosome evolution and changes in defense arsenals. Mol. Plant. 3 (2024) 423-437
王杰,张,Q,董洁张,X.,Liu,D.,中国科学院,邓,Y.,田志陈洪,Wang,T. Yin,W.高质量的组装和注释的烟草和本塞姆氏烟草基因组揭示了染色体进化和防御武器库的变化。摩尔3(2024)423-437

Wang, X., Li, M.W., Wong, F.L., Luk, C.Y., Chung, C.Y.L., Yung, W.S., Wang, Z., Xie, M., Song, S. and Chung, G. Increased copy number of gibberellin 2‐oxidase 8 genes reduced trailing growth and shoot length during soybean domestication. Plant J. 107 (2021) 1739-1755.
王,X.,Li,M.W.,Wong,F.L.,陆超英,Chung,C.Y.L.,容,WS,王志,谢,M.宋,S.和Chung,G.在大豆驯化过程中,增加赤霉素2-氧化酶8基因的拷贝数减少了拖尾生长和芽长。植物J. 107(2021)1739-1755。

Wang, Y., and Zhou, J., Parentage Analysis of Major Tobacco Varieties and Tobacco Breeding in America and China. Acta Tabacaria Sinica 1 (1995) 11-22. 
王玉,和Zhou,J.,美国和中国主要烟草品种的亲缘关系分析及烟草育种。烟草学报1(199511-22.  

Wang, Y., Yu, J., Jiang, M., Lei, W., Zhang, X. and Tang, H. (2023) Sequencing and Assembly of Polyploid Genomes. In: Polyploidy: Methods and Protocols (Van de Peer, Y. ed). NY: Springer US, New York, 2023, pp. 429-458.
王玉,余,J.,蒋,M.,雷伟,Zhang,X.和Tang,H.(2023)多倍体基因组的测序和组装。在:Polyploidy:Methods and Protocols(货车de Peer,Y.艾德)。NY:Springer US纽约,2023,pp. 429-458.

Wang, Z., Zhang, X., Liu, Y.,eds Atlas of Chinese tobacco core collection, Scientific and Technical Documents Publishing House, Beijing, 2014.
王志,张,X.,Liu,Y.,《中国烟草核心馆藏地图集》,科学技术文献出版社北京2014年

Wei, K., Wang, X., Hao, X., Qian, Y., Li, X., Xu, L., Ruan, L., Wang, Y., Zhang, Y., Bai, P., et al. Development of a genome-wide 200K SNP array and its application for high-density genetic mapping and origin analysis of Camellia sinensis. Plant Biotechnol. J. 20 (2022) 414-416.
魏,K.,王,X.,郝,X.,钱玉,Li,X.,徐,L.,阮湖,加-地王玉,张玉,白,P.,等人全基因组200 K SNP芯片的构建及其在茶树高密度遗传作图和起源分析中的应用植物生物技术J. 20(2022)414-416。

Widrlechner, M.P. Genetic markers and plant genetic resource management. Plant Breed. Rev 13 (1995) 11-86.
Widrlechner,MP遗传标记和植物遗传资源管理。植物品种。Rev 13(1995)11-86.

Winter, C.M., Szekely, P., Popov, V., Belcher, H., Carter, R., Jones, M., Fraser, S.E., Truong, T.V. and Benfey, P.N. SHR and SCR coordinate root patterning and growth early in the cell cycle. Nature 626 (2024) 611-616.
温特,C. M.,Szekely,P.,Popov,V.,Belcher,H.,卡特河,巴西-地琼斯,M.,Fraser,S. E.,Truong,T.V.和Benedict,P.N. SHR和SCR在细胞周期早期协调根的形成和生长。Nature 626