Value Type Name
0x7 itf8 codec id
0x2 itf8 number of bytes to follow
0x0 itf8 offset
0x1 itf8 K parameter| Value | Type | Name |
| :--- | :--- | :--- |
| 0x7 | itf8 | codec id |
| 0x2 | itf8 | number of bytes to follow |
| 0x0 | itf8 | offset |
| 0x1 | itf8 | K parameter |
size in bytes N key 1 value 1 key... value ... key N value N| size in bytes | N | key 1 | value 1 | key... | value ... | key N | value N |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
the block content identifier used to
associate external data blocks with
data series| the block content identifier used to |
| :--- |
| associate external data blocks with |
| data series |
coding of byte arrays with array
length| coding of byte arrays with array |
| :--- |
| length |
BYTE_ARRAY_STOP
5
byte stop, int 外部块内容 ID
byte stop, int external block
content id| byte stop, int external block |
| :--- |
| content id |
用 stopvalue 对字节数组进行编码
coding of byte arrays with a stop
value| coding of byte arrays with a stop |
| :--- |
| value |
BETA
6
int 偏移,int 位数
二进制编码
SUBEXP
7
int 偏移,int K
次指数编码
已停用(GOLOMB_RICE)
8
int 偏移,int log_(2)m\log _{2} \mathrm{~m}
戈隆-瑞斯编码
GAMMA
9
int 偏移
埃利亚斯伽马编码
Codec ID Parameters Comment
NULL 0 none series not preserved
EXTERNAL 1 int block content id "the block content identifier used to
associate external data blocks with
data series"
Deprecated (GOLOMB) 2 int offset, int M Golomb coding
HUFFMAN 3 array<int>, array<int> coding with int/byte values
BYTE_ARRAY_LEN 4 "encoding<int> array length,
encoding<byte> bytes" "coding of byte arrays with array
length"
BYTE_ARRAY_STOP 5 "byte stop, int external block
content id" "coding of byte arrays with a stop
value"
BETA 6 int offset, int number of bits binary coding
SUBEXP 7 int offset, int K subexponential coding
Deprecated (GOLOMB_RICE) 8 int offset, int log_(2)m Golomb-Rice coding
GAMMA 9 int offset Elias gamma coding| Codec | ID | Parameters | Comment |
| :--- | :--- | :--- | :--- |
| NULL | 0 | none | series not preserved |
| EXTERNAL | 1 | int block content id | the block content identifier used to <br> associate external data blocks with <br> data series |
| Deprecated (GOLOMB) | 2 | int offset, int M | Golomb coding |
| HUFFMAN | 3 | array<int>, array<int> | coding with int/byte values |
| BYTE_ARRAY_LEN | 4 | encoding<int> array length, <br> encoding<byte> bytes | coding of byte arrays with array <br> length |
| BYTE_ARRAY_STOP | 5 | byte stop, int external block <br> content id | coding of byte arrays with a stop <br> value |
| BETA | 6 | int offset, int number of bits | binary coding |
| SUBEXP | 7 | int offset, int K | subexponential coding |
| Deprecated (GOLOMB_RICE) | 8 | int offset, int $\log _{2} \mathrm{~m}$ | Golomb-Rice coding |
| GAMMA | 9 | int offset | Elias gamma coding |
有关上述所有编码算法及其参数的详细说明,请参见第 13 节。
4 校验和
校验和用于确保数据完整性。CRAM 中使用了以下校验和算法。
4.1 CRC32
这是一个循环冗余校验和,长度为 32 位,多项式为 0x04C11DB7。详情请参阅 ITU-T V. 42。CRC32 哈希函数的值写成整数。
Data type Name Value
byte[4] format magic number CRAM (0x43 0x52 0x41 0x4d)
unsigned byte major format number 3(0x3)
unsigned byte minor format number 1 (0x1)
byte[20] file id CRAM file identifier (e.g. file name or SHA1 checksum)| Data type | Name | Value |
| :--- | :--- | :--- |
| byte[4] | format magic number | CRAM (0x43 0x52 0x41 0x4d) |
| unsigned byte | major format number | $3(0 x 3)$ |
| unsigned byte | minor format number | 1 (0x1) |
| byte[20] | file id | CRAM file identifier (e.g. file name or SHA1 checksum) |
有效的 CRAM 主版本号和次版本号如下:
1.0 最初的公开 CRAM 版本。
2.0 第一个以 Java 和 C 语言实现的 CRAM 版本;整理了 1.0 版本中实现与规范之间的差异。
the sum of the lengths of all blocks in this container
(headers and data) and any padding bytes (CRAM header
container only); equal to the total byte length of the
container minus the byte length of this header structure| the sum of the lengths of all blocks in this container |
| :--- |
| (headers and data) and any padding bytes (CRAM header |
| container only); equal to the total byte length of the |
| container minus the byte length of this header structure |
itf8
参考序列 ID
该容器中的所有片段都必须有一个与该值匹配的参考序列 ID。
reference sequence identifier or
-1 for unmapped reads
-2 for multiple reference sequences.
All slices in this container must have a reference sequence
id matching this value.| reference sequence identifier or |
| :--- |
| -1 for unmapped reads |
| -2 for multiple reference sequences. |
| All slices in this container must have a reference sequence |
| id matching this value. |
itf8
起始位置
starting position on the
reference| starting position on the |
| :--- |
| reference |
the locations of slices in this container as byte offsets from
the end of this container header, used for random access
indexing. For sequence data containers, the landmark
count must equal the slice count.
Since the block before the first slice is the compression
header, landmarks[0] is equal to the byte length of the
compression header.| the locations of slices in this container as byte offsets from |
| :--- |
| the end of this container header, used for random access |
| indexing. For sequence data containers, the landmark |
| count must equal the slice count. |
| Since the block before the first slice is the compression |
| header, landmarks[0] is equal to the byte length of the |
| compression header. |
int
crc32
容器中前面所有字节的 CRC32 哈希值。
字节
大厦
容器中包含的区块。
Data type Name Value
int32 length "the sum of the lengths of all blocks in this container
(headers and data) and any padding bytes (CRAM header
container only); equal to the total byte length of the
container minus the byte length of this header structure"
itf8 reference sequence id "reference sequence identifier or
-1 for unmapped reads
-2 for multiple reference sequences.
All slices in this container must have a reference sequence
id matching this value."
itf8 "starting position on the
reference" the alignment start position
itf8 alignment span the length of the alignment
itf8 number of records number of records in the container
ltf8 record counter 1-based sequential index of records in the file/stream.
ltf8 bases number of read bases
itf8 number of blocks the total number of blocks in this container
array<itf8> landmarks "the locations of slices in this container as byte offsets from
the end of this container header, used for random access
indexing. For sequence data containers, the landmark
count must equal the slice count.
Since the block before the first slice is the compression
header, landmarks[0] is equal to the byte length of the
compression header."
int crc32 CRC32 hash of the all the preceding bytes in the container.
byte[ blocks The blocks contained within the container.| Data type | Name | Value |
| :---: | :---: | :---: |
| int32 | length | the sum of the lengths of all blocks in this container <br> (headers and data) and any padding bytes (CRAM header <br> container only); equal to the total byte length of the <br> container minus the byte length of this header structure |
| itf8 | reference sequence id | reference sequence identifier or <br> -1 for unmapped reads <br> -2 for multiple reference sequences. <br> All slices in this container must have a reference sequence <br> id matching this value. |
| itf8 | starting position on the <br> reference | the alignment start position |
| itf8 | alignment span | the length of the alignment |
| itf8 | number of records | number of records in the container |
| ltf8 | record counter | 1-based sequential index of records in the file/stream. |
| ltf8 | bases | number of read bases |
| itf8 | number of blocks | the total number of blocks in this container |
| array<itf8> | landmarks | the locations of slices in this container as byte offsets from <br> the end of this container header, used for random access <br> indexing. For sequence data containers, the landmark <br> count must equal the slice count. <br> Since the block before the first slice is the compression <br> header, landmarks[0] is equal to the byte length of the <br> compression header. |
| int | crc32 | CRC32 hash of the all the preceding bytes in the container. |
| byte[ | blocks | The blocks contained within the container. |
Data type Name Value
byte method the block compression method (and first CRAM version):
0: raw (none)*
1: gzip
2: bzip2 (v2.0)
3: lzma (v3.0)
4: rans4x8 (v3.0)
5: rans4x16 (v3.1)
6: adaptive arithmetic coder (v3.1)
7: fqzcomp (v3.1)
8: name tokeniser (v3.1)
byte block content type id the block content type identifier
itf8 size in bytes* the block content identifier used to associate external data
raw size in bytes* blocks with data series
itf8 block data size of the block data after applying block compression
itf8 the data stored in the before applying block compression
byte[] ・ bit stream of CRAM records (core data block)
∙ byte stream (external data block)
CRC32 additional fields ( header blocks)
byte[4] CRC32 hash value for all preceding bytes in the block | Data type | Name | Value |
| :--- | :--- | :--- |
| byte | method | the block compression method (and first CRAM version): |
| | | 0: raw (none)* |
| | | 1: gzip |
| | | 2: bzip2 (v2.0) |
| | | 3: lzma (v3.0) |
| | | 4: rans4x8 (v3.0) |
| | | 5: rans4x16 (v3.1) |
| | | 6: adaptive arithmetic coder (v3.1) |
| | | 7: fqzcomp (v3.1) |
| | | 8: name tokeniser (v3.1) |
| byte | block content type id | the block content type identifier |
| itf8 | size in bytes* | the block content identifier used to associate external data |
| | raw size in bytes* | blocks with data series |
| itf8 | block data | size of the block data after applying block compression |
| itf8 | | the data stored in the before applying block compression |
| byte[] | ・ bit stream of CRAM records (core data block) | |
| | | $\bullet$ byte stream (external data block) |
| | CRC32 | additional fields ( header blocks) |
| byte[4] | CRC32 hash value for all preceding bytes in the block | |
SAM 首部数据块的格式是一个 32 位小二进制整数,其长度为 SAM 首部文本的长度减去无效终止字节,然后是文本本身。虽然是 32 位,但允许的最大值是 2^(31)2^{31} ,而且所有长度都必须是正数。
以下限制适用于 SAM 标头文本:
除非参考序列已嵌入文件,否则需要 SQ:MD5 校验和。
8.4 压缩头块
压缩头块由 3 部分组成:保存图、数据序列编码图和标签编码图。
保护地图
保存映射包含 CRAM 文件中哪些数据被保存的信息。它以字节[2]键的映射形式存储:
钥匙
值数据类型
名称
价值
RN
bool
阅读名称包括
如果所有读取都保留读取名称,则为 true
AP
bool
AP 数据系列 delta
如果 AP 数据序列为 delta,则为 true,否则为 false
RR
bool
所需参考资料
如果需要参考序列才能完全恢复数据,则为 true
true if reference sequence is required to restore
the data completely| true if reference sequence is required to restore |
| :--- |
| the data completely |
SM
字节[5]
置换矩阵
置换矩阵
TD
数组<byte>
标签 ID 词典
标签 ID 列表,参见标签编码部分
Key Value data type Name Value
RN bool read names included true if read names are preserved for all reads
AP bool AP data series delta true if AP data series is delta, false otherwise
RR bool reference required "true if reference sequence is required to restore
the data completely"
SM byte[5] substitution matrix substitution matrix
TD array<byte> tag ids dictionary a list of lists of tag ids, see tag encoding section| Key | Value data type | Name | Value |
| :--- | :--- | :--- | :--- |
| RN | bool | read names included | true if read names are preserved for all reads |
| AP | bool | AP data series delta | true if AP data series is delta, false otherwise |
| RR | bool | reference required | true if reference sequence is required to restore <br> the data completely |
| SM | byte[5] | substitution matrix | substitution matrix |
| TD | array<byte> | tag ids dictionary | a list of lists of tag ids, see tag encoding section |
如果 AP-Delta = true:与前一条记录中的 AP 值的对齐起始三角洲值为 0。注意该三角洲值可能为负,例如在多参考分片中切换参考时。如果 AP-Delta = false:直接编码对齐起始位置。
if AP-Delta = true: 0-based alignment start
delta from the AP value in the previous record.
Note this delta may be negative, for example
when switching references in a multi-reference
slice. When the record is the first in the slice, the
previous position used is the slice alignment-start
field (hence the first delta should be zero for
single-reference slices, or the AP value itself for
multi-reference slices).
if AP-Delta = false: encodes the alignment start
position directly| if AP-Delta = true: 0-based alignment start |
| :--- |
| delta from the AP value in the previous record. |
| Note this delta may be negative, for example |
| when switching references in a multi-reference |
| slice. When the record is the first in the slice, the |
| previous position used is the slice alignment-start |
| field (hence the first delta should be zero for |
| single-reference slices, or the AP value itself for |
| multi-reference slices). |
| if AP-Delta = false: encodes the alignment start |
| position directly |
RG
编码<int>
阅读小组
读取组。特殊值"-1 "代表无组。
read groups. Special value ' -1 ' stands for no
group.| read groups. Special value ' -1 ' stands for no |
| :--- |
| group. |
RN^(a)\mathrm{RN}^{\mathrm{a}}
编码<byte[ ]>
阅读名称
阅读名称
MF
编码<int>
下一个队友位标志
参见具体章节
NS
编码<int>
下一个片段参考序列 ID
next fragment
reference sequence id| next fragment |
| :--- |
| reference sequence id |
下一个片段的参考序列 ID
NP
编码<int>
下一个对齐开始
next mate alignment
start| next mate alignment |
| :--- |
| start |
下一个片段的对齐位置
TS
编码<int>
模板尺寸
模板尺寸
NF
编码<int>
到下一个分段的距离
distance to next
fragment| distance to next |
| :--- |
| fragment |
跳转到下一个片段的记录数 ^(b){ }^{b}
TL^(C)\mathrm{TL}^{\mathrm{C}}
编码<int>
标签 id
标签 id 列表,参见标签编码部分
FN
编码<int>
阅读次数
number of read
features| number of read |
| :--- |
| features |
每条记录中的读取特征数
FC
编码<byte>
读取功能代码
见单独章节
FP
编码<int>
读入位置
读取特征的位置;最后一个位置的正 delta 值(从零开始)
positions of the read features; a positive delta to
the last position (starting with zero)| positions of the read features; a positive delta to |
| :--- |
| the last position (starting with zero) |
DL
编码<int>
删除长度
碱基对缺失长度
BB
编码<byte[]>
基地
基地
QQ
编码<byte[ ]>
质量分数线的长度
stretches of quality
scores| stretches of quality |
| :--- |
| scores |
质量得分
BS
编码<byte>
基本替换代码
base substitution
codes| base substitution |
| :--- |
| codes |
碱基替换码
IN
编码<byte[]>
插入
插入式基座
RS
编码<int>
参考跳读长度
N "读数特征的跳过碱基数
PD
编码<int>
衬垫
垫底数量
HC
编码<int>
硬夹
硬剪切基数
SC
编码<byte[ ]>
软夹
软剪裁底座
MQ
编码<int>
绘图质量
绘制质量分数
BA
编码<byte>
基地
基地
QS
编码<byte>
质量得分
质量得分
TC^(d)\mathrm{TC}^{\mathrm{d}}
不适用
遗留字段
置之不理
TN^(d)\mathrm{TN}^{\mathrm{d}}
不适用
遗留字段
置之不理
Key Value data type Name Value
BF encoding<int> BAM bit flags see separate section
CF encoding<int> CRAM bit flags see specific section
RI encoding<int> reference id record reference id from the SAM file header
RL encoding<int> read lengths read lengths
AP encoding<int> in-seq positions "if AP-Delta = true: 0-based alignment start
delta from the AP value in the previous record.
Note this delta may be negative, for example
when switching references in a multi-reference
slice. When the record is the first in the slice, the
previous position used is the slice alignment-start
field (hence the first delta should be zero for
single-reference slices, or the AP value itself for
multi-reference slices).
if AP-Delta = false: encodes the alignment start
position directly"
RG encoding<int> read groups "read groups. Special value ' -1 ' stands for no
group."
RN^(a) encoding<byte[ ]> read names read names
MF encoding<int> next mate bit flags see specific section
NS encoding<int> "next fragment
reference sequence id" reference sequence ids for the next fragment
NP encoding<int> "next mate alignment
start" alignment positions for the next fragment
TS encoding<int> template size template sizes
NF encoding<int> "distance to next
fragment" number of records to skip to the next fragment ^(b)
TL^(C) encoding<int> tag ids list of tag ids, see tag encoding section
FN encoding<int> "number of read
features" number of read features in each record
FC encoding<byte> read features codes see separate section
FP encoding<int> in-read positions "positions of the read features; a positive delta to
the last position (starting with zero)"
DL encoding<int> deletion lengths base-pair deletion lengths
BB encoding<byte[]> stretches of bases bases
QQ encoding<byte[ ]> "stretches of quality
scores" quality scores
BS encoding<byte> "base substitution
codes" base substitution codes
IN encoding<byte[]> insertion inserted bases
RS encoding<int> reference skip length number of skipped bases for the ' N ' read feature
PD encoding<int> padding number of padded bases
HC encoding<int> hard clip number of hard clipped bases
SC encoding<byte[ ]> soft clip soft clipped bases
MQ encoding<int> mapping qualities mapping quality scores
BA encoding<byte> bases bases
QS encoding<byte> quality scores quality scores
TC^(d) N/A legacy field to be ignored
TN^(d) N/A legacy field to be ignored| Key | Value data type | Name | Value |
| :---: | :---: | :---: | :---: |
| BF | encoding<int> | BAM bit flags | see separate section |
| CF | encoding<int> | CRAM bit flags | see specific section |
| RI | encoding<int> | reference id | record reference id from the SAM file header |
| RL | encoding<int> | read lengths | read lengths |
| AP | encoding<int> | in-seq positions | if AP-Delta = true: 0-based alignment start <br> delta from the AP value in the previous record. <br> Note this delta may be negative, for example <br> when switching references in a multi-reference <br> slice. When the record is the first in the slice, the <br> previous position used is the slice alignment-start <br> field (hence the first delta should be zero for <br> single-reference slices, or the AP value itself for <br> multi-reference slices). <br> if AP-Delta = false: encodes the alignment start <br> position directly |
| RG | encoding<int> | read groups | read groups. Special value ' -1 ' stands for no <br> group. |
| $\mathrm{RN}^{\mathrm{a}}$ | encoding<byte[ ]> | read names | read names |
| MF | encoding<int> | next mate bit flags | see specific section |
| NS | encoding<int> | next fragment <br> reference sequence id | reference sequence ids for the next fragment |
| NP | encoding<int> | next mate alignment <br> start | alignment positions for the next fragment |
| TS | encoding<int> | template size | template sizes |
| NF | encoding<int> | distance to next <br> fragment | number of records to skip to the next fragment ${ }^{b}$ |
| $\mathrm{TL}^{\mathrm{C}}$ | encoding<int> | tag ids | list of tag ids, see tag encoding section |
| FN | encoding<int> | number of read <br> features | number of read features in each record |
| FC | encoding<byte> | read features codes | see separate section |
| FP | encoding<int> | in-read positions | positions of the read features; a positive delta to <br> the last position (starting with zero) |
| DL | encoding<int> | deletion lengths | base-pair deletion lengths |
| BB | encoding<byte[]> | stretches of bases | bases |
| QQ | encoding<byte[ ]> | stretches of quality <br> scores | quality scores |
| BS | encoding<byte> | base substitution <br> codes | base substitution codes |
| IN | encoding<byte[]> | insertion | inserted bases |
| RS | encoding<int> | reference skip length | number of skipped bases for the ' N ' read feature |
| PD | encoding<int> | padding | number of padded bases |
| HC | encoding<int> | hard clip | number of hard clipped bases |
| SC | encoding<byte[ ]> | soft clip | soft clipped bases |
| MQ | encoding<int> | mapping qualities | mapping quality scores |
| BA | encoding<byte> | bases | bases |
| QS | encoding<byte> | quality scores | quality scores |
| $\mathrm{TC}^{\mathrm{d}}$ | N/A | legacy field | to be ignored |
| $\mathrm{TN}^{\mathrm{d}}$ | N/A | legacy field | to be ignored |
tag values (names and types are
available in the data series code)| tag values (names and types are |
| :--- |
| available in the data series code) |
dots\ldots
dots\ldots
dots\ldots
标签 ID n:标签类型 n
编码<byte[]>
读标签 N
dots\ldots
Key Value data type Name Value
TAG ID 1:TAG TYPE 1 encoding<byte[ ]> read tag 1 "tag values (names and types are
available in the data series code)"
dots dots dots
TAG ID N:TAG TYPE N encoding<byte[]> read tag N dots| Key | Value data type | Name | Value |
| :--- | :--- | :--- | :--- |
| TAG ID 1:TAG TYPE 1 | encoding<byte[ ]> | read tag 1 | tag values (names and types are <br> available in the data series code) |
| $\ldots$ | | $\ldots$ | $\ldots$ |
| TAG ID N:TAG TYPE N | encoding<byte[]> | read tag N | $\ldots$ |
reference sequence identifier or
-1 for unmapped reads
-2 for multiple reference sequences.
This value must match that of its enclosing
container.| reference sequence identifier or |
| :--- |
| -1 for unmapped reads |
| -2 for multiple reference sequences. |
| This value must match that of its enclosing |
| container. |
itf8
对齐开始
对齐起始位置
itf8
对齐跨度
长度
itf8
记录数
片段中的记录数
ltf8
记数器
文件/流中记录的基于 1 的顺序索引
1-based sequential index of records in the
file/stream| 1-based sequential index of records in the |
| :--- |
| file/stream |
itf8
块数
分片中的区块数
itf8[]
嵌入式基准块内容 ID
切片中区块的内容 id,表示嵌入式引用序列库的区块内容 id,无则为 -1
block content ids of the blocks in the slice
block content id for the embedded reference
sequence bases or -1 for none| block content ids of the blocks in the slice |
| :--- |
| block content id for the embedded reference |
| sequence bases or -1 for none |
MD5 checksum of the reference bases within
the slice boundaries. If this slice has
reference sequence id of -1 (unmapped) or -2
(multi-ref) the MD5 should be 16 bytes of \\0.
For embedded references, the MD5 can either
be all-zeros or the MD5 of the embedded
sequence.| MD5 checksum of the reference bases within |
| :--- |
| the slice boundaries. If this slice has |
| reference sequence id of -1 (unmapped) or -2 |
| (multi-ref) the MD5 should be 16 bytes of $\backslash 0$. |
| For embedded references, the MD5 can either |
| be all-zeros or the MD5 of the embedded |
| sequence. |
字节[16]
以 BAM 辅助字段形式编码的一系列标记、类型、值元组。
a series of tag,type,value tuples encoded as
per BAM auxiliary fields.| a series of tag,type,value tuples encoded as |
| :--- |
| per BAM auxiliary fields. |
byte[]
可选标签
Data type Name Value
itf8 reference sequence id "reference sequence identifier or
-1 for unmapped reads
-2 for multiple reference sequences.
This value must match that of its enclosing
container."
itf8 alignment start the alignment start position
itf8 alignment span the length of the alignment
itf8 number of records the number of records in the slice
ltf8 record counter "1-based sequential index of records in the
file/stream"
itf8 number of blocks the number of blocks in the slice
itf8[] embedded reference bases block content id "block content ids of the blocks in the slice
block content id for the embedded reference
sequence bases or -1 for none"
itf8 reference md5 "MD5 checksum of the reference bases within
the slice boundaries. If this slice has
reference sequence id of -1 (unmapped) or -2
(multi-ref) the MD5 should be 16 bytes of \\0.
For embedded references, the MD5 can either
be all-zeros or the MD5 of the embedded
sequence."
byte[16] "a series of tag,type,value tuples encoded as
per BAM auxiliary fields."
byte[] optional tags | Data type | Name | Value |
| :--- | :--- | :--- |
| itf8 | reference sequence id | reference sequence identifier or <br> -1 for unmapped reads <br> -2 for multiple reference sequences. <br> This value must match that of its enclosing <br> container. |
| itf8 | alignment start | the alignment start position |
| itf8 | alignment span | the length of the alignment |
| itf8 | number of records | the number of records in the slice |
| ltf8 | record counter | 1-based sequential index of records in the <br> file/stream |
| itf8 | number of blocks | the number of blocks in the slice |
| itf8[] | embedded reference bases block content id | block content ids of the blocks in the slice <br> block content id for the embedded reference <br> sequence bases or -1 for none |
| itf8 | reference md5 | MD5 checksum of the reference bases within <br> the slice boundaries. If this slice has <br> reference sequence id of -1 (unmapped) or -2 <br> (multi-ref) the MD5 should be 16 bytes of $\backslash 0$. <br> For embedded references, the MD5 can either <br> be all-zeros or the MD5 of the embedded <br> sequence. |
| byte[16] | | a series of tag,type,value tuples encoded as <br> per BAM auxiliary fields. |
| byte[] | optional tags | |
只有当切片的映射数据与单个参照(参照序列 ID >=0>=0 )对齐时,才应在解码过程中使用对齐起始值和对齐跨度值。对于多参考片段或具有未映射数据的片段,建议将这些字段的值填为 0。
如果存储的校验和为全零,则不应验证 MD5sum。在计算 MD5sum 之前,嵌入式参考文献应遵循与外部参考文献相同的大小写和字母规则。如果使用嵌入式参考文献,并不要求其与用于序列比对的参考文献完全匹配。例如,它可能包含没有覆盖的 "N "碱基,也可能对 SNP 变异有不同的碱基调用。因此,在使用嵌入序列时,MD5sum 指的是嵌入序列的校验和,而不应根据任何外部参照文件进行验证。
Data type Name Value
bit[ ] CRAM record 1 The first CRAM record
dots dots dots
bit[ ] CRAM record N The Nth CRAM record| Data type | Name | Value |
| :--- | :--- | :--- |
| bit[ ] | CRAM record 1 | The first CRAM record |
| $\ldots$ | $\ldots$ | $\ldots$ |
| bit[ ] | CRAM record N | The Nth CRAM record |
"Data series
type" "Data series
name" Field Description
int BF BAM bit flags see BAM bit flags below
int CF CRAM bit flags see CRAM bit flags below
- - Positional data See section 10.2
- - Read names See section 10.3
- - Mate records See section 10.4
- - Auxiliary tags See section 10.5
- - Sequences See sections 10.6 and 10.7| Data series <br> type | Data series <br> name | Field | Description |
| :--- | :--- | :--- | :--- |
| int | BF | BAM bit flags | see BAM bit flags below |
| int | CF | CRAM bit flags | see CRAM bit flags below |
| - | - | Positional data | See section 10.2 |
| - | - | Read names | See section 10.3 |
| - | - | Mate records | See section 10.4 |
| - | - | Auxiliary tags | See section 10.5 |
| - | - | Sequences | See sections 10.6 and 10.7 |
BAM 位标志(BF 数据系列)
以下标志与 SAM 和 BAM 规范中的标志重复,含义相同。但需要注意的是,其中一些标志可以在解码过程中产生,因此在 CRAM 文件中可以省略,并根据位于同一切片内的对端库的两个读数计算位数。
位标志
评论
说明
0x1
在测序中具有多重段的模板
template having multiple
segments in sequencing| template having multiple |
| :--- |
| segments in sequencing |
0x2
根据校准器正确校准每个片段
each segment properly aligned
according to the aligner