Value Type Name
0x7 itf8 codec id
0x2 itf8 number of bytes to follow
0x0 itf8 offset
0x1 itf8 K parameter| Value | Type | Name |
| :--- | :--- | :--- |
| 0x7 | itf8 | codec id |
| 0x2 | itf8 | number of bytes to follow |
| 0x0 | itf8 | offset |
| 0x1 | itf8 | K parameter |
size in bytes N key 1 value 1 key... value ... key N value N| size in bytes | N | key 1 | value 1 | key... | value ... | key N | value N |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
the block content identifier used to
associate external data blocks with
data series| the block content identifier used to |
| :--- |
| associate external data blocks with |
| data series |
coding of byte arrays with array
length| coding of byte arrays with array |
| :--- |
| length |
BYTE_ARRAY_STOP
5
字节停止,整数外部块内容 ID
byte stop, int external block
content id| byte stop, int external block |
| :--- |
| content id |
字节数组的编码与停止值
coding of byte arrays with a stop
value| coding of byte arrays with a stop |
| :--- |
| value |
BETA
6
int 偏移量, int 位数
二进制编码
SUBEXP
7
int offset, int K
亚指数编码
已弃用 (GOLOMB_RICE)
8
int offset, int log_(2)m\log _{2} \mathrm{~m}
戈隆布-赖斯编码
GAMMA
9
int offset
埃利亚斯伽马编码
Codec ID Parameters Comment
NULL 0 none series not preserved
EXTERNAL 1 int block content id "the block content identifier used to
associate external data blocks with
data series"
Deprecated (GOLOMB) 2 int offset, int M Golomb coding
HUFFMAN 3 array<int>, array<int> coding with int/byte values
BYTE_ARRAY_LEN 4 "encoding<int> array length,
encoding<byte> bytes" "coding of byte arrays with array
length"
BYTE_ARRAY_STOP 5 "byte stop, int external block
content id" "coding of byte arrays with a stop
value"
BETA 6 int offset, int number of bits binary coding
SUBEXP 7 int offset, int K subexponential coding
Deprecated (GOLOMB_RICE) 8 int offset, int log_(2)m Golomb-Rice coding
GAMMA 9 int offset Elias gamma coding| Codec | ID | Parameters | Comment |
| :--- | :--- | :--- | :--- |
| NULL | 0 | none | series not preserved |
| EXTERNAL | 1 | int block content id | the block content identifier used to <br> associate external data blocks with <br> data series |
| Deprecated (GOLOMB) | 2 | int offset, int M | Golomb coding |
| HUFFMAN | 3 | array<int>, array<int> | coding with int/byte values |
| BYTE_ARRAY_LEN | 4 | encoding<int> array length, <br> encoding<byte> bytes | coding of byte arrays with array <br> length |
| BYTE_ARRAY_STOP | 5 | byte stop, int external block <br> content id | coding of byte arrays with a stop <br> value |
| BETA | 6 | int offset, int number of bits | binary coding |
| SUBEXP | 7 | int offset, int K | subexponential coding |
| Deprecated (GOLOMB_RICE) | 8 | int offset, int $\log _{2} \mathrm{~m}$ | Golomb-Rice coding |
| GAMMA | 9 | int offset | Elias gamma coding |
请参阅第 13 节以获取上述所有编码算法及其参数的更详细描述。
4 个校验和
校验和用于确保数据完整性。CRAM 中使用了以下校验和算法。
4.1 CRC32
这是一个 32 位长的循环冗余校验,使用多项式 0x04C11DB7。有关更多详细信息,请参阅 ITU-T V. 42。CRC32 哈希函数的值以整数形式表示。
容器由一个或多个块组成。第一个容器称为 CRAM 头容器,用于存储如 SAM 规范中所述的文本头(参见第 7.1 节)。此容器可能会有额外的填充字节,以便允许对 SAM 头进行小规模的内联重写。这些填充字节是未定义的,但我们建议用零填充。填充字节可以是显式未压缩的块结构,或者是未分配的额外空间,其中容器的大小大于其内部块的总大小。
Data type Name Value
byte[4] format magic number CRAM (0x43 0x52 0x41 0x4d)
unsigned byte major format number 3(0x3)
unsigned byte minor format number 1 (0x1)
byte[20] file id CRAM file identifier (e.g. file name or SHA1 checksum)| Data type | Name | Value |
| :--- | :--- | :--- |
| byte[4] | format magic number | CRAM (0x43 0x52 0x41 0x4d) |
| unsigned byte | major format number | $3(0 x 3)$ |
| unsigned byte | minor format number | 1 (0x1) |
| byte[20] | file id | CRAM file identifier (e.g. file name or SHA1 checksum) |
the sum of the lengths of all blocks in this container
(headers and data) and any padding bytes (CRAM header
container only); equal to the total byte length of the
container minus the byte length of this header structure| the sum of the lengths of all blocks in this container |
| :--- |
| (headers and data) and any padding bytes (CRAM header |
| container only); equal to the total byte length of the |
| container minus the byte length of this header structure |
reference sequence identifier or
-1 for unmapped reads
-2 for multiple reference sequences.
All slices in this container must have a reference sequence
id matching this value.| reference sequence identifier or |
| :--- |
| -1 for unmapped reads |
| -2 for multiple reference sequences. |
| All slices in this container must have a reference sequence |
| id matching this value. |
itf8
参考上的起始位置
starting position on the
reference| starting position on the |
| :--- |
| reference |
the locations of slices in this container as byte offsets from
the end of this container header, used for random access
indexing. For sequence data containers, the landmark
count must equal the slice count.
Since the block before the first slice is the compression
header, landmarks[0] is equal to the byte length of the
compression header.| the locations of slices in this container as byte offsets from |
| :--- |
| the end of this container header, used for random access |
| indexing. For sequence data containers, the landmark |
| count must equal the slice count. |
| Since the block before the first slice is the compression |
| header, landmarks[0] is equal to the byte length of the |
| compression header. |
整数
crc32
容器中所有前面字节的 CRC32 哈希。
字节[
区块
容器内包含的块。
Data type Name Value
int32 length "the sum of the lengths of all blocks in this container
(headers and data) and any padding bytes (CRAM header
container only); equal to the total byte length of the
container minus the byte length of this header structure"
itf8 reference sequence id "reference sequence identifier or
-1 for unmapped reads
-2 for multiple reference sequences.
All slices in this container must have a reference sequence
id matching this value."
itf8 "starting position on the
reference" the alignment start position
itf8 alignment span the length of the alignment
itf8 number of records number of records in the container
ltf8 record counter 1-based sequential index of records in the file/stream.
ltf8 bases number of read bases
itf8 number of blocks the total number of blocks in this container
array<itf8> landmarks "the locations of slices in this container as byte offsets from
the end of this container header, used for random access
indexing. For sequence data containers, the landmark
count must equal the slice count.
Since the block before the first slice is the compression
header, landmarks[0] is equal to the byte length of the
compression header."
int crc32 CRC32 hash of the all the preceding bytes in the container.
byte[ blocks The blocks contained within the container.| Data type | Name | Value |
| :---: | :---: | :---: |
| int32 | length | the sum of the lengths of all blocks in this container <br> (headers and data) and any padding bytes (CRAM header <br> container only); equal to the total byte length of the <br> container minus the byte length of this header structure |
| itf8 | reference sequence id | reference sequence identifier or <br> -1 for unmapped reads <br> -2 for multiple reference sequences. <br> All slices in this container must have a reference sequence <br> id matching this value. |
| itf8 | starting position on the <br> reference | the alignment start position |
| itf8 | alignment span | the length of the alignment |
| itf8 | number of records | number of records in the container |
| ltf8 | record counter | 1-based sequential index of records in the file/stream. |
| ltf8 | bases | number of read bases |
| itf8 | number of blocks | the total number of blocks in this container |
| array<itf8> | landmarks | the locations of slices in this container as byte offsets from <br> the end of this container header, used for random access <br> indexing. For sequence data containers, the landmark <br> count must equal the slice count. <br> Since the block before the first slice is the compression <br> header, landmarks[0] is equal to the byte length of the <br> compression header. |
| int | crc32 | CRC32 hash of the all the preceding bytes in the container. |
| byte[ | blocks | The blocks contained within the container. |
Data type Name Value
byte method the block compression method (and first CRAM version):
0: raw (none)*
1: gzip
2: bzip2 (v2.0)
3: lzma (v3.0)
4: rans4x8 (v3.0)
5: rans4x16 (v3.1)
6: adaptive arithmetic coder (v3.1)
7: fqzcomp (v3.1)
8: name tokeniser (v3.1)
byte block content type id the block content type identifier
itf8 size in bytes* the block content identifier used to associate external data
raw size in bytes* blocks with data series
itf8 block data size of the block data after applying block compression
itf8 the data stored in the before applying block compression
byte[] ・ bit stream of CRAM records (core data block)
∙ byte stream (external data block)
CRC32 additional fields ( header blocks)
byte[4] CRC32 hash value for all preceding bytes in the block | Data type | Name | Value |
| :--- | :--- | :--- |
| byte | method | the block compression method (and first CRAM version): |
| | | 0: raw (none)* |
| | | 1: gzip |
| | | 2: bzip2 (v2.0) |
| | | 3: lzma (v3.0) |
| | | 4: rans4x8 (v3.0) |
| | | 5: rans4x16 (v3.1) |
| | | 6: adaptive arithmetic coder (v3.1) |
| | | 7: fqzcomp (v3.1) |
| | | 8: name tokeniser (v3.1) |
| byte | block content type id | the block content type identifier |
| itf8 | size in bytes* | the block content identifier used to associate external data |
| | raw size in bytes* | blocks with data series |
| itf8 | block data | size of the block data after applying block compression |
| itf8 | | the data stored in the before applying block compression |
| byte[] | ・ bit stream of CRAM records (core data block) | |
| | | $\bullet$ byte stream (external data block) |
| | CRC32 | additional fields ( header blocks) |
| byte[4] | CRC32 hash value for all preceding bytes in the block | |
true if reference sequence is required to restore
the data completely| true if reference sequence is required to restore |
| :--- |
| the data completely |
SM
字节[5]
替代矩阵
替代矩阵
TD
字节数组
标签 ID 字典
标签 ID 的列表列表,请参见标签编码部分
Key Value data type Name Value
RN bool read names included true if read names are preserved for all reads
AP bool AP data series delta true if AP data series is delta, false otherwise
RR bool reference required "true if reference sequence is required to restore
the data completely"
SM byte[5] substitution matrix substitution matrix
TD array<byte> tag ids dictionary a list of lists of tag ids, see tag encoding section| Key | Value data type | Name | Value |
| :--- | :--- | :--- | :--- |
| RN | bool | read names included | true if read names are preserved for all reads |
| AP | bool | AP data series delta | true if AP data series is delta, false otherwise |
| RR | bool | reference required | true if reference sequence is required to restore <br> the data completely |
| SM | byte[5] | substitution matrix | substitution matrix |
| TD | array<byte> | tag ids dictionary | a list of lists of tag ids, see tag encoding section |
如果 AP-Delta = true:基于 0 的对齐开始增量来自于前一记录中的 AP 值。请注意,这个增量可能是负数,例如在多参考切片中切换参考时。当记录是切片中的第一个时,使用的前一个位置是切片对齐开始字段(因此对于单参考切片,第一个增量应该为零,对于多参考切片,应该是 AP 值本身)。如果 AP-Delta = false:直接编码对齐开始位置。
if AP-Delta = true: 0-based alignment start
delta from the AP value in the previous record.
Note this delta may be negative, for example
when switching references in a multi-reference
slice. When the record is the first in the slice, the
previous position used is the slice alignment-start
field (hence the first delta should be zero for
single-reference slices, or the AP value itself for
multi-reference slices).
if AP-Delta = false: encodes the alignment start
position directly| if AP-Delta = true: 0-based alignment start |
| :--- |
| delta from the AP value in the previous record. |
| Note this delta may be negative, for example |
| when switching references in a multi-reference |
| slice. When the record is the first in the slice, the |
| previous position used is the slice alignment-start |
| field (hence the first delta should be zero for |
| single-reference slices, or the AP value itself for |
| multi-reference slices). |
| if AP-Delta = false: encodes the alignment start |
| position directly |
RG
编码
阅读组
读取组。特殊值 '-1' 表示没有组。
read groups. Special value ' -1 ' stands for no
group.| read groups. Special value ' -1 ' stands for no |
| :--- |
| group. |
RN^(a)\mathrm{RN}^{\mathrm{a}}
编码
读取名称
读取名称
MF
编码
下一个伙伴位标志
查看特定部分
NS
编码
下一个片段参考序列 ID
next fragment
reference sequence id| next fragment |
| :--- |
| reference sequence id |
下一个片段的参考序列 ID
NP
编码
下一个配对对齐开始
next mate alignment
start| next mate alignment |
| :--- |
| start |
下一个片段的对齐位置
TS
编码
模板大小
模板大小
NF
编码
到下一个片段的距离
distance to next
fragment| distance to next |
| :--- |
| fragment |
跳过到下一个片段的记录数 ^(b){ }^{b}
TL^(C)\mathrm{TL}^{\mathrm{C}}
编码
标签 ID
标签 ID 列表,请参见标签编码部分
FN
编码
读取特征的数量
number of read
features| number of read |
| :--- |
| features |
每条记录中读取特征的数量
FC
编码
阅读功能代码
请参见单独部分
FP
编码
阅读中的位置
读取特征的位置;相对于最后一个位置的正增量(从零开始)
positions of the read features; a positive delta to
the last position (starting with zero)| positions of the read features; a positive delta to |
| :--- |
| the last position (starting with zero) |
DL
编码
删除长度
碱基对缺失长度
BB
编码
基础的延伸
基础
QQ
编码
质量分数的区间
stretches of quality
scores| stretches of quality |
| :--- |
| scores |
质量分数
BS
编码
碱基替代编码
base substitution
codes| base substitution |
| :--- |
| codes |
碱基替代编码
IN
编码
插入
插入的碱基
RS
编码
参考跳过长度
'N'读取特征的跳过碱基数量
PD
编码
填充
填充碱基的数量
HC
编码
硬剪辑
硬剪切碱基的数量
SC
编码
软剪辑
软剪切碱基
MQ
编码
映射质量
映射质量分数
BA
编码
基础
基础
QS
编码
质量分数
质量分数
TC^(d)\mathrm{TC}^{\mathrm{d}}
不适用
遗留字段
被忽略
TN^(d)\mathrm{TN}^{\mathrm{d}}
不适用
遗留字段
被忽略
Key Value data type Name Value
BF encoding<int> BAM bit flags see separate section
CF encoding<int> CRAM bit flags see specific section
RI encoding<int> reference id record reference id from the SAM file header
RL encoding<int> read lengths read lengths
AP encoding<int> in-seq positions "if AP-Delta = true: 0-based alignment start
delta from the AP value in the previous record.
Note this delta may be negative, for example
when switching references in a multi-reference
slice. When the record is the first in the slice, the
previous position used is the slice alignment-start
field (hence the first delta should be zero for
single-reference slices, or the AP value itself for
multi-reference slices).
if AP-Delta = false: encodes the alignment start
position directly"
RG encoding<int> read groups "read groups. Special value ' -1 ' stands for no
group."
RN^(a) encoding<byte[ ]> read names read names
MF encoding<int> next mate bit flags see specific section
NS encoding<int> "next fragment
reference sequence id" reference sequence ids for the next fragment
NP encoding<int> "next mate alignment
start" alignment positions for the next fragment
TS encoding<int> template size template sizes
NF encoding<int> "distance to next
fragment" number of records to skip to the next fragment ^(b)
TL^(C) encoding<int> tag ids list of tag ids, see tag encoding section
FN encoding<int> "number of read
features" number of read features in each record
FC encoding<byte> read features codes see separate section
FP encoding<int> in-read positions "positions of the read features; a positive delta to
the last position (starting with zero)"
DL encoding<int> deletion lengths base-pair deletion lengths
BB encoding<byte[]> stretches of bases bases
QQ encoding<byte[ ]> "stretches of quality
scores" quality scores
BS encoding<byte> "base substitution
codes" base substitution codes
IN encoding<byte[]> insertion inserted bases
RS encoding<int> reference skip length number of skipped bases for the ' N ' read feature
PD encoding<int> padding number of padded bases
HC encoding<int> hard clip number of hard clipped bases
SC encoding<byte[ ]> soft clip soft clipped bases
MQ encoding<int> mapping qualities mapping quality scores
BA encoding<byte> bases bases
QS encoding<byte> quality scores quality scores
TC^(d) N/A legacy field to be ignored
TN^(d) N/A legacy field to be ignored| Key | Value data type | Name | Value |
| :---: | :---: | :---: | :---: |
| BF | encoding<int> | BAM bit flags | see separate section |
| CF | encoding<int> | CRAM bit flags | see specific section |
| RI | encoding<int> | reference id | record reference id from the SAM file header |
| RL | encoding<int> | read lengths | read lengths |
| AP | encoding<int> | in-seq positions | if AP-Delta = true: 0-based alignment start <br> delta from the AP value in the previous record. <br> Note this delta may be negative, for example <br> when switching references in a multi-reference <br> slice. When the record is the first in the slice, the <br> previous position used is the slice alignment-start <br> field (hence the first delta should be zero for <br> single-reference slices, or the AP value itself for <br> multi-reference slices). <br> if AP-Delta = false: encodes the alignment start <br> position directly |
| RG | encoding<int> | read groups | read groups. Special value ' -1 ' stands for no <br> group. |
| $\mathrm{RN}^{\mathrm{a}}$ | encoding<byte[ ]> | read names | read names |
| MF | encoding<int> | next mate bit flags | see specific section |
| NS | encoding<int> | next fragment <br> reference sequence id | reference sequence ids for the next fragment |
| NP | encoding<int> | next mate alignment <br> start | alignment positions for the next fragment |
| TS | encoding<int> | template size | template sizes |
| NF | encoding<int> | distance to next <br> fragment | number of records to skip to the next fragment ${ }^{b}$ |
| $\mathrm{TL}^{\mathrm{C}}$ | encoding<int> | tag ids | list of tag ids, see tag encoding section |
| FN | encoding<int> | number of read <br> features | number of read features in each record |
| FC | encoding<byte> | read features codes | see separate section |
| FP | encoding<int> | in-read positions | positions of the read features; a positive delta to <br> the last position (starting with zero) |
| DL | encoding<int> | deletion lengths | base-pair deletion lengths |
| BB | encoding<byte[]> | stretches of bases | bases |
| QQ | encoding<byte[ ]> | stretches of quality <br> scores | quality scores |
| BS | encoding<byte> | base substitution <br> codes | base substitution codes |
| IN | encoding<byte[]> | insertion | inserted bases |
| RS | encoding<int> | reference skip length | number of skipped bases for the ' N ' read feature |
| PD | encoding<int> | padding | number of padded bases |
| HC | encoding<int> | hard clip | number of hard clipped bases |
| SC | encoding<byte[ ]> | soft clip | soft clipped bases |
| MQ | encoding<int> | mapping qualities | mapping quality scores |
| BA | encoding<byte> | bases | bases |
| QS | encoding<byte> | quality scores | quality scores |
| $\mathrm{TC}^{\mathrm{d}}$ | N/A | legacy field | to be ignored |
| $\mathrm{TN}^{\mathrm{d}}$ | N/A | legacy field | to be ignored |
tag values (names and types are
available in the data series code)| tag values (names and types are |
| :--- |
| available in the data series code) |
dots\ldots
dots\ldots
dots\ldots
标签 ID N:标签类型 N
编码
读取标签 N
dots\ldots
Key Value data type Name Value
TAG ID 1:TAG TYPE 1 encoding<byte[ ]> read tag 1 "tag values (names and types are
available in the data series code)"
dots dots dots
TAG ID N:TAG TYPE N encoding<byte[]> read tag N dots| Key | Value data type | Name | Value |
| :--- | :--- | :--- | :--- |
| TAG ID 1:TAG TYPE 1 | encoding<byte[ ]> | read tag 1 | tag values (names and types are <br> available in the data series code) |
| $\ldots$ | | $\ldots$ | $\ldots$ |
| TAG ID N:TAG TYPE N | encoding<byte[]> | read tag N | $\ldots$ |
带有多个参考标志(-2)设置为头部中的序列 ID 的切片可能包含映射到多个外部参考的读取,包括未映射的 ^(3){ }^{3} 读取(放置在这些参考上或未放置),但无法以这种方式组合多个嵌入的参考。当使用多个参考时,将使用 RI 数据系列来确定每个记录的参考序列 ID。当切片中仅使用单个参考时,此数据系列不存在。
头部中的未映射 (-1) 序列 ID 用于仅包含未放置的未映射 ^(3){ }^{3} 读取的切片。
reference sequence identifier or
-1 for unmapped reads
-2 for multiple reference sequences.
This value must match that of its enclosing
container.| reference sequence identifier or |
| :--- |
| -1 for unmapped reads |
| -2 for multiple reference sequences. |
| This value must match that of its enclosing |
| container. |
itf8
对齐开始
对齐起始位置
itf8
对齐跨度
对齐的长度
itf8
记录数
切片中的记录数
ltf8
记录计数器
文件/流中记录的基于 1 的顺序索引
1-based sequential index of records in the
file/stream| 1-based sequential index of records in the |
| :--- |
| file/stream |
itf8
块的数量
切片中的块数
itf8[]
嵌入式参考基块内容 ID
块内容的 ID,切片中块的内容 ID,用于嵌入引用序列的碱基,或 -1 表示无
block content ids of the blocks in the slice
block content id for the embedded reference
sequence bases or -1 for none| block content ids of the blocks in the slice |
| :--- |
| block content id for the embedded reference |
| sequence bases or -1 for none |
MD5 checksum of the reference bases within
the slice boundaries. If this slice has
reference sequence id of -1 (unmapped) or -2
(multi-ref) the MD5 should be 16 bytes of \\0.
For embedded references, the MD5 can either
be all-zeros or the MD5 of the embedded
sequence.| MD5 checksum of the reference bases within |
| :--- |
| the slice boundaries. If this slice has |
| reference sequence id of -1 (unmapped) or -2 |
| (multi-ref) the MD5 should be 16 bytes of $\backslash 0$. |
| For embedded references, the MD5 can either |
| be all-zeros or the MD5 of the embedded |
| sequence. |
字节[16]
一系列根据 BAM 辅助字段编码的标签、类型、值元组。
a series of tag,type,value tuples encoded as
per BAM auxiliary fields.| a series of tag,type,value tuples encoded as |
| :--- |
| per BAM auxiliary fields. |
字节[]
可选标签
Data type Name Value
itf8 reference sequence id "reference sequence identifier or
-1 for unmapped reads
-2 for multiple reference sequences.
This value must match that of its enclosing
container."
itf8 alignment start the alignment start position
itf8 alignment span the length of the alignment
itf8 number of records the number of records in the slice
ltf8 record counter "1-based sequential index of records in the
file/stream"
itf8 number of blocks the number of blocks in the slice
itf8[] embedded reference bases block content id "block content ids of the blocks in the slice
block content id for the embedded reference
sequence bases or -1 for none"
itf8 reference md5 "MD5 checksum of the reference bases within
the slice boundaries. If this slice has
reference sequence id of -1 (unmapped) or -2
(multi-ref) the MD5 should be 16 bytes of \\0.
For embedded references, the MD5 can either
be all-zeros or the MD5 of the embedded
sequence."
byte[16] "a series of tag,type,value tuples encoded as
per BAM auxiliary fields."
byte[] optional tags | Data type | Name | Value |
| :--- | :--- | :--- |
| itf8 | reference sequence id | reference sequence identifier or <br> -1 for unmapped reads <br> -2 for multiple reference sequences. <br> This value must match that of its enclosing <br> container. |
| itf8 | alignment start | the alignment start position |
| itf8 | alignment span | the length of the alignment |
| itf8 | number of records | the number of records in the slice |
| ltf8 | record counter | 1-based sequential index of records in the <br> file/stream |
| itf8 | number of blocks | the number of blocks in the slice |
| itf8[] | embedded reference bases block content id | block content ids of the blocks in the slice <br> block content id for the embedded reference <br> sequence bases or -1 for none |
| itf8 | reference md5 | MD5 checksum of the reference bases within <br> the slice boundaries. If this slice has <br> reference sequence id of -1 (unmapped) or -2 <br> (multi-ref) the MD5 should be 16 bytes of $\backslash 0$. <br> For embedded references, the MD5 can either <br> be all-zeros or the MD5 of the embedded <br> sequence. |
| byte[16] | | a series of tag,type,value tuples encoded as <br> per BAM auxiliary fields. |
| byte[] | optional tags | |
对齐开始和对齐跨度值仅在解码时使用,如果切片已映射数据对齐到单个参考(参考序列 ID >=0>=0 )。对于多参考切片或那些具有未映射数据的切片,建议将这些字段填充为值 0。
MD5 校验和不应在存储的校验和全为零时进行验证。嵌入的引用应遵循与 MD5 校验和计算之前对外部引用应用的相同大小写和字母顺序规则。如果使用了嵌入引用,并不要求它与用于序列比对的引用完全匹配。例如,它可能包含在覆盖缺失时的“N”碱基,或者对于 SNP 变异可能有不同的碱基调用。因此,当使用嵌入序列时,MD5 校验和指的是嵌入序列的校验和,不应与任何外部参考文件进行验证。
Data type Name Value
bit[ ] CRAM record 1 The first CRAM record
dots dots dots
bit[ ] CRAM record N The Nth CRAM record| Data type | Name | Value |
| :--- | :--- | :--- |
| bit[ ] | CRAM record 1 | The first CRAM record |
| $\ldots$ | $\ldots$ | $\ldots$ |
| bit[ ] | CRAM record N | The Nth CRAM record |
"Data series
type" "Data series
name" Field Description
int BF BAM bit flags see BAM bit flags below
int CF CRAM bit flags see CRAM bit flags below
- - Positional data See section 10.2
- - Read names See section 10.3
- - Mate records See section 10.4
- - Auxiliary tags See section 10.5
- - Sequences See sections 10.6 and 10.7| Data series <br> type | Data series <br> name | Field | Description |
| :--- | :--- | :--- | :--- |
| int | BF | BAM bit flags | see BAM bit flags below |
| int | CF | CRAM bit flags | see CRAM bit flags below |
| - | - | Positional data | See section 10.2 |
| - | - | Read names | See section 10.3 |
| - | - | Mate records | See section 10.4 |
| - | - | Auxiliary tags | See section 10.5 |
| - | - | Sequences | See sections 10.6 and 10.7 |
BAM 位标志 (BF 数据系列)
以下标志是从 SAM 和 BAM 规范中复制的,具有相同的含义。然而,请注意,其中一些标志可以在解码过程中推导出来,因此可能在 CRAM 文件中被省略,并且基于同一切片内的成对文库的两个读取计算位。
位标志
评论
描述
0x1
模板具有多个序列段
template having multiple
segments in sequencing| template having multiple |
| :--- |
| segments in sequencing |