File format#
CpG start indexed format#
The CpG Start Indexed (CGS) format comprises two mandatory columns along with several informational columns:
Sequence name [chrom]: chromosome number or sequence names
CpG index [cpg_index]: This column lists the sequential positions of CpG sites., 1-based, end-inclusive position of the CpG index
Informational columns [optional]: column 3 to column n,
chr1 1 c3 c4
chr1 2 c3 c4
chr1 3 c3 c4
chr1 4 c3 c4
The file header is prefixed with a # symbol. The entire file is compressed using bgzip and indexed with tabix. Below is an example:
bgzip demo.cgs
tabix -C -s 1 -b 2 -e 2 demo.cgs.gz
PAT format#
PAT is a CGS-based format, containing 4 columns:
Sequence name
CpG index
Methylation motif: This column denotes the methylation status of the CpG site
‘C’: methylated CpGs
‘T’: unmethylated CpGs
‘.’: unknown methylation status
Motif count: This column records the frequency of each motif described in column 3
chr1 755 CCCTCCCCTCTTCCT 1
chr1 755 TTTT 2
chr1 756 CCCCCTC 1
chr1 756 CCCCCT....CCCC 10
chr1 758 CCCCCCCCCCCC 4
chr1 758 CCTTCCCTCCC 1
MV format#
MV (methylation vector) format is a CGS-based format
MVC format#
MVC (methylation vector cluster) format is a CGS-based format
MVM format#
MVM (methylation vector matrix) format is a CGS-based format