File format#
CpG start indexed format#
The CpG Start Indexed (CGS) format comprises two mandatory columns along with several informational columns:
Sequence name [chrom]: chromosome number or sequence names
CpG index [cpg_index]: This column lists the sequential positions of CpG sites., 1-based, end-inclusive position of the CpG index
Informational columns [optional]: column 3 to column n,
chr1	1	c3	c4
chr1	2	c3	c4
chr1	3	c3	c4
chr1	4	c3	c4
The file header is prefixed with a # symbol. The entire file is compressed using bgzip and indexed with tabix. Below is an example:
bgzip demo.cgs
tabix -C -s 1 -b 2 -e 2 demo.cgs.gz
PAT format#
PAT is a CGS-based format, containing 4 columns:
Sequence name
CpG index
Methylation motif: This column denotes the methylation status of the CpG site
‘C’: methylated CpGs
‘T’: unmethylated CpGs
‘.’: unknown methylation status
Motif count: This column records the frequency of each motif described in column 3
chr1    755     CCCTCCCCTCTTCCT 1
chr1    755     TTTT    2
chr1    756     CCCCCTC 1
chr1    756     CCCCCT....CCCC  10
chr1    758     CCCCCCCCCCCC    4
chr1    758     CCTTCCCTCCC     1
MV format#
MV (methylation vector) format is a CGS-based format
MVC format#
MVC (methylation vector cluster) format is a CGS-based format
MVM format#
MVM (methylation vector matrix) format is a CGS-based format