MAVIS standard input file format¶
These requirements pertain to the columns of input files from the various tools you want to merge. The input files should be tab-delimited text files. Comments at the top of may be included. Comments should begin with two hash marks. They will be ignored when the file is read
## This is a comment
The header row contains the column names and is the first row following the comments (or the first row if no comments are included). Optionally the header row may (or may not) begin with a hash which will be stripped out on read
## This is a comment
## this is another comment
# this is the header row
A simple input file might look as follows
## File created at: 2018-01-02
## Generated by: MAVIS v1.0.0
#break1_chromosome break1_position_start break1_position_end break2_chromosome break2_position_start break2_position_end
X 1234 1234 X 77965 77965
Required Columns¶
- break1_chromosome
- break1_position_start
- break1_position_end (can be the same as break1_position_start)
- break2_chromosome
- break2_position_start
- break2_position_end (can be the same as break2_position_start)
Optional Columns¶
Optional Columns that are not given as input will be added with default (or command line parameter options) during the clustering stage of MAVIS as some are required for subsequent pipeline steps
- break1_strand (defaults to not-specified during clustering)
- break1_orientation (expanded to all possible values during clustering)
- break2_strand (defaults to not-specified during clustering)
- break2_orientation (expanded to all possible values during clustering)
- opposing_strands (expanded to all possible values during clustering)
- stranded (defaults to False during clustering)
- library (defaults to command line library parameter during clustering)
- protocol (defaults to command line protocol parameter during clustering)
- tools (defaults to an empty string during clustering)
Summary by Pipeline Step¶
The different pipeline steps of MAVIS have different input column requirements. These are summarized below (for the pipeline steps which can act as the pipeline start)
column name | cluster | annotate | validate |
---|---|---|---|
break1_chromosome | ✔ | ✔ | ✔ |
break1_position_start | ✔ | ✔ | ✔ |
break1_position_end | ✔ | ✔ | ✔ |
break2_chromosome | ✔ | ✔ | ✔ |
break2_position_start | ✔ | ✔ | ✔ |
break2_position_end | ✔ | ✔ | ✔ |
break1_strand | |||
break1_orientation | ✔ | ✔ | |
break2_strand | |||
break2_orientation | ✔ | ✔ | |
opposing_strands | |||
stranded | |||
library | |||
protocol | |||
tools | |||
event_type | ✔ |
Some native tool outputs are supported and have built in methods to convert to the above format. Any unsupported tools can be used as long as the user converts the tools native output to match the above format.