MAVIS standard input file format

These requirements pertain to the columns of input files from the various tools you want to merge. The input files should be tab-delimited text files. Comments at the top of may be included. Comments should begin with two hash marks. They will be ignored when the file is read

## This is a comment

The header row contains the column names and is the first row following the comments (or the first row if no comments are included). Optionally the header row may (or may not) begin with a hash which will be stripped out on read

## This is a comment
## this is another comment
# this is the header row

A simple input file might look as follows

## File created at: 2018-01-02
## Generated by: MAVIS v1.0.0
#break1_chromosome  break1_position_start   break1_position_end break2_chromosome break2_position_start break2_position_end
X   1234    1234    X   77965   77965

Required Columns

Optional Columns

Optional Columns that are not given as input will be added with default (or command line parameter options) during the clustering stage of MAVIS as some are required for subsequent pipeline steps

  • break1_strand (defaults to not-specified during clustering)
  • break1_orientation (expanded to all possible values during clustering)
  • break2_strand (defaults to not-specified during clustering)
  • break2_orientation (expanded to all possible values during clustering)
  • opposing_strands (expanded to all possible values during clustering)
  • stranded (defaults to False during clustering)
  • library (defaults to command line library parameter during clustering)
  • protocol (defaults to command line protocol parameter during clustering)
  • tools (defaults to an empty string during clustering)

Summary by Pipeline Step

The different pipeline steps of MAVIS have different input column requirements. These are summarized below (for the pipeline steps which can act as the pipeline start)

column name cluster annotate validate
break1_chromosome
break1_position_start
break1_position_end
break2_chromosome
break2_position_start
break2_position_end
break1_strand      
break1_orientation  
break2_strand      
break2_orientation  
opposing_strands      
stranded      
library      
protocol      
tools      
event_type    

Some native tool outputs are supported and have built in methods to convert to the above format. Any unsupported tools can be used as long as the user converts the tools native output to match the above format.