Supported SV Callers¶
MAVIS supports output from a wide-variety of SV callers. Assumptions are made for each tool based on interpretation of the output and the publications for each tool. The tools and versions currently supported are given below. Versions listed indicate the version of the tool for which output files have been tested as input into MAVIS
Name | Source | Version(s) | Output File(s) |
---|---|---|---|
Chimerascan [Iyer-2011] | code.google.com/archive/p/chimerascan | 0.4.5 | *.bedpe |
DeFuse [McPherson-2011] | bitbucket.org/dranew/defuse | 0.6.2 | results/results.classify.tsv |
DELLY [Rausch-2012] | github.com/dellytools/delly | 0.7.3 | |
Manta [Chen-2016] | github.com/Illumina/manta | 1.0.0 | {diploidSV,somaticSV}.vcf |
Pindel [Ye-2009] | github.com/genome/pindel | 0.2.5b9 | |
Trans-ABySS [Robertson-2010] | github.com/bcgsc/transabyss | 1.4.8 (custom) | fusions/*.tsv |
Note
The trans-abyss version used was an in-house dev version. However the output columns are compatible with 1.4.8 as that was the version branched from
DELLY Post-processing¶
Some post-processing on the delly output files is generally done prior to input to filter the calls.
Writing A Custom Conversion Script¶
Logic Example - Chimerascan¶
The following is a description of how the conversion script for Chimerascan was generated. While this is a built-in conversion command now, the logic could also have been put in an external script. As mentioned above, there are a number of assumptions that had to be made about the tools output to convert it to the standard mavis format. Assumptions were then verified by reviewing at a series of called events in IGV. In the current example, Chimerascan output has six columns of interest that were used in the conversion
- start3p
- end3p
- strand3p
- start5p
- end5p
- strand5p
The above columns describe two segments which are joined. MAVIS requires the position of the join. It was assumed that the segments are always joined as a sense fusion. Using this assumption there are four logical cases to determine the position of the breakpoints.
i.e. the first case would be: If both strands are positive, then the end of the five-prime segment (end5p) is the first breakpoint and the start of the three-prime segment is the second breakpoint
The logic for all cases is shown in the code below
def _parse_chimerascan(row):
"""
transforms the chimerscan output into the common format for expansion. Maps the input column
names to column names which MAVIS can read
"""
std_row = {}
for retained_column in ['genes5p', 'genes3p']:
if retained_column in row:
std_row['{}_{}'.format(SUPPORTED_TOOL.CHIMERASCAN, retained_column)] = row[retained_column]
if TRACKING_COLUMN not in row:
std_row[TRACKING_COLUMN] = '{}-{}'.format(SUPPORTED_TOOL.CHIMERASCAN, row['chimera_cluster_id'])
std_row.update({'chr1': row['chrom5p'], 'chr2': row['chrom3p']})
if row['strand5p'] == '+':
std_row['pos1_start'] = row['end5p']
std_row['orient1'] = ORIENT.LEFT
else:
std_row['pos1_start'] = row['start5p']
std_row['orient1'] = ORIENT.RIGHT
if row['strand3p'] == '+':
std_row['pos2_start'] = row['start3p']
std_row['orient2'] = ORIENT.RIGHT
else:
std_row['pos2_start'] = row['end3p']
std_row['orient2'] = ORIENT.LEFT
std_row['opposing_strands'] = row['strand5p'] != row['strand3p']
return std_row
Calling A Custom Conversion Script¶
Custom conversion scripts can be specified during automatic config generation using the
--external_conversion
option.
Note
Any external conversion scripts must take a -o
option which requires a single
outputfile argument. This outputfile must be the converted file output by the script.
Additionally, the conversion script must be specified by its full path name and have executeable permissions.
In the following example the user has created a custom conversion script my_convert_script.py
which they
are passing an input file named my_input1.txt
.
mavis config --external_conversion my_converted_input1 "my_convert_script.py my_input1.txt ... "
This will then be called during the pipeline step as
my_convert_script.py my_input1.txt ... -o /path/to/output/dir/converted_inputs/my_converted_input1.tab
You can also re-use the same conversion script if you have multiple inputs to convert simply by specifying a different alias
mavis config \
--external_conversion my_converted_input1 "my_convert_script.py my_input1.txt" \
--external_conversion my_converted_input2 "my_convert_script.py my_input2.txt"