Glossary¶
General Terms¶
- 2bit
- File format specification. See https://genome.ucsc.edu/FAQ/FAQformat#format7.
- bam
- File format specification. See https://genome.ucsc.edu/FAQ/FAQformat#format5.1.
- bed
- File format specification. See https://genome.ucsc.edu/FAQ/FAQformat#format1.
- blat
- Alignment tool. see https://genome.ucsc.edu/FAQ/FAQblat.html.
- breakpoint
- A breakpoint is a genomic position (interval) on some reference/template/chromosome which has a strand and orientation. The orientation describes the portion of the reference that is retained.
- breakpoint pair
- Basic definition of a structural variant. Does not automatically imply a classification/type.
- BWA
- BWA is an alignement tool. See https://github.com/lh3/bwa
- event
- Used interchangeably with structural variant.
- event type
- Classification for a structural variant. see event_type.
- fasta
- File format specification. See https://genome.ucsc.edu/FAQ/FAQformat#format18.
- flanking read pair
- A pair of reads where one read maps to one side of a set of breakpoints and its mate maps to the other.
- half-mapped read
- A read whose mate is unaligned. Generally this refers to reads in the evidence stage that are mapped next to a breakpoint.
- HGVS
- Community based standard of reccommendations for variant notation. See http://varnomen.hgvs.org/
- IGV
- Integrative Genomics Viewer is a visualization tool. see http://software.broadinstitute.org/software/igv.
- IGV batch file
- This is a file format type defined by IGV see running IGV with a batch file.
- JSON
- JSON (JavaScript Object Notation) is a data file format. see https://www.w3schools.com/js/js_json_intro.asp.
- psl
- File format specification. See https://genome.ucsc.edu/FAQ/FAQformat#format2.
- pslx
- Extended format of a psl.
- SGE
- Sun Grid Engine (SGE) is a job scheduling system for cluster management see http://star.mit.edu/cluster/docs/0.93.3/guides/sge.html.
- SLURM
- SLURM is a job scheduling system for cluster management see https://slurm.schedmd.com/quickstart.html.
- spanning read
- Applies primarily to small structural variants. Reads which span both breakpoints.
- split read
- A read which aligns next to a breakpoint and is softclipped at one or more sides.
- structural variant
- A genomic alteration that can be described by a pair of breakpoints and an event type. The two breakpoints represent regions in the genome that are broken apart and reattached together.
- SVG
- SVG (Scalable vector graph) is an image format. see https://www.w3schools.com/graphics/svg_intro.asp.
Configurable Settings¶
- aligner
SUPPORTED_ALIGNER
- The aligner to use to map the contigs/reads back to the reference e.g blat or bwa. The corresponding environment variable isMAVIS_ALIGNER
and the default value is'blat'
. Accepted values include:'bwa mem'
,'blat'
- annotation_filters
str
- A comma separated list of filters to apply to putative annotations. The corresponding environment variable isMAVIS_ANNOTATION_FILTERS
and the default value is'choose_more_annotated,choose_transcripts_by_priority'
- annotation_memory
int
- Default memory limit (mb) for the annotation stage. The corresponding environment variable isMAVIS_ANNOTATION_MEMORY
and the default value is12000
- assembly_include_flanking_pairs
bool
- If true then when the split reads are assembled, any flanking read pairs will also be added. The corresponding environment variable isMAVIS_ASSEMBLY_INCLUDE_FLANKING_PAIRS
and the default value isTrue
- assembly_include_half_mapped_reads
bool
- If true then when the split reads are assembled, any half-mapped read mates will also be added. The corresponding environment variable isMAVIS_ASSEMBLY_INCLUDE_HALF_MAPPED_READS
and the default value isTrue
- assembly_max_kmer_size
int
- The minimum between this and the smallest length input sequence is used as the kmer size for assembling the debruijn graph. if this is not set (any value less than 0 is considered not set) the default is the 75%% of the minimum length input sequence. The corresponding environment variable isMAVIS_ASSEMBLY_MAX_KMER_SIZE
and the default value is-1
- assembly_max_kmer_strict
bool
- If true then any sequences input to the assembly algorithm that cannot create a kmer of this size will be discarded. if false, then the kmer size will be reduced to the minimum input and all input sequences will be used in the assembly algorithm. The corresponding environment variable isMAVIS_ASSEMBLY_MAX_KMER_STRICT
and the default value isTrue
- assembly_max_paths
int
- The maximum number of paths to resolve. this is used to limit when there is a messy assembly graph to resolve. the assembly will pre-calculate the number of paths (or putative assemblies) and stop if it is greater than the given setting. The corresponding environment variable isMAVIS_ASSEMBLY_MAX_PATHS
and the default value is4
- assembly_min_edge_weight
int
- Discards all edges with a weight/frequency less than this from the debruijn graph. The corresponding environment variable isMAVIS_ASSEMBLY_MIN_EDGE_WEIGHT
and the default value is2
- assembly_min_exact_match_to_remap
int
- The minimum length of exact matches to initiate remapping a read to a contig. The corresponding environment variable isMAVIS_ASSEMBLY_MIN_EXACT_MATCH_TO_REMAP
and the default value is15
- assembly_min_nc_edge_weight
int
- Discards all non-cutting edges with a weight/frequency less than this from the debruijn graph. The corresponding environment variable isMAVIS_ASSEMBLY_MIN_NC_EDGE_WEIGHT
and the default value is4
- assembly_min_remap_coverage
float_fraction
- Minimum fraction of the contig sequence which the remapped sequences must align over. The corresponding environment variable isMAVIS_ASSEMBLY_MIN_REMAP_COVERAGE
and the default value is0.9
- assembly_min_remapped_seq
int
- The minimum input sequences that must remap for an assembled contig to be used. The corresponding environment variable isMAVIS_ASSEMBLY_MIN_REMAPPED_SEQ
and the default value is3
- assembly_min_tgt_to_exclude_half_map
int
- The minimum number of split reads aligning to both breakpoints in order to exclude half-mapped reads from the assembly input. The corresponding environment variable isMAVIS_ASSEMBLY_MIN_TGT_TO_EXCLUDE_HALF_MAP
and the default value is7
- assembly_min_uniq
float_fraction
- Minimum percent uniq required to keep separate assembled contigs. if contigs are more similar then the lower scoring contig is dropped. The corresponding environment variable isMAVIS_ASSEMBLY_MIN_UNIQ
and the default value is0.01
- assembly_strand_concordance
float_fraction
- When the number of remapped reads from each strand are compared, the ratio must be above this number to decide on the strand. The corresponding environment variable isMAVIS_ASSEMBLY_STRAND_CONCORDANCE
and the default value is0.51
- blat_limit_top_aln
int
- Number of results to return from blat (ranking based on score). The corresponding environment variable isMAVIS_BLAT_LIMIT_TOP_ALN
and the default value is10
- blat_min_identity
float_fraction
- The minimum percent identity match required for blat results when aligning contigs. The corresponding environment variable isMAVIS_BLAT_MIN_IDENTITY
and the default value is0.9
- breakpoint_color
str
- Breakpoint outline color. The corresponding environment variable isMAVIS_BREAKPOINT_COLOR
and the default value is'#000000'
- call_error
int
- Buffer zone for the evidence window. The corresponding environment variable isMAVIS_CALL_ERROR
and the default value is10
- cluster_initial_size_limit
int
- The maximum cumulative size of both breakpoints for breakpoint pairs to be used in the initial clustering phase (combining based on overlap). The corresponding environment variable isMAVIS_CLUSTER_INITIAL_SIZE_LIMIT
and the default value is25
- cluster_radius
int
- Maximum distance allowed between paired breakpoint pairs. The corresponding environment variable isMAVIS_CLUSTER_RADIUS
and the default value is100
- contig_aln_max_event_size
int
- Relates to determining breakpoints when pairing contig alignments. for any given read in a putative pair the soft clipping is extended to include any events of greater than this size. the softclipping is added to the side of the alignment as indicated by the breakpoint we are assigning pairs to. The corresponding environment variable isMAVIS_CONTIG_ALN_MAX_EVENT_SIZE
and the default value is50
- contig_aln_merge_inner_anchor
int
- The minimum number of consecutive exact match base pairs to not merge events within a contig alignment. The corresponding environment variable isMAVIS_CONTIG_ALN_MERGE_INNER_ANCHOR
and the default value is20
- contig_aln_merge_outer_anchor
int
- Minimum consecutively aligned exact matches to anchor an end for merging internal events. The corresponding environment variable isMAVIS_CONTIG_ALN_MERGE_OUTER_ANCHOR
and the default value is15
- contig_aln_min_anchor_size
int
- The minimum number of aligned bases for a contig (m or =) in order to simplify. do not have to be consecutive. The corresponding environment variable isMAVIS_CONTIG_ALN_MIN_ANCHOR_SIZE
and the default value is50
- contig_aln_min_extend_overlap
int
- Minimum number of bases the query coverage interval must be extended by in order to pair alignments as a single split alignment. The corresponding environment variable isMAVIS_CONTIG_ALN_MIN_EXTEND_OVERLAP
and the default value is10
- contig_aln_min_query_consumption
float_fraction
- Minimum fraction of the original query sequence that must be used by the read(s) of the alignment. The corresponding environment variable isMAVIS_CONTIG_ALN_MIN_QUERY_CONSUMPTION
and the default value is0.9
- contig_aln_min_score
float_fraction
- Minimum score for a contig to be used as evidence in a call by contig. The corresponding environment variable isMAVIS_CONTIG_ALN_MIN_SCORE
and the default value is0.9
- contig_call_distance
int
- The maximum distance allowed between breakpoint pairs (called by contig) in order for them to pair. The corresponding environment variable isMAVIS_CONTIG_CALL_DISTANCE
and the default value is0
- domain_color
str
- Domain fill color. The corresponding environment variable isMAVIS_DOMAIN_COLOR
and the default value is'#ccccb3'
- domain_mismatch_color
str
- Domain fill color on 0%% match. The corresponding environment variable isMAVIS_DOMAIN_MISMATCH_COLOR
and the default value is'#b2182b'
- domain_name_regex_filter
str
- The regular expression used to select domains to be displayed (filtered by name). The corresponding environment variable isMAVIS_DOMAIN_NAME_REGEX_FILTER
and the default value is'^PF\\d+$'
- domain_scaffold_color
str
- The color of the domain scaffold. The corresponding environment variable isMAVIS_DOMAIN_SCAFFOLD_COLOR
and the default value is'#000000'
- draw_fusions_only
bool
- Flag to indicate if events which do not produce a fusion transcript should produce illustrations. The corresponding environment variable isMAVIS_DRAW_FUSIONS_ONLY
and the default value isTrue
- draw_non_synonymous_cdna_only
bool
- Flag to indicate if events which are synonymous at the cdna level should produce illustrations. The corresponding environment variable isMAVIS_DRAW_NON_SYNONYMOUS_CDNA_ONLY
and the default value isTrue
- drawing_width_iter_increase
int
- The amount (in pixels) by which to increase the drawing width upon failure to fit. The corresponding environment variable isMAVIS_DRAWING_WIDTH_ITER_INCREASE
and the default value is500
- fetch_min_bin_size
int
- The minimum size of any bin for reading from a bam file. increasing this number will result in smaller bins being merged or less bins being created (depending on the fetch method). The corresponding environment variable isMAVIS_FETCH_MIN_BIN_SIZE
and the default value is50
- fetch_reads_bins
int
- Number of bins to split an evidence window into to ensure more even sampling of high coverage regions. The corresponding environment variable isMAVIS_FETCH_READS_BINS
and the default value is5
- fetch_reads_limit
int
- Maximum number of reads, cap, to loop over for any given evidence window. The corresponding environment variable isMAVIS_FETCH_READS_LIMIT
and the default value is3000
- filter_cdna_synon
bool
- Filter all annotations synonymous at the cdna level. The corresponding environment variable isMAVIS_FILTER_CDNA_SYNON
and the default value isTrue
- filter_min_flanking_reads
int
- Minimum number of flanking pairs for a call by flanking pairs. The corresponding environment variable isMAVIS_FILTER_MIN_FLANKING_READS
and the default value is10
- filter_min_linking_split_reads
int
- Minimum number of linking split reads for a call by split reads. The corresponding environment variable isMAVIS_FILTER_MIN_LINKING_SPLIT_READS
and the default value is1
- filter_min_remapped_reads
int
- Minimum number of remapped reads for a call by contig. The corresponding environment variable isMAVIS_FILTER_MIN_REMAPPED_READS
and the default value is5
- filter_min_spanning_reads
int
- Minimum number of spanning reads for a call by spanning reads. The corresponding environment variable isMAVIS_FILTER_MIN_SPANNING_READS
and the default value is5
- filter_min_split_reads
int
- Minimum number of split reads for a call by split reads. The corresponding environment variable isMAVIS_FILTER_MIN_SPLIT_READS
and the default value is5
- filter_protein_synon
bool
- Filter all annotations synonymous at the protein level. The corresponding environment variable isMAVIS_FILTER_PROTEIN_SYNON
and the default value isTrue
- filter_secondary_alignments
bool
- Filter secondary alignments when gathering read evidence. The corresponding environment variable isMAVIS_FILTER_SECONDARY_ALIGNMENTS
and the default value isTrue
- flanking_call_distance
int
- The maximum distance allowed between breakpoint pairs (called by flanking pairs) in order for them to pair. The corresponding environment variable isMAVIS_FLANKING_CALL_DISTANCE
and the default value is0
- fuzzy_mismatch_number
int
- The number of events/mismatches allowed to be considered a fuzzy match. The corresponding environment variable isMAVIS_FUZZY_MISMATCH_NUMBER
and the default value is1
- gene1_color
str
- The color of genes near the first gene. The corresponding environment variable isMAVIS_GENE1_COLOR
and the default value is'#657e91'
- gene1_color_selected
str
- The color of the first gene. The corresponding environment variable isMAVIS_GENE1_COLOR_SELECTED
and the default value is'#518dc5'
- gene2_color
str
- The color of genes near the second gene. The corresponding environment variable isMAVIS_GENE2_COLOR
and the default value is'#325556'
- gene2_color_selected
str
- The color of the second gene. The corresponding environment variable isMAVIS_GENE2_COLOR_SELECTED
and the default value is'#4c9677'
- import_env
bool
- Flag to import environment variables. The corresponding environment variable isMAVIS_IMPORT_ENV
and the default value isTrue
- input_call_distance
int
- The maximum distance allowed between breakpoint pairs (called by input tools, not validated) in order for them to pair. The corresponding environment variable isMAVIS_INPUT_CALL_DISTANCE
and the default value is5
- label_color
str
- The label color. The corresponding environment variable isMAVIS_LABEL_COLOR
and the default value is'#000000'
- limit_to_chr
ChrListString
- A semi-colon delimited list of chromosome names to use. breakpointpairs on other chromosomes will be filteredout. for example ‘1;2;3;4’ would filter out events/breakpoint pairs on any chromosomes but 1, 2, 3, and 4. The corresponding environment variable isMAVIS_LIMIT_TO_CHR
and the default value is['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', 'X', 'Y']
- mask_fill
str
- Color of mask (for deleted region etc.). The corresponding environment variable isMAVIS_MASK_FILL
and the default value is'#ffffff'
- mask_opacity
float_fraction
- Opacity of the mask layer. The corresponding environment variable isMAVIS_MASK_OPACITY
and the default value is0.7
- max_drawing_retries
int
- The maximum number of retries for attempting a drawing. each iteration the width is extended. if it is still insufficient after this number a gene-level only drawing will be output. The corresponding environment variable isMAVIS_MAX_DRAWING_RETRIES
and the default value is3
- max_files
int
- The maximum number of files to output from clustering/splitting. The corresponding environment variable isMAVIS_MAX_FILES
and the default value is200
- max_orf_cap
int
- The maximum number of orfs to return (best putative orfs will be retained). The corresponding environment variable isMAVIS_MAX_ORF_CAP
and the default value is3
- max_proximity
int
- The maximum distance away from an annotation before the region in considered to be uninformative. The corresponding environment variable isMAVIS_MAX_PROXIMITY
and the default value is5000
- max_sc_preceeding_anchor
int
- When remapping a softclipped read this determines the amount of softclipping allowed on the side opposite of where we expect it. for example for a softclipped read on a breakpoint with a left orientation this limits the amount of softclipping that is allowed on the right. if this is set to none then there is no limit on softclipping. The corresponding environment variable isMAVIS_MAX_SC_PRECEEDING_ANCHOR
and the default value is6
- memory_limit
int
- The maximum number of megabytes (mb) any given job is allowed. The corresponding environment variable isMAVIS_MEMORY_LIMIT
and the default value is16000
- min_anchor_exact
int
- Applies to re-aligning softclipped reads to the opposing breakpoint. the minimum number of consecutive exact matches to anchor a read to initiate targetted realignment. The corresponding environment variable isMAVIS_MIN_ANCHOR_EXACT
and the default value is6
- min_anchor_fuzzy
int
- Applies to re-aligning softclipped reads to the opposing breakpoint. the minimum length of a fuzzy match to anchor a read to initiate targetted realignment. The corresponding environment variable isMAVIS_MIN_ANCHOR_FUZZY
and the default value is10
- min_anchor_match
float_fraction
- Minimum percent match for a read to be kept as evidence. The corresponding environment variable isMAVIS_MIN_ANCHOR_MATCH
and the default value is0.9
- min_clusters_per_file
int
- The minimum number of breakpoint pairs to output to a file. The corresponding environment variable isMAVIS_MIN_CLUSTERS_PER_FILE
and the default value is50
- min_domain_mapping_match
float_fraction
- A number between 0 and 1 representing the minimum percent match a domain must map to the fusion transcript to be displayed. The corresponding environment variable isMAVIS_MIN_DOMAIN_MAPPING_MATCH
and the default value is0.9
- min_double_aligned_to_estimate_insertion_size
int
- The minimum number of reads which map soft-clipped to both breakpoints to assume the size of the untemplated sequence between the breakpoints is at most the read length - 2 * min_softclipping. The corresponding environment variable isMAVIS_MIN_DOUBLE_ALIGNED_TO_ESTIMATE_INSERTION_SIZE
and the default value is2
- min_flanking_pairs_resolution
int
- The minimum number of flanking reads required to call a breakpoint by flanking evidence. The corresponding environment variable isMAVIS_MIN_FLANKING_PAIRS_RESOLUTION
and the default value is10
- min_linking_split_reads
int
- The minimum number of split reads which aligned to both breakpoints. The corresponding environment variable isMAVIS_MIN_LINKING_SPLIT_READS
and the default value is2
- min_mapping_quality
int
- The minimum mapping quality of reads to be used as evidence. The corresponding environment variable isMAVIS_MIN_MAPPING_QUALITY
and the default value is5
- min_non_target_aligned_split_reads
int
- The minimum number of split reads aligned to a breakpoint by the input bam and no forced by local alignment to the target region to call a breakpoint by split read evidence. The corresponding environment variable isMAVIS_MIN_NON_TARGET_ALIGNED_SPLIT_READS
and the default value is1
- min_orf_size
int
- The minimum length (in amino acids) to retain a putative open reading frame (orf). The corresponding environment variable isMAVIS_MIN_ORF_SIZE
and the default value is300
- min_sample_size_to_apply_percentage
int
- Minimum number of aligned bases to compute a match percent. if there are less than this number of aligned bases (match or mismatch) the percent comparator is not used. The corresponding environment variable isMAVIS_MIN_SAMPLE_SIZE_TO_APPLY_PERCENTAGE
and the default value is10
- min_softclipping
int
- Minimum number of soft-clipped bases required for a read to be used as soft-clipped evidence. The corresponding environment variable isMAVIS_MIN_SOFTCLIPPING
and the default value is6
- min_spanning_reads_resolution
int
- Minimum number of spanning reads required to call an event by spanning evidence. The corresponding environment variable isMAVIS_MIN_SPANNING_READS_RESOLUTION
and the default value is5
- min_splits_reads_resolution
int
- Minimum number of split reads required to call a breakpoint by split reads. The corresponding environment variable isMAVIS_MIN_SPLITS_READS_RESOLUTION
and the default value is3
- novel_exon_color
str
- Novel exon fill color. The corresponding environment variable isMAVIS_NOVEL_EXON_COLOR
and the default value is'#000000'
- outer_window_min_event_size
int
- The minimum size of an event in order for flanking read evidence to be collected. The corresponding environment variable isMAVIS_OUTER_WINDOW_MIN_EVENT_SIZE
and the default value is125
- queue
str
- The queue jobs are to be submitted to. The corresponding environment variable isMAVIS_QUEUE
and the default value is''
- scaffold_color
str
- The color used for the gene/transcripts scaffolds. The corresponding environment variable isMAVIS_SCAFFOLD_COLOR
and the default value is'#000000'
- scheduler
SCHEDULER
- The scheduler being used. The corresponding environment variable isMAVIS_SCHEDULER
and the default value is'SLURM'
. Accepted values include:'SGE'
,'SLURM'
- spanning_call_distance
int
- The maximum distance allowed between breakpoint pairs (called by spanning reads) in order for them to pair. The corresponding environment variable isMAVIS_SPANNING_CALL_DISTANCE
and the default value is5
- splice_color
str
- Splicing lines color. The corresponding environment variable isMAVIS_SPLICE_COLOR
and the default value is'#000000'
- split_call_distance
int
- The maximum distance allowed between breakpoint pairs (called by split reads) in order for them to pair. The corresponding environment variable isMAVIS_SPLIT_CALL_DISTANCE
and the default value is10
- stdev_count_abnormal
float
- The number of standard deviations away from the normal considered expected and therefore not qualifying as flanking reads. The corresponding environment variable isMAVIS_STDEV_COUNT_ABNORMAL
and the default value is3.0
- strand_determining_read
int
- 1 or 2. the read in the pair which determines if (assuming a stranded protocol) the first or second read in the pair matches the strand sequenced. The corresponding environment variable isMAVIS_STRAND_DETERMINING_READ
and the default value is2
- time_limit
int
- The time in seconds any given jobs is allowed. The corresponding environment variable isMAVIS_TIME_LIMIT
and the default value is57600
- trans_validation_memory
int
- Default memory limit (mb) for the validation stage (for transcriptomes). The corresponding environment variable isMAVIS_TRANS_VALIDATION_MEMORY
and the default value is18000
- uninformative_filter
bool
- Flag that determines if breakpoint pairs which are not within max_proximity to any annotations are filtered out prior to clustering. The corresponding environment variable isMAVIS_UNINFORMATIVE_FILTER
and the default value isTrue
- validation_memory
int
- Default memory limit (mb) for the validation stage. The corresponding environment variable isMAVIS_VALIDATION_MEMORY
and the default value is16000
- width
int
- The drawing width in pixels. The corresponding environment variable isMAVIS_WIDTH
and the default value is1000
Column Names¶
List of column names and their definitions. The types indicated here are the expected types in a row for a given column name.
- annotation_figure
FILEPATH
- File path to the svg drawing representing the annotation- annotation_figure_legend
JSON
- JSON data for the figure legend- annotation_id
- Identifier for the annotation step
- break1_chromosome
str
- The name of the chromosome on which breakpoint 1 is situated- break1_ewindow
int-int
- Window where evidence was gathered for the first breakpoint- break1_ewindow_count
int
- Number of reads processed/looked-at in the first evidence window- break1_ewindow_practical_coverage
float
- break2_ewindow_practical_coverage, break1_ewindow_count / len(break1_ewindow). Not the actual coverage as bins are sampled within and there is a read limit cutoff- break1_homologous_seq
str
- Sequence in common at the first breakpoint and other side of the second breakpoint- break1_orientation
ORIENT
- The side of the breakpoint wrt the positive/forward strand that is retained.- break1_position_end
int
- End integer inclusive 1-based of the range representing breakpoint 1- break1_position_start
int
- Start integer inclusive 1-based of the range representing breakpoint 1- break1_seq
str
- The sequence up to and including the breakpoint. Always given wrt to the positive/forward strand- break1_split_reads
int
- Number of split reads that call the exact breakpoint given- break1_split_reads_forced
int
- Number of split reads which were aligned to the opposite breakpoint window using a targeted alignment- break1_strand
STRAND
- The strand wrt to the reference positive/forward strand at this breakpoint.- break2_chromosome
- The name of the chromosome on which breakpoint 2 is situated
- break2_ewindow
int-int
- Window where evidence was gathered for the second breakpoint- break2_ewindow_count
int
- Number of reads processed/looked-at in the second evidence window- break2_ewindow_practical_coverage
float
- break2_ewindow_practical_coverage, break2_ewindow_count / len(break2_ewindow). Not the actual coverage as bins are sampled within and there is a read limit cutoff- break2_homologous_seq
str
- Sequence in common at the second breakpoint and other side of the first breakpoint- break2_orientation
ORIENT
- The side of the breakpoint wrt the positive/forward strand that is retained.- break2_position_end
int
- End integer inclusive 1-based of the range representing breakpoint 2- break2_position_start
int
- Start integer inclusive 1-based of the range representing breakpoint 2- break2_seq
str
- The sequence up to and including the breakpoint. Always given wrt to the positive/forward strand- break2_split_reads
int
- Number of split reads that call the exact breakpoint given- break2_split_reads_forced
int
- Number of split reads which were aligned to the opposite breakpoint window using a targeted alignment- break2_strand
STRAND
- The strand wrt to the reference positive/forward strand at this breakpoint.- call_method
CALL_METHOD
- The method used to call the breakpoints- cdna_synon
- semi-colon delimited list of transcript ids which have an identical cdna sequence to the cdna sequence of the current fusion product
- cluster_id
- Identifier for the merging/clustering step
- cluster_size
int
- The number of breakpoint pair calls that were grouped in creating the cluster- contig_alignment_cigar
- The cigar string(s) representing the contig alignment. Semi-colon delimited
- contig_alignment_query_name
- The query name for the contig alignment. Should match the ‘read’ name(s) in the .contigs.bam output file
- contig_alignment_reference_start
- The reference start(s) <chr>:<position> of the contig alignment. Semi-colon delimited
- contig_alignment_score
float
- A rank based on the alignment tool blat etc. of the alignment being used. An average if split alignments were used. Lower numbers indicate a better alignment. If it was the best alignment possible then this would be zero.- contig_build_score
int
- Score representing the edge weights of all edges used in building the sequence- contig_remap_coverage
float
- Fraction of the contig sequence which is covered by the remapped reads- contig_remap_score
float
- Score representing the number of sequences from the set of sequences given to the assembly algorithm that were aligned to the resulting contig with an acceptable scoring based on user-set thresholds. For any sequence its contribution to the score is divided by the number of mappings to give less weight to multimaps- contig_remapped_read_names
- read query names for the reads that were remapped. A -1 or -2 has been appended to the end of the name to indicate if this is the first or second read in the pair
- contig_remapped_reads
int
- the number of reads from the input bam that map to the assembled contig- contig_seq
str
- Sequence of the current contig wrt to the positive forward strand if not strand specific- contig_strand_specific
bool
- A flag to indicate if it was possible to resolve the strand for this contig- contigs_aligned
int
- Number of contigs that were able to align- contigs_assembled
int
- Number of contigs that were built from split read sequences- event_type
SVTYPE
- The classification of the event- flanking_median_fragment_size
int
- The median fragment size of the flanking reads being used as evidence- flanking_pairs
int
- Number of read-pairs where one read aligns to the first breakpoint window and the second read aligns to the other. The count here is based on the number of unique query names- flanking_pairs_compatible
int
- Number of flanking pairs of a compatible orientation type. This applies to insertions and duplications. Flanking pairs supporting an insertion will be compatible to a duplication and flanking pairs supporting a duplication will be compatible to an insertion (possibly indicating an internal translocation)- flanking_stdev_fragment_size
float
- The standard deviation in fragment size of the flanking reads being used as evidence- fusion_cdna_coding_end
- Position wrt the 5’ end of the fusion transcript where coding ends last base of the stop codon
- fusion_cdna_coding_end
int
- Position wrt the 5’ end of the fusion transcript where coding ends last base of the stop codon- fusion_cdna_coding_start
int
- Position wrt the 5’ end of the fusion transcript where coding begins first base of the Met amino acid.- fusion_mapped_domains
JSON
- List of domains in JSON format where each domain start and end positions are given wrt to the fusion transcript and the mapping quality is the number of matching amino acid positions over the total number of amino acids. The sequence is the amino acid sequence of the domain on the reference/original transcript- fusion_protein_hgvs
str
- Describes the fusion protein in HGVS notation. Will be None if the change is not an indel or is synonymous- fusion_sequence_fasta_file
FILEPATH
- Path to the corresponding fasta output file- fusion_sequence_fasta_id
- The sequence identifier for the cdna sequence output fasta file
- fusion_splicing_pattern
SPLICE_TYPE
- Type of splicing pattern used to create the fusion cDNA.- gene1
- Gene for the current annotation at the first breakpoint
- gene1_aliases
- Other gene names associated with the current annotation at the first breakpoint
- gene1_direction
PRIME
- The direction/prime of the gene- gene2
- Gene for the current annotation at the second breakpoint
- gene2_aliases
- Other gene names associated with the current annotation at the second breakpoint
- gene2_direction
PRIME
- The direction/prime of the gene. Has the following possible values- gene_product_type
GENE_PRODUCT_TYPE
- Describes if the putative fusion product will be sense or anti-sense- genes_encompassed
- Applies to intrachromosomal events only. List of genes which overlap any region that occurs between both breakpoints. For example in a deletion event these would be deleted genes.
- genes_overlapping_break1
- list of genes which overlap the first breakpoint
- genes_overlapping_break2
- list of genes which overlap the second breakpoint
- genes_proximal_to_break1
- list of genes near the breakpoint and the distance away from the breakpoint
- genes_proximal_to_break2
- list of genes near the breakpoint and the distance away from the breakpoint
- inferred_pairing
- A semi colon delimited of event identifiers i.e. <annotation_id>_<splicing pattern>_<cds start>_<cds end> which were paired to the current event based on predicted products
- library
- Identifier for the library/source
- linking_split_reads
int
- Number of split reads that align to both breakpoints- net_size
int-int
- The net size of an event. For translocations and inversion this will always be 0. For indels it will be negative for deletions and positive for insertions. It is a range to accommodate non-specific events.- opposing_strands
bool
- Specifies if breakpoints are on opposite strands wrt to the reference. Expects a boolean- pairing
- A semi colon delimited of event identifiers i.e. <annotation_id>_<splicing pattern>_<cds start>_<cds end> which were paired to the current event based on breakpoint positions
- product_id
- Unique identifier of the final fusion including splicing and ORF decision from the annotation step
- protein_synon
- semi-colon delimited list of transcript ids which produce a translation with an identical amino-acid sequence to the current fusion product
- protocol
PROTOCOL
- Specifies the type of library- raw_break1_split_reads
int
- Number of split reads before calling the breakpoint- raw_break2_split_reads
int
- Number of split reads before calling the breakpoint- raw_flanking_pairs
int
- Number of flanking reads before calling the breakpoint. The count here is based on the number of unique query names- raw_spanning_reads
int
- Number of spanning reads collected during evidence collection before calling the breakpoint- spanning_read_names
- read query names of the spanning reads which support the current event
- spanning_reads
int
- the number of spanning reads which support the event- stranded
bool
- Specifies if the sequencing protocol was strand specific or not. Expects a boolean- tools
- The tools that called the event originally from the cluster step. Should be a semi-colon delimited list of <tool name>_<tool version>
- tracking_id
- column used to store input identifiers from the original SV calls. Used to track calls from the input files to the final outputs.
- transcript1
- Transcript for the current annotation at the first breakpoint
- transcript2
- Transcript for the current annotation at the second breakpoint
- untemplated_seq
str
- The untemplated/novel sequence between the breakpoints- validation_id
- Identifier for the validation step