variant module¶
-
class
mavis.annotate.variant.
Annotation
(bpp, transcript1=None, transcript2=None, proximity=5000, data=None, **kwargs)[source]¶ Bases:
mavis.breakpoint.BreakpointPair
a fusion of two transcripts created by the associated breakpoint_pair will also hold the other annotations for overlapping and encompassed and nearest genes
Holds a breakpoint call and a set of transcripts, other information is gathered relative to these
Parameters: - bpp (BreakpointPair) – the breakpoint pair call. Will be adjusted and then stored based on the transcripts
- transcript1 (Transcript) – transcript at the first breakpoint
- transcript2 (Transcript) – Transcript at the second breakpoint
- data (dict) – optional dictionary to hold related attributes
- event_type (SVTYPE) – the type of event
-
add_gene
(input_gene)[source]¶ adds a input_gene to the current set of annotations. Checks which set it should be added to
Parameters: input_gene (input_gene) – the input_gene being added
-
class
mavis.annotate.variant.
IndelCall
(refseq, mutseq)[source]¶ Bases:
object
Given two sequences, Assuming there exists a single difference between the two call an indel which accounts for the change
-
mavis.annotate.variant.
annotate_events
(bpps, annotations, reference_genome, max_proximity=5000, min_orf_size=200, min_domain_mapping_match=0.95, max_orf_cap=3, log=<function devnull>, filters=None)[source]¶ Parameters: - bpps (list of
BreakpointPair
) – list of events - annotations – reference annotations
- reference_genome (dict of string by string) – dictionary of reference sequences by name
- max_proximity (int) – see max_proximity
- min_orf_size (int) – see min_orf_size
- min_domain_mapping_match (float) – see min_domain_mapping_match
- max_orf_cap (int) – see max_orf_cap
- log (callable) – callable function to take in strings and time_stamp args
- filters (list of callable) – list of functions taking in a list and returning a list for filtering
Returns: list of the putative annotations
Return type: list of
Annotation
- bpps (list of
-
mavis.annotate.variant.
call_protein_indel
(ref_translation, fusion_translation, reference_genome=None)[source]¶ compare the fusion protein/aa sequence to the reference protein/aa sequence and return an hgvs notation indel call
Parameters: - ref_translation (Translation) – the reference protein/translation
- fusion_translation (Translation) – the fusion protein/translation
- reference_genome – the reference genome object used to fetch the reference translation AA sequence
Returns: the HGVS protein indel notation
Return type:
-
mavis.annotate.variant.
choose_more_annotated
(ann_list)[source]¶ for a given set of annotations if there are annotations which contain transcripts and annotations that are simply intergenic regions, discard the intergenic region annotations
similarly if there are annotations where both breakpoints fall in a transcript and annotations where one or more breakpoints lands in an intergenic region, discard those that land in the intergenic region
Parameters: ann_list (list of Annotation
) – list of input annotationsWarning
input annotations are assumed to be the same event (the same validation_id) the logic used would not apply to different events
Returns: the filtered list Return type: list of Annotation
-
mavis.annotate.variant.
choose_transcripts_by_priority
(ann_list)[source]¶ for each set of annotations with the same combinations of genes, choose the annotation with the most “best_transcripts” or most “alphanumeric” choices of transcript. Throw an error if they are identical
Parameters: ann_list (list of Annotation
) – input annotationsWarning
input annotations are assumed to be the same event (the same validation_id) the logic used would not apply to different events
Returns: the filtered list Return type: list of Annotation
-
mavis.annotate.variant.
flatten_fusion_translation
(translation)[source]¶ for a given fusion product (translation) gather the information to be output to the tabbed files
Parameters: translation (Translation) – the translation which is on the fusion transcript Returns: the dictionary of column names to values Return type: dict