variant module¶
-
class
mavis.annotate.variant.
Annotation
(bpp, transcript1=None, transcript2=None, proximity=5000, data=None, **kwargs)[source]¶ Bases:
mavis.breakpoint.BreakpointPair
a fusion of two transcripts created by the associated breakpoint_pair will also hold the other annotations for overlapping and encompassed and nearest genes
Holds a breakpoint call and a set of transcripts, other information is gathered relative to these
Parameters: - bpp (BreakpointPair) – the breakpoint pair call. Will be adjusted and then stored based on the transcripts
- transcript1 (Transcript) – transcript at the first breakpoint
- transcript2 (Transcript) – Transcript at the second breakpoint
- data (dict) – optional dictionary to hold related attributes
- event_type (SVTYPE) – the type of event
-
class
mavis.annotate.variant.
FusionTranscript
[source]¶ Bases:
mavis.annotate.genomic.usTranscript
-
classmethod
build
(ann, REFERENCE_GENOME, min_orf_size=None, max_orf_cap=None, min_domain_mapping_match=None)[source]¶ Parameters: - ann (Annotation) – the annotation object we want to build a FusionTranscript for
- REFERENCE_GENOME (
dict
ofBio.SeqRecord
bystr
) – dict of reference sequence by template/chr name
Returns: the newly built fusion transcript
Return type:
-
exon_number
(exon)[source]¶ Parameters: exon (Exon) – the exon to be numbered Returns: the number of the exon in the original transcript (prior to fusion) Return type: int
-
classmethod
-
mavis.annotate.variant.
annotate_events
(bpps, annotations, reference_genome, max_proximity=5000, min_orf_size=200, min_domain_mapping_match=0.95, max_orf_cap=3, log=<function devnull>, filters=None)[source]¶ Parameters: - bpps (list of
BreakpointPair
) – list of events - annotations – reference annotations
- reference_genome (dict of string by string) – dictionary of reference sequences by name
- max_proximity (int) – see max_proximity
- min_orf_size (int) – see min_orf_size
- min_domain_mapping_match (float) – see min_domain_mapping_match
- max_orf_cap (int) – see max_orf_cap
- log (callable) – callable function to take in strings and time_stamp args
- filters (list of callable) – list of functions taking in a list and returning a list for filtering
Returns: list of the putative annotations
Return type: list of
Annotation
- bpps (list of
-
mavis.annotate.variant.
choose_more_annotated
(ann_list)[source]¶ for a given set of annotations if there are annotations which contain transcripts and annotations that are simply intergenic regions, discard the intergenic region annotations
similarly if there are annotations where both breakpoints fall in a transcript and annotations where one or more breakpoints lands in an intergenic region, discard those that land in the intergenic region
Parameters: ann_list (list of Annotation
) – list of input annotationsWarning
input annotations are assumed to be the same event (the same validation_id) the logic used would not apply to different events
Returns: the filtered list Return type: list of Annotation
-
mavis.annotate.variant.
choose_transcripts_by_priority
(ann_list)[source]¶ for each set of annotations with the same combinations of genes, choose the annotation with the most “best_transcripts” or most “alphanumeric” choices of transcript. Throw an error if they are identical
Parameters: ann_list (list of Annotation
) – input annotationsWarning
input annotations are assumed to be the same event (the same validation_id) the logic used would not apply to different events
Returns: the filtered list Return type: list of Annotation
-
mavis.annotate.variant.
determine_prime
(transcript, breakpoint)[source]¶ determine the side of the transcript 5’ or 3’ which is ‘kept’ given the breakpoint
Parameters: - transcript (Transcript) – the transcript
- breakpoint (Breakpoint) – the breakpoint
Returns: 5’ or 3’
Return type: Raises: AttributeError
– if the orientation of the breakpoint or the strand of the transcript is not specified
-
mavis.annotate.variant.
overlapping_transcripts
(ref_ann, breakpoint)[source]¶ Parameters: - ref_ann (
dict
oflist
ofGene
bystr
) – the reference list of genes split by chromosome - breakpoint (Breakpoint) – the breakpoint in question
Returns: a list of possible transcripts
Return type: list
ofusTranscript
- ref_ann (