genomic module¶
-
class
mavis.annotate.genomic.
Exon
(start, end, transcript=None, name=None, intact_start_splice=True, intact_end_splice=True, seq=None)[source]¶ Bases:
mavis.annotate.base.BioInterval
Parameters: - start (int) – the genomic start position
- end (int) – the genomic end position
- name (str) – the name of the exon
- transcript (usTranscript) – the ‘parent’ transcript this exon belongs to
- intact_start_splice (bool) – if the starting splice site has been abrogated
- intact_end_splice (bool) – if the end splice site has been abrogated
Raises: AttributeError
– if the exon start > the exon endExample
>>> Exon(15, 78)
-
acceptor
¶ int – returns the genomic exonic position of the acceptor splice site
-
donor
¶ int – returns the genomic exonic position of the donor splice site
-
transcript
¶ usTranscript
– the transcript this exon belongs to
-
class
mavis.annotate.genomic.
Gene
(chr, start, end, name=None, strand='?', aliases=None, seq=None)[source]¶ Bases:
mavis.annotate.base.BioInterval
Parameters: Example
>>> Gene('X', 1, 1000, 'ENG0001', '+', ['KRAS'])
-
chr
¶ returns the name of the chromosome that this gene resides on
-
get_seq
(REFERENCE_GENOME, ignore_cache=False)[source]¶ gene sequence is always given wrt to the positive forward strand regardless of gene strand
Parameters: Returns: the sequence of the gene
Return type:
-
spliced_transcripts
¶ list
ofTranscript
– list of transcripts
-
transcript_priority
(transcript)[source]¶ prioritizes transcripts from 0 to n-1 based on best transcript flag and then alphanumeric name sort
Warning
Lower number means higher priority. This is to make sort work by default
-
transcripts
¶ list
ofusTranscript
– list of unspliced transcripts
-
translations
¶ list
ofTranslation
– list of translations
-
-
class
mavis.annotate.genomic.
IntergenicRegion
(chr, start, end, strand)[source]¶ Bases:
mavis.annotate.base.BioInterval
Parameters: Example
>>> IntergenicRegion('1', 1, 100, '+')
-
chr
¶ returns the name of the chromosome that this gene resides on
-
-
class
mavis.annotate.genomic.
Transcript
(ust, splicing_patt, seq=None, translations=None)[source]¶ Bases:
mavis.annotate.base.BioInterval
splicing pattern is given in genomic coordinates
Parameters: - us_transcript (usTranscript) – the unspliced transcript
- splicing_patt (
list
ofint
) – the list of splicing positions - seq (str) – the cdna sequence
- translations (
list
ofTranslation
) – the list of translations of this transcript
-
convert_cdna_to_genomic
(pos)[source]¶ Parameters: pos (int) – cdna position Returns: the genomic equivalent Return type: int
-
convert_genomic_to_cdna
(pos)[source]¶ Parameters: pos (int) – the genomic position to be converted Returns: the cdna equivalent Return type: int Raises: IndexError
– when a genomic position not present in the cdna is attempted to be converted
-
get_seq
(REFERENCE_GENOME=None, ignore_cache=False)[source]¶ Parameters: Returns: the sequence corresponding to the spliced cdna
Return type:
-
unspliced_transcript
¶ usTranscript
– the unspliced transcript this splice variant belongs to
-
class
mavis.annotate.genomic.
usTranscript
(exons, gene=None, name=None, strand=None, spliced_transcripts=None, seq=None, is_best_transcript=False)[source]¶ Bases:
mavis.annotate.base.BioInterval
creates a new transcript object
Parameters: - exons (
list
ofExon
) – list of Exon that make up the transcript - genomic_start (int) – genomic start position of the transcript
- genomic_end (int) – genomic end position of the transcript
- gene (Gene) – the gene this transcript belongs to
- name (str) – name of the transcript
- strand (STRAND) – strand the transcript is on, defaults to the strand of the Gene if not specified
- seq (str) – unspliced cDNA seq
-
convert_cdna_to_genomic
(pos, splicing_pattern)[source]¶ Parameters: - pos (int) – cdna position
- splicing_pattern (SplicingPattern) – list of genomic splice sites 3‘5’ repeating
Returns: the genomic equivalent
Return type:
-
convert_genomic_to_cdna
(pos, splicing_pattern)[source]¶ Parameters: - pos (int) – the genomic position to be converted
- splicing_pattern (SplicingPattern) – list of genomic splice sites 3‘5’ repeating
Returns: the cdna equivalent
Return type: Raises: IndexError
– when a genomic position not present in the cdna is attempted to be converted
-
convert_genomic_to_nearest_cdna
(pos, splicing_pattern)[source]¶ converts a genomic position to its cdna equivalent or (if intronic) the nearest cdna and shift
Parameters: - pos (int) – the genomic position
- splicing_pattern (SplicingPattern) – the splicing pattern
Returns: - int - the exonic cdna position
- int - the intronic shift
Return type: tuple of int and int
-
exon_number
(exon)[source]¶ exon numbering is based on the direction of translation
Parameters: exon (Exon) – the exon to be numbered Returns: the exon number (1 based) Return type: int Raises: AttributeError
– if the strand is not given or the exon does not belong to the transcript
-
generate_splicing_patterns
()[source]¶ returns a list of splice sites to be connected as a splicing pattern
Returns: List of positions to be spliced together Return type: list
ofSplicingPattern
-
get_cdna_seq
(splicing_pattern, REFERENCE_GENOME=None, ignore_cache=False)[source]¶ Parameters: - splicing_pattern (SplicingPattern) – the list of splicing positions
- REFERENCE_GENOME (
dict
ofBio.SeqRecord
bystr
) – dict of reference sequence by template/chr name - ignore_cache (bool) – if True then stored sequences will be ignored and the function will attempt to retrieve the sequence using the positions and the input REFERENCE_GENOME
Returns: the spliced cDNA sequence
Return type:
-
get_seq
(REFERENCE_GENOME=None, ignore_cache=False)[source]¶ Parameters: Returns: the sequence of the transcript including introns (but relative to strand)
Return type:
-
transcripts
¶ list
ofTranscript
– list of spliced transcripts
-
translations
¶ list
ofTranslation
– list of translations associated with this transcript
- exons (