protein module¶
-
class
mavis.annotate.protein.
Domain
(name, regions, translation=None, data=None)[source]¶ Bases:
object
Parameters: - name (str) – the name of the domain i.e. PF00876
- regions (
list
ofDomainRegion
) – the amino acid ranges that are part of the domain - transcript (Transcript) – the ‘parent’ transcript this domain belongs to
Raises: AttributeError
– if the end of any region is less than the startExample
>>> Domain('DNA binding domain', [(1, 4), (10, 24)], transcript)
-
align_seq
(input_sequence, REFERENCE_GENOME=None)[source]¶ align each region to the input sequence starting with the last one. then take the subset of sequence that remains to align the second last and so on return a list of intervals for the alignment. If multiple alignments are found, then raise an error
Parameters: Returns: tuple contains
- int: the number of matches
- int: the total number of amino acids to be aligned
list
ofDomainRegion
: the list of domain regions on the new input sequence
Return type: Raises: AttributeError
– if sequence information is not availableUserWarning
– if a valid alignment could not be found or no best alignment was found
-
get_seqs
(REFERENCE_GENOME=None, ignore_cache=False)[source]¶ returns the amino acid sequences for each of the domain regions associated with this domain in the order of the regions (sorted by start)
Parameters: REFERENCE_GENOME ( dict
ofBio.SeqRecord
bystr
) – dict of reference sequence by template/chr nameReturns: list of amino acid sequences for each DomainRegion Return type: list
ofstr
Raises: AttributeError
– if there is not enough sequence information given to determine this
-
key
()[source]¶ tuple
: a tuple representing the items expected to be unique. for hashing and comparing
-
score_region_mapping
(REFERENCE_GENOME=None)[source]¶ compares the sequence in each DomainRegion to the sequence collected for that domain region from the translation object
Parameters: REFERENCE_GENOME ( dict
ofBio.SeqRecord
bystr
) – dict of reference sequence by template/chr nameReturns: tuple contains - int: the number of matching amino acids
- int: the total number of amino acids
Return type: tuple of int and int
-
translation
¶ Translation
– the Translation this domain belongs to
-
class
mavis.annotate.protein.
Translation
(start, end, transcript=None, domains=None, seq=None, name=None)[source]¶ Bases:
mavis.annotate.base.BioInterval
describes the splicing pattern and cds start and end with reference to a particular transcript
Parameters: - start (int) – start of the coding sequence (cds) relative to the start of the first exon in the transcript
- end (int) – end of the coding sequence (cds) relative to the start of the first exon in the transcript
- transcript (Transcript) – the transcript this is a Translation of
- domains (
list
ofDomain
) – a list of the domains on this translation - sequence (str) – the cds sequence
-
convert_aa_to_cdna
(pos)[source]¶ Parameters: pos (int) – the amino acid position Returns: the cdna equivalent (with CODON_SIZE uncertainty) Return type: Interval
-
convert_cdna_to_aa
(pos)[source]¶ Parameters: pos (int) – the cdna position Returns: the protein/amino-acid position Return type: int Raises: AttributeError
– the cdna position is not translated
-
convert_genomic_to_cds
(pos)[source]¶ converts a genomic position to its cds (coding sequence) equivalent
Parameters: pos (int) – the genomic position Returns: the cds position (negative if before the initiation start site) Return type: int
-
convert_genomic_to_cds_notation
(pos)[source]¶ converts a genomic position to its cds (coding sequence) equivalent using hgvs cds notation
Parameters: pos (int) – the genomic position Returns: the cds position notation Return type: str Example
>>> tl = Translation(...) # a position before the translation start >>> tl.convert_genomic_to_cds_notation(1010) '-50' # a position after the translation end >>> tl.convert_genomic_to_cds_notation(2031) '*72' # an intronic position >>> tl.convert_genomic_to_cds_notation(1542) '50+10' >>> tl.convert_genomic_to_cds_notation(1589) '51-14'
-
convert_genomic_to_nearest_cds
(pos)[source]¶ converts a genomic position to its cds equivalent or (if intronic) the nearest cds and shift
Parameters: pos (int) – the genomic position Returns: - int - the cds position
- int - the intronic shift
Return type: tuple of int and int
-
get_AA_seq
(REFERENCE_GENOME=None, ignore_cache=False)[source]¶ Parameters: REFERENCE_GENOME ( dict
ofBio.SeqRecord
bystr
) – dict of reference sequence by template/chr nameReturns: the amino acid sequence Return type: str Raises: AttributeError
– if the reference sequence has not been given and is not set
-
get_cds_seq
(REFERENCE_GENOME=None, ignore_cache=False)[source]¶ Parameters: REFERENCE_GENOME ( dict
ofBio.SeqRecord
bystr
) – dict of reference sequence by template/chr nameReturns: the cds sequence Return type: str Raises: AttributeError
– if the reference sequence has not been given and is not set
-
get_seq
(REFERENCE_GENOME=None, ignore_cache=False)[source]¶ wrapper for the sequence method
Parameters: REFERENCE_GENOME ( dict
ofBio.SeqRecord
bystr
) – dict of reference sequence by template/chr name
-
transcript
¶ Transcript
– the spliced transcript this translation belongs to
-
mavis.annotate.protein.
calculate_ORF
(spliced_cdna_sequence, min_orf_size=None)[source]¶ calculate all possible open reading frames given a spliced cdna sequence (no introns)
Parameters: spliced_cdna_sequence (str) – the sequence Returns: list of open reading frame positions on the input sequence Return type: list
ofInterval