cigar module¶
holds methods related to processing cigar tuples. Cigar tuples are generally an iterable list of tuples where the first element in each tuple is the CIGAR value (i.e. 1 for an insertion), and the second value is the frequency
-
mavis.bam.cigar.
alignment_matches
(cigar)[source]¶ counts the number of aligned bases irrespective of match/mismatch this is equivalent to counting all CIGAR.M
-
mavis.bam.cigar.
compute
(ref, alt, force_softclipping=True, min_exact_to_stop_softclipping=6)[source]¶ given a ref and alt sequence compute the cigar string representing the alt
returns the cigar tuples along with the start position of the alt relative to the ref
-
mavis.bam.cigar.
convert_for_igv
(cigar)[source]¶ igv does not support the extended CIGAR values for match v mismatch
Example
>>> convert_for_igv([(7, 4), (8, 1), (7, 5)]) [(0, 10)]
-
mavis.bam.cigar.
extend_softclipping
(cigar, min_exact_to_stop_softclipping)[source]¶ given some input cigar, extends softclipping if there are mismatches/insertions/deletions close to the end of the aligned portion. The stopping point is defined by the min_exact_to_stop_softclipping parameter. this function will throw an error if there is no exact match aligned portion to signal stop
Parameters: Returns: Return type:
-
mavis.bam.cigar.
hgvs_standardize_cigar
(read, reference_seq)[source]¶ extend alignments as long as matches are possible. call insertions before deletions
-
mavis.bam.cigar.
join
(*pos)[source]¶ given a number of cigar lists, joins them and merges any consecutive tuples with the same cigar value
Example
>>> join([(1, 1), (4, 7)], [(4, 3), (2, 4)]) [(1, 1), (4, 10), (2, 4)]
-
mavis.bam.cigar.
longest_exact_match
(cigar)[source]¶ returns the longest consecutive exact match
Parameters: cigar ( list
oftuple
ofint
andint
) – the cigar tuples
-
mavis.bam.cigar.
longest_fuzzy_match
(cigar, max_fuzzy_interupt=1)[source]¶ computes the longest sequence of exact matches allowing for ‘x’ event interrupts
Parameters: - cigar – cigar tuples
- max_fuzzy_interupt (int) – number of mismatches allowed
-
mavis.bam.cigar.
match_percent
(cigar)[source]¶ calculates the percent of aligned bases (matches or mismatches) that are matches
-
mavis.bam.cigar.
merge_internal_events
(cigar, inner_anchor=10, outer_anchor=10)[source]¶ merges events (insertions, deletions, mismatches) within a cigar if they are between exact matches on either side (anchors) and separated by less exact matches than the given parameter
Parameters: Returns: new list of cigar tuples with merged events
Return type: Example
>>> merge_internal_events([(CIGAR.EQ, 10), (CIGAR.X, 1), (CIGAR.EQ, 2), (CIGAR.D, 1), (CIGAR.EQ, 10)]) [(CIGAR.EQ, 10), (CIGAR.I, 3), (CIGAR.D, 4), (CIGAR.EQ, 10)]
-
mavis.bam.cigar.
recompute_cigar_mismatch
(read, ref)[source]¶ for cigar tuples where M is used, recompute to replace with X/= for increased utility and specificity
Parameters: - read (pysam.AlignedSegment) – the input read
- ref (str) – the reference sequence
Returns: the cigar tuple
Return type:
-
mavis.bam.cigar.
score
(cigar, **kwargs)[source]¶ scoring based on sw alignment properties with gap extension penalties
Parameters: Returns: the score value
Return type:
-
mavis.bam.cigar.
smallest_nonoverlapping_repeat
(s)[source]¶ for a given string returns the smallest substring that is a repeat consuming the entire string
Example
>>> smallest_nonoverlapping_repeat('ATATATA') 'ATATATA' >>> smallest_nonoverlapping_repeat('ATATAT') 'AT' >>> smallest_nonoverlapping_repeat('CCCCCCCC') 'C'