cigar module

holds methods related to processing cigar tuples. Cigar tuples are generally an iterable list of tuples where the first element in each tuple is the CIGAR value (i.e. 1 for an insertion), and the second value is the frequency

mavis.bam.cigar.alignment_matches(cigar)[source]

counts the number of aligned bases irrespective of match/mismatch this is equivalent to counting all CIGAR.M

mavis.bam.cigar.compute(ref, alt, force_softclipping=True, min_exact_to_stop_softclipping=6)[source]

given a ref and alt sequence compute the cigar string representing the alt

returns the cigar tuples along with the start position of the alt relative to the ref

mavis.bam.cigar.convert_for_igv(cigar)[source]

igv does not support the extended CIGAR values for match v mismatch

Example

>>> convert_for_igv([(7, 4), (8, 1), (7, 5)])
[(0, 10)]
mavis.bam.cigar.convert_string_to_cigar(string)[source]
mavis.bam.cigar.extend_softclipping(cigar, min_exact_to_stop_softclipping)[source]

given some input cigar, extends softclipping if there are mismatches/insertions/deletions close to the end of the aligned portion. The stopping point is defined by the min_exact_to_stop_softclipping parameter. this function will throw an error if there is no exact match aligned portion to signal stop

Parameters:
  • original_cigar (list of CIGAR and int) – the input cigar
  • min_exact_to_stop_softclipping (int) – number of exact matches to terminate extension
Returns:

  • list of CIGAR and int - new cigar list
  • int - shift from the original start position

Return type:

tuple

mavis.bam.cigar.hgvs_standardize_cigar(read, reference_seq)[source]

extend alignments as long as matches are possible. call insertions before deletions

mavis.bam.cigar.join(*pos)[source]

given a number of cigar lists, joins them and merges any consecutive tuples with the same cigar value

Example

>>> join([(1, 1), (4, 7)], [(4, 3), (2, 4)])
[(1, 1), (4, 10), (2, 4)]
mavis.bam.cigar.longest_exact_match(cigar)[source]

returns the longest consecutive exact match

Parameters:cigar (list of tuple of int and int) – the cigar tuples
mavis.bam.cigar.longest_fuzzy_match(cigar, max_fuzzy_interupt=1)[source]

computes the longest sequence of exact matches allowing for ‘x’ event interrupts

Parameters:
  • cigar – cigar tuples
  • max_fuzzy_interupt (int) – number of mismatches allowed
mavis.bam.cigar.match_percent(cigar)[source]

calculates the percent of aligned bases (matches or mismatches) that are matches

mavis.bam.cigar.merge_indels(cigar)[source]
mavis.bam.cigar.merge_internal_events(cigar, inner_anchor=10, outer_anchor=10)[source]

merges events (insertions, deletions, mismatches) within a cigar if they are between exact matches on either side (anchors) and separated by less exact matches than the given parameter

Parameters:
  • cigar (list) – a list of tuples of cigar states and counts
  • inner_anchor (int) – minimum number of consecutive exact matches separating events
  • outer_anchor (int) – minimum consecutively aligned exact matches to anchor an end for merging
Returns:

new list of cigar tuples with merged events

Return type:

list

Example

>>> merge_internal_events([(CIGAR.EQ, 10), (CIGAR.X, 1), (CIGAR.EQ, 2), (CIGAR.D, 1), (CIGAR.EQ, 10)])
[(CIGAR.EQ, 10), (CIGAR.I, 3), (CIGAR.D, 4), (CIGAR.EQ, 10)]
mavis.bam.cigar.recompute_cigar_mismatch(read, ref)[source]

for cigar tuples where M is used, recompute to replace with X/= for increased utility and specificity

Parameters:
Returns:

the cigar tuple

Return type:

list of tuple of int and int

mavis.bam.cigar.score(cigar, **kwargs)[source]

scoring based on sw alignment properties with gap extension penalties

Parameters:
  • cigar (list of CIGAR and int) – list of cigar tuple values
  • MISMATCH (int) – mismatch penalty
  • MATCH (int) – match penalty
  • GAP (int) – initial gap penalty
  • GAP_EXTEND (int) – gap extension penalty
Returns:

the score value

Return type:

int

mavis.bam.cigar.smallest_nonoverlapping_repeat(s)[source]

for a given string returns the smallest substring that is a repeat consuming the entire string

Example

>>> smallest_nonoverlapping_repeat('ATATATA')
'ATATATA'
>>> smallest_nonoverlapping_repeat('ATATAT')
'AT'
>>> smallest_nonoverlapping_repeat('CCCCCCCC')
'C'