breakpoint module¶
-
class
mavis.breakpoint.
Breakpoint
(chr, start, end=None, orient='?', strand='?', seq=None)[source]¶ Bases:
mavis.interval.Interval
class for storing information about a SV breakpoint coordinates are given as 1-indexed
Parameters: Examples
>>> Breakpoint('1', 1, 2) >>> Breakpoint('1', 1) >>> Breakpoint('1', 1, 2, '+', 'R') >>> Breakpoint('1', 1, orient='R')
-
key
¶
-
-
class
mavis.breakpoint.
BreakpointPair
(b1, b2, stranded=False, opposing_strands=None, untemplated_seq=None, data={})[source]¶ Bases:
object
Parameters: - b1 (Breakpoint) – the first breakpoint
- b2 (Breakpoint) – the second breakpoint
- stranded (bool) – if not stranded then +/- is equivalent to -/+
- opposing_strands (bool) – are the strands at the breakpoint opposite? i.e. +/- instead of +/+
- untemplated_seq (str) – seq between the breakpoints that is not part of either breakpoint
- data (dict) – optional dictionary of attributes associated with this pair
Note
untemplated_seq should always be given wrt to the positive/forward reference strand
Example
>>> BreakpointPair(Breakpoint('1', 1), Breakpoint('1', 9999), opposing_strands=True) >>> BreakpointPair(Breakpoint('1', 1, strand='+'), Breakpoint('1', 9999, strand='-'))
-
breakpoint_sequence_homology
(REFERENCE_GENOME)[source]¶ for a given set of breakpoints matches the sequence opposite the partner breakpoint this sequence comparison is done with reference to a reference genome and does not use novel or untemplated sequence in the comparison. For this reason, insertions will never return any homologous sequence
small duplication event CTT => CTTCTT GATACATTTCTTCTTGAAAA reference ---------<========== first breakpoint ===========>-------- second breakpoint ---------CT-CT------ first break homology -------TT-TT-------- second break homology
Parameters: REFERENCE_GENOME ( dict
ofBio.SeqRecord
bystr
) – dict of reference sequence by template/chr nameReturns: Return type: tuple Raises: AttributeError
– for non specific breakpoints
-
classmethod
call_breakpoint_pair
(read1, read2=None, REFERENCE_GENOME=None)[source]¶ calls a set of breakpoints from a single or a pair of pysam style read(s)
Parameters: - read1 (pysam.AlignedSegment) – the first read
- read2 (pysam.AlignedSegment) – the second read
Returns: the newly called breakpoint pair from the contig
Return type: Todo
return multiple events not just the major event
-
classmethod
classify
(pair)[source]¶ uses the chr, orientations and strands to determine the possible structural_variant types that this pair could support
Parameters: pair (BreakpointPair) – the pair to classify Returns: a list of possible SVTYPE Return type: list
ofSVTYPE
Example
>>> bpp = BreakpointPair(Breakpoint('1', 1), Breakpoint('1', 9999), opposing_strands=True) >>> BreakpointPair.classify(bpp) ['inversion'] >>> bpp = BreakpointPair(Breakpoint('1', 1, orient='L'), Breakpoint('1', 9999, orient='R'), opposing_strands=False) >>> BreakpointPair.classify(bpp) ['deletion', 'insertion']
-
mavis.breakpoint.
read_bpp_from_input_file
(filename, expand_ns=True, force_stranded=False, **kwargs)[source]¶ reads a file using the TSV module. Each row is converted to a breakpoint pair and other column data is stored in the data attribute
Parameters: filename (str) – path to the input file Returns: a list of pairs Return type: list
ofBreakpointPair
Example
>>> read_bpp_from_input_file('filename') [BreakpointPair(), BreakpointPair(), ...]
One can also validate other expected columns that will go in the data attribute using the usual arguments to the TSV.read_file function
>>> read_bpp_from_input_file('filename', cast={'index': int}) [BreakpointPair(), BreakpointPair(), ...]