base module¶
-
class
mavis.validate.base.
Evidence
(break1, break2, bam_cache, REFERENCE_GENOME, read_length, stdev_fragment_size, median_fragment_size, stranded=False, opposing_strands=None, untemplated_seq=None, data={}, classification=None, **kwargs)[source]¶ Bases:
mavis.breakpoint.BreakpointPair
Parameters: - breakpoint_pair (BreakpointPair) – the breakpoint pair to collect evidence for
- bam_cache (BamCache) – the bam cache (and assc file) to collect evidence from
- REFERENCE_GENOME (
dict
ofBio.SeqRecord
bystr
) – dict of reference sequence by template/chr name - data (dict) – a dictionary of data to associate with the evidence object
- classification (SVTYPE) – the event type
- protocol (PROTOCOL) – genome or transcriptome
-
assemble_contig
(log=<function devnull>)[source]¶ uses the split reads and the partners of the half mapped reads to create a contig representing the sequence across the breakpoints
if it is not strand specific then sequences are sorted alphanumerically and only the first of a pair is kept (paired by sequence)
-
collect_compatible_flanking_pair
(read, mate, compatible_type)[source]¶ checks if a given read meets the minimum quality criteria to be counted as evidence as stored as support for this event
Parameters: - read (pysam.AlignedSegment) – the read to add
- mate (pysam.AlignedSegment) – the mate
- compatible_type (SVTYPE) – the type we are collecting for
Returns: - True: the pair was collected and stored in the current evidence object
- False: the pair was not collected
Return type: Raises: ValueError
– if the input reads are not a valid pair
-
collect_flanking_pair
(read, mate)[source]¶ checks if a given read meets the minimum quality criteria to be counted as evidence as stored as support for this event
Parameters: - read (pysam.AlignedSegment) – the read to add
- mate (pysam.AlignedSegment) – the mate
Returns: - True: the pair was collected and stored in the current evidence object
- False: the pair was not collected
Return type: Raises: ValueError
– if the input reads are not a valid pair
-
collect_from_outer_window
()[source]¶ determines if evidence should be collected from the outer window (looking for flanking evidence) or should be limited to the inner window (split/spanning/contig only)
Returns: True or False Return type: bool
-
collect_half_mapped
(read, mate)[source]¶ Parameters: - read (pysam.AlignedSegment) – the read to add
- mate (pysam.AlignedSegment) – the unmapped mate
Returns: - True: the read was collected and stored in the current evidence object
- False: the read was not collected
Return type: Raises: AssertionError
– if the mate is not unmapped
-
collect_spanning_read
(read)[source]¶ spanning read: a read covering BOTH breakpoints
This is only applicable to small events. Do not need to look for soft clipped reads here since they will be collected already
Parameters: read (pysam.AlignedSegment) – the putative spanning read Returns: - True: the read was collected and stored in the current evidence object
- False: the read was not collected
Return type: bool
-
collect_split_read
(read, first_breakpoint)[source]¶ adds a split read if it passes the criteria filters and raises a warning if it does not
Parameters: - read (pysam.AlignedSegment) – the read to add
- first_breakpoint (bool) – add to the first breakpoint (or second if false)
Returns: - True: the read was collected and stored in the current evidence object
- False: the read was not collected
Return type: Raises: NotSpecifiedError
– if the breakpoint orientation is not specified
-
compatible_window1
¶ Interval
– the window/region where it is expected to find reads in a compatible flanking pair (mate must be in compatible_window2)
-
compatible_window2
¶ Interval
– the window/region where it is expected to find reads in a compatible flanking pair (mate must be in compatible_window1)
-
compute_fragment_size
(read, mate)[source]¶ Parameters: - read (pysam.AlignedSegment) –
- mate (pysam.AlignedSegment) –
Returns: interval representing the range of possible fragment sizes for this read pair
Return type:
-
decide_sequenced_strand
(reads)[source]¶ given a set of reads, determines the sequenced strand (if possible) and then returns the majority strand found
Parameters: reads (set of pysam.AlignedSegment
) – set of readsReturns: the sequenced strand Return type: STRAND Raises: ValueError
– input was an empty set or the ratio was not sufficient to decide on a strand
-
load_evidence
(log=<function devnull>)[source]¶ open the associated bam file and read and store the evidence does some preliminary read-quality filtering
-
classmethod
load_multiple
(evidence, log=<function devnull>)[source]¶ loads evidence from the bam file for multiple evidence objects at once
Parameters: evidence (list of Evidence
) – list of evidence objects to collect evidence forWarning
this is not exactly equivalent to multiple calls of load_evidence because it does not keep a running total of reads between bins and cannot adjust dynamically. Effectively this means load_evidence may potentially collect more evidence
-
max_expected_fragment_size
¶
-
min_expected_fragment_size
¶