base module

class mavis.validate.base.Evidence(break1, break2, bam_cache, reference_genome, read_length, stdev_fragment_size, median_fragment_size, stranded=False, opposing_strands=None, untemplated_seq=None, data={}, classification=None, **kwargs)[source]

Bases: mavis.breakpoint.BreakpointPair

Parameters:
  • breakpoint_pair (BreakpointPair) – the breakpoint pair to collect evidence for
  • bam_cache (BamCache) – the bam cache (and assc file) to collect evidence from
  • reference_genome (dict of Bio.SeqRecord by str) – dict of reference sequence by template/chr name
  • data (dict) – a dictionary of data to associate with the evidence object
  • classification (SVTYPE) – the event type
  • protocol (PROTOCOL) – genome or transcriptome
assemble_contig(log=<function devnull>)[source]

uses the split reads and the partners of the half mapped reads to create a contig representing the sequence across the breakpoints

if it is not strand specific then sequences are sorted alphanumerically and only the first of a pair is kept (paired by sequence)

collect_compatible_flanking_pair(read, mate, compatible_type)[source]

checks if a given read meets the minimum quality criteria to be counted as evidence as stored as support for this event

Parameters:
Returns:

  • True: the pair was collected and stored in the current evidence object
  • False: the pair was not collected

Return type:

bool

Raises:

ValueError – if the input reads are not a valid pair

see theory - types of flanking evidence

collect_flanking_pair(read, mate)[source]

checks if a given read meets the minimum quality criteria to be counted as evidence as stored as support for this event

Parameters:
Returns:

  • True: the pair was collected and stored in the current evidence object
  • False: the pair was not collected

Return type:

bool

Raises:

ValueError – if the input reads are not a valid pair

see theory - types of flanking evidence

collect_from_outer_window()[source]

determines if evidence should be collected from the outer window (looking for flanking evidence) or should be limited to the inner window (split/spanning/contig only)

Returns:True or False
Return type:bool
collect_half_mapped(read, mate)[source]
Parameters:
Returns:

  • True: the read was collected and stored in the current evidence object
  • False: the read was not collected

Return type:

bool

Raises:

AssertionError – if the mate is not unmapped

collect_spanning_read(read)[source]

spanning read: a read covering BOTH breakpoints

This is only applicable to small events. Do not need to look for soft clipped reads here since they will be collected already

Parameters:read (pysam.AlignedSegment) – the putative spanning read
Returns:
  • True: the read was collected and stored in the current evidence object
  • False: the read was not collected
Return type:bool
collect_split_read(read, first_breakpoint)[source]

adds a split read if it passes the criteria filters and raises a warning if it does not

Parameters:
  • read (pysam.AlignedSegment) – the read to add
  • first_breakpoint (bool) – add to the first breakpoint (or second if false)
Returns:

  • True: the read was collected and stored in the current evidence object
  • False: the read was not collected

Return type:

bool

Raises:

NotSpecifiedError – if the breakpoint orientation is not specified

compatible_window1

Interval – the window/region where it is expected to find reads in a compatible flanking pair (mate must be in compatible_window2)

see theory - calculating the evidence window

compatible_window2

Interval – the window/region where it is expected to find reads in a compatible flanking pair (mate must be in compatible_window1)

see theory - calculating the evidence window

compute_fragment_size(read, mate)[source]
Parameters:
Returns:

interval representing the range of possible fragment sizes for this read pair

Return type:

Interval

copy()[source]
decide_sequenced_strand(reads)[source]

given a set of reads, determines the sequenced strand (if possible) and then returns the majority strand found

Parameters:reads (set of pysam.AlignedSegment) – set of reads
Returns:the sequenced strand
Return type:STRAND
Raises:ValueError – input was an empty set or the ratio was not sufficient to decide on a strand
flatten()[source]
get_bed_repesentation()[source]
inner_window1

Interval – the window where evidence will be gathered for the first breakpoint

inner_window2

Interval – the window where evidence will be gathered for the second breakpoint

load_evidence(log=<function devnull>)[source]

open the associated bam file and read and store the evidence does some preliminary read-quality filtering

classmethod load_multiple(evidence_list, log=<function devnull>)[source]

loads evidence from the bam file for multiple evidence objects at once

Parameters:evidence (list of Evidence) – list of evidence objects to collect evidence for

Warning

this is not exactly equivalent to multiple calls of load_evidence because it does not keep a running total of reads between bins and cannot adjust dynamically. Effectively this means load_evidence may potentially collect more evidence

max_expected_fragment_size
min_expected_fragment_size
outer_window1

Interval – the window where evidence will be gathered for the first breakpoint

see theory - calculating the evidence window

outer_window2

Interval – the window where evidence will be gathered for the second breakpoint

see theory - calculating the evidence window

putative_event_types()[source]
Returns:list of the possible classifications
Return type:list of SVTYPE
standardize_read(read)[source]
supporting_reads()[source]

convenience method to return all flanking, split and spanning reads associated with an evidence object