base module¶

class mavis.validate.base.Evidence(break1, break2, bam_cache, REFERENCE_GENOME, read_length, stdev_fragment_size, median_fragment_size, stranded=False, opposing_strands=None, untemplated_seq=None, data={}, classification=None, **kwargs)[source]¶

Bases: mavis.breakpoint.BreakpointPair

Parameters:

breakpoint_pair (BreakpointPair) – the breakpoint pair to collect evidence for
bam_cache (BamCache) – the bam cache (and assc file) to collect evidence from
REFERENCE_GENOME (dict of Bio.SeqRecord by str) – dict of reference sequence by template/chr name
data (dict) – a dictionary of data to associate with the evidence object
classification (SVTYPE) – the event type
protocol (PROTOCOL) – genome or transcriptome

assemble_contig(log=<function devnull>)[source]¶

uses the split reads and the partners of the half mapped reads to create a contig representing the sequence across the breakpoints

if it is not strand specific then sequences are sorted alphanumerically and only the first of a pair is kept (paired by sequence)

collect_compatible_flanking_pair(read, mate, compatible_type)[source]¶

checks if a given read meets the minimum quality criteria to be counted as evidence as stored as support for this event

Parameters:	read (pysam.AlignedSegment) – the read to add mate (pysam.AlignedSegment) – the mate compatible_type (SVTYPE) – the type we are collecting for
Returns:	True: the pair was collected and stored in the current evidence object False: the pair was not collected
Return type:	bool
Raises:	`ValueError` – if the input reads are not a valid pair

see theory - types of flanking evidence

collect_flanking_pair(read, mate)[source]¶

checks if a given read meets the minimum quality criteria to be counted as evidence as stored as support for this event

Parameters:	read (pysam.AlignedSegment) – the read to add mate (pysam.AlignedSegment) – the mate
Returns:	True: the pair was collected and stored in the current evidence object False: the pair was not collected
Return type:	bool
Raises:	`ValueError` – if the input reads are not a valid pair

see theory - types of flanking evidence

collect_from_outer_window()[source]¶

determines if evidence should be collected from the outer window (looking for flanking evidence) or should be limited to the inner window (split/spanning/contig only)

Returns:	True or False
Return type:	bool

collect_half_mapped(read, mate)[source]¶

Parameters:	read (pysam.AlignedSegment) – the read to add mate (pysam.AlignedSegment) – the unmapped mate
Returns:	True: the read was collected and stored in the current evidence object False: the read was not collected
Return type:	bool
Raises:	`AssertionError` – if the mate is not unmapped

collect_spanning_read(read)[source]¶

spanning read: a read covering BOTH breakpoints

This is only applicable to small events. Do not need to look for soft clipped reads here since they will be collected already

Parameters:	read (pysam.AlignedSegment) – the putative spanning read
Returns:	True: the read was collected and stored in the current evidence object False: the read was not collected
Return type:	bool

collect_split_read(read, first_breakpoint)[source]¶

adds a split read if it passes the criteria filters and raises a warning if it does not

Parameters:	read (pysam.AlignedSegment) – the read to add first_breakpoint (bool) – add to the first breakpoint (or second if false)
Returns:	True: the read was collected and stored in the current evidence object False: the read was not collected
Return type:	bool
Raises:	`NotSpecifiedError` – if the breakpoint orientation is not specified

compatible_window1¶

Interval – the window/region where it is expected to find reads in a compatible flanking pair (mate must be in compatible_window2)

see theory - calculating the evidence window

compatible_window2¶

Interval – the window/region where it is expected to find reads in a compatible flanking pair (mate must be in compatible_window1)

see theory - calculating the evidence window

compute_fragment_size(read, mate)[source]¶

Parameters:	read (pysam.AlignedSegment) – mate (pysam.AlignedSegment) –
Returns:	interval representing the range of possible fragment sizes for this read pair
Return type:	Interval

copy()[source]¶

decide_sequenced_strand(reads)[source]¶

given a set of reads, determines the sequenced strand (if possible) and then returns the majority strand found

Parameters:	reads (set of `pysam.AlignedSegment`) – set of reads
Returns:	the sequenced strand
Return type:	STRAND
Raises:	`ValueError` – input was an empty set or the ratio was not sufficient to decide on a strand

flatten()[source]¶

get_bed_repesentation()[source]¶

inner_window1¶: Interval – the window where evidence will be gathered for the first breakpoint

inner_window2¶: Interval – the window where evidence will be gathered for the second breakpoint

load_evidence(log=<function devnull>)[source]¶: open the associated bam file and read and store the evidence does some preliminary read-quality filtering

classmethod load_multiple(evidence, log=<function devnull>)[source]¶

loads evidence from the bam file for multiple evidence objects at once

Parameters:	evidence (list of `Evidence`) – list of evidence objects to collect evidence for

Warning

this is not exactly equivalent to multiple calls of load_evidence because it does not keep a running total of reads between bins and cannot adjust dynamically. Effectively this means load_evidence may potentially collect more evidence

max_expected_fragment_size¶

min_expected_fragment_size¶

outer_window1¶

Interval – the window where evidence will be gathered for the first breakpoint

see theory - calculating the evidence window

outer_window2¶

Interval – the window where evidence will be gathered for the second breakpoint

see theory - calculating the evidence window

putative_event_types()[source]¶

Returns:	list of the possible classifications
Return type:	list of `SVTYPE`

standardize_read(read)[source]¶

supporting_reads()[source]¶: convenience method to return all flanking, split and spanning reads associated with an evidence object