cache module

class mavis.bam.cache.BamCache(bamfile, stranded=False)[source]

Bases: object

caches reads by name to facilitate getting read mates without jumping around the file if we’ve already read that section

Parameters:bamfile (str) – path to the input bam file
add_read(read)[source]
Parameters:read (pysam.AlignedSegment) – the read to add to the cache
chr(read)[source]
Parameters:read (pysam.AlignedSegment) – the read we want the chromosome name for
Returns:the name of the chromosome
Return type:str
close()[source]

close the bam file handle

fetch(chrom, start, stop, read_limit=10000, cache=False, sample_bins=3, cache_if=<function BamCache.<lambda>>, bin_gap_size=0, filter_if=<function BamCache.<lambda>>)[source]

wrapper around the fetch method, returns a list to avoid errors with changing the file pointer position from within the loop. Also caches reads if requested and can return a limited read number

Parameters:
  • chrom (str) – the chromosome
  • start (int) – the start position
  • stop (int) – the end position
  • read_limit (int) – the maximum number of reads to parse
  • cache (bool) – flag to store reads
  • sample_bins (int) – number of bins to split the region into
  • cache_if (callable) – function to check to against a read to determine if it should be cached
  • bin_gap_size (int) – gap between the bins for the fetch area
Returns:

set of reads gathered from the region

Return type:

set of pysam.AlignedSegment

get_mate(read, primary_only=True, allow_file_access=True)[source]
Parameters:
  • read (pysam.AlignedSegment) – the read
  • primary_only (bool) – ignore secondary alignments
  • allow_file_access (bool) – determines if the bam can be accessed to try to find the mate
Returns:

list of mates of the input read

Return type:

list of pysam.AlignedSegment

reference_id(chrom)[source]
Parameters:chrom (str) – the chromosome/reference name
Returns:the reference id corresponding to input chromosome name
Return type:int