cache module¶
-
class
mavis.bam.cache.
BamCache
(bamfile, stranded=False)[source]¶ Bases:
object
caches reads by name to facilitate getting read mates without jumping around the file if we’ve already read that section
Parameters: bamfile (str) – path to the input bam file -
add_read
(read)[source]¶ Parameters: read (pysam.AlignedSegment) – the read to add to the cache
-
fetch
(input_chrom, start, stop, limit=10000, cache_if=<function BamCache.<lambda>>, filter_if=<function BamCache.<lambda>>, stop_on_cached_read=False)[source]¶ Parameters: - input_chrom (str) – chromosome name
- start (int) – start position
- end (int) – end position
- limit (int) – maximum number of reads to fetch
- cache_if (function) – if returns True then the read is added to the cache
- filter_if (function) – if returns True then the read is not returned as part of the result
- stop_on_cached_read (bool) – stop reading at the first read found that is already in the cache
Note
the cache_if and filter_if functions must be any function that takes a read as input and returns a boolean
Returns: a set of reads which overlap the input region Return type: set of pysam.AlignedSegment
-
fetch_from_bins
(input_chrom, start, stop, read_limit=10000, cache=False, sample_bins=3, cache_if=<function BamCache.<lambda>>, min_bin_size=10, filter_if=<function BamCache.<lambda>>)[source]¶ wrapper around the fetch method, returns a list to avoid errors with changing the file pointer position from within the loop. Also caches reads if requested and can return a limited read number
Parameters: - chrom (str) – the chromosome
- start (int) – the start position
- stop (int) – the end position
- read_limit (int) – the maximum number of reads to parse
- cache (bool) – flag to store reads
- sample_bins (int) – number of bins to split the region into
- cache_if (callable) – function to check to against a read to determine if it should be cached
- bin_gap_size (int) – gap between the bins for the fetch area
Returns: set of reads gathered from the region
Return type:
-
get_mate
(read, primary_only=True, allow_file_access=False)[source]¶ Parameters: - read (pysam.AlignedSegment) – the read
- primary_only (bool) – ignore secondary alignments
- allow_file_access (bool) – determines if the bam can be accessed to try to find the mate
Returns: list of mates of the input read
Return type:
-
get_read_reference_name
(read)[source]¶ Parameters: read (pysam.AlignedSegment) – the read we want the chromosome name for Returns: the name of the chromosome Return type: str
-