main module¶
-
mavis.cluster.main.
main
(inputs, output, strand_specific, library, protocol, disease_status, masking, annotations, limit_to_chr=['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', 'X', 'Y'], cluster_initial_size_limit=25, cluster_radius=100, uninformative_filter=True, max_proximity=5000, min_clusters_per_file=50, max_files=200, log_args=False, batch_id=None, split_only=False, start_time=1512085692, **kwargs)[source]¶ Parameters: - inputs (
List
ofstr
) – list of input files to read - output (str) – path to the output directory
- strand_specific (bool) – is the bam using a strand specific protocol
- library (str) – the library to look for in each of the input files
- protocol (PROTOCOL) – the sequence protocol (genome or transcriptome)
- masking (object) – see
load_masking_regions()
- cluster_clique_size (int) – the maximum size of cliques to search for using the exact algorithm
- cluster_radius (int) – distance (in breakpoint pairs) used in deciding to join bpps in a cluster
- uninformative_filter (bool) – if True then clusters should be filtered out if they are not within a specified (max_proximity) distance to any annotation
- max_proximity (int) – the maximum distance away an annotation can be before the uninformative_filter is applied
- annotations (object) – see
load_reference_genes()
- min_clusters_per_file (int) – the minimum number of clusters to output to a file
- max_files (int) – the maximum number of files to split clusters into
- inputs (
-
mavis.cluster.main.
split_clusters
(clusters, outputdir, batch_id, min_clusters_per_file=0, max_files=1, write_bed_summary=True)[source]¶ For a set of clusters creates a bed file representation of all clusters. Also splits the clusters evenly into multiple files based on the user parameters (min_clusters_per_file, max_files)
Returns: of output file names (not including the bed file) Return type: list