biofx.variants package

Submodules

biofx.variants.Genes module

class biofx.variants.Genes.Gene[source]

Bases: object

class biofx.variants.Genes.Inheritance[source]

Bases: enum.Enum

An enumeration.

COMPLEX = 'COMPLEX'
DOMINANT = 'DOMINANT'
RECESSIVE = 'RECESSIVE'
UNKNOWN = 'UNKNOWN'
class biofx.variants.Genes.Zygosity[source]

Bases: enum.Enum

An enumeration.

HEMIZYGOUS = 'HEMIZYGOUS'
HETEROZYGOUS = 'HETEROZYGOUS'
HOMOZYGOUS = 'HOMOZYGOUS'
NOT_FOUND = 'NOT FOUND'
UNKNOWN = 'UNKNOWN'
biofx.variants.Genes.is_complex(symbol)[source]
biofx.variants.Genes.is_recessive(symbol)[source]

biofx.variants.attributes module

class biofx.variants.attributes.Zygosity[source]

Bases: enum.Enum

An enumeration.

HEMIZYGOUS = 'HEMIZYGOUS'
HETEROZYGOUS = 'HETEROZYGOUS'
HOMOZYGOUS = 'HOMOZYGOUS'
NOT_FOUND = 'NOT FOUND'
UNKNOWN = 'UNKNOWN'

biofx.variants.features module

Created: July, 2014

@author: Carolyn Ch’ng

Todo

think about whether logging should be part of module.

  • Variant SNV, Indel, CNV inherits Variant
class biofx.variants.features.CNV(chromosome, start, end)[source]

Bases: biofx.variants.features.Variant

Generic CNV class with basic chromosome, start, end attributes, as well as annotation attributes. Inherits from Variant

GAIN = 'copy gain'
LOSS = 'copy loss'
NEUTRAL = 'copy neutral'
get_type(ploidy=2)[source]

Infer copy number type

Parameters:ploidy (int) – ploidy or ploidy model for assessment. Default is 2 (human).
Raise:
ValueError:
class biofx.variants.features.Indel(chromosome, start, end=None)[source]

Bases: biofx.variants.features.Variant

Generic Indel class with basic chromosome, start, end attributes, as well as optional ref/alt alleles, annotation attributes. Inherits methods from Variant.

get_allele_frequencies(mpileup_record)[source]
Parameters:

mpileup_record (string) – output from Variants.get_variant_mpileup() or single mpileup record

Returns:

dictionary containing:

ref_count (int): reference allele count. 0 if empty record alt_count (int): alternate allele count. 0 if empty record

Return type:

dict

Raises:
  • ValueError – more than one record provided
  • ValueError – if reference and/or alternate allele not set

pileup format specifications

Note:

Reference counts for indels are currently obtained by subtracting alternative allele counts from total coverage. Usage of total coverage (self.coverage) is preferred over reference counts.

get_length()[source]

Get indel length.

Todo

think about definitions of start and end

Raises:ValueError – if reference and/or alternate allele not set
set_alt(alt)[source]

Set alternative allele. White spaces are stripped off.

Parameters:alt (string) – alt string
set_ref(ref)[source]
set_state(state)[source]
Parameters:state (string) – indel type, usually from genome validator
Raises:ValueError – unrecognized state
vt_normalize()[source]

Implementation of algorithm in http://genome.sph.umich.edu/wiki/Variant_Normalization.

Modified for samtools/bcftools output - left aligned but not parsimonous.

Raises:
  • AssertionError – normalization cannot be done if ref alt bases are not set
  • AssertionError – algorithm failed. shouldn’t get here. never observed.

Reference: http://genome.sph.umich.edu/wiki/Variant_Normalization

See also: http://bioinformatics.oxfordjournals.org/content/31/13/2202.full.

class biofx.variants.features.SNV(chromosome, start, end=None)[source]

Bases: biofx.variants.features.Variant

Generic SNV class with basic chromosome, start, end attributes, as well as optional ref/alt alleles, annotation attributes. Inherits from Variant

get_allele_frequencies(mpileup_record)[source]

Get allele frequencies for each base at a given position of this SNV. Currently supports samtools 0.1.18

Parameters:mpileup_record (string) – output from Variants.get_variant_mpileup() or single mpileup record
Returns:frequency of each base
Return type:base_frequencies (dict)
get_alt_count(**kwargs)[source]
Parameters:**kwargs – used kwargs: mpileup_params
Returns:alt base count
Return type:int
get_ref_count(**kwargs)[source]

Get reference base count.

Parameters:**kwargs – used kwargs: mpileup_params
Returns:ref base count
Return type:int
is_snp()[source]

True if is an SNV instance.

Returns:True on an SNV instance.
Return type:bool
set_alt(alt)[source]
Parameters:alt (string) – alt base
Raises:ValueError – alt length must be 1 and one of ATCG.
set_ref(ref)[source]
Parameters:ref (string) – reference base
Raises:ValueError – reference length must be 1 and one of ATCG.
class biofx.variants.features.Variant(chromosome, start, end=None)[source]

Bases: object

A base class for describing variants. Currently supports small mutations.

chromosome

string

start

int

end

int

ref

string – reference base

alt

string – alternative base

ref_count

int – reference base count

alt_count

int – alternative base count

coverage

int – total coverage

state

string – insertion, deletion, duplication etc.

misc

list

effects

string

snpeff_effects
identifier

string – external IDs associated with variant. Usually column 3 in vcf

provenance

string – if applicable, tool used to call variant

cosmic

string – cosmic ID

dbsnp

string – dbSNP ID

bamfile

string – path to bamfile where this variant came from

source

string – source ID where this variant came from

is_special

bool – True if special characters (not ATCG) in ref/alt base

homopolymer

bool – True if homopolymer

clnsig_values

string – bar delimited integers

cgl_pathogenicity

string – PATHOGENIC, VUS, BENIGN

info

? – vcf info

add_comment(comment)[source]

Adds a new comment to a list of comments

Parameters:comment (string) – comment string
add_misc(**kwargs)[source]

add miscellaneous info

Parameters:**kwargs – key word args for additional info.
get_chromosome_index()[source]

1-based

get_comments()[source]
Returns:semi-colon delimited comments
Return type:string
get_coordinates()[source]
get_key()[source]
get_pathogenicity()[source]
get_variant_info(data, samtools, fasta_file)[source]

Get read support and total coverage for variant.

Parameters:
  • self (Variant) – an instance of biofx.variants.Variants.Variant
  • data (dict) – dictionary with library ID as keys. Required nested keys: - sample_prefix - tumour_content - bam
  • samtools (basestring) – path to samtools executable
  • fasta_file (basestring) – path to fasta file
Returns:

a dictionary with library IDs as keys and attribute keywords

Return type:

variant_info (dict)

Raises:

AssertionError – each bam file in bams is assumed to come from a different library

get_variant_mpileup(bamfile, samtools='/gsc/software/linux-x86_64-centos6/samtools-0.1.19/samtools', reference='/projects/alignment_references/9606/hg19a/genome/bwa_32/hg19a.fa', mpileup_params=['--ff', '1540', '-BQ0'])[source]

Get mpileup record for variant.

Todo

add functionality for region? move executables to a configs module/file or something like that

Parameters:
  • bamfile (string) – path to bamfile
  • samtools (string) – samtools exectutable. Default: /gsc/software/linux-x86_64-centos6/samtools-0.1.19
  • reference (string) – path to reference fasta file. Default: /projects/alignment_references/9606/hg19a/genome/bwa_32/GRCh37-lite.fa
  • mpileup_params (list) – list of mpileup parameters
static has_pathogenicity_conflict(p1, p2)[source]

Check for pathogenicity conflict between two sources. :param p1: clinvar pathogenicity :type p1: string :param p2: cgl pathogenicity :type p2: string

Returns:True if has conflict
Return type:boolean
is_snp()[source]
normalize_splice_site()[source]

Checks hgvs effect type attributes and change value to a normalized splice site string if effect is a splice site.

set_allele_model(gmaf, maf_cutoff)[source]

Get allele model string from gmaf. TODO: enumerate this? :param gmaf: allele model gmaf :type gmaf: float :param maf_cutoff: Cut off for rare/common :type maf_cutoff: float

Returns:allele model string
Return type:allele_model (string)
set_alternative(eff_maps, model, to_string=True)[source]

Set alternative (not best/canonical) effect descriptions.

Parameters:
  • eff_maps (list) – a list of dictionaries
  • model (string) – transcript or gene
  • to_string
Raises:

ValueError – invalid model

set_alternative_genes(eff_maps, to_string=True)[source]

Todo

merge set alternative transcripts and set alternative genes

Parameters:
  • eff_maps
  • to_string

Returns:

set_alternative_transcripts(eff_maps, to_string=True)[source]

Todo

merge set alternative transcripts and set alternative genes

Parameters:
  • eff_maps (list) – a list of dictionaries
  • to_string

Returns:

set_cosmic(cosmic)[source]
Parameters:cosmic (string) – cosmic ID
Raises:ValueError – not a cosmic ID
set_dbsnp(dbsnp)[source]
Parameters:dbsnp (string) – dbSNP ID
Raises:ValueError – not a dbsnp ID
set_eff(eff_map, hgvs=True, classic=False)[source]
Parameters:
  • eff_map (dict) – a dictionary with snpeff annotation key values (See output of parse_effect)
  • hgvs (bool) – is hgvs eff_map (parse_effect with hgvs=True)
  • classic (bool) – is classic eff_map

Note: Set both hgvs and classic to True if it is a merged eff map from merge_eff_maps

set_effects(effects)[source]
Parameters:effects (str) – snpeff effect string
set_end(pos)[source]
set_id(identifier)[source]
set_provenance(provenance)[source]
set_start(pos)[source]
set_zygosity(genotype)[source]

Assign zygosity to variant.

Parameters:
  • v (Variant) – an instance of :class`biofx.variants.Variants`
  • genotype (string) – genotype from vcf record
Raises:

biofx.variants.operations module

biofx.variants.operations.chunk_variants_by_chromosome(infile, outdir='.')[source]

Split a file with chromosome positions into separate files grouped by chromosome.

Parameters:
  • infile (string) – input file path. a tab delimited file with chromosomes in column 1 and positions in column 2.
  • outdir (string) – output file directory.
Returns:

a dictionary of chromosome as keys and positions as values.

Return type:

var_by_chromosome (dict)

biofx.variants.operations.generate_mpileup(bamfile, variants, output=None, region=None, samtools='/gsc/software/linux-x86_64-centos6/samtools-0.1.19/samtools', reference='/projects/alignment_references/9606/hg19a/genome/bwa_32/hg19a.fa', mpileup_params=['--ff', '1540', '-BQ0'])[source]

Generate mpileup file for a set of variants.

Parameters:
  • bamfile (string) – path to bamfile
  • variants (string) – path to file containing list of positions (chr pos) or regions (BED)
  • output (string) – mpileup output file
  • region (string) – region in the format of chr:start-end
  • samtools (string) – samtools exectutable. Default: /gsc/software/linux-x86_64-centos6/samtools-0.1.19
  • reference (string) – path to reference fasta file.
  • mpileup_params (list) – list of mpileup parameters
Returns:

tuple containing:
  • (string): mpileup record
  • (string): mpileup command executed

Return type:

tuple

Module contents