biofx.annotator package

Submodules

biofx.annotator.GSCannotation module

Module to retrieve annotations from GSC resources. @cchng

Created in January 2015.

class biofx.annotator.GSCannotation.Ensembl(resource='/projects/yshen_prj/db/Ensembl_69_transcripts.txt')[source]

Bases: object

Legacy resource. Avoid if possible.

get_ensembl_gene_id(enst)[source]
get_ensembl_transcript_id(ensg)[source]
class biofx.annotator.GSCannotation.Hugo(resource='/projects/yshen_prj/db/hugo_genenames_08Mar2012.txt')[source]

Bases: object

Legacy resource. Avoid if possible.

get_gene_name(symbol)[source]
class biofx.annotator.GSCannotation.WTSSGeneExon(ensg2genesymbol, resource='/projects/wtsspipeline/resources/Homo_sapiens/bfa_NCBI-37-TCGA/transcript_coverage/ens69_mito_as_MT_no_LRG_genes/gene_exon.txt')[source]

Bases: object

Information from WTSS gene_exon.txt file

get_map()[source]
Returns:map of gene symbol to exon coordinates
Return type:dict
class biofx.annotator.GSCannotation.WTSSGeneInfo(resource='/projects/wtsspipeline/resources/Homo_sapiens/bfa_NCBI-37-TCGA/transcript_coverage/ens69_mito_as_MT_no_LRG_genes/gene_info.txt')[source]

Bases: object

Parse and retrieve information from WTSS gene_info.txt file

Parameters:resource (string) – path to gene_info.txt file in WTSS formatting
Raises:IOError – resource file provided does not exist
get_ensg2symbol()[source]
Returns:a mapping with ensembl gene ID as keys and gene symbol as values
Return type:dict
Raises:RuntimeError – Something went wrong with populating mappings
get_ensg_from_enst(enst)[source]

Get Ensembl gene ID from Ensembl transcript ID.

Parameters:enst (string) – Ensembl transcript ID
Returns:Ensembl gene ID. NA if Ensembl transcript ID not found
Return type:string
get_ensg_from_symbol(symbol)[source]

Get Ensembl gene ID from gene symbol. One to many mapping.

Parameters:symbol (string) – Gene symbol
Returns:Ensembl gene ID. comma-delimited if multiple. “NA” symbol not found.
Return type:string
get_enst2ensg()[source]
Returns:a mapping with ensembl transcript ID as keys and ensembl gene ID as values
Return type:dict
Raises:RuntimeError – Something went wrong with populating mappings
get_gene_symbol(ensg)[source]

Get gene symbol from Ensembl gene ID

Parameters:ensg (string) – Ensembl gene ID
Returns:gene symbol. NA if gene symbol not found
Return type:(string)
get_strand_from_gene_symbol(gene_symbol)[source]
Parameters:gene_symbol (string) – gene symbol
Returns:strand. NA if gene symbol not found
Return type:string
get_symbol2ensg()[source]
Returns:a mapping with gene symbol as keys and ensembl gene ID as values
Return type:dict
Raises:RuntimeError – Something went wrong with populating mappings

biofx.annotator.VariantAnnotation module

@cchng

class biofx.annotator.VariantAnnotation.SnpEff(java_mem='Xmx4g', version=None)[source]

Bases: object

A wrapper for snpeff/snpsift 4.1 excecution.

annotate_with_snpeff(input_file, output_file, genome, snpeff=None)[source]

Annotation input with snpeff. Runs classic and hgvs annotations concurrently.

Parameters:
  • input_file (string) – input vcf file path
  • output_file (string) – output vcf file path. hgvs output has an hgvs suffix appended
  • genome (string) – genome used for snpeff
  • snpeff (string) – snpeff executable
Returns:

list of return values

Return type:

revals (list)

annotate_with_snpsift(input_file, output_file, annotate, snpsift=None)[source]

Annotate input with snpsift.

Parameters:
  • input_file (string) – input vcf file path
  • output_file (string) – output vcf file path
  • annotate (string) – vcf file used for snpsift annotation
  • snpsift (string) – snpsift executable
Returns:

return value

Return type:

reval (int)

Module contents