Birol lab

The Bioinformatics Technology Lab at Canada's Michael Smith Genome Sciences Centre at BC Cancer is a computational biology research group. We develop novel algorithms, data structures and genome analysis software and offer a complete and scalable solution for de novo genome assembly. The bioinformatics tools we build find applications in cancer research, and are the foundation of our genome research program.

Click here to learn more about the Bioinformatics Technology Lab



We are located at Canada's Michael Smith Genome Sciences Centre, Echelon Technology Platform.

570 West 7th Avenue 
Vancouver, British Columbia 
V5Z 4S6 


Short read sequencing

Dr. Birol's lab builds high throughput analysis methods to process large volumes of reads in diverse DNA sequencing projects, from high profile international cancer genome mapping initiatives to the generation of reference genomes of non-model species.

Selected Publications

ntEdit: scalable genome sequence polishing.

Bioinformatics (Oxford, England), 2019
Warren, René L, Coombe, Lauren, Mohamadi, Hamid, Zhang, Jessica, Jaquish, Barry, Isabel, Nathalie, Jones, Steven J M, Bousquet, Jean, Bohlmann, Joerg, Birol, Inanç
In the modern genomics era, genome sequence assemblies are routine practice. However, depending on the methodology, resulting drafts may contain considerable base errors. Although utilities exist for genome base polishing, they work best with high read coverage and do not scale well. We developed ntEdit, a Bloom filter-based genome sequence editing utility that scales to large mammalian and conifer genomes.

ORCA: a comprehensive bioinformatics container environment for education and research.

Bioinformatics (Oxford, England), 2019
Jackman, Shaun D, Mozgacheva, Tatyana, Chen, Susie, O'Huiginn, Brendan, Bailey, Lance, Birol, Inanc, Jones, Steven J M
The ORCA bioinformatics environment is a Docker image that contains hundreds of bioinformatics tools and their dependencies. The ORCA image and accompanying server infrastructure provide a comprehensive bioinformatics environment for education and research. The ORCA environment on a server is implemented using Docker containers, but without requiring users to interact directly with Docker, suitable for novices who may not yet have familiarity with managing containers. ORCA has been used successfully to provide a private bioinformatics environment to external collaborators at a large genome institute, for teaching an undergraduate class on bioinformatics targeted at biologists, and to provide a ready-to-go bioinformatics suite for a hackathon. Using ORCA eliminates time that would be spent debugging software installation issues, so that time may be better spent on education and research.

Complete Chloroplast Genome Sequence of an Engelmann Spruce (, Genotype Se404-851) from Western Canada.

Microbiology resource announcements, 2019
Lin, Diana, Coombe, Lauren, Jackman, Shaun D, Gagalova, Kristina K, Warren, René L, Hammond, S Austin, McDonald, Helen, Kirk, Heather, Pandoh, Pawan, Zhao, Yongjun, Moore, Richard A, Mungall, Andrew J, Ritland, Carol, Doerksen, Trevor, Jaquish, Barry, Bousquet, Jean, Jones, Steven J M, Bohlmann, Joerg, Birol, Inanc
Engelmann spruce () is a conifer found primarily on the west coast of North America. Here, we present the complete chloroplast genome sequence of genotype Se404-851. This chloroplast sequence will benefit future conifer genomic research and contribute resources to further species conservation efforts.

Complete Chloroplast Genome Sequence of a White Spruce (Picea glauca, Genotype WS77111) from Eastern Canada.

Microbiology resource announcements, 2019
Lin, Diana, Coombe, Lauren, Jackman, Shaun D, Gagalova, Kristina K, Warren, René L, Hammond, S Austin, Kirk, Heather, Pandoh, Pawan, Zhao, Yongjun, Moore, Richard A, Mungall, Andrew J, Ritland, Carol, Jaquish, Barry, Isabel, Nathalie, Bousquet, Jean, Jones, Steven J M, Bohlmann, Joerg, Birol, Inanc
Here, we present the complete chloroplast genome sequence of white spruce (, genotype WS77111), a coniferous tree widespread in the boreal forests of North America. This sequence contributes to genomic and phylogenetic analyses of the genus that are part of ongoing research to understand their adaptation to environmental stress.

Antimicrobial peptides from Rana [Lithobates] catesbeiana: Gene structure and bioinformatic identification of novel forms from tadpoles.

Scientific reports, 2019
Helbing, Caren C, Hammond, S Austin, Jackman, Shireen H, Houston, Simon, Warren, René L, Cameron, Caroline E, Birol, Inanç
Antimicrobial peptides (AMPs) exhibit broad-spectrum antimicrobial activity, and have promise as new therapeutic agents. While the adult North American bullfrog (Rana [Lithobates] catesbeiana) is a prolific source of high-potency AMPs, the aquatic tadpole represents a relatively untapped source for new AMP discovery. The recent publication of the bullfrog genome and transcriptomic resources provides an opportune bridge between known AMPs and bioinformatics-based AMP discovery. The objective of the present study was to identify novel AMPs with therapeutic potential using a combined bioinformatics and wet lab-based approach. In the present study, we identified seven novel AMP precursor-encoding transcripts expressed in the tadpole. Comparison of their amino acid sequences with known AMPs revealed evidence of mature peptide sequence conservation with variation in the prepro sequence. Two mature peptide sequences were unique and demonstrated bacteriostatic and bactericidal activity against Mycobacteria but not Gram-negative or Gram-positive bacteria. Nine known and seven novel AMP-encoding transcripts were detected in premetamorphic tadpole back skin, olfactory epithelium, liver, and/or tail fin. Treatment of tadpoles with 10 nM 3,5,3'-triiodothyronine for 48 h did not affect transcript abundance in the back skin, and had limited impact on these transcripts in the other three tissues. Gene mapping revealed considerable diversity in size (1.6-15 kbp) and exon number (one to four) of AMP-encoding genes with clear evidence of alternative splicing leading to both prepro and mature amino acid sequence diversity. These findings verify the accuracy and utility of the bullfrog genome assembly, and set a firm foundation for bioinformatics-based AMP discovery.

The Genome of the North American Brown Bear or Grizzly: Ursus arctos ssp. horribilis.

Genes, 2018
Taylor, Gregory A, Kirk, Heather, Coombe, Lauren, Jackman, Shaun D, Chu, Justin, Tse, Kane, Cheng, Dean, Chuah, Eric, Pandoh, Pawan, Carlsen, Rebecca, Zhao, Yongjun, Mungall, Andrew J, Moore, Richard, Birol, Inanc, Franke, Maria, Marra, Marco A, Dutton, Christopher, Jones, Steven J M
The grizzly bear ( ssp. ) represents the largest population of brown bears in North America. Its genome was sequenced using a microfluidic partitioning library construction technique, and these data were supplemented with sequencing from a nanopore-based long read platform. The final assembly was 2.33 Gb with a scaffold N50 of 36.7 Mb, and the genome is of comparable size to that of its close relative the polar bear (2.30 Gb). An analysis using 4104 highly conserved mammalian genes indicated that 96.1% were found to be complete within the assembly. An automated annotation of the genome identified 19,848 protein coding genes. Our study shows that the combination of the two sequencing modalities that we used is sufficient for the construction of highly contiguous reference quality mammalian genomes. The assembled genome sequence and the supporting raw sequence reads are available from the NCBI (National Center for Biotechnology Information) under the bioproject identifier PRJNA493656, and the assembly described in this paper is version QXTK01000000.

Tigmint: correcting assembly errors using linked reads from large molecules.

BMC bioinformatics, 2018
Jackman, Shaun D, Coombe, Lauren, Chu, Justin, Warren, Rene L, Vandervalk, Benjamin P, Yeo, Sarah, Xue, Zhuyi, Mohamadi, Hamid, Bohlmann, Joerg, Jones, Steven J M, Birol, Inanc
Genome sequencing yields the sequence of many short snippets of DNA (reads) from a genome. Genome assembly attempts to reconstruct the original genome from which these reads were derived. This task is difficult due to gaps and errors in the sequencing data, repetitive sequence in the underlying genome, and heterozygosity. As a result, assembly errors are common. In the absence of a reference genome, these misassemblies may be identified by comparing the sequencing data to the assembly and looking for discrepancies between the two. Once identified, these misassemblies may be corrected, improving the quality of the assembled sequence. Although tools exist to identify and correct misassemblies using Illumina paired-end and mate-pair sequencing, no such tool yet exists that makes use of the long distance information of the large molecules provided by linked reads, such as those offered by the 10x Genomics Chromium platform. We have developed the tool Tigmint to address this gap.

TAP: a targeted clinical genomics pipeline for detecting transcript variants using RNA-seq data.

BMC medical genomics, 2018
Chiu, Readman, Nip, Ka Ming, Chu, Justin, Birol, Inanc
RNA-seq is a powerful and cost-effective technology for molecular diagnostics of cancer and other diseases, and it can reach its full potential when coupled with validated clinical-grade informatics tools. Despite recent advances in long-read sequencing, transcriptome assembly of short reads remains a useful and cost-effective methodology for unveiling transcript-level rearrangements and novel isoforms. One of the major concerns for adopting the proven de novo assembly approach for RNA-seq data in clinical settings has been the analysis turnaround time. To address this concern, we have developed a targeted approach to expedite assembly and analysis of RNA-seq data.

Recurrent tumor-specific regulation of alternative polyadenylation of cancer-related genes.

BMC genomics, 2018
Xue, Zhuyi, Warren, René L, Gibb, Ewan A, MacMillan, Daniel, Wong, Johnathan, Chiu, Readman, Hammond, S Austin, Yang, Chen, Nip, Ka Ming, Ennis, Catherine A, Hahn, Abigail, Reynolds, Sheila, Birol, Inanc
Alternative polyadenylation (APA) results in messenger RNA molecules with different 3' untranslated regions (3' UTRs), affecting the molecules' stability, localization, and translation. APA is pervasive and implicated in cancer. Earlier reports on APA focused on 3' UTR length modifications and commonly characterized APA events as 3' UTR shortening or lengthening. However, such characterization oversimplifies the processing of 3' ends of transcripts and fails to adequately describe the various scenarios we observe.


Readman Chiu

Bioinformatics Coordinator

Dr. Anat Yani

Research Associate

Lauren Coombe

Assistant Bioinformatics Coordinator

Vladimir Nikolic

Research Programmer

Johnathan Wong

Research Programmer


Amirhossein Afshinfard

Graduate Student

Kristina Gagalova

Graduate Student

Talha Goktas

Graduate Student

Saber Hafezqorani

Graduate Student

Chenkai Li

Graduate Student

Janet Xin Li

Graduate Student

Diana Lin

Graduate Student

Theodora Lo

Graduate Student

Ka Ming Nip

Graduate Student

Darcy Sutherland

Graduate Student

Kristina Wright

Graduate Student

Chen Yang

Graduate Student

Cecilia Yang

Student Researcher
Back to top