In order to diagnose a cancer, pathologists use a variety of techniques to analyze the features of a tissue biopsy. This works best when the specimen being analyzed is of high quality, contains many cells and has clearly identifiable features. Unfortunately, this is not always the case, and rare cancers are especially difficult to diagnose using traditional diagnostic methods.
Cancers are diseases of the genome. Imagine if, armed with the genetic data from the genes of tens of thousands of other consenting cancer patients, we could train computers to provide fast and accurate diagnoses?
A new study led by scientists at BC Cancer’s Genome Sciences Centre and published in the Journal of the American Medical Association (JAMA) Network Open demonstrates that, yes, computers can provide cancer diagnosis with precision, including for cases that had previously failed human assessment.
This paper shows a novel use of whole-genome sequencing and machine learning techniques—specifically, measuring expression of all of the genes in the genome—to provide a cancer diagnosis with quantifiable confidence.
“Our analysis highlights the progress machine learning approaches have made in fields previously considered to be the domain of highly skilled human expertise,” says Dr. Steve Jones, co-director of Canada’s Michael Smith Genome Sciences Centre and principal investigator for the study. “It also demonstrates where computational approaches can not only augment but improve upon clinical decision making.”
In this study, scientists trained computers to look across 17,688 genes in the human genome and generate a diagnosis, with a confidence score, out of a set of 40 different cancer types. They found that the method had approximately 99 per cent accuracy in identifying cancers with mixed tissue types, and had a success rate of 80 to 86 per cent in the most challenging cases— cancers of unknown origin and advanced cancers—that had already failed human assessment or were extremely difficult to diagnose by a human expert.
While gene panels looking across smaller gene subsets are already available to help pathologists make more accurate cancer diagnoses, their application is restricted to commonly occurring cancers. Rarer cancers, metastatic cancers, and cancers of unknown origin are all challenging to diagnose using traditional pathology and through gene panels.
“This illustrates that there is huge potential to develop interpretable machine-learning methods using the entirety of sequencing data,” says Jasleen Grewal, lead author on the study and a graduate student in Dr. Jones’ lab. “Algorithms that incorporate this high-resolution data as a whole to provide insights into cancers can serve as a powerful means of analysis and decision-making.”
Machine learning methods are only as good as the data they have to train on. Efforts are needed to properly curate and sequence rare and advanced cancers so that scientists can better incorporate and improve pathologist’s ability to diagnose them. Future research will examine the ability to leverage genomic data for other manually-driven cancer analysis tasks, such as alignment with appropriate therapies.
“For me, the most interesting finding will be when we start to dig into what these algorithms are learning about cancer,” says Grewal. “I’m interested to learn whether machine learning at this scale can be used to uncover subtleties and facets about cancer that have been eluding us."
Article: Grewal, J et al. (2019) Application of a Neural Network Whole Transcriptome-Based Pan-Cancer Method for Diagnosis of Primary and Metastatic Cancers. JAMA Network Open. DOI: 10.1001/jamanetworkopen.109.2597
Authors: Jasleen K Grewal, Basile Tessier-Cloutier, Martin Jones, Sitanshu Gakkhar, Yussanne Ma, Richard Moore, Andrew J Mungall, Yongjun Zhao, Michael D Taylor, Karen Gelmon, Howard Lim, Daniel Renouf, Janessa Laskin, Marco Marra, Stephen Yip, Steven JM Jones
Funding: BC Cancer Foundation, Genome British Columbia, Genome Canada, Canada Foundation for Innovation, BC Knowledge Development Fund
Data: dbGAP: The Cancer Genome Atlas managed by the National Cancer Institute and National Human Genome Research Institute