SimRVSequences: an R package to simulate genetic sequence data for pedigrees.

Bioinformatics (Oxford, England), 2020
Nieuwoudt, Christina, Brooks-Wilson, Angela, Graham, Jinko
We present the R package SimRVSequences to simulate sequence data for pedigrees. SimRVSequences allows for simulations of large numbers of single-nucleotide variants (SNVs) and scales well with increasing numbers of pedigrees. Users provide a sample of pedigrees and SNV data from a sample of unrelated individuals.

Sample Tracking Using Unique Sequence Controls.

The Journal of molecular diagnostics : JMD, 2020
Moore, Richard A, Zeng, Thomas, Docking, T Roderick, Bosdet, Ian, Butterfield, Yaron S, Munro, Sarah, Li, Irene, Swanson, Lucas, Starks, Elizabeth R, Tse, Kane, Mungall, Andrew J, Holt, Robert A, Karsan, Aly
Sample tracking and identity are essential when processing multiple samples in parallel. Sequencing applications often involve high sample numbers, and the data are frequently used in a clinical setting. As such, a simple and accurate intrinsic sample tracking process through a sequencing pipeline is essential. Various solutions have been implemented to verify sample identity, including variant detection at the start and end of the pipeline using arrays or genotyping, bioinformatic comparisons, and optical barcoding of samples. None of these approaches are optimal. To establish a more effective approach using genetic barcoding, we developed a panel of unique DNA sequences cloned into a common vector. A unique DNA sequence is added to the sample when it is first received and can be detected by PCR and/or sequencing at any stage of the process. The control sequences are approximately 200 bases long with low identity to any sequence in the National Center for Biotechnology Information nonredundant database (<30 bases) and contain no long homopolymer (>7) stretches. When a spiked next-generation sequencing library is sequenced, sequence reads derived from this control sequence are generated along with the standard sequencing run and are used to confirm sample identity and determine cross-contamination levels. This approach is used in our targeted clinical diagnostic whole-genome and RNA-sequencing pipelines and is an inexpensive, flexible, and platform-agnostic solution.
Back to top