ntCard is a streaming algorithm for estimating the frequencies of k-mers in genomics datasets. At its core, ntCard uses the ntHash algorithm to efficiently compute hash values for streamed sequences. It then samples the calculated hash values to build a reduced representation multiplicity table describing the sample distribution. Finally, it uses a statistical model to reconstruct the population distribution from the sample distribution.
Publications
-
Hamid Mohamadi, Hamza Khan, and Inanc Birol. ntCard: a streaming algorithm for cardinality estimation in genomics data. Bioinformatics (2017) 33 (9): 1324-1330. 10.1093/bioinformatics/btw832
- Hamid Mohamadi, Justin Chu, Benjamin P Vandervalk, and Inanc Birol. ntHash: recursive nucleotide hashing. Bioinformatics (2016) 32 (22): 3492-3494. doi:10.1093/bioinformatics/btw397
Current Release
GitHub release page for ntCard
All Releases
Version | Released | Description | Licenses | Status |
---|---|---|---|---|
1.0.2 | Sep 04, 2018 | Higher periodicity ntHash | BSD | final |
1.0.1 | Jan 29, 2018 | Change License to MIT License Fixing bugs and improving ops | BSD | final |
1.0.0 | Jan 11, 2017 | See ntCard GitHub page for details | GPLv3 for non-commercial usage | final |