ntCard is a streaming algorithm for estimating the frequencies of k-mers in genomics datasets. At its core, ntCard uses the ntHash algorithm to efficiently compute hash values for streamed sequences. It then samples the calculated hash values to build a reduced representation multiplicity table describing the sample distribution. Finally, it uses a statistical model to reconstruct the population distribution from the sample distribution.

Publications

  • Hamid Mohamadi, Hamza Khan, and Inanc Birol. ntCard: a streaming algorithm for cardinality estimation in genomics dataBioinformatics (2017) 33 (9): 1324-1330. 10.1093/bioinformatics/btw832

  • Hamid Mohamadi, Justin Chu, Benjamin P Vandervalk, and Inanc Birol. ntHash: recursive nucleotide hashingBioinformatics (2016) 32 (22): 3492-3494. doi:10.1093/bioinformatics/btw397

Current Release

GitHub release page for ntCard

All Releases

Version  Released  Description  Licenses  Status 
1.0.2 Sep 04, 2018 Higher periodicity ntHash BSD final
1.0.1 Jan 29, 2018 Change License to MIT License Fixing bugs and improving ops BSD final
1.0.0 Jan 11, 2017 See ntCard GitHub page for details GPLv3 for non-commercial usage final
Back to top