Software Open Access

ekzhu/datasketch: Improved performance for MinHash and MinHashLSH

Eric Zhu; Vadim Markovtsev; aastafiev; Wojciech Łukasiewicz; ae-foster; Jordan Martin; Ekevoo; Kevin Mann; Keyur Joshi; Spandan Thakur; Stefano Ortolani; Titusz; Vojtech Letal; Zac Bentley; fpug

  • Performance improvement for MinHash's update method.
  • Make MinHash updates 4.5X faster by using update_batch method for bulk update on MinHash. [See API doc].(
  • Further performance gain by using bulk generation of MinHash using MinHash.bulk or MinHash.generator. See API doc and pull request.
  • Optional compression for MinHash LSH index by hashing the bucket key produced by MinHashLSH._H. See pull request. This leads to saving of memory/storage space used by the index.

Thank you @Sinusoidal36!

Files (795.7 kB)
Name Size
795.7 kB Download
All versions This version
Views 1,49143
Downloads 2292
Data volume 277.1 MB1.6 MB
Unique views 1,29140
Unique downloads 1122


Cite as