Software Open Access

ekzhu/datasketch: Improved performance for MinHash and MinHashLSH

Eric Zhu; Vadim Markovtsev; aastafiev; Wojciech Łukasiewicz; ae-foster; Jordan Martin; Ekevoo; Kevin Mann; Keyur Joshi; Spandan Thakur; Stefano Ortolani; Titusz; Vojtech Letal; Zac Bentley; fpug

  • Performance improvement for MinHash's update method.
  • Make MinHash updates 4.5X faster by using update_batch method for bulk update on MinHash. [See API doc].(
  • Further performance gain by using bulk generation of MinHash using MinHash.bulk or MinHash.generator. See API doc and pull request.
  • Optional compression for MinHash LSH index by hashing the bucket key produced by MinHashLSH._H. See pull request. This leads to saving of memory/storage space used by the index.

Thank you @Sinusoidal36!

Files (795.7 kB)
Name Size
795.7 kB Download
All versions This version
Views 1,42126
Downloads 2261
Data volume 273.1 MB795.7 kB
Unique views 1,23025
Unique downloads 1101


Cite as