Published December 15, 2020
| Version 1.5.2
Software
Open
ekzhu/datasketch: Improved performance for MinHash and MinHashLSH
Creators
- 1. @athenianco
- 2. Six Five Design
- 3. University of Illinois, Urbana-Champaign
- 4. Adobe
- 5. @blindspot-ai
- 6. Klaviyo
Description
- Performance improvement for MinHash's update method.
- Make MinHash updates 4.5X faster by using
update_batch
method for bulk update on MinHash. [See API doc].(http://ekzhu.com/datasketch/documentation.html#datasketch.MinHash.update_batch) - Further performance gain by using bulk generation of MinHash using
MinHash.bulk
orMinHash.generator
. See API doc and pull request. - Optional compression for MinHash LSH index by hashing the bucket key produced by
MinHashLSH._H
. See pull request. This leads to saving of memory/storage space used by the index.
Thank you @Sinusoidal36!
Files
ekzhu/datasketch-1.5.2.zip
Files
(795.7 kB)
Name | Size | Download all |
---|---|---|
md5:a3d3bce4aa309dab4bcd0ed08870cbc8
|
795.7 kB | Preview Download |
Additional details
Related works
- Is supplement to
- https://github.com/ekzhu/datasketch/tree/1.5.2 (URL)