Software Open Access

ekzhu/datasketch: Improved performance for MinHash and MinHashLSH

Eric Zhu; Vadim Markovtsev; aastafiev; Wojciech Łukasiewicz; ae-foster; Jordan Martin; Ekevoo; Kevin Mann; Keyur Joshi; Spandan Thakur; Stefano Ortolani; Titusz; Vojtech Letal; Zac Bentley; fpug


Dublin Core Export

<?xml version='1.0' encoding='utf-8'?>
<oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
  <dc:creator>Eric Zhu</dc:creator>
  <dc:creator>Vadim Markovtsev</dc:creator>
  <dc:creator>aastafiev</dc:creator>
  <dc:creator>Wojciech Łukasiewicz</dc:creator>
  <dc:creator>ae-foster</dc:creator>
  <dc:creator>Jordan Martin</dc:creator>
  <dc:creator>Ekevoo</dc:creator>
  <dc:creator>Kevin Mann</dc:creator>
  <dc:creator>Keyur Joshi</dc:creator>
  <dc:creator>Spandan Thakur</dc:creator>
  <dc:creator>Stefano Ortolani</dc:creator>
  <dc:creator>Titusz</dc:creator>
  <dc:creator>Vojtech Letal</dc:creator>
  <dc:creator>Zac Bentley</dc:creator>
  <dc:creator>fpug</dc:creator>
  <dc:date>2020-12-15</dc:date>
  <dc:description>
Performance improvement for MinHash's update method.
Make MinHash updates 4.5X faster by using update_batch method for bulk update on MinHash. [See API doc].(http://ekzhu.com/datasketch/documentation.html#datasketch.MinHash.update_batch)
Further performance gain by using bulk generation of MinHash using MinHash.bulk or MinHash.generator. See API doc and pull request.
Optional compression for MinHash LSH index by hashing the bucket key produced by MinHashLSH._H. See pull request. This leads to saving of memory/storage space used by the index.

Thank you @Sinusoidal36!</dc:description>
  <dc:identifier>https://zenodo.org/record/4323502</dc:identifier>
  <dc:identifier>10.5281/zenodo.4323502</dc:identifier>
  <dc:identifier>oai:zenodo.org:4323502</dc:identifier>
  <dc:relation>url:https://github.com/ekzhu/datasketch/tree/1.5.2</dc:relation>
  <dc:relation>doi:10.5281/zenodo.598238</dc:relation>
  <dc:rights>info:eu-repo/semantics/openAccess</dc:rights>
  <dc:title>ekzhu/datasketch: Improved performance for MinHash and MinHashLSH</dc:title>
  <dc:type>info:eu-repo/semantics/other</dc:type>
  <dc:type>software</dc:type>
</oai_dc:dc>
1,453
228
views
downloads
All versions This version
Views 1,45331
Downloads 2281
Data volume 276.3 MB795.7 kB
Unique views 1,26130
Unique downloads 1111

Share

Cite as