There is a newer version of this record available.

Software Open Access

ekzhu/datasketch: hashfunc to replace hashobj

Eric Zhu; Vadim Markovtsev; aastafiev; ae-foster; fpug; Wojciech Łukasiewicz; Titusz; Spandan Thakur; Kevin Mann

MARC21 XML Export

<?xml version='1.0' encoding='UTF-8'?>
<record xmlns="">
  <controlfield tag="005">20201215205807.0</controlfield>
  <controlfield tag="001">2532820</controlfield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">@src-d</subfield>
    <subfield code="a">Vadim Markovtsev</subfield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="a">aastafiev</subfield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="a">ae-foster</subfield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="a">fpug</subfield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">Free University of Berlin</subfield>
    <subfield code="a">Wojciech Łukasiewicz</subfield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="a">Titusz</subfield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">Adobe</subfield>
    <subfield code="a">Spandan Thakur</subfield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">Six Five Design</subfield>
    <subfield code="a">Kevin Mann</subfield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">2542995</subfield>
    <subfield code="z">md5:4c4b76e205742a3590df4291d6bc520d</subfield>
    <subfield code="u"></subfield>
  <datafield tag="542" ind1=" " ind2=" ">
    <subfield code="l">open</subfield>
  <datafield tag="260" ind1=" " ind2=" ">
    <subfield code="c">2019-01-06</subfield>
  <datafield tag="909" ind1="C" ind2="O">
    <subfield code="p">software</subfield>
    <subfield code="o"></subfield>
  <datafield tag="100" ind1=" " ind2=" ">
    <subfield code="u">University of Toronto</subfield>
    <subfield code="a">Eric Zhu</subfield>
  <datafield tag="245" ind1=" " ind2=" ">
    <subfield code="a">ekzhu/datasketch: hashfunc to replace hashobj</subfield>
  <datafield tag="540" ind1=" " ind2=" ">
    <subfield code="a">Other (Open)</subfield>
  <datafield tag="650" ind1="1" ind2="7">
    <subfield code="a">cc-by</subfield>
    <subfield code="2"></subfield>
  <datafield tag="520" ind1=" " ind2=" ">
    <subfield code="a">&lt;p&gt;Now support &lt;code&gt;hashfunc&lt;/code&gt; parameter for MinHash and HyperLogLog. The old parameter &lt;code&gt;hashobj&lt;/code&gt; is removed.&lt;/p&gt;
&lt;pre&gt;&lt;code class="lang-python"&gt;# Let&amp;#39;s use MurmurHash3.
import mmh3

# We need to define a new hash function that outputs an integer that
# can be encoded in 32 bits.
def _hash_func(d):
    return mmh3.hash32(d)

# Use this function in MinHash constructor.
m = MinHash(hashfunc=_hash_func)
  <datafield tag="773" ind1=" " ind2=" ">
    <subfield code="n">url</subfield>
    <subfield code="i">isSupplementTo</subfield>
    <subfield code="a"></subfield>
  <datafield tag="773" ind1=" " ind2=" ">
    <subfield code="n">doi</subfield>
    <subfield code="i">isVersionOf</subfield>
    <subfield code="a">10.5281/zenodo.598238</subfield>
  <datafield tag="024" ind1=" " ind2=" ">
    <subfield code="a">10.5281/zenodo.2532820</subfield>
    <subfield code="2">doi</subfield>
  <datafield tag="980" ind1=" " ind2=" ">
    <subfield code="a">software</subfield>
All versions This version
Views 1,41772
Downloads 22614
Data volume 273.1 MB35.6 MB
Unique views 1,22758
Unique downloads 1108


Cite as