Zenodo.org will be unavailable for 2 hours on September 29th from 06:00-08:00 UTC. See announcement.

Conference paper Open Access

Improving Named Entity Linking Corpora Quality

Weichselbraun, Albert; Brasoveanu, Adrian; Kuntschik, Philipp; Nixon, Lyndon

MARC21 XML Export

<?xml version='1.0' encoding='UTF-8'?>
<record xmlns="http://www.loc.gov/MARC21/slim">
  <controlfield tag="005">20200120165109.0</controlfield>
  <controlfield tag="001">3404911</controlfield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">MODUL Technology</subfield>
    <subfield code="a">Brasoveanu, Adrian</subfield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">HTW Chur</subfield>
    <subfield code="a">Kuntschik, Philipp</subfield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">MODUL Technology</subfield>
    <subfield code="0">(orcid)0000-0001-7091-4543</subfield>
    <subfield code="a">Nixon, Lyndon</subfield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">617043</subfield>
    <subfield code="z">md5:22c72ed321971362f8c971a285c511c9</subfield>
    <subfield code="u">https://zenodo.org/record/3404911/files/ranlp2019_poster (2).pdf</subfield>
  <datafield tag="542" ind1=" " ind2=" ">
    <subfield code="l">open</subfield>
  <datafield tag="260" ind1=" " ind2=" ">
    <subfield code="c">2019-09-11</subfield>
  <datafield tag="909" ind1="C" ind2="O">
    <subfield code="p">openaire</subfield>
    <subfield code="p">user-retv-h2020</subfield>
    <subfield code="o">oai:zenodo.org:3404911</subfield>
  <datafield tag="100" ind1=" " ind2=" ">
    <subfield code="u">HTW Chur</subfield>
    <subfield code="a">Weichselbraun, Albert</subfield>
  <datafield tag="245" ind1=" " ind2=" ">
    <subfield code="a">Improving Named Entity Linking Corpora Quality</subfield>
  <datafield tag="980" ind1=" " ind2=" ">
    <subfield code="a">user-retv-h2020</subfield>
  <datafield tag="536" ind1=" " ind2=" ">
    <subfield code="c">780656</subfield>
    <subfield code="a">Enhancing and Re-Purposing TV Content for Trans-Vector Engagement</subfield>
  <datafield tag="540" ind1=" " ind2=" ">
    <subfield code="u">https://creativecommons.org/licenses/by/4.0/legalcode</subfield>
    <subfield code="a">Creative Commons Attribution 4.0 International</subfield>
  <datafield tag="650" ind1="1" ind2="7">
    <subfield code="a">cc-by</subfield>
    <subfield code="2">opendefinition.org</subfield>
  <datafield tag="520" ind1=" " ind2=" ">
    <subfield code="a">&lt;p&gt;Gold standard corpora and competitive evaluations play a key role in benchmarking named entity linking (NEL) performance and driving the development of more sophisticated NEL systems.&lt;br&gt;
The quality of the used corpora and the used evaluation metrics are crucial in this process. We, therefore, assess the quality of three popular evaluation corpora, identifying four major issues which affect these gold standards: (i) the use of different annotation styles, (ii) incorrect and missing annotations, (iii) Knowledge Base evolution, (iv) and differences in annotating co-occurrences.&lt;br&gt;
This paper addresses these issues by formalizing NEL annotations and corpus versioning which allows standardizing corpus creation, supports corpus evolution, and paves the way for the use of lenses to automatically transform between different corpus configurations. In addition, the use of clearly defined scoring rules and evaluation metrics ensures a better comparability of evaluation results.&lt;/p&gt;</subfield>
  <datafield tag="773" ind1=" " ind2=" ">
    <subfield code="n">url</subfield>
    <subfield code="i">references</subfield>
    <subfield code="a">https://github.com/orbis-eval/corpus_quality_paper</subfield>
  <datafield tag="773" ind1=" " ind2=" ">
    <subfield code="n">doi</subfield>
    <subfield code="i">isVersionOf</subfield>
    <subfield code="a">10.5281/zenodo.3404910</subfield>
  <datafield tag="024" ind1=" " ind2=" ">
    <subfield code="a">10.5281/zenodo.3404911</subfield>
    <subfield code="2">doi</subfield>
  <datafield tag="980" ind1=" " ind2=" ">
    <subfield code="a">publication</subfield>
    <subfield code="b">conferencepaper</subfield>
All versions This version
Views 574573
Downloads 8989
Data volume 54.9 MB54.9 MB
Unique views 569568
Unique downloads 8585


Cite as