10.5281/zenodo.3404911
https://zenodo.org/records/3404911
oai:zenodo.org:3404911
Weichselbraun, Albert
Albert
Weichselbraun
HTW Chur
Brasoveanu, Adrian
Adrian
Brasoveanu
MODUL Technology
Kuntschik, Philipp
Philipp
Kuntschik
HTW Chur
Nixon, Lyndon
Lyndon
Nixon
0000-0001-7091-4543
MODUL Technology
Improving Named Entity Linking Corpora Quality
Zenodo
2019
2019-09-11
https://github.com/orbis-eval/corpus_quality_paper
10.5281/zenodo.3404910
https://zenodo.org/communities/retv-h2020
https://zenodo.org/communities/eu
Creative Commons Attribution 4.0 International
Gold standard corpora and competitive evaluations play a key role in benchmarking named entity linking (NEL) performance and driving the development of more sophisticated NEL systems.
The quality of the used corpora and the used evaluation metrics are crucial in this process. We, therefore, assess the quality of three popular evaluation corpora, identifying four major issues which affect these gold standards: (i) the use of different annotation styles, (ii) incorrect and missing annotations, (iii) Knowledge Base evolution, (iv) and differences in annotating co-occurrences.
This paper addresses these issues by formalizing NEL annotations and corpus versioning which allows standardizing corpus creation, supports corpus evolution, and paves the way for the use of lenses to automatically transform between different corpus configurations. In addition, the use of clearly defined scoring rules and evaluation metrics ensures a better comparability of evaluation results.
European Commission
10.13039/501100000780
780656
Enhancing and Re-Purposing TV Content for Trans-Vector Engagement