2024-03-29T10:24:28Z
https://zenodo.org/oai2d
oai:zenodo.org:4073462
2020-10-09T13:13:36Z
openaire
user-scriptnet
Romein, Christel Annemieke
2020-10-09
<p>This is webinar #3 on Transkribus</p>
<ul>
<li>Pylaia and how to use it. (This might get a bit boring, but I will explain the parameters so that you have this information while documentation is currently absent.)</li>
<li>Tables and Layout analysis (a quick tweak to make life easier)</li>
<li>How to remove tiny regions (noise).</li>
<li>How to create a sample set to function as a sturdy basis for your models.</li>
<li>And a quick update on one new feature in P2PaLA and Read and Search.</li>
</ul>
https://doi.org/10.5281/zenodo.4073462
oai:zenodo.org:4073462
eng
Zenodo
https://zenodo.org/communities/scriptnet
https://doi.org/10.5281/zenodo.4073461
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
digital scholarship
Digital Humanities
Computational linguistics
Handwriting Text Recognition
Transkribus
Webinar
Transkribus webinar #3 update & pylaia (powerpoint)
info:eu-repo/semantics/lecture
oai:zenodo.org:4626501
2021-03-22T12:27:20Z
openaire
user-scriptnet
Stefano Bazzaco
2021-03-22
<p>Recording and Powerpoint presentation of the second day of the seminar "DE LAS OBRAS EN PAPEL A LAS EDICIONES ACADÉMICAS DIGITALES. RECURSOS Y NUEVAS COMPETENCIAS - Jornada 2: Introducción a la Edición Académica Digital". The seminar was organized by the following research groups Progetto Mambrino (University of Verona), BIDISO (University of A Coruña), Comedic (University of Zaragoza) and sponsored by the Department of Foreign Languages and Literatures of Verona and the Excellence Project of the University of Verona.</p>
https://doi.org/10.5281/zenodo.4626501
oai:zenodo.org:4626501
spa
Zenodo
https://zenodo.org/communities/scriptnet
https://doi.org/10.5281/zenodo.4626500
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
DE LAS OBRAS EN PAPEL A LAS EDICIONES ACADÉMICAS DIGITALES. RECURSOS Y NUEVAS COMPETENCIAS, University of Verona, 26/02/2021
Progetto Mambrino (University of Verona)
Comedic (University of Zaragoza)
BIDISO (University of A Coruña)
Modern Age literature
Spanish Golden Age literature
XML TEI
Digital Scholarly editing
Digital text theory
Digital Editions
Digital Philology
DE LAS OBRAS EN PAPEL A LAS EDICIONES ACADÉMICAS DIGITALES. RECURSOS Y NUEVAS COMPETENCIAS - Jornada 2: Introducción a la Edición Académica Digital
info:eu-repo/semantics/lecture
oai:zenodo.org:3760331
2020-04-23T18:02:52Z
user-scriptnet
Dr. C.A. Romein
J. Walcher
2020-04-21
<p>Creating transcriptions can be a tedious job. The computer tool Transkribus can ease the process after some training. In this digital workshop, the basics of the program will be demonstrated, and some practical examples are going to be given. During this workshop, several features will be discussed: how to get started, what are all the buttons for, how to create TextRegions and Baselines, what is necessary to necessary to create a model. This workshop is organised through a Zoom-meeting; additional questions can be asked through the chat-function. The workshop will be given in English and will be provided by Annemieke Romein, a postdoctoral researcher at HuygensING and experienced Transkribus-user.</p>
<p>Date: April 21st, 2020; 1-4pm CEST. Location: online.</p>
<p>Location online: https://youtu.be/5YCfaFNMol4 </p>
<p>Slides: https://doi.org/10.5281/zenodo.3759787</p>
<p> </p>
<p>Intro to the webinar 00:00-05:52</p>
<p>Theory OCR and HTR 05:53-28:33</p>
<p>How to get started: registration and installation. 28:34-33:42</p>
<p>How to upload files? 33:43-41:06</p>
<p>How does the desktop-version work? 41:07-46:04</p>
<p>What are all the buttons in Transkribus? 46:05-52:28</p>
<p>How to perform a Lay-Out Analysis (manual and automatically) 52:29-1:11:04</p>
<p>How and where to transcribe? 1:11:05-1:16:18</p>
<p>How to collaborate within Transkribus? 1:16:19-1:24:28</p>
<p>What to do with abbreviations, Italics and conventions? 1:24:29-1:31:52</p>
<p>How to apply OCR or a HTR-model? 1:31:53-1:45:43</p>
<p>How to create a model? 1:45:44-1:58:28</p>
<p>How to export from Transkribus? 1:58:29-2:2:01:03</p>
<p>How to do full-text-searches or keyword spotting? 2:2:01:04-2:05:08</p>
<p>What is the READ-COOP? By Johanna Walcher 2:05:11-2:20:00</p>
<p>Questions and Answers 2:20:00-end</p>
<p> </p>
https://doi.org/10.5281/zenodo.3760331
oai:zenodo.org:3760331
eng
Zenodo
https://zenodo.org/communities/scriptnet
https://doi.org/10.5281/zenodo.3760330
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
digital scholarship
Digital Humanities
Computational linguistics
Handwriting Text Recognition
Transkribus
Webinar
KNAW HuygensING
Digi-workshop. Transkribus: getting started. The basics (video)
info:eu-repo/semantics/other
oai:zenodo.org:4106571
2020-11-27T15:07:31Z
openaire
user-scriptnet
Huff, Dorothee
2020-10-19
<p>Vorstellung der Nutzung von Transkribus im Projekt OCR-BW mit Schwerpunkt auf dem Thema Modelltraining mit Vergleich zweier Herangehensweisen. Während für das Korpus der Tagebücher von Edwin Hennig zunächst ein zeitlich stark spezialisiertes Modell erstellt und dieses durch Nachtraining erweitert wurde, wurde für den Bestand der griechischen Predigtnachschriften von Martin Crusius von vorneherein Material aus allen Jahrgängen für das Modelltraining herangezogen.</p>
https://doi.org/10.5281/zenodo.4106571
oai:zenodo.org:4106571
deu
Zenodo
https://zenodo.org/communities/scriptnet
https://doi.org/10.5281/zenodo.4106570
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
Dokumentenerbe digital - Digitalisierung historischer Bestände baden-württembergischer Bibliotheken. OCR-Workshop, Heidelberg, Germany, 16 October 2020
OCR-BW
OCR
HTR
Transkribus
Automatische Texterkennung
Modelltraining
Texterkennung von Handschriften mit Transkribus. Modelltraining – Herangehensweisen und Problematiken
info:eu-repo/semantics/lecture
oai:zenodo.org:3726603
2020-04-02T12:24:39Z
openaire
user-scriptnet
user-eu
Tobias Hodel
2020-03-25
<p>Die Folien zum Transkribus best-practices paper «Erkennung alter Drucke und Handschriften» der dhd2020. Das paper dazu findet sich im Book of Abstracts der DHD, Schöch (Hg.): https://doi.org/10.5281/zenodo.3666689, S. 84-87.</p>
https://doi.org/10.5281/zenodo.3726603
oai:zenodo.org:3726603
deu
Zenodo
https://zenodo.org/communities/scriptnet
https://zenodo.org/communities/eu
https://doi.org/10.5281/zenodo.3726602
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
dhd2020, DHd 2020 Spielräume: Digital Humanities zwischen Modellierung und Interpretation, Paderborn, 02-06 march
handwritten text recognition
ocr
Best-practices zur Erkennung alter Drucke und Handschriften – Die Nutzung von Transkribus large- und small-scale
info:eu-repo/semantics/lecture
oai:zenodo.org:3834737
2020-05-20T06:53:30Z
user-scriptnet
Romein, Christel Annemieke
2020-05-19
<p>Creating transcriptions can be a tedious job. The computer tool Transkribus can ease the process after some training. In this digital workshop, we will commence with the knowledge you gained in <a href="https://youtu.be/5YCfaFNMol4">the basic training</a>. You will now learn about judging the quality of a model, structure tags and automatic training of structure through P2PaLA, combining existing transcriptions with images (T2I), future options with tables and NERs-tagging.</p>
<p>This workshop was organised through a Zoom-meeting. The workshop is in English and is provided in by Annemieke Romein, a postdoctoral researcher at Huygens ING and experienced Transkribus-user.</p>
<p>The video can be found at: </p>
<p>Date: May 19th, 2020; 1-4pm CEST. Location: online.</p>
https://doi.org/10.5281/zenodo.3834737
oai:zenodo.org:3834737
eng
Zenodo
https://zenodo.org/communities/scriptnet
https://doi.org/10.5281/zenodo.3834736
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
digital scholarship
Digital Humanities
Computational linguistics
Handwriting Text Recognition
Transkribus
Webinar
KNAW HuygensING
Digi-workshop. Transkribus: advanced (video)
info:eu-repo/semantics/other
oai:zenodo.org:8108347
2023-07-04T12:06:09Z
openaire_data
user-scriptnet
Hillebrand Verkroost
Bart Cohen
Evelien Bachrach
Marjo Janssens
Cocky Sietses
Milan van Lange
Annelies van Nispen
Carlijn Keijzer
Muriël Bouman
2023-02-16
<p>The HTR model ‘NIOD_WarLet_1935-1950_NoBasemodel’ was trained using 968 ‘Ground Truth’ transcriptions of high-resolution scans of various handwritten letters. These letters are all written in Dutch and originate from the period 1935-1950. The training set contains personal correspondence from a wide variety of letter writers (e.g., children, soldiers, Jewish people in hiding). These personal correspondences are all part of the archival collection known as ‘247 Correspondentie’ held by the NIOD Institute for War, Holocaust, and Genocide Studies in Amsterdam.<br>
<br>
This model was created as part of the project ‘First-Hand Accounts of War: War letters (1935-1950) from NIOD digitised’. All documents used for training and validation were scanned and transcribed within this project. This project ran from 2020 to 2023 and was funded by the Mondriaan Fund, the Dutch Ministry of Health, Welfare, and Sport, and the NIOD Institute for War, Holocaust, and Genocide Studies in Amsterdam.<br>
<br>
The ‘Ground Truth’ training set is created by project members Annelies van Nispen, Carlijn Keijzer and Milan van Lange. Additional transcription and correction of ‘Ground Truth’ transcriptions was performed under supervision of Muriël Bouman by citizen scientists Hillebrand Verkroost, Bart Cohen, Evelien Bachrach, Marjo Janssens, and Cocky Sietses. The validation set contains a sample of 17 ‘Ground Truth’ transcriptions from various writers and sub-collections. Due to legal restrictions only a limited sample of the training set is published publicly.<br>
<br>
The model is trained using PyLaia HTR, max. 500 epochs (321 epochs trained), learning rate 0.0003. No basemodel was used. See also: https://readcoop.eu/model/niod_warlet_1945-1950_nobasemodel/</p>
https://doi.org/10.5281/zenodo.8108347
oai:zenodo.org:8108347
nld
Zenodo
https://zenodo.org/communities/scriptnet
https://doi.org/10.5281/zenodo.8108346
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
HTR
Transkribus
Handwritten Text
Egodocuments
Dutch
World War II
HTR model NIOD_WarLet_1935-1950_NoBasemodel
info:eu-repo/semantics/other
oai:zenodo.org:4888927
2023-01-07T11:26:51Z
openaire_data
user-scriptnet
Stefano Bazzaco (coord.)
Federica Zoppi
Giada Blasut
Nuria Aranda García
Ángela Torralba Ruberte
Ana-Milagros Jiménez Ruiz
Pedro Monteiro
2021-06-01
<pre>The SpanishGothic_XV-XVI_extended DATASET is conceived to be uploaded inside Transkribus
platform (READ Coop) to perform a training and create an HTR+ model for the automated
recognition of Spanish printed documents in Gothic script published between XV-XVI Century.
For further information please have a look to the README file here:
https://github.com/stefanobazzaco/HTR-model-SpanishGothic_XV-XVI_extended
For system requirements and information about Transkribus platform, go to:
https://readcoop.eu/transkribus/
https://readcoop.eu/transkribus/resources/
https://github.com/Transkribus</pre>
https://doi.org/10.5281/zenodo.4888927
oai:zenodo.org:4888927
spa
Zenodo
https://hdl.handle.net/ISSN 2254-7290/https://www.janusdigital.es/articulo.htm?id=160
https://github.com/stefanobazzaco/HTR-model-SpanishGothic_XV-XVI_extended-DATASET
https://zenodo.org/communities/scriptnet
https://doi.org/10.5281/zenodo.4888926
info:eu-repo/semantics/restrictedAccess
HTR
Transkribus
Spanish Gothic
Spanish printed documents
XV-XVI Century documents
Progetto Mambrino
BIDISO
COMEDIC
HTR model SpanishGothic_XV-XVI_extended DATASET
info:eu-repo/semantics/other
oai:zenodo.org:1208366
2020-01-24T19:26:01Z
openaire_data
user-scriptnet
user-cvl
user-eu
Diem, Markus
Kleber, Florian
Fiel, Stefan
Grüning, Tobias
Gatos, Basilis
2017-01-23
<p>This dataset contains the training and test set for the ICDAR 2017 Competition on Baseline Detection in Archival Documents (cBAD).</p>
<p>A newly created freely available real world dataset consisting of 2035 annotated document page images that are collected from 9 different archives and form the basis of cBAD. Two competition tracks test different characteristics of the methods submitted. Track A [Simple Documents] is published with annotated text regions and tests therefore a method's quality of text line segmentation. The more challenging Track B [Complex Documents] provides only the page area. Hence, baseline detection algorithms need to correctly locate text lines in the presence of marginalia, tables, and noise.</p>
<p>The dataset comprises images with additional PAGE XMLs. The PAGE XMLs contain text regions and baseline annotations.</p>
<p>Competition Website: https://scriptnet.iit.demokritos.gr/competitions/5/</p>
<p>Version 4 contains also the page region and in case of a double-page the page split as separator.</p>
This project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 674943
https://doi.org/10.5281/zenodo.1208366
oai:zenodo.org:1208366
Zenodo
https://zenodo.org/communities/cvl
https://zenodo.org/communities/scriptnet
https://zenodo.org/communities/eu
https://doi.org/10.5281/zenodo.746925
info:eu-repo/semantics/openAccess
Creative Commons Attribution Share Alike 4.0 International
https://creativecommons.org/licenses/by-sa/4.0/legalcode
Baseline Detection
Text Line Segmentation
Historical Documents
ICDAR 2017 Competition
cBAD
ScriptNet: ICDAR 2017 Competition on Baseline Detection in Archival Documents (cBAD)
info:eu-repo/semantics/other
oai:zenodo.org:835441
2020-01-24T19:25:38Z
openaire_data
user-scriptnet
user-cvl
user-eu
Diem, Markus
Kleber, Florian
Fiel, Stefan
Grüning, Tobias
Gatos, Basilis
2017-01-23
<p>This dataset contains the training and test set for the ICDAR 2017 Competition on Baseline Detection in Archival Documents (cBAD).</p>
<p>A newly created freely available real world dataset consisting of 2035 annotated document page images that are collected from 9 different archives and form the basis of cBAD. Two competition tracks test different characteristics of the methods submitted. Track A [Simple Documents] is published with annotated text regions and tests therefore a method's quality of text line segmentation. The more challenging Track B [Complex Documents] provides only the page area. Hence, baseline detection algorithms need to correctly locate text lines in the presence of marginalia, tables, and noise.</p>
<p>The dataset comprises images with additional PAGE XMLs. The PAGE XMLs contain text regions and baseline annotations.</p>
<p>Competition Website: https://scriptnet.iit.demokritos.gr/competitions/5/</p>
This project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 674943
https://doi.org/10.5281/zenodo.835441
oai:zenodo.org:835441
Zenodo
https://zenodo.org/communities/cvl
https://zenodo.org/communities/scriptnet
https://zenodo.org/communities/eu
https://doi.org/10.5281/zenodo.746925
info:eu-repo/semantics/openAccess
Creative Commons Attribution Share Alike 4.0 International
https://creativecommons.org/licenses/by-sa/4.0/legalcode
Baseline Detection
Text Line Segmentation
Historical Documents
ICDAR 2017 Competition
cBAD
ScriptNet: ICDAR 2017 Competition on Baseline Detection in Archival Documents (cBAD)
info:eu-repo/semantics/other
oai:zenodo.org:1164028
2020-01-24T19:25:13Z
openaire_data
user-scriptnet
user-eu
Toselli, A.H.
Romero, V.
Villegas, M.
Vidal, E.
Sánchez, J.A.
2018-02-01
<p>Test set corresponding to the HTR competition held at ICFHR 2016</p>
https://doi.org/10.5281/zenodo.1164028
oai:zenodo.org:1164028
gml
Zenodo
https://zenodo.org/communities/scriptnet
https://zenodo.org/communities/eu
https://doi.org/10.5281/zenodo.1164027
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
Handwriting Text Recognition
HTR Test ICFHR 2016
info:eu-repo/semantics/other
oai:zenodo.org:3258194
2020-01-24T19:25:18Z
openaire_data
user-scriptnet
user-cvl
user-eu
Diem Markus
Kleber Florian
Gatos Basilis
2019-02-17
<p>This dataset contains the training, evaluation, and test set for the ICDAR 2019 Competition on Baseline Detection (cBAD).</p>
<p>A newly created freely available real world dataset consisting of 3021 annotated document page images that are collected from seven European archives and form the basis of cBAD. The baselines in all images were manually annotated. The training and the evaluation sets contain PAGE XMLs with annotated text regions and baselines.</p>
<p>Competition Website: https://scriptnet.iit.demokritos.gr/competitions/11/</p>
https://doi.org/10.5281/zenodo.3258194
oai:zenodo.org:3258194
eng
Zenodo
https://zenodo.org/communities/cvl
https://zenodo.org/communities/scriptnet
https://zenodo.org/communities/eu
https://doi.org/10.5281/zenodo.2567397
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
Baseline Detection
cBAD
ICDAR2019 Competition
Historical documents
ICDAR 2019 Competition on Baseline Detection (cBAD)
info:eu-repo/semantics/other
oai:zenodo.org:1297399
2020-01-24T19:25:56Z
openaire_data
user-scriptnet
user-eu
Quirós, Lorenzo
Toselli, A.H.
Romero, V.
Villegas, M.
Vidal, E.
Sánchez, J.A.
2018-02-01
<p>This dataset arises from the READ project (Horizon 2020).</p>
<p>The dataset consists of a subset of documents from the Ratsprotokolle collection composed of minutes of the council meetings held from 1470 to 1805 (about 30.000 pages), which will be used in the READ project. This dataset is written in Early Modern German. The number of writers is unknown. Handwriting in this collection is complex enough to challenge the HTR software.</p>
<p>The training dataset is composed of 400 pages; most of the pages consist of a single block with many difficulties for line detection and extraction. The ground-truth in this set is in PAGE format and it is provided annotated at line level in the PAGE files.</p>
<p>The previous dataset is the same that is located at https://zenodo.org/record/218236#.WnLhaCHhBGF</p>
<p>The new file includes the test set corresponding to the HTR competition held at ICFHR 2016</p>
Main updates in Version 1.2.0 (Author: Lorenzo Quirós)
1) TextRegions have been labeled into four different structural types
(page-number, marginalia, paragraph and heading).
2) The surrounding polygon some TextRegion have been modified to avoid
overlaps between regions, and oversized and undersized regions.
3) Spurious regions have been deleted.
https://doi.org/10.5281/zenodo.1297399
oai:zenodo.org:1297399
gml
Zenodo
https://zenodo.org/communities/scriptnet
https://zenodo.org/communities/eu
https://doi.org/10.5281/zenodo.1164027
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
Handwriting Text Recognition
HTR Dataset ICFHR 2016
info:eu-repo/semantics/other
oai:zenodo.org:3568023
2020-01-24T19:25:11Z
openaire_data
user-scriptnet
user-cvl
user-eu
Diem Markus
Kleber Florian
Gatos Basilis
2019-02-17
<p>This dataset contains the training, evaluation, and test set for the ICDAR 2019 Competition on Baseline Detection (cBAD).</p>
<p>A newly created freely available real world dataset consisting of 3021 annotated document page images that are collected from seven European archives and form the basis of cBAD. The baselines in all images were manually annotated. The training and the evaluation sets contain PAGE XMLs with annotated text regions and baselines.</p>
<p>Competition Website: https://scriptnet.iit.demokritos.gr/competitions/11/</p>
https://doi.org/10.5281/zenodo.3568023
oai:zenodo.org:3568023
eng
Zenodo
https://zenodo.org/communities/cvl
https://zenodo.org/communities/scriptnet
https://zenodo.org/communities/eu
https://doi.org/10.5281/zenodo.2567397
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
Baseline Detection
cBAD
ICDAR2019 Competition
Historical documents
ICDAR 2019 Competition on Baseline Detection (cBAD)
info:eu-repo/semantics/other
oai:zenodo.org:1226879
2020-01-24T19:24:35Z
openaire_data
user-scriptnet
user-eu
Déjean Hervé
Lang Eva
Kleber Florian
2018-04-23
<p>Datasets used in the publication : Comparing Machine Learning Approaches for Table Recognition in Historical Register Books, Hervé Déjean, Jean-Luc Meunier, Stéphane Clinchant, Eva Maria Lang and Florian Kleber, 13TH IAPR INTERNATIONAL WORKSHOP ON DOCUMENT ANALYSIS SYSTEMS DAS 2018; Vienna, Austria</p>
<p> </p>
<p>dataset111<br>
img/ images<br>
xml/ READ pagexml with BIESO annotation</p>
<p>dataset150<br>
img/ images<br>
GT_xml: READ pagexml with BIESO annotation<br>
WK_xml: workflow version: pagexml corresponding to the workflow outputs (textlines are automatically recognised, columns as well)<br>
ROWREF: GT for the row regions</p>
<p>Tagset (attribute of the TextLine element)</p>
<p>Type: deprecated</p>
<p>DU_row:<br>
B: first element of cell<br>
I: inside a cell<br>
E: last element of a cell<br>
S: single element of a cell<br>
O: outside the table</p>
https://doi.org/10.5281/zenodo.1226879
oai:zenodo.org:1226879
Zenodo
https://zenodo.org/communities/scriptnet
https://zenodo.org/communities/eu
https://doi.org/10.5281/zenodo.1226878
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
READ Table Understanding Handwritten Texts
READ ABP Table datasets
info:eu-repo/semantics/other
oai:zenodo.org:3387369
2020-01-24T19:24:52Z
openaire_data
user-newseye
user-scriptnet
user-eu
Guenter Muehlberger
Guenter Hackl
2019-09-05
<p>The dataset comprises Austrian newspaper pages from 19th and early 20th century with carefully corrected text. The page images were provided by the <a href="http://onb.ac.at/">Austrian National Library</a> and comprise 148 pages (training set) and 13 pages (validation set). The data are formed according to the PAGE format (cf. Cf. <a href="https://github.com/PRImA-Research-Lab/PAGE-XML/">https://github.com/PRImA-Research-Lab/PAGE-XML/</a>) and were produced with the <a href="http://read.transkribus.eu/">Transkribus </a>platform with support of the <a href="http://newseye.eu/">NewsEye</a> and the <a href="http://read.transkribus.eu/">READ </a>project.</p>
https://doi.org/10.5281/zenodo.3387369
oai:zenodo.org:3387369
deu
Zenodo
https://zenodo.org/communities/newseye
https://zenodo.org/communities/scriptnet
https://zenodo.org/communities/eu
https://doi.org/10.5281/zenodo.3387368
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
OCR, Text Recognition, Transkribus
NewsEye / READ OCR training dataset from Austrian Newspapers (19th C.)
info:eu-repo/semantics/other
oai:zenodo.org:4293602
2021-03-12T10:10:14Z
openaire_data
user-newseye
user-scriptnet
user-eu
Muehlberger, Guenter
Hackl, Guenter
2020-11-27
<p>The dataset comprises French newspaper pages from 18th, 19th and early 20th century with carefully corrected text. The page images were provided by the <a href="https://www.bnf.fr/en">French National Library</a> and comprise 127 pages (training set) and 8 pages (validation set). The data are formed according to the PAGE format (cf. Cf. <a href="https://github.com/PRImA-Research-Lab/PAGE-XML/">https://github.com/PRImA-Research-Lab/PAGE-XML/</a>) and were produced with the <a href="http://read.transkribus.eu/">Transkribus </a>platform with support of the <a href="http://newseye.eu/">NewsEye</a> and the <a href="http://read.transkribus.eu/">READ </a>project.</p>
https://doi.org/10.5281/zenodo.4293602
oai:zenodo.org:4293602
fra
Zenodo
https://zenodo.org/communities/newseye
https://zenodo.org/communities/scriptnet
https://zenodo.org/communities/eu
https://doi.org/10.5281/zenodo.4293601
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
OCR, Text Recognition, Transkribus
NewsEye / READ OCR training dataset from French Newspapers (18th, 19th, early 20th C.)
info:eu-repo/semantics/other
oai:zenodo.org:2649217
2022-04-05T12:44:36Z
openaire_data
user-scriptnet
user-cvl
user-eu
user-iapr-tc11
Déjean, Hervé
Meunier, Jean-Luc
Gao, Liangcai
Huang, Yilun
Fang, Yu
Kleber, Florian
Lang, Eva-Maria
2019-04-23
<p>The aim of this competition is to evaluate the performance of state of the art methods for table detection (TRACK A) and table recognition (TRACK B). For the first track, document images containing one or several tables are provided. For TRACK B two subtracks exist: the first subtrack (B.1) provides the table region. Thus, only the table structure recognition must be performed. The second subtrack (B.2) provides no a-priori information. This means, the table region and table structure detection has to be done. The Ground Truth is provided in a similar format as for the ICDAR 2013 competition (see [2]):</p>
<p><?xml version="1.0" encoding="UTF-8"?></p>
<p><<strong>document</strong> filename='filename.jpg'></p>
<p> <<strong>table</strong> id='Table_1540517170416_3'></p>
<p><strong> <Coords points="180,160 4354,160 4354,3287 180,3287"/></strong></p>
<p> <<strong>cell</strong> id='TableCell_1540517477147_58' <strong>start-row</strong>='0' <strong>start-col</strong>='0' <strong>end-row</strong>='1' <strong>end-col</strong>='2'></p>
<p> <<strong>Coords</strong> <strong>points</strong>="180,160 177,456 614,456 615,163"/></p>
<p> </cell></p>
<p> ...</p>
<p> </table></p>
<p> ...</p>
<p></document></p>
<p> </p>
<p>The difference to Gobel et al. [2] is the Coords tag which defines a table/cell as a polygon specified by a list of coordinates. For B.1 the table and its coordinates is given together with the input image.</p>
<p>Important Note:</p>
<p>For the modern dataset, the convex hull of the content describes a cell region. For the historical dataset, it is requested that the output region of a cell is the cell boundary. This is necessary due to the characteristics of handwritten text, which is often overlapping with different cells.</p>
<p>See also: http://sac.founderit.com/tasks.html</p>
<p>The evaluation tool is available at github: https://github.com/cndplab-founder/ctdar_measurement_tool</p>
http://sac.founderit.com/
https://doi.org/10.5281/zenodo.2649217
oai:zenodo.org:2649217
Zenodo
https://zenodo.org/communities/iapr-tc11
https://zenodo.org/communities/cvl
https://zenodo.org/communities/scriptnet
https://zenodo.org/communities/eu
https://doi.org/10.5281/zenodo.2649216
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
ICDAR, International Conference on Document Analysis and Recognition, Sydney, Australia, 2019
ICDAR 2019 Competition on Table Detection and Recognition (cTDaR)
info:eu-repo/semantics/other
oai:zenodo.org:4599624
2021-03-15T12:34:59Z
openaire_data
user-newseye
user-scriptnet
user-eu
Muehlberger, Guenter
Hackl, Guenter
2021-03-11
<p>The dataset comprises swedish newspaper pages from late 18th till early 20th century with carefully corrected text. The page images were provided by the <a href="https://www.kansalliskirjasto.fi/en/">National Library Finland</a> (NLF) and comprise 255 pages (training set) and 6 pages (validation set). The data are formed according to the PAGE format (cf. Cf. <a href="https://github.com/PRImA-Research-Lab/PAGE-XML/">https://github.com/PRImA-Research-Lab/PAGE-XML/</a>) and were produced with the <a href="http://read.transkribus.eu/">Transkribus </a>platform with support of the <a href="http://newseye.eu/">NewsEye</a> and the <a href="http://read.transkribus.eu/">READ </a>project.</p>
https://doi.org/10.5281/zenodo.4599624
oai:zenodo.org:4599624
swe
Zenodo
https://zenodo.org/communities/newseye
https://zenodo.org/communities/scriptnet
https://zenodo.org/communities/eu
https://doi.org/10.5281/zenodo.4599623
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
OCR
Text Recognition
Transkribus
NewsEye / READ OCR training dataset from Swedish Newspapers (18th, 19th, early 20th C.)
info:eu-repo/semantics/other
oai:zenodo.org:3759787
2020-05-24T08:18:10Z
openaire
user-scriptnet
Dr. C.A. Romein
J. Walcher
2020-04-21
<p>Creating transcriptions can be a tedious job. The computer tool Transkribus can ease the process after some training. In this digital workshop, the basics of the program will be demonstrated, and some practical examples are going to be given. During this workshop, several features will be discussed: how to get started, what are all the buttons for, how to create TextRegions and Baselines, what is necessary to necessary to create a model. This workshop is organised through a Zoom-meeting; additional questions can be asked through the chat-function. The workshop will be given in English and will be provided by Annemieke Romein, a postdoctoral researcher at HuygensING and experienced Transkribus-user.</p>
<p>Date: April 21st, 2020; 1-4pm CEST. Location: online.</p>
<p> </p>
<p>Location online: https://youtu.be/5YCfaFNMol4 </p>
<p>Video: https://doi.org/10.5281/zenodo.3760331</p>
https://doi.org/10.5281/zenodo.3759787
oai:zenodo.org:3759787
eng
Zenodo
https://zenodo.org/communities/scriptnet
https://doi.org/10.5281/zenodo.3759786
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
digital scholarship
Digital Humanities
Computational linguistics
Handwriting Text Recognition
Transkribus
Webinar
KNAW HuygensING
Digi-workshop. Transkribus: getting started. The basics (powerpoint)
info:eu-repo/semantics/lecture
oai:zenodo.org:3886989
2020-06-23T13:04:10Z
openaire
user-scriptnet
Romein, Christel Annemieke
Bazzaco, Stefano
Walcher, Johanna
Terbul, Tamara
2020-02-06
<p>What are the experiences regarding 'giving a training on Transkribus'? </p>
https://doi.org/10.5281/zenodo.3886989
oai:zenodo.org:3886989
eng
Zenodo
https://zenodo.org/communities/scriptnet
https://doi.org/10.5281/zenodo.3886988
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
TUC2020, Transkribus User Meeting 2020, Innsbruck, 6-7 February 2020
digital scholarship
Digital Humanities
Computational linguistics
Handwriting Text Recognition
Transkribus
Want to become a trainer for Transkribus?
info:eu-repo/semantics/lecture
oai:zenodo.org:4626480
2021-03-22T12:27:20Z
openaire
user-scriptnet
Stefano Bazzaco
2021-03-22
<p>Recording and Powerpoint presentation of the first day of the seminar "DE LAS OBRAS EN PAPEL A LAS EDICIONES ACADÉMICAS DIGITALES. RECURSOS Y NUEVAS COMPETENCIAS - Jornada 1: Introducción al reconocimiento automático de textos hispánicos con la plataforma Transkribus" (25/02/2021). The seminar was organized by the following research groups Progetto Mambrino (University of Verona), BIDISO (University of A Coruña), Comedic (University of Zaragoza) and sponsored by the Department of Foreign Languages and Literatures of Verona and the Excellence Project of the University of Verona.</p>
https://doi.org/10.5281/zenodo.4626480
oai:zenodo.org:4626480
spa
Zenodo
https://zenodo.org/communities/scriptnet
https://doi.org/10.5281/zenodo.4626479
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
DE LAS OBRAS EN PAPEL A LAS EDICIONES ACADÉMICAS DIGITALES. RECURSOS Y NUEVAS COMPETENCIAS, University of Verona, 25/02/2021
Transkribus
Spanish printed books
HTR
Text Recognition
READ Coop SCE
Progetto Mambrino
Comedic
BIDISO
Modern Age gothic scripts
Modern Age round scripts
OCR
DE LAS OBRAS EN PAPEL A LAS EDICIONES ACADÉMICAS DIGITALES. RECURSOS Y NUEVAS COMPETENCIAS. Jornada 1: Introducción al reconocimiento automático de textos hispánicos con la plataforma Transkribus
info:eu-repo/semantics/lecture
oai:zenodo.org:3834205
2020-05-19T16:11:17Z
user-scriptnet
Romein, Christel Annemieke
2020-05-19
<p>Creating transcriptions can be a tedious job. The computer tool Transkribus can ease the process after some training. In this digital workshop, we will commence with the knowledge you gained in <a href="https://youtu.be/5YCfaFNMol4">the basic training</a>. You will now learn about judging the quality of a model, structure tags and automatic training of structure through P2PaLA, combining existing transcriptions with images (T2I), future options with tables and NERs-tagging.</p>
<p>This workshop was organised through a Zoom-meeting. The workshop is in English and is provided in by Annemieke Romein, a postdoctoral researcher at Huygens ING and experienced Transkribus-user.</p>
<p>The video can be found at: </p>
<p>Date: May 19th, 2020; 1-4pm CEST. Location: online.</p>
https://doi.org/10.5281/zenodo.3834205
oai:zenodo.org:3834205
eng
Zenodo
https://zenodo.org/communities/scriptnet
https://doi.org/10.5281/zenodo.3834204
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
digital scholarship
Digital Humanities
Computational linguistics
Handwriting Text Recognition
Transkribus
Webinar
KNAW HuygensING
Digi-workshop. Transkribus: advanced (powerpoint)
info:eu-repo/semantics/lecture
oai:zenodo.org:218236
2020-01-24T19:25:13Z
openaire_data
user-scriptnet
Sánchez, Joan Andreu
Romero, Verónica
Toselli, Alejandro H.
Vidal, Enrique
2016-12-22
<p>This dataset arises from the READ project (Horizon 2020).</p>
<p>The dataset consists of a subset of documents from the Ratsprotokolle collection composed of minutes of the council meetings held from 1470 to 1805 (about 30.000 pages), which will be used in the READ project. This dataset is written in Early Modern German. The number of writers is unknown. Handwriting in this collection is complex enough to challenge the HTR software.</p>
<p>The training dataset is composed of 400 pages; most of the pages consist of a single block with many difficulties for line detection and extraction. The ground-truth in this set is in PAGE format and it is provided annotated at line level in the PAGE files.</p>
https://doi.org/10.5281/zenodo.218236
oai:zenodo.org:218236
Zenodo
https://zenodo.org/communities/scriptnet
https://doi.org/
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
ICFHR2016 Competition on Handwritten Text Recognition on the READ Dataset
READ dataset Bozen
info:eu-repo/semantics/other
oai:zenodo.org:7512295
2023-03-04T07:25:01Z
openaire_data
user-scriptnet
Stefano Bazzaco (coord.)
Gaetano Lalomia
Daniela Santonocito
Manuel Garrobo Peral
Mónica Martín Molares
Carlota Cristina Fernández Travieso
Giulia Tomasi
Alessia Fichera
Soledad Castaño Santos
Almudena Izquierdo Andreu
2023-01-07
<pre>The SpanishRedonda_XVI-XVII_extended DATASET is conceived to be uploaded inside Transkribus platform
(READ Coop) to perform a training and create an HTR+ model for the automated recognition of Spanish
printed documents in Round script published between XV-XVI Century.
For further information please have a look to the README file here:
https://github.com/stefanobazzaco/HTR-model-SpanishRedonda_XVI-XVII_extended
For system requirements and information about Transkribus platform, go to:
https://readcoop.eu/transkribus/
https://readcoop.eu/transkribus/resources/
https://github.com/Transkribus</pre>
<p> </p>
https://doi.org/10.5281/zenodo.7512295
oai:zenodo.org:7512295
spa
Zenodo
https://github.com/stefanobazzaco/HTR-model-SpanishRedonda_XVI-XVII_extended-DATASET
https://zenodo.org/communities/scriptnet
https://doi.org/10.5281/zenodo.4889217
info:eu-repo/semantics/restrictedAccess
HTR
Transkribus
Spanish Redonda
Spanish printed documents
XVI-XVII Century documents
Progetto Mambrino
BIDISO
COMEDIC
HTR model SpanishRedonda_XVI-XVII_extended DATASET (v1.2)
info:eu-repo/semantics/other
oai:zenodo.org:854353
2020-01-24T19:26:19Z
openaire_data
user-scriptnet
user-eu
Fiel, Stefan
Kleber, Florian
Diem, Markus
Christlein, Vincent
Louloudis, Georgios
Stamatopoulos, Nikos
Gatos, Basilis
2017-08-30
<p>This dataset contains the test set for the ICDAR2017 Competition on Historical Document Writer Identification (Historical-WI).</p>
<p>The dataset used in this competition consists of 3600 handwritten pages originating from 13th to 20th century. It contains manuscripts from 720 different writers where each writer contributed five pages.</p>
<p> </p>
<p>Competition Website: https://scriptnet.iit.demokritos.gr/competitions/6/</p>
https://doi.org/10.5281/zenodo.854353
oai:zenodo.org:854353
Zenodo
https://zenodo.org/communities/scriptnet
https://zenodo.org/communities/eu
https://doi.org/10.5281/zenodo.854352
info:eu-repo/semantics/openAccess
Creative Commons Attribution Share Alike 4.0 International
https://creativecommons.org/licenses/by-sa/4.0/legalcode
Writer Identification
ScriptNet: ICDAR2017 Competition on Historical Document Writer Identification (Historical-WI)
info:eu-repo/semantics/other
oai:zenodo.org:1324999
2020-01-24T19:26:11Z
openaire_data
user-scriptnet
user-eu
Fiel, Stefan
Kleber, Florian
Diem, Markus
Christlein, Vincent
Louloudis, Georgios
Stamatopoulos, Nikos
Gatos, Basilis
2017-08-30
<p>This dataset contains the test set for the ICDAR2017 Competition on Historical Document Writer Identification (Historical-WI).</p>
<p>The dataset used in this competition consists of 3600 handwritten pages originating from 13th to 20th century. It contains manuscripts from 720 different writers where each writer contributed five pages.</p>
<p>Competition Website: https://scriptnet.iit.demokritos.gr/competitions/6/</p>
<p>Changes August 1st, 2018: uploaded trainings set in color and binarized</p>
<p>if you use the dataset please cite the ICDAR2017 Competition on Historical Document Writer Identification (Historical-WI) paper: https://doi.org/10.1109/ICDAR.2017.225</p>
<p> </p>
https://doi.org/10.5281/zenodo.1324999
oai:zenodo.org:1324999
Zenodo
https://zenodo.org/communities/scriptnet
https://zenodo.org/communities/eu
https://doi.org/10.5281/zenodo.854352
info:eu-repo/semantics/openAccess
Creative Commons Attribution Share Alike 4.0 International
https://creativecommons.org/licenses/by-sa/4.0/legalcode
Writer Identification
ScriptNet: ICDAR2017 Competition on Historical Document Writer Identification (Historical-WI)
info:eu-repo/semantics/other
oai:zenodo.org:2567398
2020-01-24T19:25:10Z
openaire_data
user-scriptnet
user-cvl
user-eu
Diem Markus
Kleber Florian
Gatos Basilis
2019-02-17
<p>This dataset contains the training, evaluation, and test set for the ICDAR 2019 Competition on Baseline Detection (cBAD).</p>
<p>A newly created freely available real world dataset consisting of 3021 annotated document page images that are collected from seven European archives and form the basis of cBAD. The baselines in all images were manually annotated. The training and the evaluation sets contain PAGE XMLs with annotated text regions and baselines. The groundtruth for the test set will be published after the competition deadline (May 2019).</p>
<p>Competition Website: https://scriptnet.iit.demokritos.gr/competitions/11/</p>
https://doi.org/10.5281/zenodo.2567398
oai:zenodo.org:2567398
eng
Zenodo
https://zenodo.org/communities/cvl
https://zenodo.org/communities/scriptnet
https://zenodo.org/communities/eu
https://doi.org/10.5281/zenodo.2567397
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
Baseline Detection
cBAD
Historical documents
ICDAR2019 Competition
ICDAR 2019 Competition on Baseline Detection (cBAD)
info:eu-repo/semantics/other
oai:zenodo.org:1442182
2020-01-24T19:24:39Z
openaire_data
user-scriptnet
user-eu
Strauss, Tobias
Leifert, Gundram
Labahn, Roger
Hodel, Tobias
Mühlberger, Günter
2018-10-01
<p>The main idea of this dataset is to analyse the impact of training data. How many training data specific to the document, you are transcribing, is necessary? </p>
<p><strong>general data: </strong>This is a collection of heterogeneous documents to train an initial system. For each text line there is an image file of that line, a file with the ground truth text and an information file containing an automatically generated surrounding polygon.</p>
<p><strong>specific data: </strong>The specific data contains documents related to the test data. For the specific systems only the images of the train list may be used. The file are of the same type as the general data.</p>
<p><strong>test data: </strong>The test data contains only the images and the information files.</p>
<p>More Information, some published results and an evaluation procedure at https://scriptnet.iit.demokritos.gr/competitions/10/</p>
https://doi.org/10.5281/zenodo.1442182
oai:zenodo.org:1442182
deu
Zenodo
https://zenodo.org/communities/scriptnet
https://zenodo.org/communities/eu
https://doi.org/10.5281/zenodo.1442181
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
text recognition, adaptation to new hands,
Dataset for ICFHR2018 Competition on Automated Text Recognition on a READ Dataset
info:eu-repo/semantics/other
oai:zenodo.org:1490009
2020-01-24T19:25:37Z
openaire_data
user-scriptnet
user-eu
Emilio Granell
Carlos-D. Martínez-Hinarejos
2018-11-16
<p>The Rodrigo<em> </em>corpus was obtained from the digitisation of the book “Historia de España del arçobispo Don Rodrigo”, written in ancient Spanish in 1545. It is a single writer book where most pages consist of a single block of well-separated lines of calligraphical text.</p>
<p>This dataset is free available for research purposes. It contains 15,010 images of text lines with their paleographic transcription. It is divided into three partitions: 9000 text lines for training, 1000 for validation and 5010 for testing.</p>
Work partially supported by projects READ: Recognition and Enrichment of Archival Documents - 674943 (European Union's H2020) and CoMUN-HaT: Context, Multimodality and User Collaboration in Handwritten Text Processing - TIN2015-70924-C2-1-R (MINECO/FEDER).
https://doi.org/10.5281/zenodo.1490009
oai:zenodo.org:1490009
osp
Zenodo
https://zenodo.org/communities/scriptnet
https://zenodo.org/communities/eu
https://doi.org/10.5281/zenodo.1490008
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
Historical manuscript
Handwritten text recognition
The Rodrigo corpus
info:eu-repo/semantics/other
oai:zenodo.org:7512383
2023-01-08T02:26:52Z
openaire_data
user-scriptnet
Stefano Bazzaco (coord.)
Federica Zoppi
Giada Blasut
Nuria Aranda García
Ángela Torralba Ruberte
Ana-Milagros Jiménez Ruiz
Pedro Monteiro
José Manuel Fradejas
Eduardo Camero Santos
Laura Lecina Nogués
Almudena Izquierdo Andreu
2023-01-07
<pre>The SpanishGothic_XV-XVI_extended DATASET is conceived to be uploaded inside Transkribus
platform (READ Coop) to perform a training and create an HTR+ model for the automated
recognition of Spanish printed documents in Gothic script published between XV-XVI Century.
For further information please have a look to the README file here:
https://github.com/stefanobazzaco/HTR-model-SpanishGothic_XV-XVI_extended
For system requirements and information about Transkribus platform, go to:
https://readcoop.eu/transkribus/
https://readcoop.eu/transkribus/resources/
https://github.com/Transkribus</pre>
https://doi.org/10.5281/zenodo.7512383
oai:zenodo.org:7512383
spa
Zenodo
https://hdl.handle.net/ISSN 2254-7290/https://www.janusdigital.es/articulo.htm?id=160
https://github.com/stefanobazzaco/HTR-model-SpanishGothic_XV-XVI_extended-DATASET
https://zenodo.org/communities/scriptnet
https://doi.org/10.5281/zenodo.4888926
info:eu-repo/semantics/restrictedAccess
HTR
Transkribus
Spanish Gothic
Spanish printed documents
XV-XVI Century documents
Progetto Mambrino
BIDISO
COMEDIC
HTR model SpanishGothic_XV-XVI_extended DATASET (v1.2)
info:eu-repo/semantics/other
oai:zenodo.org:1491441
2020-01-24T19:25:49Z
openaire_data
user-scriptnet
user-cvl
user-eu
Diem, Markus
Kleber, Florian
Fiel, Stefan
Grüning, Tobias
Gatos, Basilis
2017-01-23
<p>This dataset contains the training and test set for the ICDAR 2017 Competition on Baseline Detection in Archival Documents (cBAD).</p>
<p>A newly created freely available real world dataset consisting of 2035 annotated document page images that are collected from 9 different archives and form the basis of cBAD. Two competition tracks test different characteristics of the methods submitted. Track A [Simple Documents] is published with annotated text regions and tests therefore a method's quality of text line segmentation. The more challenging Track B [Complex Documents] provides only the page area. Hence, baseline detection algorithms need to correctly locate text lines in the presence of marginalia, tables, and noise.</p>
<p>The dataset comprises images with additional PAGE XMLs. The PAGE XMLs contain text regions and baseline annotations.</p>
<p>Competition Website: https://scriptnet.iit.demokritos.gr/competitions/5/</p>
<p>Version 3 is the version of the cBad competition</p>
<p>Version 4 contains also the page region and in case of a double-page the page split as separator.</p>
This project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 674943
https://doi.org/10.5281/zenodo.1491441
oai:zenodo.org:1491441
Zenodo
https://zenodo.org/communities/cvl
https://zenodo.org/communities/scriptnet
https://zenodo.org/communities/eu
https://doi.org/10.5281/zenodo.746925
info:eu-repo/semantics/openAccess
Creative Commons Attribution Share Alike 4.0 International
https://creativecommons.org/licenses/by-sa/4.0/legalcode
Baseline Detection
Text Line Segmentation
Historical Documents
ICDAR 2017 Competition
cBAD
ScriptNet: ICDAR 2017 Competition on Baseline Detection in Archival Documents (cBAD)
info:eu-repo/semantics/other
oai:zenodo.org:4599472
2021-03-15T12:35:02Z
openaire_data
user-newseye
user-scriptnet
user-eu
Muehlberger, Guenter
Hackl, Guenter
2021-03-11
<p>The dataset comprises finnish newspaper pages from late 18th till early 20th century with carefully corrected text. The page images were provided by the <a href="https://www.kansalliskirjasto.fi/en/">National Library Finland</a> (NLF) and comprise 526 pages (training set) and 8 pages (validation set). The data are formed according to the PAGE format (cf. Cf. <a href="https://github.com/PRImA-Research-Lab/PAGE-XML/">https://github.com/PRImA-Research-Lab/PAGE-XML/</a>) and were produced with the <a href="http://read.transkribus.eu/">Transkribus </a>platform with support of the <a href="http://newseye.eu/">NewsEye</a> and the <a href="http://read.transkribus.eu/">READ </a>project.</p>
https://doi.org/10.5281/zenodo.4599472
oai:zenodo.org:4599472
fin
Zenodo
https://zenodo.org/communities/newseye
https://zenodo.org/communities/scriptnet
https://zenodo.org/communities/eu
https://doi.org/10.5281/zenodo.4599471
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
OCR
Text Recognition
Transkribus
NewsEye / READ OCR training dataset from Finnish Newspapers (18th, 19th, early 20th C.)
info:eu-repo/semantics/other
oai:zenodo.org:1243098
2020-01-24T19:24:34Z
openaire_data
user-scriptnet
user-eu
Déjean Hervé
Lang Eva
Kleber Florian
2018-04-23
<p>Datasets used in the publication : Comparing Machine Learning Approaches for Table Recognition in Historical Register Books, Hervé Déjean, Jean-Luc Meunier, Stéphane Clinchant, Eva Maria Lang and Florian Kleber, 13TH IAPR INTERNATIONAL WORKSHOP ON DOCUMENT ANALYSIS SYSTEMS DAS 2018; Vienna, Austria</p>
<p>CHANGES:</p>
<p>07/05/2018: second version: dataset150 images were missing</p>
<p>dataset111<br>
img/ images<br>
xml/ READ pagexml with BIESO annotation</p>
<p>dataset150<br>
img/ images<br>
GT_xml: READ pagexml with BIESO annotation<br>
WK_xml: workflow version: pagexml corresponding to the workflow outputs (textlines are automatically recognised, columns as well)<br>
ROWREF: GT for the row regions</p>
<p>Tagset (attribute of the TextLine element)</p>
<p>Type: deprecated</p>
<p>DU_row:<br>
B: first element of cell<br>
I: inside a cell<br>
E: last element of a cell<br>
S: single element of a cell<br>
O: outside the table</p>
https://doi.org/10.5281/zenodo.1243098
oai:zenodo.org:1243098
Zenodo
https://zenodo.org/communities/scriptnet
https://zenodo.org/communities/eu
https://doi.org/10.5281/zenodo.1226878
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
READ Table Understanding Handwritten Texts
READ ABP Table datasets
info:eu-repo/semantics/other
oai:zenodo.org:3895674
2020-06-23T13:04:07Z
user-scriptnet
user-kbnl
Romein, Christel Annemieke
2020-06-16
<p>Presentation for the TTM (Things that Matter 6) summerschool in Durham</p>
https://doi.org/10.5281/zenodo.3895674
oai:zenodo.org:3895674
eng
Zenodo
https://zenodo.org/communities/kbnl
https://zenodo.org/communities/scriptnet
https://doi.org/10.5281/zenodo.3895673
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
digital scholarship
KB Lab
Digital Humanities
Data
Books
Entangled Histories
From book to data. Tensions and opportunities
info:eu-repo/semantics/article
oai:zenodo.org:10810509
2024-03-12T19:29:13Z
software
user-scriptnet
Engels, James
Robert Barnett
Erhard, Franz Xaver
Hill, Nathan
2024-03-05
<p>This utility accepts Transkribus PageXML as input and then interprets the text regions on each page/image (such as headers, titles, blocks of text, etc.) as "paragraphs" and returns the raw text of each paragraph along with its metadata.</p>
<p><strong>Paragraph Extractor </strong>was developed by James Engels of SOAS University of London for the <a href="https://research.uni-leipzig.de/diverge/">Divergent Discourses</a> project. The project is a joint study involving SOAS University of London and Leipzig University, funded by the AHRC in the UK and the DFG in Germany.</p>
<p>Please acknowledge the project in any use of these materials. Copyright for the project resides with the two univerisities. </p>
https://doi.org/10.5281/zenodo.10810509
oai:zenodo.org:10810509
Zenodo
https://github.com/Divergent-Discourses
https://zenodo.org/communities/scriptnet
https://doi.org/10.5281/zenodo.10810508
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
Transkribus
OCR
HTR
Text regions
Lay-out
Transkribus_utils: Paragraph Extractor
info:eu-repo/semantics/other
oai:zenodo.org:4030907
2020-10-08T07:33:58Z
openaire
user-scriptnet
Stefano Bazzaco
2020-09-15
<p>Theorical presentation of OCR/HTR history, developments and techniques</p>
<p>Comparison (pros and cons) between OCR/HTR systems</p>
<p>Progetto Mambrino results using Transkribus with Spanish Gothic, Spanish Romana, Italian Italics scripts XVth-XVIth Century</p>
<p>presented on 05/05/20 at University of Verona</p>
https://doi.org/10.5281/zenodo.4030907
oai:zenodo.org:4030907
eng
Zenodo
https://doi.org/10.13136/2284-2667/89
https://zenodo.org/communities/scriptnet
https://doi.org/10.5281/zenodo.4030906
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
OPTICAL CHARACTER RECOGNITION (OCR) SOFTWARES FOR SEMI AUTOMATED TRANSCRIPTION OF PRINTED AND HANDWRITTEN TEXTS, University of Verona, 05 may 2020
HTR
OCR
Transkribus
READ
Progetto Mambrino (University of Verona)
Theorical OCR/HTR and Transkribus presentation for Phd Seminar (University of Verona 05/05/20)
info:eu-repo/semantics/lecture
oai:zenodo.org:1421600
2022-04-05T12:45:07Z
openaire_data
user-scriptnet
user-cvl
user-eu
user-iapr-tc11
Fiel, Stefan
Kleber, Florian
Lang, Eva-Maria
Fronhöfer, Wolfgang
2018-09-19
<p>A hand is usually considered as a unique characteristic of a person. However, it may slightly change over their whole lifespan. This change might be due to some physical or mental issues. To the best of our knowledge, there is no dataset available, which covers this aspect of evolvement of handwriting of a single person.</p>
<p>When dealing with archival documents, it is important to show that methods are invariant against these changes or investigate how much of these changes are covered. Thus, a new dataset was created with data of the Passau Diocesan Archives (ABP, <a href="https://www.bistum-passau.de/bistum/archiv">https://www.bistum-passau.de/bistum/archiv</a> ).</p>
<p>The documents originate from death records of different villages or towns in the Diocese of Passau. Usually the writer of these records (mostly the priest) remains the same over several years. In total, the dataset consists of 1766 pages, which originate from 28 different writers. The number of pages per writer varies from 7 up to 311. For some writers, we only have data from 3 different years, whereas the largest time span between two documents of the same writer is 31 years.</p>
<p>The dataset is organized as follows:</p>
<p>[ID]_[Name]\[YEAR]\[ID]_filename.png</p>
<p>The corresponding PAGE XML file is provided along with the dataset and contains the regions of the image where text is included. This file can be used to calculate features of the writer solely on the handwriting and not on the table lines.</p>
<p>Currently no research tasks are defined on the dataset; we leave this up to the community. Drop us a note how you are using this dataset.</p>
This project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 674943
https://doi.org/10.5281/zenodo.1421600
oai:zenodo.org:1421600
Zenodo
https://zenodo.org/communities/iapr-tc11
https://zenodo.org/communities/cvl
https://zenodo.org/communities/scriptnet
https://zenodo.org/communities/eu
https://doi.org/10.5281/zenodo.1421599
info:eu-repo/semantics/openAccess
Creative Commons Attribution Share Alike 4.0 International
https://creativecommons.org/licenses/by-sa/4.0/legalcode
Writer Identification
Writing Style over decades
Writer Retrieval
READ ABP WI Dataset - Writer Identification over decades
info:eu-repo/semantics/other
oai:zenodo.org:3239032
2022-04-05T12:44:36Z
openaire_data
user-scriptnet
user-cvl
user-eu
user-iapr-tc11
Déjean, Hervé
Meunier, Jean-Luc
Gao, Liangcai
Huang, Yilun
Fang, Yu
Kleber, Florian
Lang, Eva-Maria
2019-04-23
<p>The aim of this competition is to evaluate the performance of state of the art methods for table detection (TRACK A) and table recognition (TRACK B). For the first track, document images containing one or several tables are provided. For TRACK B two subtracks exist: the first subtrack (B.1) provides the table region. Thus, only the table structure recognition must be performed. The second subtrack (B.2) provides no a-priori information. This means, the table region and table structure detection has to be done. The Ground Truth is provided in a similar format as for the ICDAR 2013 competition (see [2]):</p>
<p><?xml version="1.0" encoding="UTF-8"?></p>
<p><<strong>document</strong> filename='filename.jpg'></p>
<p> <<strong>table</strong> id='Table_1540517170416_3'></p>
<p><strong> <Coords points="180,160 4354,160 4354,3287 180,3287"/></strong></p>
<p> <<strong>cell</strong> id='TableCell_1540517477147_58' <strong>start-row</strong>='0' <strong>start-col</strong>='0' <strong>end-row</strong>='1' <strong>end-col</strong>='2'></p>
<p> <<strong>Coords</strong> <strong>points</strong>="180,160 177,456 614,456 615,163"/></p>
<p> </cell></p>
<p> ...</p>
<p> </table></p>
<p> ...</p>
<p></document></p>
<p> </p>
<p>The difference to Gobel et al. [2] is the Coords tag which defines a table/cell as a polygon specified by a list of coordinates. For B.1 the table and its coordinates is given together with the input image.</p>
<p>Important Note:</p>
<p>For the modern dataset, the convex hull of the content describes a cell region. For the historical dataset, it is requested that the output region of a cell is the cell boundary. This is necessary due to the characteristics of handwritten text, which is often overlapping with different cells.</p>
<p>See also: http://sac.founderit.com/tasks.html</p>
<p>The evaluation tool is available at github: https://github.com/cndplab-founder/ctdar_measurement_tool</p>
http://sac.founderit.com/
https://doi.org/10.5281/zenodo.3239032
oai:zenodo.org:3239032
Zenodo
https://zenodo.org/communities/iapr-tc11
https://zenodo.org/communities/cvl
https://zenodo.org/communities/scriptnet
https://zenodo.org/communities/eu
https://doi.org/10.5281/zenodo.2649216
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
ICDAR, International Conference on Document Analysis and Recognition, Sydney, Australia, 2019
ICDAR 2019 Competition on Table Detection and Recognition (cTDaR)
info:eu-repo/semantics/other
oai:zenodo.org:4889218
2023-03-04T07:25:00Z
openaire_data
user-scriptnet
Stefano Bazzaco (coord.)
Gaetano Lalomia
Daniela Santonocito
Manuel Garrobo Peral
Mónica Martín Molares
Carlota Cristina Fernández Travieso
2021-06-01
<pre>The SpanishRedonda_XVI-XVII_extended DATASET is conceived to be uploaded inside Transkribus platform
(READ Coop) to perform a training and create an HTR+ model for the automated recognition of Spanish
printed documents in Round script published between XV-XVI Century.
For further information please have a look to the README file here:
https://github.com/stefanobazzaco/HTR-model-SpanishRedonda_XVI-XVII_extended
For system requirements and information about Transkribus platform, go to:
https://readcoop.eu/transkribus/
https://readcoop.eu/transkribus/resources/
https://github.com/Transkribus</pre>
<p> </p>
https://doi.org/10.5281/zenodo.4889218
oai:zenodo.org:4889218
spa
Zenodo
https://github.com/stefanobazzaco/HTR-model-SpanishRedonda_XVI-XVII_extended-DATASET
https://zenodo.org/communities/scriptnet
https://doi.org/10.5281/zenodo.4889217
info:eu-repo/semantics/restrictedAccess
HTR
Transkribus
Spanish Redonda
Spanish printed documents
XVI-XVII Century documents
Progetto Mambrino
BIDISO
COMEDIC
HTR model SpanishRedonda_XVI-XVII_extended DATASET
info:eu-repo/semantics/other
oai:zenodo.org:4073478
2020-10-09T13:13:34Z
user-scriptnet
Romein, Christel Annemieke
2020-10-09
<p>This is webinar #3 on Transkribus</p>
<ul>
<li>Pylaia and how to use it. (This might get a bit boring, but I will explain the parameters so that you have this information while documentation is currently absent.)</li>
<li>Tables and Layout analysis (a quick tweak to make life easier)</li>
<li>How to remove tiny regions (noise).</li>
<li>How to create a sample set to function as a sturdy basis for your models.</li>
<li>And a quick update on one new feature in P2PaLA and Read and Search.</li>
</ul>
https://doi.org/10.5281/zenodo.4073478
oai:zenodo.org:4073478
eng
Zenodo
https://zenodo.org/communities/scriptnet
https://doi.org/10.5281/zenodo.4073477
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
digital scholarship
Digital Humanities
Computational linguistics
Handwriting Text Recognition
Transkribus
Webinar
Transkribus webinar #3 update & pylaia (video)
info:eu-repo/semantics/other
oai:zenodo.org:248733
2020-01-24T19:25:56Z
user-18thcenturybritishhistory
openaire_data
user-scriptnet
user-eu
Sánchez, J.A.
Toselli, A.H.
Romero, V.
Vidal, E.
2017-01-17
<p>This dataset comprises the dataset used for the ICDAR 2015 Competition on Handwritten Text Recognition on the tranScriptorium Dataset. The handwritten images for this contest were drawn from the English “Bentham collection” dataset used in the TRAN SCRIPTORIUM project. The selected data has been written by several hands and entails significant variabilities and difficulties regarding the quality of text images, writing styles and crossed-out text. This contest is clearly more difficult than the the first edition both for training and for testing. A portion of the training dataset and the full test dataset were provided in the form of carefully segmented line images, along with the corresponding transcripts. Another portion of the training dataset was provided as raw images and their corresponding transcripts at region level.<br>
</p>
<p>ICDAR 2015 competition HTRtS: handwritten text recognition on the tranScriptorium dataset<br>
JA Sánchez, AH Toselli, V Romero, E Vidal. In International Conference on Document Analysis and Recognition (ICDAR), pp. 1166-1170, 2015.</p>
https://doi.org/10.5281/zenodo.248733
oai:zenodo.org:248733
Zenodo
https://zenodo.org/communities/18thcenturybritishhistory
https://zenodo.org/communities/scriptnet
https://zenodo.org/communities/eu
https://doi.org/
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
Handwritten Text Recognition of Historical Documents, Pattern Recognition, Machine Learning
ICDAR 2015 Competition HTRtS: Handwritten Text Recognition on the tranScriptorium Dataset
info:eu-repo/semantics/other
oai:zenodo.org:1322666
2020-01-24T19:25:59Z
openaire_data
user-scriptnet
user-eu
Quirós, Lorenzo
Serrano, Lluís
Bosch, Vicente
Toselli, Alejandro H.
Congost, Rosa
Saguer, Enric
Vidal, Enrique
2018-07-27
<p>This dataset is a subset of 596 documents from the <em>Registre d'Hipoteques de Girona</em> of 1769 collection, guarded by the <a href="http://xac.gencat.cat/ca/llista_arxius_comarcals/girones/"><em>Arxiu Històric de Girona</em></a>. This collection, is composed by hundreds of thousands of notarial deeds from the XVIII-XIX century (1768-1862). Sales, redemption of censuses, inheritance and matrimonial chapters are among the most common documentary typologies in the collection.</p>
<p>This dataset is composed of more than 23700 text lines written by a single hand, covering more that 50 different topics (documentary typologies) and a vocabulary of more than 2400 different words. The documents are transcribed using the so-called diplomatic criteria. Additionally, transcripts were tagged with <br>
extra enriching/complementary information (e.g. expansion of the abbreviations, hyphen marks, etc.). Along with the transcripts the layout of the document is detected and recorded. Pages have been labeled using six different layout regions.</p>
<p>The images along with their respective ground-truth was compiled in PAGE compliant XML format<br>
by the <a href="http://www2.udg.edu/tabid/11296/Default.aspx"><em>Centre de Recerca d'Història Rural</em></a> and the HTR group of the <a href="https://www.prhlt.upv.es">Pattern Recognition and Human Language Technologies Research Center</a>.</p>
This work is partially funded by READ project (Ref. 674943), Spanish Ministry of Science and Innovation project HAR2014-54891-P/HIST, ICREA Acadèmia 2013 and Fundación BBVA project EXPLORHIST.
https://doi.org/10.5281/zenodo.1322666
oai:zenodo.org:1322666
osp
Zenodo
https://zenodo.org/communities/scriptnet
https://zenodo.org/communities/eu
https://doi.org/10.5281/zenodo.1322665
info:eu-repo/semantics/openAccess
Creative Commons Attribution Non Commercial 4.0 International
https://creativecommons.org/licenses/by-nc/4.0/legalcode
handwritten text document
handwritten text recognition
document layout analysis
XVIII-XIX century
notarial deeds
Oficio de Hipotecas de Girona. A dataset of Spanish notarial deeds (18th Century) for Handwritten Text Recognition and Layout Analysis of historical documents.
info:eu-repo/semantics/other
oai:zenodo.org:257972
2020-01-24T19:26:00Z
openaire_data
user-scriptnet
user-cvl
user-eu
Diem, Markus
Kleber, Florian
Fiel, Stefan
Grüning, Tobias
Gatos, Basilis
2017-01-23
<p>This dataset contains the training and test set for the ICDAR 2017 Competition on Baseline Detection in Archival Documents (cBAD).</p>
<p>Two newly created, freely available, real world datasets are the basis for the competition. There will be two tracks of participation. The first track deals with the basic baseline detection of handwritten texts in paragraph form. In total 750 pages of handwritten archival documents (no tables or marginalia) with manually annotated baselines and text regions (paragraphs) are prepared. The second track consists of more challenging data including tables, marginalia, and noisy document images. Textlines can be skewed up to 180°. About 1200 pages of archival documents (handwritten and printed documents) have been manually annotated. For both tracks, the images are provided from 9 different archives and document collections.</p>
<p>The training set comprises images with additional PAGE XMLs while the test set consists of images only. The PAGE XML contains text regions, e.g. paragraphs.</p>
<p>Competition Website: https://scriptnet.iit.demokritos.gr/competitions/5/</p>
This project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 674943
https://doi.org/10.5281/zenodo.257972
oai:zenodo.org:257972
Zenodo
https://zenodo.org/communities/cvl
https://zenodo.org/communities/scriptnet
https://zenodo.org/communities/eu
https://doi.org/10.5281/zenodo.746925
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
Baseline Detection, Archival Documents, ICDAR 2017 Competition, cBAD
ScriptNet: ICDAR 2017 Competition on Baseline Detection in Archival Documents (cBAD)
info:eu-repo/semantics/other
oai:zenodo.org:5381739
2021-09-13T12:51:42Z
user-scriptnet
Ball, Rachael
Parker, Geoffrey
The Hispanic Society of America
Romein, Christel Annemieke
2021-09-03
<p><strong>Samples of Charles V/ Carlos V’s handwriting </strong></p>
<p>It is based upon the book: Rachael Ball and Geoffrey Parker (eds.), <em>Cómo ser rey. Instrucciones del emperador Carlos V a su hijo Felipe. Mayo de 1543</em>. Spanish-English bilingual edition; jointly published with the CSA and The Hispanic Society of America; 2014; ISBN 978-84-15245-45-2. You can find an HTR+-model in Transkribus as of 3 September 2021 named <em>Carlos V/ Charles V – </em>Early modern Spanish – 15<sup>th</sup> century (1543).</p>
<p>The samples below give an indication how the handwriting of Charles V looked like in 1543. These examples are provided with The Hispanic Society of America’s permission.</p>
https://doi.org/10.5281/zenodo.5381739
oai:zenodo.org:5381739
osp
Zenodo
https://zenodo.org/communities/scriptnet
https://doi.org/10.5281/zenodo.5381738
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
Transkribus
Digital Humanities
HTR
Charles V
Sample of Charles V handwriting
info:eu-repo/semantics/other
oai:zenodo.org:1164045
2020-01-24T19:25:56Z
openaire_data
user-scriptnet
user-eu
Toselli, A.H.
Romero, V.
Villegas, M.
Vidal, E.
Sánchez, J.A.
2018-02-01
<p>This dataset arises from the READ project (Horizon 2020).</p>
<p>The dataset consists of a subset of documents from the Ratsprotokolle collection composed of minutes of the council meetings held from 1470 to 1805 (about 30.000 pages), which will be used in the READ project. This dataset is written in Early Modern German. The number of writers is unknown. Handwriting in this collection is complex enough to challenge the HTR software.</p>
<p>The training dataset is composed of 400 pages; most of the pages consist of a single block with many difficulties for line detection and extraction. The ground-truth in this set is in PAGE format and it is provided annotated at line level in the PAGE files.</p>
<p>The previous dataset is the same that is located at https://zenodo.org/record/218236#.WnLhaCHhBGF</p>
<p>The new file includes the test set corresponding to the HTR competition held at ICFHR 2016</p>
https://doi.org/10.5281/zenodo.1164045
oai:zenodo.org:1164045
gml
Zenodo
https://zenodo.org/communities/scriptnet
https://zenodo.org/communities/eu
https://doi.org/10.5281/zenodo.1164027
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
Handwriting Text Recognition
HTR Dataset ICFHR 2016
info:eu-repo/semantics/other
oai:zenodo.org:4780947
2021-05-25T08:14:22Z
openaire_data
user-scriptnet
Peter Stotz
Phillip Ströbel
2021-05-22
<p>This is ground truth for Ruolph Gwalther's (1519-1586) handwriting taken from his book "Lateinische" Gedichte", where he accumulated writings between 1540 and 1580.</p>
<p>Data collection and ground truth creation:</p>
<p>At the time we collected the data, we found 150 images with corresponding transcriptions by Peter Stotz on <a href="https://www.e-manuscripta.ch/zuz/content/titleinfo/1111284">e-manuscripta</a> (reference: Gwalther, Rudolf: Lateinische Gedichte. Zürich, 1540-1580. Zentralbibliothek Zürich, Ms D 152, <a href="https://doi.org/10.7891/e-manuscripta-26750">https://doi.org/10.7891/e-manuscripta-26750</a> / Public Domain Mark) . We removed 8 images with too many corrections or vertical texts. Next, we uploaded the images into the <a href="https://readcoop.eu/de/transkribus/">Transkribus</a> platform, applied the line recognition tool and manually copied the transcribed text lines into the recognised line boxes. During this process, we made some corrections, which were mainly due to inconsistencies in punctuation and capitalised letters.</p>
<p>Key figures<br>
<strong>Language </strong>Latin</p>
<p><strong>Images </strong>142</p>
<p><strong>Lines </strong>4,037</p>
<p><strong>Words </strong>26,088</p>
<p>Contact:</p>
<p>Phillip Ströbel <a href="mailto:pstroebel@cl.uzh.ch">pstroebel@cl.uzh.ch</a> - in case of questions</p>
https://doi.org/10.5281/zenodo.4780947
oai:zenodo.org:4780947
lat
Zenodo
https://github.com/bullinger-digital/gwalther-handwriting-ground-truth/tree/v1.0
https://zenodo.org/communities/scriptnet
https://doi.org/10.5281/zenodo.4780946
info:eu-repo/semantics/openAccess
Other (Open)
Digital Humanities
Handwritten Text Recognition
Handwriting
Latin
Reformation
bullinger-digital/gwalther-handwriting-ground-truth: Initial release
info:eu-repo/semantics/other
oai:zenodo.org:215383
2020-01-24T19:25:26Z
openaire_data
user-scriptnet
user-eu
Tobias Grüning, Gundram Leifert, Johannes Michael, Tobias Strauß, Max Weidemann, Roger Labahn
2016-12-21
<p>This dataset arises from the READ project (Horizon 2020).<br>
<br>
Images were provided and enriched under the lead of Dr. Dirk Alvermann (Universitätsarchiv Greifswald - Germany).<br>
All in all this dataset contains 8770 trainscribed textlines of handwritten historical documents from the late 18th century.</p>
<p>Besides the images and page-files (containing geometric textline information and transcripts), lists dividing the dataset in train and test data are provided (each list element contains the corresponding image, textregion and textline identifiers and therefore an explicit mapping of a list element to a textline is possible). Furthermore sublists of the train list are given.<br>
</p>
https://doi.org/10.5281/zenodo.215383
oai:zenodo.org:215383
Zenodo
https://zenodo.org/communities/scriptnet
https://zenodo.org/communities/eu
https://doi.org/
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
handwritten, historical, HTR
read_dataset_german_konzilsprotokolle
info:eu-repo/semantics/other
oai:zenodo.org:3234502
2020-01-24T19:25:18Z
openaire_data
user-scriptnet
user-cvl
user-eu
Diem Markus
Kleber Florian
Gatos Basilis
2019-02-17
<p>This dataset contains the training, evaluation, and test set for the ICDAR 2019 Competition on Baseline Detection (cBAD).</p>
<p>A newly created freely available real world dataset consisting of 3021 annotated document page images that are collected from seven European archives and form the basis of cBAD. The baselines in all images were manually annotated. The training and the evaluation sets contain PAGE XMLs with annotated text regions and baselines.</p>
<p>Competition Website: https://scriptnet.iit.demokritos.gr/competitions/11/</p>
https://doi.org/10.5281/zenodo.3234502
oai:zenodo.org:3234502
eng
Zenodo
https://zenodo.org/communities/cvl
https://zenodo.org/communities/scriptnet
https://zenodo.org/communities/eu
https://doi.org/10.5281/zenodo.2567397
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
Baseline Detection
cBAD
ICDAR2019 Competition
Historical documents
ICDAR 2019 Competition on Baseline Detection (cBAD)
info:eu-repo/semantics/other
oai:zenodo.org:6542056
2022-05-30T07:08:15Z
openaire_data
user-scriptnet
user-iapr-tc11
Maarand, Martin
Beyer, Yngvil
Kåsen, Andre
2022-05-20
<p>The dataset comprises Norwegian letter and diary line images and text from 19th and early 20th century.</p>
https://doi.org/10.5281/zenodo.6542056
oai:zenodo.org:6542056
nor
Zenodo
https://zenodo.org/communities/iapr-tc11
https://zenodo.org/communities/scriptnet
https://doi.org/10.5281/zenodo.6542055
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
NorHand / Dataset for Handwritten Text Recognition in Norwegian
info:eu-repo/semantics/other