Frequently asked questions

What is the IMPACT Centre of Competence?

The IMPACT Centre of Competence in Digitisation is a not for profit organisation, comprised of public and private institutions, with the mission to make the digitisation of historical printed text “better, faster, cheaper”. It provides tools, services and facilities to further advance the state-of-the-art in the field of document imaging, language technology and the processing of historical text.

IMPACT is governed by a Board of institutions (premium members) and it is located at the facilities of the Fundación General de la Universidad de Alicante in Alicante. The daily management of the IMPACT Centre is performed by a Manager, a Director, a Scientific and Technological Director and the Chair of the Executive Board.

What is the IMPACT Dataset?

The IMPACT dataset contains more than half a million representative text-based images compiled by a number of major European libraries. Covering texts from as early as 1500, and containing material from newspapers, books, pamphlets and typewritten notes, the dataset is an invaluable resource for future research into imaging technology, OCR and language enrichment.

A carefully selected subset of these images has been reproduced with accompanying “ground truth”.

What is Ground Truth?

In digital imaging and OCR, ground truth is the objective verification of the particular properties of a digital image, used to test the accuracy of automated image analysis processes. The ground truth of an image’s text content, for instance, is the complete and accurate record of every character and word in the image.

This can be compared to the output of an OCR engine and used to assess the engine’s accuracy, and how important any deviation from ground truth is in that instance.

The ground truth provided by the IMPACT Centre of Competence is stored and exchanged via xml instances in the Page Analysis and Ground-truth Elements (PAGE) format, which was developed by the University of Salford, and which is maintained at:

Which are the licences available for the resources?

The IMPACT dataset is mainly distributed under attribution, non-commercial, share alike licence, but please check every dataset for more information about its licensing schema.

Can I add my resources to the IMPACT Dataset?

Yes! Should you be interested in adding your resources to the IMPACT Dataset, please contact us at

This page uses 'cookies'. More information