DOI 10.5281/zenodo.13986218
- —nove Training Sets (TS) (cerca de 92% da totalidade dos dados; estado das transcrições do TS: Ground Truth);
- —o conjunto de validação final (VS) (cerca de 8% dos dados totais; estado das transcrições do VS: Ground Truth). Todas as pastas TS contêm apenas os novos dados adicionados ao treino seguinte (portanto, adicionados aos dados anteriores). Apenas o último VS, que está completo (505 p.), é fornecido. Um documento = imagens / páginas transcritas (Ground Truth: transcrição feita pelos membros do projeto TraPrInq (Transcrever os processos da Inquisição portuguesa, 1536-1821 | Transcribing the court records of the Portuguese Inquisition, 1536-1821), que durou de janeiro de 2023 a julho de 2024.
All data were imported from the platform Transkribus on which the AI model for automatic transcription “Portuguese Handwriting 16th-19th c.” was last trained in July 2023 with the recognition engine Pylaia, and can now be used. The data are divided into ten folders, according to the total number of the trainings, from the initial to the definitive one, plus one set for final validation. The eight previous trainings were realized between June 2022 and May 2023. The history of all trainings can be read on e-Inquisition. Each of these folders corresponds to one collection in the platform; every collection has a number of documents; every document has a number of images, or pages, as indicated below. The ten uploaded folders (zip) are distributed as follows:
- —nine Training Sets (TS) (ca 92% of the whole data; status of the transcriptions from the TS: Ground Truth);
- —the final Validation Set (VS) (ca 8% of the whole data; status of the transcriptions from the VS: Ground Truth). All TS folders contain only the new data added to the following training (thus added to the previous data). Only the last VS, which is complete (505 p.), is provided.
One document = images / transcribed pages (Ground Truth: transcription
made by the members of TraPrInq project (Transcrever os processos da
Inquisição portuguesa, 1536-1821 | Transcribing the court records of the
Portuguese Inquisition, 1536-1821), which lasted from January 2023 to
July 2024.
