Automatic medieval charters structure detection : A Bi-LSTM linear segmentation approach - Laboratoire d'Informatique Médicale et Ingénierie des Connaissances en e-Santé Accéder directement au contenu
Article Dans Une Revue Journal of Data Mining and Digital Humanities Année : 2022

Automatic medieval charters structure detection : A Bi-LSTM linear segmentation approach

Résumé

This paper presents a model aiming to automatically detect sections in medieval Latin charters. These legal sources are some of the most important sources for medieval studies as they reflect economic and social dynamics as well as legal and institutional writing practices. An automatic linear segmentation can greatly facilitate charter indexation and speed up the recovering of evidence to support historical hypothesis by the means of granular inquiries on these raw, rarely structured sources. Our model is based on a Bi-LSTM approach using a final CRF-layer and was trained using a large, annotated collection of medieval charters (4,700 documents) coming from Lombard monasteries: the CDLM corpus (11th-12th centuries). The evaluation shows a high performance in most sections on the test-set and on an external evaluation corpus consisting of the Montecassino abbey charters (10th-12th centuries). We describe the architecture of the model, the main problems related to the treatment of medieval Latin and formulaic discourse, and we discuss some implications of the results in terms of record-keeping practices in High Middle Ages.
Fichier principal
Vignette du fichier
Diplomatics_parts_bilsmt_Journal_data_mining (1).pdf (2.49 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03410057 , version 1 (30-10-2021)
hal-03410057 , version 2 (20-07-2022)

Identifiants

Citer

Sergio Torres Aguilar, Pierre Chastang, Xavier Tannier. Automatic medieval charters structure detection : A Bi-LSTM linear segmentation approach. Journal of Data Mining and Digital Humanities, 2022, 2022, ⟨10.46298/jdmdh.8646⟩. ⟨hal-03410057v2⟩
315 Consultations
595 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More