Terminologies augmented recurrent neural network model for clinical named entity recognition

Ivan Lerner; Nicolas Paris; Xavier Tannier

doi:10.1016/j.jbi.2019.103356

Article Dans Une Revue Journal of Biomedical Informatics Année : 2020

Terminologies augmented recurrent neural network model for clinical named entity recognition

, (1, 2) , (3)

1
2
3

Ivan Lerner

Fonction : Auteur
PersonId : 799102
ORCID : 0000-0002-5466-1707

Nicolas Paris

Fonction : Auteur

Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur

Assistance publique - Hôpitaux de Paris (AP-HP)

Xavier Tannier

Fonction : Auteur
PersonId : 18076
IdHAL : xtannier
ORCID : 0000-0002-2452-8868
IdRef : 113391722

Laboratoire d'Informatique Médicale et Ingénierie des Connaissances en e-Santé

Résumé

OBJECTIVE:We aimed to enhance the performance of a supervised model for clinical named-entity recognition (NER) using medical terminologies. In order to evaluate our system in French, we built a corpus for 5 types of clinical entities.METHODS:We used a terminology-based system as baseline, built upon UMLS and SNOMED. Then, we evaluated a biGRU-CRF, and a hybrid system using the prediction of the terminology-based system as feature for the biGRU-CRF. In French, we built APcNER, a corpus of 147 documents annotated for 5 entities (Drug names, Signs or symptoms, Diseases or disorders, Diagnostic procedures or lab tests and Therapeutic procedures). We evaluated each NER systems using exact and partial match definition of F-measure for NER. The APcNER contains 4,837 entities, which took 28 h to annotate. The inter-annotator agreement as measured by Cohen's Kappa was substantial for non-exact match (Κ = 0.61) and moderate considering exact match (Κ = 0.42). In English, we evaluated the NER systems on the i2b2-2009 Medication Challenge for Drug name recognition, which contained 8,573 entities for 268 documents, and i2b2-small a version reduced to match APcNER number of entities.RESULTS:For drug name recognition on both i2b2-2009 and APcNER, the biGRU-CRF performed better that the terminology-based system, with an exact-match F-measure of 91.1% versus 73% and 81.9% versus 75% respectively. For i2b2-small and APcNER, the hybrid system outperformed the biGRU-CRF, with an exact-match F-measure of 87.8% versus 85.6% and 86.4% versus 81.9% respectively. On APcNER corpus, the micro-average F-measure of the hybrid system on the 5 entities was 69.5% in exact match and 84.1% in non-exact match.CONCLUSION:APcNER is a French corpus for clinical-NER of five types of entities which covers a large variety of document types. The extension of the supervised model with terminology has allowed an easy increase in performance, especially for rare entities, and established near state of the art results on the i2b2-2009 corpus.

Mots clés

APcNER Clinical natural language processing Information extraction Machine learning Named entity recognition

Domaines

Informatique et langage [cs.CL]

Fichier principal

S1532046419302734.pdf (580.66 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Accord Elsevier CCSD : Connectez-vous pour contacter le contributeur

https://hal.science/hal-02428771

Soumis le : jeudi 21 juillet 2022-09:43:58

Dernière modification le : jeudi 14 décembre 2023-13:50:37

Archivage à long terme le : samedi 22 octobre 2022-20:32:38

Dates et versions

hal-02428771 , version 1 (21-07-2022)

Licence

Paternité - Pas d'utilisation commerciale

Identifiants

HAL Id : hal-02428771 , version 1
ARXIV : 1904.11473
DOI : 10.1016/j.jbi.2019.103356
PII : S1532-0464(19)30273-4
PUBMED : 31837473

Citer

Ivan Lerner, Nicolas Paris, Xavier Tannier. Terminologies augmented recurrent neural network model for clinical named entity recognition. Journal of Biomedical Informatics, 2020, 102, pp.103356. ⟨10.1016/j.jbi.2019.103356⟩. ⟨hal-02428771⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSERM UNIV-PARIS13 CNRS LIMSI LIMICS USPC UNIV-PARIS-SACLAY SORBONNE-UNIVERSITE SU-SCIENCES SORBONNE-PARIS-NORD LISN GS-ENGINEERING GS-COMPUTER-SCIENCE

158 Consultations

42 Téléchargements

Terminologies augmented recurrent neural network model for clinical named entity recognition

Résumé

Mots clés

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Altmetric

Partager