CardioBERTpt: Transformer-based Models for Cardiology Language Representation in Portuguese

dc.contributorSistema FMUSP-HC: Faculdade de Medicina da Universidade de São Paulo (FMUSP) e Hospital das Clínicas da FMUSP
dc.contributor.authorSCHNEIDER, Elisa Terumi Rubel
dc.contributor.authorGUMIEL, Yohan Bonescki
dc.contributor.authorSOUZA, Joao Vitor Andrioli de
dc.contributor.authorMUKAI, Lilian Mie
dc.contributor.authorOLIVEIRA, Lucas Emanuel Silva e
dc.contributor.authorREBELO, Marina de Sa
dc.contributor.authorGUTIERREZ, Marco Antonio
dc.contributor.authorKRIEGER, Jose Eduardo
dc.contributor.authorTEODORO, Douglas
dc.contributor.authorMORO, Claudia
dc.contributor.authorPARAISO, Emerson Cabrera
dc.date.accessioned2023-11-16T20:01:57Z
dc.date.available2023-11-16T20:01:57Z
dc.date.issued2023
dc.description.abstractContextual word embeddings and the Transformers architecture have reached state-of-the-art results in many natural language processing (NLP) tasks and improved the adaptation of models for multiple domains. Despite the improvement in the reuse and construction of models, few resources are still developed for the Portuguese language, especially in the health domain. Furthermore, the clinical models available for the language are not representative enough for all medical specialties. This work explores deep contextual embedding models for the Portuguese language to support clinical NLP tasks. We transferred learned information from electronic health records of a Brazilian tertiary hospital specialized in cardiology diseases and pre-trained multiple clinical BERT-based models. We evaluated the performance of these models in named entity recognition experiments, fine-tuning them in two annotated corpora containing clinical narratives. Our pre-trained models outperformed previous multilingual and Portuguese BERT-based models for cardiology and multi-specialty environments, reaching the state-of-the-art for analyzed corpora, with 5.5% F1 score improvement in TempClinBr (all entities) and 1.7% in SemClinBr (Disorder entity) corpora. Hence, we demonstrate that data representativeness and a high volume of training data can improve the results for clinical tasks, aligned with results for other languages.eng
dc.description.conferencedateJUN 22-24, 2023
dc.description.conferencelocalLAquila, ITALY
dc.description.conferencename36th IEEE International Symposium on Computer-Based Medical Systems (CBMS)
dc.description.indexPubMed
dc.description.indexWoS
dc.description.sponsorshipCoordenacao de Aperfeicoamento de Pessoal de Nivel Superior - Brasil (CAPES) [001]
dc.description.sponsorshipFoxconn Brazil
dc.description.sponsorshipZerbini Foundation as part of the research project Machine Learning in Cardiovascular Medicine
dc.identifier.citation2023 IEEE 36TH INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS, CBMS, p.378-381, 2023
dc.identifier.doi10.1109/CBMS58004.2023.00075
dc.identifier.isbn979-8-3503-1224-9
dc.identifier.issn2372-9198
dc.identifier.urihttps://observatorio.fm.usp.br/handle/OPI/56999
dc.language.isoeng
dc.publisherIEEE COMPUTER SOCeng
dc.relation.ispartof2023 Ieee 36th International Symposium on Computer-Based Medical Systems, Cbms
dc.relation.ispartofseriesIEEE International Symposium on Computer-Based Medical Systems
dc.rightsrestrictedAccesseng
dc.rights.holderCopyright IEEE COMPUTER SOCeng
dc.subjectnatural language processingeng
dc.subjecttransformereng
dc.subjectclinical textseng
dc.subjectlanguage modeleng
dc.subject.wosComputer Science, Artificial Intelligenceeng
dc.subject.wosComputer Science, Information Systemseng
dc.subject.wosEngineering, Biomedicaleng
dc.titleCardioBERTpt: Transformer-based Models for Cardiology Language Representation in Portugueseeng
dc.typeconferenceObjecteng
dc.type.categoryproceedings papereng
dc.type.versionpublishedVersioneng
dspace.entity.typePublication
hcfmusp.affiliation.countrySuíça
hcfmusp.affiliation.countryisoch
hcfmusp.author.externalSCHNEIDER, Elisa Terumi Rubel:Pontificia Univ Catolica Parana, Curitiba, Parana, Brazil
hcfmusp.author.externalGUMIEL, Yohan Bonescki:Pontificia Univ Catolica Parana, Inst Heart, InCor, HC FMUSP, Curitiba, Parana, Brazil
hcfmusp.author.externalSOUZA, Joao Vitor Andrioli de:Comsentimento, Curitiba, Parana, Brazil
hcfmusp.author.externalMUKAI, Lilian Mie:Pontificia Univ Catolica Parana, Inst Heart, InCor, HC FMUSP, Curitiba, Parana, Brazil
hcfmusp.author.externalOLIVEIRA, Lucas Emanuel Silva e:Comsentimento, Curitiba, Parana, Brazil
hcfmusp.author.externalTEODORO, Douglas:Univ Geneva, Geneva, Switzerland
hcfmusp.author.externalMORO, Claudia:Pontificia Univ Catolica Parana, Curitiba, Parana, Brazil
hcfmusp.author.externalPARAISO, Emerson Cabrera:Pontificia Univ Catolica Parana, Curitiba, Parana, Brazil
hcfmusp.contributor.author-fmusphcMARINA DE FATIMA DE SA REBELO
hcfmusp.contributor.author-fmusphcMARCO ANTONIO GUTIERREZ
hcfmusp.contributor.author-fmusphcJOSE EDUARDO KRIEGER
hcfmusp.description.beginpage378
hcfmusp.description.endpage381
hcfmusp.origemWOS
hcfmusp.origem.wosWOS:001037777900066
hcfmusp.publisher.cityLOS ALAMITOSeng
hcfmusp.publisher.countryUSAeng
hcfmusp.relation.referenceAlsentzer Emily, 2019, P 2 CLIN NATURAL LAN, DOI [DOI 10.18653/V1/W19-1909, 10.18653/v1/, DOI 10.18653/V1]eng
hcfmusp.relation.referenceBrown T. B., 2020, P ADV NEUR INF PROC, V33, P1877eng
hcfmusp.relation.referenceDevlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171eng
hcfmusp.relation.referenceKalyan KS, 2022, J BIOMED INFORM, V126, DOI 10.1016/j.jbi.2021.103982eng
hcfmusp.relation.referenceLaparra Egoitz, 2021, Yearb Med Inform, V30, P239, DOI 10.1055/s-0041-1726522eng
hcfmusp.relation.referenceLee J, 2020, BIOINFORMATICS, V36, P1234, DOI 10.1093/bioinformatics/btz682eng
hcfmusp.relation.referenceSchneider ETR, 2020, P 3 CLIN NATURAL LAN, P65, DOI 10.18653/V1/2020.CLINICALNLP-1.7eng
hcfmusp.relation.referenceOliveira LESE, 2022, J BIOMED SEMANT, V13, DOI 10.1186/s13326-022-00269-1eng
hcfmusp.relation.referenceSouza Fabio, 2020, Intelligent Systems. 9th Brazilian Conference, BRACIS 2020. Proceedings. Lecture Notes in Artificial Intelligence. Subseries of Lecture Notes in Computer Science (LNAI 12319), P403, DOI 10.1007/978-3-030-61377-8_28eng
hcfmusp.relation.referenceTamine L, 2021, ACM COMPUT SURV, V54, DOI 10.1145/3462476eng
hcfmusp.relation.referenceTempClinBr, 2023, USeng
hcfmusp.relation.referenceTurchioe MR, 2022, HEART, V108, P909, DOI 10.1136/heartjnl-2021-319769eng
hcfmusp.relation.referenceYu Gu, 2022, ACM Transactions on Computing and Healthcare, V3, DOI 10.1145/3458754eng
relation.isAuthorOfPublication6c2f9752-a260-4ff7-a5f9-15cf4a282060
relation.isAuthorOfPublication23ec3b55-50df-4630-902e-bedbb470fecb
relation.isAuthorOfPublicationa970d450-bcd4-4662-94d6-ad1c6d043b3c
relation.isAuthorOfPublication.latestForDiscovery6c2f9752-a260-4ff7-a5f9-15cf4a282060
Arquivos