CardioBERTpt: Transformer-based Models for Cardiology Language Representation in Portuguese
dc.contributor | Sistema FMUSP-HC: Faculdade de Medicina da Universidade de São Paulo (FMUSP) e Hospital das Clínicas da FMUSP | |
dc.contributor.author | SCHNEIDER, Elisa Terumi Rubel | |
dc.contributor.author | GUMIEL, Yohan Bonescki | |
dc.contributor.author | SOUZA, Joao Vitor Andrioli de | |
dc.contributor.author | MUKAI, Lilian Mie | |
dc.contributor.author | OLIVEIRA, Lucas Emanuel Silva e | |
dc.contributor.author | REBELO, Marina de Sa | |
dc.contributor.author | GUTIERREZ, Marco Antonio | |
dc.contributor.author | KRIEGER, Jose Eduardo | |
dc.contributor.author | TEODORO, Douglas | |
dc.contributor.author | MORO, Claudia | |
dc.contributor.author | PARAISO, Emerson Cabrera | |
dc.date.accessioned | 2023-11-16T20:01:57Z | |
dc.date.available | 2023-11-16T20:01:57Z | |
dc.date.issued | 2023 | |
dc.description.abstract | Contextual word embeddings and the Transformers architecture have reached state-of-the-art results in many natural language processing (NLP) tasks and improved the adaptation of models for multiple domains. Despite the improvement in the reuse and construction of models, few resources are still developed for the Portuguese language, especially in the health domain. Furthermore, the clinical models available for the language are not representative enough for all medical specialties. This work explores deep contextual embedding models for the Portuguese language to support clinical NLP tasks. We transferred learned information from electronic health records of a Brazilian tertiary hospital specialized in cardiology diseases and pre-trained multiple clinical BERT-based models. We evaluated the performance of these models in named entity recognition experiments, fine-tuning them in two annotated corpora containing clinical narratives. Our pre-trained models outperformed previous multilingual and Portuguese BERT-based models for cardiology and multi-specialty environments, reaching the state-of-the-art for analyzed corpora, with 5.5% F1 score improvement in TempClinBr (all entities) and 1.7% in SemClinBr (Disorder entity) corpora. Hence, we demonstrate that data representativeness and a high volume of training data can improve the results for clinical tasks, aligned with results for other languages. | eng |
dc.description.conferencedate | JUN 22-24, 2023 | |
dc.description.conferencelocal | LAquila, ITALY | |
dc.description.conferencename | 36th IEEE International Symposium on Computer-Based Medical Systems (CBMS) | |
dc.description.index | PubMed | |
dc.description.index | WoS | |
dc.description.sponsorship | Coordenacao de Aperfeicoamento de Pessoal de Nivel Superior - Brasil (CAPES) [001] | |
dc.description.sponsorship | Foxconn Brazil | |
dc.description.sponsorship | Zerbini Foundation as part of the research project Machine Learning in Cardiovascular Medicine | |
dc.identifier.citation | 2023 IEEE 36TH INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS, CBMS, p.378-381, 2023 | |
dc.identifier.doi | 10.1109/CBMS58004.2023.00075 | |
dc.identifier.isbn | 979-8-3503-1224-9 | |
dc.identifier.issn | 2372-9198 | |
dc.identifier.uri | https://observatorio.fm.usp.br/handle/OPI/56999 | |
dc.language.iso | eng | |
dc.publisher | IEEE COMPUTER SOC | eng |
dc.relation.ispartof | 2023 Ieee 36th International Symposium on Computer-Based Medical Systems, Cbms | |
dc.relation.ispartofseries | IEEE International Symposium on Computer-Based Medical Systems | |
dc.rights | restrictedAccess | eng |
dc.rights.holder | Copyright IEEE COMPUTER SOC | eng |
dc.subject | natural language processing | eng |
dc.subject | transformer | eng |
dc.subject | clinical texts | eng |
dc.subject | language model | eng |
dc.subject.wos | Computer Science, Artificial Intelligence | eng |
dc.subject.wos | Computer Science, Information Systems | eng |
dc.subject.wos | Engineering, Biomedical | eng |
dc.title | CardioBERTpt: Transformer-based Models for Cardiology Language Representation in Portuguese | eng |
dc.type | conferenceObject | eng |
dc.type.category | proceedings paper | eng |
dc.type.version | publishedVersion | eng |
dspace.entity.type | Publication | |
hcfmusp.affiliation.country | Suíça | |
hcfmusp.affiliation.countryiso | ch | |
hcfmusp.author.external | SCHNEIDER, Elisa Terumi Rubel:Pontificia Univ Catolica Parana, Curitiba, Parana, Brazil | |
hcfmusp.author.external | GUMIEL, Yohan Bonescki:Pontificia Univ Catolica Parana, Inst Heart, InCor, HC FMUSP, Curitiba, Parana, Brazil | |
hcfmusp.author.external | SOUZA, Joao Vitor Andrioli de:Comsentimento, Curitiba, Parana, Brazil | |
hcfmusp.author.external | MUKAI, Lilian Mie:Pontificia Univ Catolica Parana, Inst Heart, InCor, HC FMUSP, Curitiba, Parana, Brazil | |
hcfmusp.author.external | OLIVEIRA, Lucas Emanuel Silva e:Comsentimento, Curitiba, Parana, Brazil | |
hcfmusp.author.external | TEODORO, Douglas:Univ Geneva, Geneva, Switzerland | |
hcfmusp.author.external | MORO, Claudia:Pontificia Univ Catolica Parana, Curitiba, Parana, Brazil | |
hcfmusp.author.external | PARAISO, Emerson Cabrera:Pontificia Univ Catolica Parana, Curitiba, Parana, Brazil | |
hcfmusp.contributor.author-fmusphc | MARINA DE FATIMA DE SA REBELO | |
hcfmusp.contributor.author-fmusphc | MARCO ANTONIO GUTIERREZ | |
hcfmusp.contributor.author-fmusphc | JOSE EDUARDO KRIEGER | |
hcfmusp.description.beginpage | 378 | |
hcfmusp.description.endpage | 381 | |
hcfmusp.origem | WOS | |
hcfmusp.origem.wos | WOS:001037777900066 | |
hcfmusp.publisher.city | LOS ALAMITOS | eng |
hcfmusp.publisher.country | USA | eng |
hcfmusp.relation.reference | Alsentzer Emily, 2019, P 2 CLIN NATURAL LAN, DOI [DOI 10.18653/V1/W19-1909, 10.18653/v1/, DOI 10.18653/V1] | eng |
hcfmusp.relation.reference | Brown T. B., 2020, P ADV NEUR INF PROC, V33, P1877 | eng |
hcfmusp.relation.reference | Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171 | eng |
hcfmusp.relation.reference | Kalyan KS, 2022, J BIOMED INFORM, V126, DOI 10.1016/j.jbi.2021.103982 | eng |
hcfmusp.relation.reference | Laparra Egoitz, 2021, Yearb Med Inform, V30, P239, DOI 10.1055/s-0041-1726522 | eng |
hcfmusp.relation.reference | Lee J, 2020, BIOINFORMATICS, V36, P1234, DOI 10.1093/bioinformatics/btz682 | eng |
hcfmusp.relation.reference | Schneider ETR, 2020, P 3 CLIN NATURAL LAN, P65, DOI 10.18653/V1/2020.CLINICALNLP-1.7 | eng |
hcfmusp.relation.reference | Oliveira LESE, 2022, J BIOMED SEMANT, V13, DOI 10.1186/s13326-022-00269-1 | eng |
hcfmusp.relation.reference | Souza Fabio, 2020, Intelligent Systems. 9th Brazilian Conference, BRACIS 2020. Proceedings. Lecture Notes in Artificial Intelligence. Subseries of Lecture Notes in Computer Science (LNAI 12319), P403, DOI 10.1007/978-3-030-61377-8_28 | eng |
hcfmusp.relation.reference | Tamine L, 2021, ACM COMPUT SURV, V54, DOI 10.1145/3462476 | eng |
hcfmusp.relation.reference | TempClinBr, 2023, US | eng |
hcfmusp.relation.reference | Turchioe MR, 2022, HEART, V108, P909, DOI 10.1136/heartjnl-2021-319769 | eng |
hcfmusp.relation.reference | Yu Gu, 2022, ACM Transactions on Computing and Healthcare, V3, DOI 10.1145/3458754 | eng |
relation.isAuthorOfPublication | 6c2f9752-a260-4ff7-a5f9-15cf4a282060 | |
relation.isAuthorOfPublication | 23ec3b55-50df-4630-902e-bedbb470fecb | |
relation.isAuthorOfPublication | a970d450-bcd4-4662-94d6-ad1c6d043b3c | |
relation.isAuthorOfPublication.latestForDiscovery | 6c2f9752-a260-4ff7-a5f9-15cf4a282060 |