REPLICCAR II Study: Data Quality Audit in the Paulista Cardiovascular Surgery Registry

Background Electronic health records databases are important sources of data for research and health practice. The aim of this study was to assess the quality of the data in REPLICCAR II, the Brazilian cardiovascular surgery database based in São Paulo State. Study Design The REPLICCAR II database contains data from 9 institutions in São Paulo, with more than 700 variables. We audited data entry at 6 months (n=107 records) and 1 year (n=2229 records) after the start of data collection. We present a modified Aggregate Data Quality Score (ADQ) for 30 variables in this analysis. Results The agreement between the data independently entered by a database operator and a researcher was good for categorical data (Cohen κ = 0.70, 95%CI 059, 0.83). For continuous data, the intraclass coefficient was high for all variables, with only 2 of 15 continuous variables having an ICC of less than 0.90. In an indirect audit, 74% of the selected variables (n = 23) showed a good ADQ score, regarding completeness and reliability. Conclusions Data entry in the REPLICCAR II database is satisfactory and can provide accurate and reliable data for research in cardiovascular surgery in Brazil.

58 showing that just adhering to a quality improvement initiative could already impact mortality 59 rates 19 .

60
The development of the Paulista Registry of Cardiovascular Surgery (REPLICCAR II), 61 a multicenter prospective cohort study coordinated by the Instituto do Coração do Estado de 62 São Paulo (InCor) aimed at evaluating morbidity and mortality predictors in patients 63 undergoing coronary artery bypass graft (CABG) surgery and constitutes a definite example 64 of the concept. Data collection and analysis were performed according to its guidelines set by 65 professionals from different areas forming an interface between research and clinical 66 practice. The adoption of quality-oriented data analysis then becomes imperative to assure 67 the validity of its outcomes with the intent of enhancing its prospective clinical impact 6 .

68
The aim of the present study was to present the results of direct and indirect audits of 69 the data quality of the registries included in the REPLICCAR II database after 6 months and

83
REPLICCAR II includes more than 700 variables, among which are factors related to 84 general facts about the patients, their pre-, intra-, and postoperative assessments and their 4 86 participant center responsible for mobilizing a team for the task, as well as being free to 87 designate the person responsible, usually a medical resident. All researchers responsible for 88 the gathering were previously trained on how to fill out the forms correctly.

89
Data gathering was performed by using the online platform REDCap-HCFMUSP 90 (Vanderbilt, Tennessee, EUA/https://redcap.hc.fm.usp.br/), accessible from any computer 91 with an internet connection, with access restricted to selected researchers. The data are 92 stored in real time at a safe server at the University of São Paulo Medical School. This

95
96 Direct audit 97 A direct audit was carried out after 6 months of data collection; 7% (107 records) of the data 98 collected at each center until February 2018 was randomly selected with STATA 13.1 99 software (StataCorp, Texas, USA), and for re-collection as performed by experienced 100 independent investigators (auditors) within the team, who visited each center for this task.

101
The auditors, with full access to each center's own previously available database, re-102 collected these data, under two fundamental conditions: (i) that they were blinded to the original record and (ii) that each one did not re-collect the same data they had originally 104 input. The original and the re-collected data then underwent statistical analysis to check for 105 accuracy in data collection.

159
In this sample, the data for glycated hemoglobin, total bilirubin, and albumin levels 160 were insufficient for analysis, but these variables are not mandatory in the registry.

191
We propose criteria and definitions (Table 4)      The outcome variable (operative death) had 85% completeness and 92% reliability. Among 211 the inconsistencies related to mortality, we verified that the cases of intraoperative death 212 were negligible for the variable operative death. To rectify such in reliability, we inserted in 213 the system a script that considers the cases of surgery without admission to the intensive 214 care unit at the immediate postoperative period to count as death on the day of surgery.

216
Patients with unknown or incomplete 30-day mortality status could potentially 217 introduce bias. However, "in-hospital mortality" completeness was almost fully recorded, and

254
However, we must remember that these results do not present improvement 255 strategies for the quality of the records because we cannot pinpoint the data collection error 256 solely based on these parameters. Newly proposed parameters, such as the ADQ, may 257 provide faster, more practical and low-cost analysis of generic data quality. Another 258 evaluation that could be performed was the ADQ by each center, which could then be used    This work shows the seriousness and commitment to this project, already concerned not only 323 with its development and implementation, but also the quality of its data.

326
The reliability and completeness of medical records are essential to the validity and 327 reliability of the results obtained. Indirect auditing gave clear directions for data improvement, 328 without the need to recollect a sample to evaluate concordance.

329
The best strategies based on our experience to improve data quality in a way the 330 information can be reviewed in the moment the investigator is filling data, are periodical 331 reports with detailed feedback and, above all, to maintain a sound scientific partnership with 332 regular meetings to integrate well with working groups in each institution.

333
Findings of a discrepancy between the data only reinforces the need for quality-334 oriented statistical studies, because it directly influences the validity, the analysis, and 335 conclusions performed in research. In places where such studies and their application are 336 still underdeveloped, like in Brazil, studies in this field become even more indispensable.

337
Focus on data quality is a sure factor that ultimately leads to a more efficient and safer 338 healthcare system, and it will surely play an increasing major role in its development. The 339 main objective of the present work was to implement improvement actions in a way that 340 guarantees safety and validity to the results, as well as allowing feedback on REPLICCAR II 341 itself. As an STS-based database, this project could provide the basis for a wider and reliable 342 quality-focused program, with the prospect of a positive impact on clinical outcomes.

343
Our experience reinforces the importance of training, incentives, and standardization 344 of the staff who collect the data and fill out the forms, which brings greater benefits and 345 substantially lower costs than the direct auditing with the still traditional Raters Agreement 346 Analysis. The latter demands more investigators to collect the data at each institution, 347 extensive data analysis periods, and results related to the understanding of concepts and 348 criteria. The indirect auditing was more practical in elaborating strategies for data quality and shows the best parameters of data quality in prospective observational studies. It is 351 therefore expected that it will attract more attention in studies yet to come, although there is 352 still a lot of room for research in parameters for measuring data quality in the healthcare 353 sciences.

355
Funding: This work was supported by the Programa de Pesquisa do SUS (PPSUS).

356
Conflict of interest: none declared.