Mobile Menu

Evaluation of data quality practices

Data quality is critical to avoid the ‘garbage in, garbage out’ paradigm. We summarise a recent systematic review, published in JAMIA, that assessed the practice of data quality assessment.

Data quality

In recent years, there has been a surge of national and international clinical research network (CRNs) curating large collections of real-world data (RWD). One prominent example is the national patient-centered clinical research network (PCORnet). This CRN contains more than 66 million patient datasets across the United States.

RWD is becoming increasingly important to support a wide range of healthcare and regulatory decisions. Nonetheless, there are ongoing concerns regarding the quality of RWD. Data quality (DQ) issues, such as incompleteness, inconsistency and accuracy are widely reported and discussed. To maximise the potential of RWD, DQ must be systematically assessed and understood.

Several DQ frameworks have been developed over time. However, the latest DQ assessment of EHR data was in 2013. Also, few studies have explored the practice of DQ assessment in large clinical networks.

Evaluation of practices

In this study, researchers aimed to identify gaps in the existing PCORnet data characterisation process. They first conducted a systematic review of existing DQ literature related to RWD. Then, they organised the existing DQ dimensions and the methods used to assess these dimensions. Finally, they reviewed the dimensions and corresponding methods used in the PCORnet data characterisation process to assess the DQ practice in PCORnet and how it has evolved.

They analysed a total of 3 reviews, 20 DQ frameworks, and 226 DQ studies and extracted 14 DQ dimensions and 10 assessment methods. They found that completeness, concordance and correctness/accuracy were commonly assessed. Common DQ assessment methods included element presence, validity check and conformance. These methods were the main focuses of the PCORnet data checks.

Moreover, they found that definitions of DQ dimensions and methods were not consistent in the literature. In addition, DQ assessment was not evenly distributed, i.e., usability and ease-of-use were rarely discussed.

Although the practice of DQ assessment exists, there are still several challenges. With the rapid adoption and increasing promotion of research using RWD, issues surrounding DQ will become increasingly important. Future work is essential to generate understandable, executable and reusable DQ measures.

Image credit: By kjpargeter –

More on these topics

Data Quality / Real World Data

Share this article