Mobile Menu

LEGEND: Large-scale Evidence Generation and Evaluation across a Network of Databases

The Observational Health Data Sciences and Informatics (OHDSI) international collaborative have launched a research initiative known as LEGEND. The aim of this initiative is to generate evidence on the effects of medical interventions, using observational healthcare databases.

Observational study bias

Real-world evidence (RWE) can fill evidence gaps within medicine. Experts typically obtain RWE from existing healthcare data, such as electronic health records (EHRs). However, many argue that this data cannot be used to estimate causal treatment effects due to the potential observational study bias. One of the main reasons for this bias is confounding. As researchers do not assign treatment randomly, observational studies are prone to detect spurious effects. Therefore, one treatment group may differ to another in ways that impact risk outcome. Although observational studies often attempt to correct for this, many confounders are not known and cannot be measured or adjusted for correctly.

Another concern with observational research is issues of P hacking and publication bias. P hacking is when a researcher performs multiple variations of the analysis until they obtain their desired result. Publication bias occurs when journals selectively publish statistically significant results or when an author only submits studies with positive effects. Both of these can increase false positive rates in published research.


To address these issues, a team from the OHDSI international collaborative have developed a new initiative. In the Journal of the American Medical Informatics Association, the team presented LEGEND.

A key element of LEGEND is that evidence is generated at scale. It allows for researchers to address many research questions in a single study. The team believe this shift to large-scale analyses will enhance the comprehensiveness of the evidence base. They also suggest that disseminating all generated evidence without filtering will prevent P hacking and publication bias. In addition, applying a systematic approach to all these questions allows researchers to evaluate the performance of the evidence generation process. LEGEND includes control questions where the answer is known. This allows researchers to measure operating characteristics and calibrate confidence intervals and P values. Importantly, by performing the analysis in a network of databases, the team highlight that LEGEND allows you to observe whether findings in one database replicate in another. In turn, this will enhance the reproducibility of the findings.

Guiding principles and study overview

Below are the guiding principles of LEGEND:

  1. LEGEND will generate evidence at a large scale.
  2. Dissemination of the evidence will not depend on the estimated effects.
  3. LEGEND will generate evidence using a prespecified analysis design.
  4. It will generate evidence by applying a systematic process across all research questions.
  5. It will generate evidence using best practices.
  6. LEGEND will include empirical evaluation through the use of control questions.
  7. LEGEND will generate evidence using open-source software that is freely available to all.
  8. It will not be used to evaluate new methods.
  9. It will generate evidence across a network of multiple databases.
  10. LEGEND will maintain data confidentiality. Researchers should not share patient-level data between sites in the network.

A typical LEGEND study begins with defining a large set of research questions (Principle 1). Researchers would also define control questions with known answers (Principle 6). Next, researchers would apply a systematic, causal effect estimation procedure that reflects best practices (Principles 5 and 8). This would generate estimates for all questions (Principle 4) from an international network of healthcare databases (Principle 9). Every site would run the analysis locally and only share aggregated statistics (Principle 10). Investigators would then use effect estimates for the control questions to estimate systematic errors and subsequent empirical calibrations. Researchers would make the results available on online databases which would be accessible through various online applications (Principle 2). They would also prespecify the protocol and make it available online (Principle 3) along with the open-source code for executing the study (Principle 7).

Concluding comments

LEGEND is a new approach that aims to generate evidence from healthcare data while overcoming weaknesses seen in current processes. The LEGEND guiding principles seek to avoid study bias, P hacking and publication bias. It also aims to increase existing knowledge by generating reliable evidence from existing healthcare data. The researchers designed LEGEND to simultaneously answer many research questions, while using a transparent, reproducible and systematic approach. The team believe that evidence generated by LEGEND could help inform medical decision-making where evidence currently lacks.

Image credit: People vector created by pikisuperstar –

Share this article