A case study of binary outcome data extraction across three systematic reviews of hip arthroplasty: errors and differences of selection

ARTICLES

By: Articles 3 February 2014

Abstract

Background

Data extraction is a key stage in systematic review, yet it is the subject of little research. The aim of the present research was to use a small case study to highlight some important issues affecting this fundamental process.

Methods

The authors undertook an analysis of differences in the binary event data extracted and analysed by three systematic reviews on the same topic: a comparison of total hip arthroplasty and hemiarthroplasty. The following binary event data were extracted for three key outcomes, common to all three reviews, from those trials common to all three reviews: Dislocation rates, one-year mortality, and revision rates. Differences between the data extracted by the three reviews were categorised as either errors or an issue of data selection. Meta-analysis was performed to assess whether these differences led to differences in summary estimates of effect.

Results

Across the three outcomes, differences in selection accounted for between 8% and 42% of the data differences between reviews, and errors accounted for between 8% and 17%. No rationale was given in any of these former cases for the choice of event data being reported. These differences did lead to small differences in meta-analysed relative risks between the two treatments in the three reviews, but none was significant.

Conclusion

Systematic reviewers should use double-data extraction to minimise error and also make every effort to clarify or explain their choice of data, within the scope of their publication. Reviewers frequently exercise selection when faced with a choice of alternative but potentially equally appropriate data for an outcome. However, this selection is rarely made clear by review authors. Systematic review was developed as a method specifically to be both reproducible and transparent. This case study suggests that neither objective is always being achieved.

Background

Data extraction or abstraction is a crucial stage in systematic review. It is the stage that generates the data to be analysed. However, it is a relatively under-researched area of the systematic review process compared to information retrieval, the assessment of bias, and methods of synthesis. The little research that has been conducted has been fundamental in our understanding of some of the principal limitations affecting the process.

It has found that errors in data extraction can occur frequently. One study found errors in 20 of the 34 Cochrane reviews assessed, or error rates as high as 31% in one evaluation of data extraction1. Data extraction errors may occur in all variables extracted for a review, but outcomes appear to generate the most errors: such error rates have been found to be as high as 77%.2 Errors have been defined in various ways in these studies, principally as inaccuracies, omissions, inadequacies and incomplete data.1,2 Previous assessments have covered all fields in data extraction, from design and inclusion criteria to actual outcomes. Error rates are apparently unaffected by reviewer experience,1 but can be influenced to a small degree by the double data extraction process:2 that is, double data extraction can lead to fewer errors, though reviewer experience itself does not. Wherever it has been evaluated, it has been found that the impact of these errors on key outcomes was not statistically significant.

The aim of the present work is to contribute to the small amount of literature on this topic, which is actually a time-consuming and crucial stage in the systematic review process. In doing so it focuses only on key numerical outcome data actually employed in a meta-analysis. This approach was taken because these outcome data are more likely to affect synthesis and findings than other data extracted from a study, and because numerical data potentially present less ambiguity than textual data, such as inclusion or exclusion criteria or descriptions of measures.

Previous studies have reported that error rates are lower when the variable is “simple”, such as authorship, country of study, gender distribution and numbers enrolled, whereas text variables, which can sometimes be lengthy, such as inclusion and outcome assessment criteria, prove more probl