Most Patient Records Within A Large Cancer Registry-Based Real-World Data Source Have Missing Data


A significant number of patient records in large cancer registries are missing data, which can have an impact on real-world data studies.

Investigators found a high prevalence of missing data within a large cancer registry-based real-world data (RWD) source, which emphasizes a need for documentation improvements, according to results of a recent retrospective cohort study published in JAMA Oncology.

RWD sources, which provide routine information on patient health status and delivery of health care, are flexible and cost-effective ways to investigate clinical interventions and to supplement data from clinical trials, which are often high-cost and slow-paced. Cancer registries are considered important sources of RWD that can help provide insights for analysis of different therapies. Data quality is vital when working with registries, especially as emerging data suggests treatment-associated survival outcomes differ between registries and randomized clinical trials.

“There is a need to assess the quality of clinical data generated from registries and other RWD sources and to examine whether these sources have adhered to best data practices,” the study authors wrote.

Missing data can also affect clinical care, as patient information is often used to guide treatment decisions. In this study, the researchers aimed to assess whether characteristics and overall survival of patients with missing data were comparable to those with complete data.

The researchers examined the prevalence of patient records with missing data and the association with overall survival among patients with cancer in a large cancer registry. They assessed the records of patients with the three most common cancers in the U.S. (non-small cell lung cancer [NSCLC], breast cancer and prostate cancer) and compared overall survival differences between patients with complete versus missing data.

A total of 63 final variables of interest were identified to compare the patients’ data. Among them, there were 14 demographic variables (22.2%), six tumor characteristic variables (9.5%), 13 cancer stage variables (20.6%) and 30 treatment variables (47.6%).

The study authors found differences in demographic characteristics, cancer stage and treatments received. Among the 851,295 patients with NSCLC who had missing data, 95,560 (11.2%) were Black, 25,102 (2.9%) were Hispanic, 375,298 (44.1%) had stage 4 disease and 178,671 (21.0%) underwent surgery at the primary tumor site. Among the 1,161,096 patients with breast cancer who had missing data, 137,369 (11.8%) were Black, 66,997 (5.8%) were Hispanic, 51,889 (4.5%) had stage 4 disease and 1,046,754 (90.2%) underwent surgery at the primary tumor site. Among the 460,167 patients with prostate cancer who had missing data, 67,160 (14.6%) were Black, 20,141 (4.4%) were Hispanic, 44,650 (9.7%) had stage 4 disease and 249,492 (54.2%) underwent surgery at the primary tumor site.

When evaluating the association between missing data and overall survival, the findings equated to an absolute 2-year overall survival difference, or the difference in proportion of patients still alive after two years after death from any condition, of 18.4% for patients with NSCLC, 0.7% for patients with breast cancer and 4.6% for patients with prostate cancer. Among patients with nonmetastatic cancer, the absolute survival differences were smaller for patients with breast cancer (0.4%) and prostate cancer (1.1%). Among patients with metastatic disease, there were more significant survival differences of 4.5% for breast cancer and 16.7% for prostate cancer.

A possible limitation of the study was that the authors could not draw conclusions on other outcomes such as toxic effects, disease recurrences or factors associated with death.

According to the authors, these findings suggest substantial gaps in documenting and capturing data for patients with cancer, especially with regard to demographic characteristics, tumor characteristics and treatments received. Missing data was more prevalent among Black patients and patients from other racial and ethnic minority groups, reflecting health care access and treatment disparities. Records of patients with fewer comorbid conditions were more likely to have missing data, which could be attributed to fewer medical visits. Additionally, patients with advanced-stage cancer also had more frequently missing data, which the authors noted may be due to the increased complexity of care needed for these patients.

“The high prevalence of missing data suggests that continued investment in data exchange standards remains an important step toward addressing the missing RWD problem for patients with cancer,” the authors wrote. They suggested several methods to address missing data, such as a missing data indicator, better adherence to data entry and natural language processing tools, several efforts which are already ongoing.

For more news on cancer updates, research and education, don’t forget to subscribe to CURE®’s newsletters here.

Related Videos
Jennifer King from the GO2 Foundation for Lung Cancer in an interview with CURE
Dr. Bruna Pellini in an interview with CURE (against a gray, CURE backdrop) at the ASCO Annual Meeting