Application of a TrialTranslator, a framework that emulates randomised clinical trials (RCTs) across three prognostic phenotypes identified through machine learning models across a range of cancer and risk types showed that the median overall survival treatment benefit for real-world patients would be, on average, 3 months lower than in RCTs, and the patients with high-risk phenotypes performed even worser.
Patients with high-risk phenotypes consistently exhibited survival times below published RCT results according to Drs. Qi Long of the Perelman School of Medicine, University of Pennsylvania; Penn Center for Cancer Care Innovation, Abramson Cancer Center; Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, all in Philadelphia, PA, US, Ravi B. Parikh of the Emory University School of Medicine; Winship Cancer Institute, both in Atlanta, GA, US and colleagues, who published the findings on 3 January 2025 in the Nature Medicine.
Restrictive eligibility criteria in RCTs are frequently cited as a cause of lack of generalisability of results from RCTs evaluating anticancer agents. Approximately one in five real-world patients with cancer are ineligible for a phase III trial. However, restrictive eligibility criteria alone are unlikely to fully explain the generalisability gap.
An alternative explanation is that physicians selectively recruit patients with better prognoses, specifically those who are younger with fewer comorbidities, irrespective of eligibility criteria. Additionally, preferential recruitment based on factors, such as race or socioeconomic status, both of which are linked to prognosis, may also contribute.
Considering the varied survival outcomes among real-world patients with cancer, accurately translating phase III trial results is crucial for treatment decision-making and advance care planning. Improved methods for translating phase III trials to real-world patients with multiple varying characteristics are necessary.
The authors wrote in the background that a combination of well-curated electronic health record data and machine learning phenotyping could help identify real-world patient groups whose treatment effects align with published RCT results. The recent availability of population-level electronic health record data enriched with clinical factors and molecular biomarkers offers the potential for improved trial emulation through enhanced baseline feature balancing between treatment arms. Machine learning models leveraging these granular datasets may uncover subtle patterns and relationships that may not be apparent through conventional analysis, potentially revealing more nuanced prognostic groups.
A comprehensive approach to systematically evaluate RCT generalisability across different prognostic groups in oncology is lacking. In this study, the authors developed the TrialTranslator, a framework designed to systematically emulate phase III oncology trials across machine learning-identified prognostic phenotypes to uncover treatment effect heterogeneity in real-world patients.
By using the US nationwide database of electronic health records from Flatiron Health, this framework emulated RCTs across three prognostic phenotypes identified through machine learning models. The study team applied this approach to 11 landmark RCTs that investigated anticancer regimens considered standard-of-care for the four most prevalent advanced solid malignancies.
The analyses reveal that patients in low-risk and medium-risk phenotypes exhibit survival times and treatment-associated survival benefits similar to those observed in RCTs. In contrast, high-risk phenotypes show significantly lower survival times and treatment-associated survival benefits compared to RCTs. Patients with high-risk phenotypes, even when meeting strict eligibility criteria, exhibited treatment effects that differed significantly from RCT benchmarks and lower-risk phenotypes. This suggests that patient prognosis, rather than eligibility criteria, better predicts survival and treatment benefit.
The results were corroborated by a comprehensive robustness assessment, including examinations of specific patient subgroups, holdout validation and semi-synthetic data simulation.
The authors concluded that their findings suggest that the prognostic heterogeneity among real-world patients with cancer plays a substantial role in the limited generalisability of RCT results. Machine learning frameworks may facilitate individual patient-level decision support and estimation of real-world treatment benefits to guide trial design. Tools, such as TrialTranslator, can support clinicians and patients in making informed treatment decisions, understanding expected benefits of novel therapies and planning future care.
Reference
Orcutt X, Chen K, Mamtani R, et al. Evaluating generalizability of oncology trial results to real-world patients using machine learning-based trial emulations. Nature Medicine; Published online 3 January 2025.