Evaluation of survival extrapolation in immuno-oncology using multiple pre-planned data cuts: learnings to aid in model selection

Abstract

Background
Due to limited duration of follow up in clinical trials of cancer treatments, estimates of lifetime survival benefits are typically derived using statistical extrapolation methods. To justify the method used, a range of approaches have been proposed including statistical goodness-of-fit tests and comparing estimates against a previous data cut (i.e. interim data collected). In this study, we extend these approaches by presenting a range of extrapolations fitted to four pre-planned data cuts from the JAVELIN Merkel 200 (JM200) trial. By comparing different estimates of survival and goodness-of-fit as JM200 data mature, we undertook an iterative process of fitting and re-fitting survival models to retrospectively identify early indications of likely long-term survival.
Methods
Standard and spline-based parametric models were fitted to overall survival data from each JM200 data cut. Goodness-of-fit was determined using an assessment of the estimated hazard function, information theory-based methods and objective comparisons of estimation accuracy. Best-fitting extrapolations were compared to establish which one provided the most accurate estimation, and how statistical goodness-of-fit differed.
Results
Spline-based models provided the closest fit to the final JM200 data cut, though all extrapolation methods based on the earliest data cut underestimated the ‘true’ long-term survival (difference in restricted mean survival time [RMST] at 36 months: − 1.1 to − 0.5 months). Goodness-of-fit scores illustrated that an increasingly flexible model was favored as data matured. Given an early data cut, a more flexible model better aligned with clinical expectations could be reasonably justified using a range of metrics, including RMST and goodness-of-fit scores (which were typically within a 2-point range of the statistically ‘best-fitting’ model).
Conclusions
Survival estimates from the spline-based models are more aligned with clinical expectation and provided a better fit to the JM200 data, despite not exhibiting the definitively ‘best’ statistical goodness-of-fit. Longer-term data are required to further validate extrapolations, though this study illustrates the importance of clinical plausibility when selecting the most appropriate model. In addition, hazard-based plots and goodness-of-fit tests from multiple data cuts present useful approaches to identify when a more flexible model may be advantageous.

Publication
BMC Medical Research Methodology
Click the Cite button above to demo the feature to enable visitors to import publication metadata into their reference management software.