Due to limited duration of follow up in clinical trials of cancer treatments, estimates of lifetime survival benefits are typically derived using statistical extrapolation methods. To justify the method used, a range of approaches have been proposed including statistical goodness-of-fit tests and comparing estimates against a previous data cut (i.e. interim data collected). In this study, we extend these approaches by presenting a range of extrapolations fitted to four pre-planned data cuts from the JAVELIN Merkel 200 (JM200) trial. By comparing different estimates of survival and goodness-of-fit as JM200 data mature, we undertook an iterative process of fitting and re-fitting survival models to retrospectively identify early indications of likely long-term survival.
Standard and spline-based parametric models were fitted to overall survival data from each JM200 data cut. Goodness-of-fit was determined using an assessment of the estimated hazard function, information theory-based methods and objective comparisons of estimation accuracy. Best-fitting extrapolations were compared to establish which one provided the most accurate estimation, and how statistical goodness-of-fit differed.
Spline-based models provided the closest fit to the final JM200 data cut, though all extrapolation methods based on the earliest data cut underestimated the ‘true’ long-term survival (difference in restricted mean survival time [RMST] at 36 months: − 1.1 to − 0.5 months). Goodness-of-fit scores illustrated that an increasingly flexible model was favored as data matured. Given an early data cut, a more flexible model better aligned with clinical expectations could be reasonably justified using a range of metrics, including RMST and goodness-of-fit scores (which were typically within a 2-point range of the statistically ‘best-fitting’ model).
Survival estimates from the spline-based models are more aligned with clinical expectation and provided a better fit to the JM200 data, despite not exhibiting the definitively ‘best’ statistical goodness-of-fit. Longer-term data are required to further validate extrapolations, though this study illustrates the importance of clinical plausibility when selecting the most appropriate model. In addition, hazard-based plots and goodness-of-fit tests from multiple data cuts present useful approaches to identify when a more flexible model may be advantageous.