Simulating Multiple Endpoints While Including External Historical Data in Adaptive Oncology Trial Designs


March 3, 2026

Multiple endpoints are now the rule, not the exception

In many contemporary Phase III oncology programs, a single primary endpoint is no longer sufficient. While Overall Survival (OS) remains the gold standard and regulators still view it as the most direct measure of clinical benefit, in practice, OS takes time to mature leading to very long and expensive clinical trials. In metastatic settings with multiple subsequent lines of therapy, the signal can dilute over time. As a result, sponsors frequently structure confirmatory trials with OS on top of an endpoint that is faster to measure, such as Progression-Free Survival (PFS), and sometimes Overall Response Rate (ORR), incorporated either as dual primary endpoints or within a gatekeeping framework.

For example, a Phase III trial in non-small cell lung cancer (NSCLC) where PFS is expected to read out at ~18 months, while OS may require 36 months of follow-up. The sponsor hopes PFS will support regulatory interaction earlier, potentially even forming the basis of accelerated approval, while OS continues to mature for full approval. The accelerated approval may save the sponsor resources or may bring in additional resources while still following OS data accrual, as the OS evidence is still required by regulatory agencies for the final claim of success.

Although this seems straightforward, this approach fails to take into account all the complexities that may impact that final claim. These endpoints are correlated, mature at different rates, and are influenced by post-progression therapy, imaging frequency, and dropout patterns. Designing such a study requires more than separate power computations for each endpoint, it requires understanding how they behave together. This is where simulation becomes essential.

 

The statistical reality of correlated endpoints

Endpoints such as ORR, PFS, and OS are not independent random variables. They arise from the same underlying disease process. Patients who achieve early tumor shrinkage (i.e., ORR) often experience delayed progression. But that does not guarantee improved OS. Subsequent therapy, crossover, and differential dropout can attenuate survival differences. Many programs begin by assuming independence when calculating sample size or multiplicity adjustments. Unfortunately, that assumption rarely holds once joint behavior is modeled explicitly.

For example:

  • If ORR and PFS have moderate positive correlation (e.g., driven by response durability), the probability of dual success may be higher than naïve calculations suggest.
  • If OS is weakly correlated with PFS due to heavy post-progression treatment, hierarchical strategies may protect alpha but substantially reduce the probability of demonstrating statistical significance on OS.

Note that statisticians usually include a range of correlation coefficients between endpoints to evaluate their impact on overall operating characteristics of the trial.

The FDA will typically focus first on control of familywise type I error across endpoints. But during review, questions often shift toward interpretability:

  • How was correlation justified?
  • Were joint distributions modelled based on empirical data?
  • How sensitive are conclusions to deviations in event timing?

Those questions are difficult to answer with closed-form approximations alone.

 

Why closed-form calculations do not apply

Closed testing procedures, alpha recycling, and parallel gatekeeping frameworks are well-established tools for multiplicity control. From a theoretical standpoint, they provide strong familywise error control under specified assumptions, but operating characteristics become non-intuitive once endpoints are correlated and events accrue at different rates.

For example, let’s assume a hierarchical testing strategy where OS is tested first and fails narrowly due to immature data, PFS may never formally be tested, even if the PFS hazard ratio is clinically meaningful.

Alternatively, reversing the order (i.e., PFS tested first followed by OS) may increase the probability of declaring success on PFS, but now OS significance depends on passing through earlier gates. Power becomes conditional in ways that clinical teams often underestimate.

Simulating such designs allows evaluation of:

  • Probability of joint success (OS and PFS both significant)
  • Probability of partial success (e.g., showing significant PFS while OS is not yet mature)
  • Impact of varying correlation assumptions
  • Sensitivity to delayed event accrual
  • Effect of interim analyses on overall power

This helps clinical teams focus on actual operating characteristics under realistic assumptions instead of theoretical power under ideal ones. For example, in some settings, probability of winning on both endpoints may drop from 75% to around 50% when introducing correlation structures.

 

Modeling multiple endpoint outcomes

Traditional simulations often generate each endpoint independently from parametric survival distributions (e.g., using Exponential or Weibull curves). This is convenient, but not always clinically realistic. The FDA will often ask how simulation assumptions were calibrated. “We assumed independence” is not persuasive.

Therefore, modelling patient outcome data based on a multistate model may generate more credible data that aligns better with what will come to be in practice. This is certainly not the only approach, but one we encourage using on top of the copula approach where correlation coefficients between the endpoints must be specified.

Leveraging prior internal data, particularly standard-of-care arms from earlier studies, can anchor assumptions about:

  • Correlation between endpoints
  • Event-time distributions
  • Dropout rates
  • Missing data mechanisms

Alternatively, external historical data can also be used for this purpose. However, clinical teams must ensure proper evaluation for exchangeability of this data to the assumptions they are using it for, especially if disease management has shifted from when this data was collected.

 

Multiplicity control considerations

As previously mentioned, testing multiple primary endpoints requires strict familywise type I error control. Common approaches include:

  • Hierarchical gatekeeping
  • Alpha recycling
  • Closed testing procedures
  • Pre-specified adaptive decision rules

Under strong positive correlation, alpha allocation may be conservative relative to realized joint behavior. Under weak correlation, nominal power calculations may overstate the chance of dual success.

One area that is often overlooked is how interim analyses interact with multiplicity. Early looks based on PFS may alter the distribution of OS information at final analysis, particularly if enrollment slows after interim data are reviewed. That secondary impact is unfortunately rarely captured.

Simulations accounting for the multiple endpoints decisions may help characterize type 1 error control and power trade-offs in more realistic execution scenarios.

 

Integrating external and historical data

In oncology, prior data are often available, particularly for standard-of-care arms. Including empirically derived components, such as correlation and dropout rate assumptions, in simulation makes projections more defensible.

Regulatory agencies may still require conservative assumptions, but a simulation framework grounded in observed data allows transparent discussion of where assumptions are aggressive, where they are conservative, and why.

 

A practical perspective

Multiple primary endpoints introduce scientific opportunity and statistical complexity at the same time. There is a list of trade-offs that must be accounted for, including but not limited to, overcommitting on sample size, conditional power dependencies across endpoints, sensitivity to correlation structures, event timing uncertainty, and interim decision impacts.

Simulation, when built on joint patient-level modelling and calibrated to empirical data, allows these trade-offs to be evaluated prospectively rather than discovered after a database lock.

In our experience, teams that invest early in this level of simulations and endpoints modelling encounter fewer redesign discussions, particularly once regulatory feedback begins. More importantly, cross-functional stakeholders gain a clearer understanding of what “success” actually means across endpoints.

That clarity is often worth as much as the statistical precision itself.

 

Interested in learning more?

Join J. Kyle Wathen, Valeria Mazzanti,  and Julija Saltane for their upcoming webinar “Simulating Multiple Endpoints to Drive Late-Stage Oncology Trials” on Thursday, April 2 at 10 AM ET:

Register today!
Subscribe to our newsletter

Julija Saltane

Senior Innovation & Software Engineer

Julija Saltane is a Senior Innovation & Software Engineer at Cytel, where she works on product innovation and the development of analytical solutions that advance clinical trial design and simulation. She has extensive experience with East Horizon, particularly in implementing complex and bespoke trial designs through R integration. Her work includes developing technical resources that demonstrate East Horizon’s advanced capabilities, evaluating and stress-testing new features, and supporting tools such as Enforesys for patient enrollment prediction.

Her career spans pharmaceutical research, healthcare analytics consulting, and product-focused data science. Before joining Cytel, Julija led clinical trial simulation initiatives for a healthcare client, developed custom R packages, and collaborated closely with statisticians to conduct operating characteristic simulations. Earlier, she worked in the media sector, building predictive models, churn analyses, and monetization simulations to guide product and growth strategies. She also spent nearly a decade in pharmaceutical analytical development, specializing in the analytical characterization of biologics and small molecules, advancing laboratory automation, and supporting regulatory submissions.

Julija holds an MSc in Data Analytics from the University of Glasgow and an MChem in Chemistry with Mathematics from the University of Southampton.

Read full employee bio

Kyle Wathen

Vice President, Scientific Strategy and Innovation

Kyle brings experience from a diverse background in academia, consulting, and the life sciences industry to his role at Cytel. Working on the development and application of novel Bayesian methodology for adaptive clinical trial designs, he is involved in each step of developing new adaptive clinical trial designs, starting from initial concept development through software development/trial simulation and completing with trial conduct and data collection.

Kyle has over 25 years of experience in the design of innovative clinical trials such as ​​Bayesian approaches, ​​platform trials and other ​​adaptive approaches. He has been involved in many innovative clinical trials, especially platform trials, in various disease areas including ​oncology, neuroscience, infectious diseases, cardiovascular and inflammation. Additionally, he has released several software packages including OCTOPUS, an R package for simulation of platform trials.

Kyle received his M.S. in statistics from Texas A&M University and M.S. and Ph.D. in Biostatistics from the University of Texas: Graduate School of Biomedical Sciences.

Read full employee bio
Valeria Mazzanti

Valeria Mazzanti

Associate Director, Customer Success

Valeria Mazzanti is the Associate Director of Customer Success at Cytel. She is an expert in ​adaptive clinical trial design methodology and software, including our cutting-edge and industry-standard software such as Solara, East, and​​ EnForeSys, and now our more recently launched ​​East Horizon Platform.

Prior to joining Cytel, Valeria worked in several different academic research laboratories and has extensive teaching experience.

Valeria grew up in Milan, Paris and Geneva before completing a Master of Public Health degree specializing in Biostatistics at Columbia University in New York and a Bachelor of Science degree in Behavioral Neuroscience at UCLA.

Read full employee bio

Claim your free 30-minute strategy session

Book a free, no-obligation strategy session with a Cytel expert to get advice on how to improve your drug’s probability of success and plot a clearer route to market.

glow-ring
glow-ring-second