Solutions
About Us
Insights
Careers
Contact us
Contact Us
Customer Support
Customer Support

Simulating Multiple Endpoints While Including External Historical Data in Adaptive Oncology Trial Designs

Multiple endpoints are now the rule, not the exception

In many contemporary Phase III oncology programs, a single primary endpoint is no longer sufficient. While Overall Survival (OS) remains the gold standard and regulators still view it as the most direct measure of clinical benefit, in practice, OS takes time to mature leading to very long and expensive clinical trials. In metastatic settings with multiple subsequent lines of therapy, the signal can dilute over time. As a result, sponsors frequently structure confirmatory trials with OS on top of an endpoint that is faster to measure, such as Progression-Free Survival (PFS), and sometimes Overall Response Rate (ORR), incorporated either as dual primary endpoints or within a gatekeeping framework.

For example, a Phase III trial in non-small cell lung cancer (NSCLC) where PFS is expected to read out at ~18 months, while OS may require 36 months of follow-up. The sponsor hopes PFS will support regulatory interaction earlier, potentially even forming the basis of accelerated approval, while OS continues to mature for full approval. The accelerated approval may save the sponsor resources or may bring in additional resources while still following OS data accrual, as the OS evidence is still required by regulatory agencies for the final claim of success.

Although this seems straightforward, this approach fails to take into account all the complexities that may impact that final claim. These endpoints are correlated, mature at different rates, and are influenced by post-progression therapy, imaging frequency, and dropout patterns. Designing such a study requires more than separate power computations for each endpoint, it requires understanding how they behave together. This is where simulation becomes essential.

 

The statistical reality of correlated endpoints

Endpoints such as ORR, PFS, and OS are not independent random variables. They arise from the same underlying disease process. Patients who achieve early tumor shrinkage (i.e., ORR) often experience delayed progression. But that does not guarantee improved OS. Subsequent therapy, crossover, and differential dropout can attenuate survival differences. Many programs begin by assuming independence when calculating sample size or multiplicity adjustments. Unfortunately, that assumption rarely holds once joint behavior is modeled explicitly.

For example:

  • If ORR and PFS have moderate positive correlation (e.g., driven by response durability), the probability of dual success may be higher than naïve calculations suggest.
  • If OS is weakly correlated with PFS due to heavy post-progression treatment, hierarchical strategies may protect alpha but substantially reduce the probability of demonstrating statistical significance on OS.

Note that statisticians usually include a range of correlation coefficients between endpoints to evaluate their impact on overall operating characteristics of the trial.

The FDA will typically focus first on control of familywise type I error across endpoints. But during review, questions often shift toward interpretability:

  • How was correlation justified?
  • Were joint distributions modelled based on empirical data?
  • How sensitive are conclusions to deviations in event timing?

Those questions are difficult to answer with closed-form approximations alone.

 

Why closed-form calculations do not apply

Closed testing procedures, alpha recycling, and parallel gatekeeping frameworks are well-established tools for multiplicity control. From a theoretical standpoint, they provide strong familywise error control under specified assumptions, but operating characteristics become non-intuitive once endpoints are correlated and events accrue at different rates.

For example, let’s assume a hierarchical testing strategy where OS is tested first and fails narrowly due to immature data, PFS may never formally be tested, even if the PFS hazard ratio is clinically meaningful.

Alternatively, reversing the order (i.e., PFS tested first followed by OS) may increase the probability of declaring success on PFS, but now OS significance depends on passing through earlier gates. Power becomes conditional in ways that clinical teams often underestimate.

Simulating such designs allows evaluation of:

  • Probability of joint success (OS and PFS both significant)
  • Probability of partial success (e.g., showing significant PFS while OS is not yet mature)
  • Impact of varying correlation assumptions
  • Sensitivity to delayed event accrual
  • Effect of interim analyses on overall power

This helps clinical teams focus on actual operating characteristics under realistic assumptions instead of theoretical power under ideal ones. For example, in some settings, probability of winning on both endpoints may drop from 75% to around 50% when introducing correlation structures.

 

Modeling multiple endpoint outcomes

Traditional simulations often generate each endpoint independently from parametric survival distributions (e.g., using Exponential or Weibull curves). This is convenient, but not always clinically realistic. The FDA will often ask how simulation assumptions were calibrated. “We assumed independence” is not persuasive.

Therefore, modelling patient outcome data based on a multistate model may generate more credible data that aligns better with what will come to be in practice. This is certainly not the only approach, but one we encourage using on top of the copula approach where correlation coefficients between the endpoints must be specified.

Leveraging prior internal data, particularly standard-of-care arms from earlier studies, can anchor assumptions about:

  • Correlation between endpoints
  • Event-time distributions
  • Dropout rates
  • Missing data mechanisms

Alternatively, external historical data can also be used for this purpose. However, clinical teams must ensure proper evaluation for exchangeability of this data to the assumptions they are using it for, especially if disease management has shifted from when this data was collected.

 

Multiplicity control considerations

As previously mentioned, testing multiple primary endpoints requires strict familywise type I error control. Common approaches include:

  • Hierarchical gatekeeping
  • Alpha recycling
  • Closed testing procedures
  • Pre-specified adaptive decision rules

Under strong positive correlation, alpha allocation may be conservative relative to realized joint behavior. Under weak correlation, nominal power calculations may overstate the chance of dual success.

One area that is often overlooked is how interim analyses interact with multiplicity. Early looks based on PFS may alter the distribution of OS information at final analysis, particularly if enrollment slows after interim data are reviewed. That secondary impact is unfortunately rarely captured.

Simulations accounting for the multiple endpoints decisions may help characterize type 1 error control and power trade-offs in more realistic execution scenarios.

 

Integrating external and historical data

In oncology, prior data are often available, particularly for standard-of-care arms. Including empirically derived components, such as correlation and dropout rate assumptions, in simulation makes projections more defensible.

Regulatory agencies may still require conservative assumptions, but a simulation framework grounded in observed data allows transparent discussion of where assumptions are aggressive, where they are conservative, and why.

 

A practical perspective

Multiple primary endpoints introduce scientific opportunity and statistical complexity at the same time. There is a list of trade-offs that must be accounted for, including but not limited to, overcommitting on sample size, conditional power dependencies across endpoints, sensitivity to correlation structures, event timing uncertainty, and interim decision impacts.

Simulation, when built on joint patient-level modelling and calibrated to empirical data, allows these trade-offs to be evaluated prospectively rather than discovered after a database lock.

In our experience, teams that invest early in this level of simulations and endpoints modelling encounter fewer redesign discussions, particularly once regulatory feedback begins. More importantly, cross-functional stakeholders gain a clearer understanding of what “success” actually means across endpoints.

That clarity is often worth as much as the statistical precision itself.

 

Interested in learning more?

Join J. Kyle Wathen, Valeria Mazzanti,  and Julija Saltane for their upcoming webinar “Simulating Multiple Endpoints to Drive Late-Stage Oncology Trials” on Thursday, April 2 at 10 AM ET:

Master Protocols in Oncology Trials

A master protocol is defined as a protocol designed with multiple sub-studies, which may have different objectives and involve coordinated efforts to evaluate one or more investigational drugs in one or more disease subtypes within the overall trial structure. Master protocol trials include three trial designs: basket trials, umbrella trials, and platform trials.

FDA guidance released in March 2022 provides recommendations for master protocol trials.

In this blog, we discuss master protocol trial designs, challenges and best practices, and the benefit of these innovative designs in oncology trials.

 

Types of master protocol trials

Basket trials

Basket trials are designed to test a single investigational drug or drug combination in different populations defined by different cancers, disease stages for a specific cancer, histologies, number of prior therapies, genetic or other biomarkers, or demographic characteristics.

 

Umbrella trials

Umbrella trials are designed to evaluate multiple investigational drugs administered as single drugs or as drug combinations in a single disease population.

 

Platform trials

Platform trials are master protocols in which arm(s) can be dropped or added based on knowledge gained from previously evaluated parts of the trial.

 

Figure 1: Basket Trials, Umbrella Trials, and Platform Trials

Image credit: Park, J. J. H., Siden, E., Zoratti, M. J., Dron, L., Harari, O., Singer, J., Lester, R. T., Thorlund, K., & Mills, E. J. (2019). Systematic review of basket trials, umbrella trials, and platform trials: A landscape analysis of master protocols. Trials, 20.

 

Key challenges with master protocol trials

Master protocol trials are inherently complex due to their expansive scope and varied components. Let’s refine these challenges further:

 

Data management and analysis

  • Large amounts of data need efficient integration and processing.
  • Basket trials involve multiple indications and endpoint definitions, and/or response criteria may vary across the indications.
  • Umbrella trials have multiple drugs, leading to complex exposure and safety summaries.
  • Platform trials continuously add new treatment arms, generating a dynamic dataset that requires real-time integration and analysis. This necessitates robust data management systems capable of handling evolving data structures and ensuring consistency across various cohorts.

 

Safety profile considerations

  • Variability in drug effects requires tailored safety monitoring strategies.
  • Adverse events of special interest might need to be defined for each drug separately.

 

Biomarker data complexity

  • Data can be relatively large and complex.
  • Having the data transfer specifications at an early stage is important to ensure that the correct data will be received and in the expected format.
  • Intensive discussion might be needed with biomarker data specialists to define the rules for deriving biomarker/genomic profile of interest.
  • Mapping those data from raw data to SDTM can also be challenging.

 

Statistical Analysis Plan (SAP) and shell development

  • Potential additional complexity for statistical inference (e.g., adaptive features, multiplicity, and Bayesian methods).
  • Require the team to focus on the main objectives of the study, otherwise SAP and shell can become very extensive.
  • The number of tables, figures, and listings can grow significantly, making prioritization essential.
  • Layout complexities arise when need to display numerous columns across multiple cohorts.

 

Operational and reporting challenges

  • Each cohort may follow different timelines, complicating interim and final analyses.
  • Frequent reportings require good planning.
  • CSR(s) strategy (e.g., separate CSR for each cohort versus single CSR) should be defined sufficiently early.

Staying focused on the key study objectives is crucial to prevent data overload and inefficiencies in reporting. Exploratory analyses can be planned in a second step.

 

Comparative Overview: Basket vs. Umbrella vs. Platform Trials

(Click table to enlarge)

 

Final takeaways

Master protocol trials represent a transformative shift in clinical research — enabling the simultaneous evaluation of multiple therapies or disease subtypes under a unified framework. While designs like basket, umbrella, and platform trials offer flexibility and efficiency, they also introduce significant operational, statistical, and data management complexities.

Success is built on early planning, early discussion with safety and biomarker teams, and a focus on core study objectives to ensure meaningful insights and readiness.

Interim Decision-Making in Clinical Trials: A Focus on Sample Size Re-Estimation and Population Enrichment

In the evolving landscape of clinical trial design, flexibility and efficiency have become essential for success. Sample size re-estimation (SSR) and population enrichment — both adaptive trial design methods — use interim data to make informed mid-trial adjustments. While they address different aspects — SSR focusing on how many patients to enroll and population enrichment focusing on which patients to include — both approaches aim to optimize trial outcomes, reduce unnecessary exposure, and make better use of limited resources.

This blog explores how these two methods work, their statistical underpinnings, and how they can be used to build more ethical, targeted, and cost-effective trials.

 

Sample size re-estimation

Sample size re-estimation is a type of clinical trial design adaptation in which the sample size can be reassessed at an interim look, based on accumulated data. Over the years, this method has grown in popularity for several reasons:

  1. SSR designs address variability in an observed treatment effect when the treatment shows some promise, but the effect size is not as pronounced as originally expected.
  2. SSR designs produce more ethical trials, as they limit the number of patients exposed to treatment until sufficient efficacy evidence is collected.
  3. These designs provide flexibility in trial implementation in cases of hard-to-recruit patient populations or rare disease.
  4. They allow for gatekeeping of investment for biotech companies who may undergo additional scrutiny to justify additional R&D spend.
  5. They limit the pursuit of relatively small treatment effects that may not be clinically meaningful.

 

The CHW and CDL statistical methods for SSR

Following the seminal work on adaptive interim analysis by Bauer and Kohne (1994) and others, Cui, Hung, and Wang proposed a method that is today widely accepted in the field of biostatistics, combining statistics with pre-specified weights to preserve Type I error now known as CHW (1999). An alternative method proposed by Chen, DeMets, and Lan (2004) and known as CDL, provides an alternative to the use of the weighted statistic in a confirmatory two-arm, two-stage design where the sample size of the second stage is increased based on an unblinded analysis of the data at the first stage.

Both CHW and CDL are accepted by regulatory bodies such as the FDA in cases where such an adaptation is deemed appropriate. The CHW method applies a lower weight to the contributions of the second stage of the design relative to those of the first stage, and the CDL method permits the use of conventional statistics for testing the primary endpoint at the end of the study while still preserving Type I error.

 

Population enrichment

Population enrichment is a clinical trial design adaptation that allows for the use of data from an ongoing clinical trial to adjust the sample size of the entire study population, or a promising subpopulation based on a specific biomarker or other characteristics. At the outset, the overall trial population is enrolled in the study, regardless of biomarker status or other subgroup attribute. At the time of an interim analysis, a decision can be taken to either continue enrollment of the overall population, a subgroup of the population that is showing promise, or terminate the entire study for futility. Restricting enrollment to a specific subgroup enriches the data collected for this subpopulation.

There are several benefits for this adaptation, including:

  • Optimizing resource allocation by enriching promising subpopulations while avoiding continued investment in less-successful subpopulations.
  • It allows investigators the opportunity to examine a larger population while reducing the risk of trial failure or unnecessary drug exposure due to heterogeneity among the study’s subpopulations.
  • At the same token, it increases the probability of success of a study, by increasing the sample size of promising subgroups.

 

How to model SSR and population enrichment

Both CDL and CHW methods for sample size re-estimation and population enrichment, are adaptations that can be modeled using Cytel’s East Horizon™ platform. Find out more by booking a product demonstration.

 

Final takeaways

Sample size re-estimation and population enrichment approaches are powerful adaptations in the biostatistician’s toolbox for advanced, cost-effective, and ethical clinical trial design. They empower sponsors to allocate R&D resources more appropriately towards promising treatments, while limiting exposure of patients to potentially ineffective or harmful treatments.

Innovations in Clinical Trial Design for CNS Disorders

Clinical research in central nervous system (CNS) diseases has long been fraught with challenges. High failure rates, complex pathophysiology, variability in disease progression, strong placebo effects, and difficulties in recruitment and outcome measurement have made CNS disorders one of the riskiest areas for drug development. However, recent innovations in trial design — coupled with advances in digital health and statistical modelling — are transforming how we conduct clinical research in diseases like Huntington’s disease (HD), Alzheimer’s disease (AD), and multiple sclerosis (MS). This blog explores three recent trials that exemplify these innovations and proposes statistical advancements to strengthen their impact.

 

Adaptive designs in Huntington’s disease: The PIVOT-HD trial

Traditional fixed designs often struggle to efficiently explore dose-response relationships or adapt to emerging data. Adaptive trial designs offer a dynamic solution, particularly valuable in neurodegenerative diseases like Huntington’s disease, where treatment response and disease progression can vary widely.

Case study: PIVOT-HD trial (NCT05358717)

The PIVOT-HD trial, led by PTC Therapeutics, is a Phase II adaptive study evaluating the safety, pharmacodynamics, and early signs of efficacy of PTC518, a novel small-molecule HTT-lowering therapy. PTC518 modulates mRNA splicing to reduce levels of the mutant huntingtin protein, a key driver of HD pathology.

What sets this trial apart is its seamless adaptive design. The trial is structured to adjust dosing and the randomization ratios based on interim pharmacodynamic and safety readouts. By incorporating planned decision-making, PIVOT-HD minimizes exposure to ineffective doses and accelerates identification of promising therapeutic windows.

 

Digital biomarkers and remote monitoring in Alzheimer’s disease: The DETECT-AD trial

Cognitive decline in AD is insidious and can be difficult to quantify with infrequent clinic visits and subjective tests. Digital health technologies are revolutionizing outcome assessment through continuous, objective, and sensitive data collection.

Case study: DETECT-AD (Digital Evaluations and Technologies Enabling Clinical Translation in Alzheimer’s Disease)

The DETECT-AD initiative, part of a broader effort supported by the NIH and multiple research institutions, is employing wearables, mobile apps, and speech analysis to detect early signs of Alzheimer’s disease in at-risk populations.

In the DETECT-AD observational study, participants use smartphone apps and passive sensors to monitor activities like walking, typing speed, and even voice characteristics. These digital biomarkers are being correlated with traditional cognitive assessments and brain imaging data to predict cognitive decline before clinical symptoms emerge.

 

Platform trials in multiple sclerosis: The OCTOPUS trial

In diseases like MS, where multiple mechanisms may underlie relapses and progression, traditional “one drug, one trial” designs are increasingly inefficient. Platform trials offer a more flexible and scalable solution.

Case study: The OCTOPUS trial (UK MS Society)

The OCTOPUS (Optimal Clinical Trials Platform for Progressive MS) trial is the world’s first multi-arm, multi-stage platform trial in progressive MS. Spearheaded by the UK MS Society, this innovative study aims to test multiple repurposed therapies simultaneously, using a shared control group and adaptive design principles.

OCTOPUS promises faster answers with fewer patients and more efficient use of resources, particularly crucial in progressive MS where effective treatments are lacking.

 

Statistical challenges and opportunities

Despite these advances, several statistical hurdles remain. Novel designs require equally innovative statistical approaches to preserve validity and ensure robust interpretation.

Broader adoption of Bayesian statistical frameworks

Bayesian approaches allow the integration of prior knowledge (e.g., historical control data or early biomarkers) and offer probabilistic interpretations of trial results. In adaptive and platform trials, Bayesian methods facilitate:

  • Interim analyses with posterior probabilities guiding adaptations.
  • Dynamic borrowing from concurrent or historical control arms.
  • Greater flexibility in endpoint modelling across heterogeneous subgroups.

For example, the GBM AGILE platform trial in glioblastoma (a CNS tumor) successfully uses Bayesian methods to adapt enrollment and determine early stopping rules. A similar framework could benefit complex CNS conditions like MS or AD, where responses are highly individualized.

Incorporating real-world evidence (RWE) in trial planning and analysis

As clinical trials increasingly occur alongside large electronic health record (EHR) systems, real-world data (RWD) can inform trial design and enhance external validity. Specifically:

  • RWD can help refine eligibility criteria to better represent actual patient populations.
  • Real-world comparators can augment underpowered control groups or offer external validation.
  • Longitudinal RWE provides insight into long-term treatment effects beyond trial duration.

In Alzheimer’s disease, initiatives like the AHEAD 3-45 study are already incorporating observational cohorts and RWE in trial simulation and endpoint modelling.

 

The next generation of neuroscience trials

The future of CNS clinical trials is increasingly adaptive, digital, and data driven. Innovative designs like PIVOT-HD, DETECT-AD, and OCTOPUS illustrate the power of new methodologies to make trials more efficient, sensitive, and patient-centric. However, to fully realize their potential, we must integrate robust statistical techniques such as Bayesian modelling and real-world data frameworks. These tools will help overcome inherent complexities in CNS research and bring transformative treatments closer to patients in need.

As we look ahead, collaboration between statisticians, clinicians, regulators, and technology developers will be essential in shaping the next generation of neuroscience trials — where precision, agility, and real-world relevance are no longer luxuries, but necessities.

 

Interested in learning more?

Register now to watch James Matcham’s on-demand webinar, “Clinical Trial Design Innovation in CNS Disorders.” This webinar features a review of regulatory guidelines and showcase recent successful trials in Alzheimer’s disease and other neurological disorders.

Adaptive Pivotal Clinical Trials with Composite Hierarchical Outcomes

In the world of cardiovascular drug development, getting clear answers from clinical trials isn’t always straightforward, especially when multiple patient outcomes matter. Imagine a treatment that reduces the risk of death and hospitalizations and also improves a patient’s ability to walk. How do we capture that in a single statistic?

Enter hierarchical composite endpoints — a way to prioritize outcomes based on their clinical relevance, like putting survival first, hospitalizations second, and functional improvements third. While these endpoints offer a more holistic view of treatment benefit, they introduce statistical challenges. The Finkelstein-Schoenfeld (FS) statistic is specifically designed for analyzing hierarchical composite endpoints, where outcomes like death, hospitalizations, and functional improvements are prioritized by clinical importance. It’s particularly appropriate for cardiovascular and similar trials because it compares patients pairwise across this hierarchy.

However, adaptive design methods that provide adequate power, control Type I error, and allow for an unblinded sample size re-assessment for these kinds of endpoints remain less well explored. The authors have developed an adaptive trial design that integrates the FS statistic with adaptive sample size re-estimation (SSR). This method offers a powerful new way to evaluate therapies when multiple outcomes ranked by clinical importance drive the decision-making.

 

Why this matters for drug developers

  1. More meaningful endpoints: This clinical trial design reflects real-world clinical priorities. Instead of boiling the trial down to just one event (like time to first event), we use a hierarchical framework to give due weight to death, repeated hospital visits, and patient function — all in one analysis.
  2. Smarter use of resources: Trials often start with a best guess at the right sample size. But guesses can go wrong. With SSR, we analyze the data midway through the trial (without compromising trial integrity) to reassess whether the sample size should be adjusted — helping avoid underpowered or over-enrolled studies.
  3. Pragmatic decision rules: Based on interim results, we categorize trials into zones: “futile,” “promising,” or “favorable.” Promising trials get more patients, boosting their chance of success. Futile ones can stop early, sparing time and cost.
  4. Regulatory-friendly: Our method is built on well-accepted statistical principles, including those shown to preserve Type I error (false-positive rate), making it suitable for late-phase trials that could support drug approval.

 

Real-world utility

The design was inspired by a real-world cardiovascular trial, where the primary endpoint combined death, hospitalizations, and functional improvement over 12 months. Using simulations across various treatment scenarios, we demonstrated that our adaptive approach maintained statistical rigor while offering gains in power and efficiency, especially when treatment effects were modest but clinically meaningful.

 

Looking ahead

As composite and hierarchical endpoints become more common in therapeutic areas like cardiology, adaptive designs like ours will be key to unlocking their full value. They allow sponsors to detect benefits that matter to patients, regulators, and clinicians — without needing to inflate sample sizes unnecessarily.

In a landscape where trial costs are soaring and time-to-decision is critical, our approach offers a statistically sound, operationally feasible, and clinically intuitive path forward.

 

This was presented at the ENAR 2025 Spring Meeting in New Orleans on March 24, 2025.

 

Interested in learning more?

Optimizing Interim Looks in Group Sequential Adaptive Study Designs

What are group sequential study designs?

Group sequential study designs include predetermined interim analyses (interim looks) in an ongoing clinical trial, to allow researchers the potential for stopping the trial earlier than the planned final analysis due to overwhelming evidence for success (efficacy), failure (futility), or safety concerns that arise from accumulating study data. Special considerations must be given to the preservation of Type-I error with the implementation of such interim looks, and several approaches have been developed over the years to control Type-I error, including those by Stuart Pocock, Peter O’Brien, and Thomas Fleming.

 

What are key considerations of group sequential designs?

There are several advantages for incorporating an interim look or looks in a study design, including the potential for more limited patient exposure, more efficient use of resources, time savings, and increased probability of success. Study design teams must weigh these considerations and agree on their strategic priorities before implementing group sequential design features. Specific points for consideration include the number and timing of interim analyses, and the stopping rules or thresholds used to declare early efficacy or futility.

 

Interim look timing

The timing of an interim look can be critical for the success of the group sequential approach. Performing the analysis too early may mean not enough information is available to make an informed decision; too late, and the benefits of the approach diminish significantly. Running extensive simulations across a variety of potential analysis time points can prove beneficial in selecting the optimal timeframe, balancing the team’s strategic priorities. Adding more than one interim look may prevail as a preferred approach, allowing for early stopping for futility only, with later look, or looks, focused on gains in early efficacy stopping (see schematic 1 below).

 

Schematic 1: A study with two interim looks: An early futility and later efficacy assessment

 

Early stopping rules

Setting the correct stopping rules for early efficacy and/or futility is also paramount in designing a robust clinical trial. If an early stopping threshold for futility is set incorrectly, it can lead to the termination of a promising treatment due to limited data. Conversely, setting a stopping rule for efficacy which is too aggressive, may lead to premature trial termination with inaccurate results. Here too, extensive simulation of trials with a variety of stopping rules for both efficacy and futility can help optimize these thresholds and the potential savings from these trial designs.

 

Schematic 2: Stopping boundaries for efficacy and futility: An interim look at 50% information fraction

 

A closer look at the benefits of implementing group sequential designs

Group sequential designs offer several key benefits in clinical trial practice:

  • Design trials that are more ethical: accurate decision rules for early stopping either for futility or efficacy can reduce the number of patients required for enrollment in a clinical trial and reduce unnecessary exposure of patients to potentially ineffective or harmful treatments.
  • Design trials with more efficient resource use: including interim looks in a study can lead to savings in both the timing and cost of clinical trials. Adaptive designs with interim analyses are shorter in overall average duration and average cost when compared to similar fixed study designs with no interim analyses. These savings are gained through the thoughtful implementation of early stopping rules.
  • Design trials with a higher probability of success: adaptive designs with interim analyses demonstrate and a higher average probability of success compared to fixed study designs. These benefits is especially pronounced when the true underlying treatment effect is clear at an early study stage (either beneficial or inefficacious).

 

Overall, interim analyses are an important feature in adaptive clinical trial design, and when well planned and executed, can lead to benefits and savings in clinical trial execution.

 

Group sequential designs now available in the East HorizonTM platform

Cytel’s East Horizon platform now includes a Group Sequential module. This module offers statisticians the ability to compute and simulate single-arm and two-arm study designs with interim looks. The module allows users to select and optimize the number and timing of interim looks and the boundaries for efficacy and futility through advanced simulation and analysis tools.

Cytel’s East Horizon Group Sequential Module is the second in a series of six revamped cornerstone components of Cytel’s new cloud-based trial design platform. In combination with other platform components, the module provides statisticians with the tools needed for design, optimization, and selection of adaptive clinical trials with interim analyses.

Oncology Drug Development Under Project Optimus: Case Studies

Conducting a successful oncology development program under Project Optimus requires increased emphasis on determining the optimal dose for the compound under study. Rather than a singular focus on the maximum tolerated dose (MTD), oncology drug development under Project Optimus requires one to develop an approach based on all available data. This includes safety, response rate, biomarker responses, and pharmacokinetics.

The increased emphasis on determining the optimal dose has led to several changes in how clinical trials for oncology drugs are conducted. Here, we will describe several case studies that will demonstrate how innovative study designs and clinical pharmacology may be used to speed development of oncology assets under Project Optimus.

 

Dose escalation in oncology drug development

There are three main goals of dose escalation in oncology drug development: to determine 1) the dose range where efficacy might be safely explored; 2) the maximum tolerated dose, if obtainable; and 3) the minimum active dose.

A wide range of designs for dose escalation can be used, the majority of which fall into one of two categories: algorithm-based and model-based designs. These design categories differ in several ways, as illustrated in Figure 1.

 

Figure 1: Comparison of algorithm and model-based methods

 

Algorithm-based methods are conventional design methods that use prespecified rules to determine dose escalation and de-escalation. The classic 3+3 design, for example, is still used fairly frequently, despite its documented shortcomings. For example, the “3+3” design may recommend a Phase 2 dose that is too high, it is unable to include intermediate doses, and there are difficulties with handling cohort numbers that are not multiples of three.

 

Model-based methods, on the other hand, have significant advantages over algorithm-based methods, since prior information may be used. These adaptive design methods may provide information on intermediate doses not studied. However, because the “3+3” design has been in use for so long, there is considerable inertia among trialists to adopt better designs.

Newer algorithmic designs, such as mTPI-2, BOIN, i3+3, and model-based designs, such as BLRM, should be carefully considered.

In the age of targeted immune-oncology agents, the concept of the Maximum Tolerated Dose (MTD) is assuming less importance, as the optimally efficacious dose of these products is usually lower than the MTD. There are newer study designs that consider not only toxicity, but also efficacy. These designs, such as J3+3, PRINTE, TEPE, EFFTOX and UBOIN, are more suited for modern targeted agents.

 

Pharmacokinetics and pharmacodynamics in oncology drug development: Case studies

There are many reasons to closely monitor pharmacokinetics (PK) during the initial dose escalation phase, including confirmation of exposure predictions, sufficient bioavailability (for oral or subcutaneous drugs), and the potential need for changes in the infusion rate, sampling scheme, or dosing regimen.

To illustrate this: in one case we encountered, the observed exposure was considerably different than what was predicted, and so considerable re-work of doses and dosing regimens had to be performed. The good news is that this was done early, thus minimizing the number of patients exposed to sub-optimal doses.

In another example, poor oral bioavailability was observed early in a dose escalation trial, thus allowing the trial to close early, again minimizing the number of patients treated with a sub-optimal dosing regimen. Building PK models of your drug early allows quick evaluation of the impact of different regimens and infusion times.

Dosing of oncology agents based on some measure of body size has a long history in oncology. Most immune-oncology agents are dosed based on body weight, as clearance of monoclonal antibodies (mAbs) is proportional to weight. Therefore, weight-based dosing is often used in the initial Phase 1 trial. In later studies, weight-based dosing may result in increased costs, as patient kits will have to contain extra vials of the drug to account for the wide range of patient weights that may be encountered. This can result in considerable waste if these extra vials are not used. Post-approval, weight-based dosing can also result in waste, as more than one vial of the drug may have to be used for larger patients, with the remaining portion in the second vial being discarded. To transition from weight-based to fixed dosing, one should perform simulations of weight-based and fixed doses and choose the fixed dose that most closely matches the weight-based exposure (AUC or Cmax).

The role and advantages of measuring pharmacodynamic biomarkers in oncology is sometimes not clear. There are no established surrogate endpoints in oncology, so what do these markers add to your program? The answer is evidence of target engagement. A biomarker closely linked to the mechanism of action of the drug, which changes in response to various doses of the compound, gives added “reason to believe” in the new drug’s mechanism of action.

Exposure-response analyses also help evaluate the relationship between exposure and safety. Evidence of maximum target engagement, coupled with safety and efficacy data, adds credence to the overall data set. (For two case studies illustrating the usefulness of exposure-response data in helping to interpret the overall safety and efficacy data, watch the webinar linked below). One case we encountered showed how exposure-response data for both a safety endpoint and a target engagement endpoint helped with the interpretation of efficacy data, which was promising, but difficult to interpret. The use of biomarkers in an exposure-response context allowed the selection of doses for Phase 2. Another case study showed that, based on a clear exposure-response relationship with a safety endpoint, it would be advantageous to study additional patients at lower doses in order to find a lower dose with the same level of efficacy, but a lower level of toxicity.

 

Final takeaways

The goal of any study under Project Optimus is to determine the optimal dose, not necessarily the MTD, which is of lesser importance. Newer study design options for immune-oncology products developed under Project Optimus should be considered. These designs have significant advantages over the classical “3+3” design. The pharmacokinetics of your compound should be quantified as early as possible in development in order to 1) confirm exposure, 2) confirm sufficient bioavailability (if administered orally), and 3) investigate whether fixed-dosing (rather than weight-based dosing) may be used. Performing exposure-response analyses using reliable biomarkers can aid in decision-making on doses, by showing target engagement. Exposure-response analyses using safety endpoints can also be extremely helpful in determining doses to take further into development.

 

Interested in learning more? Our recent webinar, “Oncology Drug Development Under Project Optimus: Case Studies” gives a full breakdown of the case studies mentioned here, and more. Watch on demand:

Adaptive Population Enrichment Designs in Oncology Trials

Enrichment strategies in oncology clinical trials have become increasingly important in the era of precision medicine. These strategies involve selecting patients with specific pre-treatment characteristics that may make them more likely to respond to a targeted therapy, thereby increasing the efficiency and effectiveness of the clinical trial.

In oncology, enrichment often involves selecting patients based on specific biomarkers or genetic mutations that are associated with the drug’s mechanism of action. For example, trials for drugs targeting HER2/neu in breast cancer or EGFR mutations in lung cancer often use enrichment strategies to include only patients whose tumors express these markers. This approach not only increases the likelihood of detecting efficacy but also helps identify the patient population most likely to benefit from the treatment.

 

FDA guidance on enrichment strategies for clinical trials

The FDA has issued guidance on enrichment strategies for clinical trials. This guidance defines enrichment as the prospective use of patient characteristics to select a study population more likely to demonstrate a drug effect. The guidance outlines three main categories of enrichment strategies:

  1. Strategies to decrease heterogeneity, which aim to reduce variability and increase study power.
  2. Prognostic enrichment strategies, which select patients with a higher likelihood of having a disease-related endpoint event or substantial worsening of condition.
  3. Predictive enrichment strategies, which select patients more likely to respond to the drug based on physiological, disease characteristics, or previous response to similar drugs.

 

The FDA encourages the use of these strategies to enhance the understanding of the benefit-risk relationship in both the overall and the enriched population. They also emphasize the importance of properly describing study findings in drug labeling.

 

Benefits and trade-offs of adaptive population enrichment designs

Adaptive population enrichment designs offer sponsors additional flexibility, allowing for adjustment of eligibility criteria based on accumulating data during the trial, potentially leading to more efficient drug development and better-targeted therapies for cancer patients.

These designs start by enrolling a broad patient population but have the flexibility to restrict future recruitment after an interim analysis to patient subgroups showing greater treatment benefit. Trials designed in accordance with these principles simultaneously evaluate treatment effects in both the overall population and specific subpopulations of interest, while maintaining statistical power. By allowing for data-driven adjustments to the study population, adaptive population enrichment designs can increase trial efficiency, direct resources toward promising subgroups, and improve the likelihood of identifying effective treatments for specific patient sub-populations.

However, adaptive population enrichment designs also present several statistical challenges that require careful planning and consideration. One of the primary issues is controlling the Type I error rate, as these designs involve interim unblinded analyses and potential changes to the study population. This necessitates the use of specialized statistical methods to ensure the validity of the trial results.

Sample size determination is another critical aspect that demands thorough planning. Sponsors must consider various scenarios, including different treatment effects in subpopulations and potential adaptation decisions, to ensure adequate statistical power for detecting treatment effects in both the overall population and selected subgroups. The pre-specification of adaptation rules, hypothesis tests, and statistical methods for combining data from different stages of the trial is also essential for maintaining the integrity of the study.

Finally, there are also trade-offs and considerations regarding the timing of the interim analyses, the underlying prevalence of the sub-populations, the magnitude of the differential effects.

 

Final takeaways

Enrichment strategies can increase the efficiency and effectiveness of oncology trials by selecting patients more likely to respond to a targeted therapy. However, while adaptive population enrichment designs allow for adjustments based on interim data, their complexity introduces statistical challenges that require careful planning. Despite these challenges, the ability to direct resources toward subgroups showing promise holds significant potential for accelerating the development of cancer therapies.

 

 

Interested in learning more? Register today for our webinar “Oncology Clinical Trials: Design Considerations in Adaptive Population Enrichment Trials” on October 9, 2024.

The webinar will provide a comprehensive overview of statistical aspects of adaptive enrichment trials, regulatory requirements for pre-specification of design elements, and benefits and trade-offs, as well as insights from past engagements with sponsors and regulatory agencies.

Oncology Clinical Trials: Design Trends in Biomarker Research

Oncology research has seen many changes and advances in recent decades, from new therapies in combination with backbone chemotherapy to novel treatments targeting malignancies, and compounds targeting specific disease biomarkers at the genetic mutation level. The latter approach has called to question large, relatively long clinical studies assessing the safety and efficacy of treatments against a large population defined at the tumor level. Rather, research at the subpopulation or biomarker level has garnered much more interest as targeted treatments are being developed.  

This focus on subpopulations and biomarkers is changing how researchers approach clinical trials in oncology and helps resolve several issues with larger clinical trials. For example, treatment effects may be diluted in a heterogeneous population, possibly resulting in an underpowered study. Furthermore, a large trial in a heterogeneous population may place patients for whom the drug is ineffective at risk of serious adverse events. On the other hand, restricting enrollment to a target subgroup without sufficient evidence may deny a large segment of the patient population access to a potentially beneficial treatment. This blog post will briefly introduce two statistical approaches addressing the rise of more specific study populations: predefined subpopulation statistical analysis in the context of a larger trial population and population enrichment of the more promising subgroup within an ongoing study. 

Subpopulation Analysis 

Subpopulation testing and analysis is a phase III clinical trial design strategy in which a subset of the study population is selected based on patient characteristics that may be more likely to respond to the treatment under investigation. Identifying and analyzing specific subpopulations allows the researcher to explore whether a treatment leads to different effects in a pre-designated subpopulation. A subpopulation can be defined by any stratification characteristic such as gender or geography, and in oncology clinical trials, specific biomarkers identified within a study population. 

This type of approach to clinical research has several significant benefits in Oncology studies: 

  • A large trial in a heterogeneous population may place patients for whom the drug is ineffective at risk of serious adverse events. 
  • In a heterogenous population, the treatment effect may be diluted, possibly resulting in an underpowered study. 
  • Restricting enrollment to the targeted subgroup without sufficient statistical evidence of lack of efficacy in the non‐targeted subgroup may eliminate beneficial treatment options for patients. 
  • Subpopulation analysis allows for treatment recommendations based on individual characteristics. 

As with any novel adaptive design approach, subpopulation analysis requires several considerations at the design stage. These considerations include the specific definition of the subpopulations for analysis in the study, the appropriate timing for an interim analysis, the methods used for hypothesis testing and type-1 error preservation, and the sequence of hypothesis testing of the different subpopulations and/or the full study population.  

With these considerations in mind, rigorous planning and testing in the design stage of such a clinical trial is critical. Cytel’s East Horizon adaptive clinical trial design software offers a unique solution for the planning and testing of a clinical trial design that includes subpopulation analysis. In Cytel’s solution, hypothesis testing for the full and subpopulations can be performed using graphical multiple comparison procedures (gMPC) with a weighted Bonferroni procedure employed for closed testing. This method of hypothesis testing uses directed, weighted graphs where each node corresponds to a single hypothesis. A transition matrix is used as a complement to specify the weights and generate an intuitive diagram. Finally, a simple algorithm sequentially tests the individual hypotheses using the specified weights and hierarchies. 

 

Population Enrichment 

Population Enrichment is an adaptive clinical trial approach that includes the prospective use of any patient characteristic to obtain a study population in which detection is more likely than in the unselected population. There are two types of population enrichment: Prognostic Enrichment, in which a high-risk patient population is identified based on a biomarker, and Predictive Enrichment, in which the researchers identify a patient group more likely to respond to treatment. Some industry trends that have contributed to the popularization of this adaptive design method include the soaring costs of clinical trial execution, a move away from a “one-size-fits-all” approach to clinical development, and the rising interest in individualized medicine. This adaptive design approach has several benefits, including the identification of highly responsive patient populations, the efficient detection of a treatment effect in a smaller sample size, and the ability to identify beneficial treatments for a subgroup of patients that may have failed with a broader population in a more traditional study design.  

Population enrichment can be seen as an extension of the sample size re-estimation (SSR) methodology, which we discussed in more depth in a previous blog post. 

In the enrichment adaptive approach, a pre-specified number of subjects comprising the entire population, designated as cohort 1, is tested in an interim analysis, and a data monitoring committee reviews the results to assess efficacy or futility against predetermined thresholds. Suppose the analysis shows promising results for only a specific subpopulation of interest in the study, this population is “enriched” with additional patient enrollment in the remaining number of subjects of the study, designated as cohort 2, to enhance data collection for only this subgroup of interest and increase the overall probability of success of the study. As with any adaptive approach, this method has specific considerations, including closed testing with a p-value combination, the preservation of type-1 errors, and additional special considerations requiring attention in event-driven trials like most oncology ones.  

 

Final Takeaways 

Both subpopulation analysis and population enrichment are adaptive approaches to modern trial designs in oncology that offer great hope for researchers and patients alike. As the focus on specific patient populations narrows, these adaptive design types are gaining industry traction. Software-guided clinical trial design and simulation using tools such as East Horizon ensure adaptive elements are incorporated thoughtfully and are rigorously tested prior to trial launch. 

Learn more about these approaches in our upcoming webinar ‘’Oncology Clinical Trials: Design Trends in Biomarker-Driven Research’’ with Boaz Adler and Valeria Mazzanti.

The Role of External Data in Oncology Drug Development

Randomized controlled trials (RCTs) remain the gold standard for the evaluation of the safety and effectiveness of a new treatment. However, in a number of cases alternative approaches leveraging external data (i.e., data from outside of a clinical trial) — ranging from single arm trials to augmented RCTs — can be appropriate. Here, we discuss how to leverage and incorporate external data in drug development, focusing on the use of external control arms and Bayesian borrowing  

Read more »