Software Archives -

Simulating Multiple Endpoints While Including External Historical Data in Adaptive Oncology Trial Designs

Multiple endpoints are now the rule, not the exception

In many contemporary Phase III oncology programs, a single primary endpoint is no longer sufficient. While Overall Survival (OS) remains the gold standard and regulators still view it as the most direct measure of clinical benefit, in practice, OS takes time to mature leading to very long and expensive clinical trials. In metastatic settings with multiple subsequent lines of therapy, the signal can dilute over time. As a result, sponsors frequently structure confirmatory trials with OS on top of an endpoint that is faster to measure, such as Progression-Free Survival (PFS), and sometimes Overall Response Rate (ORR), incorporated either as dual primary endpoints or within a gatekeeping framework.

For example, a Phase III trial in non-small cell lung cancer (NSCLC) where PFS is expected to read out at ~18 months, while OS may require 36 months of follow-up. The sponsor hopes PFS will support regulatory interaction earlier, potentially even forming the basis of accelerated approval, while OS continues to mature for full approval. The accelerated approval may save the sponsor resources or may bring in additional resources while still following OS data accrual, as the OS evidence is still required by regulatory agencies for the final claim of success.

Although this seems straightforward, this approach fails to take into account all the complexities that may impact that final claim. These endpoints are correlated, mature at different rates, and are influenced by post-progression therapy, imaging frequency, and dropout patterns. Designing such a study requires more than separate power computations for each endpoint, it requires understanding how they behave together. This is where simulation becomes essential.

The statistical reality of correlated endpoints

Endpoints such as ORR, PFS, and OS are not independent random variables. They arise from the same underlying disease process. Patients who achieve early tumor shrinkage (i.e., ORR) often experience delayed progression. But that does not guarantee improved OS. Subsequent therapy, crossover, and differential dropout can attenuate survival differences. Many programs begin by assuming independence when calculating sample size or multiplicity adjustments. Unfortunately, that assumption rarely holds once joint behavior is modeled explicitly.

For example:

If ORR and PFS have moderate positive correlation (e.g., driven by response durability), the probability of dual success may be higher than naïve calculations suggest.
If OS is weakly correlated with PFS due to heavy post-progression treatment, hierarchical strategies may protect alpha but substantially reduce the probability of demonstrating statistical significance on OS.

Note that statisticians usually include a range of correlation coefficients between endpoints to evaluate their impact on overall operating characteristics of the trial.

The FDA will typically focus first on control of familywise type I error across endpoints. But during review, questions often shift toward interpretability:

How was correlation justified?
Were joint distributions modelled based on empirical data?
How sensitive are conclusions to deviations in event timing?

Those questions are difficult to answer with closed-form approximations alone.

Why closed-form calculations do not apply

Closed testing procedures, alpha recycling, and parallel gatekeeping frameworks are well-established tools for multiplicity control. From a theoretical standpoint, they provide strong familywise error control under specified assumptions, but operating characteristics become non-intuitive once endpoints are correlated and events accrue at different rates.

For example, let’s assume a hierarchical testing strategy where OS is tested first and fails narrowly due to immature data, PFS may never formally be tested, even if the PFS hazard ratio is clinically meaningful.

Alternatively, reversing the order (i.e., PFS tested first followed by OS) may increase the probability of declaring success on PFS, but now OS significance depends on passing through earlier gates. Power becomes conditional in ways that clinical teams often underestimate.

Simulating such designs allows evaluation of:

Probability of joint success (OS and PFS both significant)
Probability of partial success (e.g., showing significant PFS while OS is not yet mature)
Impact of varying correlation assumptions
Sensitivity to delayed event accrual
Effect of interim analyses on overall power

This helps clinical teams focus on actual operating characteristics under realistic assumptions instead of theoretical power under ideal ones. For example, in some settings, probability of winning on both endpoints may drop from 75% to around 50% when introducing correlation structures.

Modeling multiple endpoint outcomes

Traditional simulations often generate each endpoint independently from parametric survival distributions (e.g., using Exponential or Weibull curves). This is convenient, but not always clinically realistic. The FDA will often ask how simulation assumptions were calibrated. “We assumed independence” is not persuasive.

Therefore, modelling patient outcome data based on a multistate model may generate more credible data that aligns better with what will come to be in practice. This is certainly not the only approach, but one we encourage using on top of the copula approach where correlation coefficients between the endpoints must be specified.

Leveraging prior internal data, particularly standard-of-care arms from earlier studies, can anchor assumptions about:

Correlation between endpoints
Event-time distributions
Dropout rates
Missing data mechanisms

Alternatively, external historical data can also be used for this purpose. However, clinical teams must ensure proper evaluation for exchangeability of this data to the assumptions they are using it for, especially if disease management has shifted from when this data was collected.

Multiplicity control considerations

As previously mentioned, testing multiple primary endpoints requires strict familywise type I error control. Common approaches include:

Hierarchical gatekeeping
Alpha recycling
Closed testing procedures
Pre-specified adaptive decision rules

Under strong positive correlation, alpha allocation may be conservative relative to realized joint behavior. Under weak correlation, nominal power calculations may overstate the chance of dual success.

One area that is often overlooked is how interim analyses interact with multiplicity. Early looks based on PFS may alter the distribution of OS information at final analysis, particularly if enrollment slows after interim data are reviewed. That secondary impact is unfortunately rarely captured.

Simulations accounting for the multiple endpoints decisions may help characterize type 1 error control and power trade-offs in more realistic execution scenarios.

Integrating external and historical data

In oncology, prior data are often available, particularly for standard-of-care arms. Including empirically derived components, such as correlation and dropout rate assumptions, in simulation makes projections more defensible.

Regulatory agencies may still require conservative assumptions, but a simulation framework grounded in observed data allows transparent discussion of where assumptions are aggressive, where they are conservative, and why.

A practical perspective

Multiple primary endpoints introduce scientific opportunity and statistical complexity at the same time. There is a list of trade-offs that must be accounted for, including but not limited to, overcommitting on sample size, conditional power dependencies across endpoints, sensitivity to correlation structures, event timing uncertainty, and interim decision impacts.

Simulation, when built on joint patient-level modelling and calibrated to empirical data, allows these trade-offs to be evaluated prospectively rather than discovered after a database lock.

In our experience, teams that invest early in this level of simulations and endpoints modelling encounter fewer redesign discussions, particularly once regulatory feedback begins. More importantly, cross-functional stakeholders gain a clearer understanding of what “success” actually means across endpoints.

That clarity is often worth as much as the statistical precision itself.

Interested in learning more?

Join J. Kyle Wathen, Valeria Mazzanti, and Julija Saltane for their upcoming webinar “Simulating Multiple Endpoints to Drive Late-Stage Oncology Trials” on Thursday, April 2 at 10 AM ET:

Finding the Optimal Biological Dose with New PKBOIN-12 Method

With the rise of targeted and immunotherapies, we have recently seen a shift away from finding a drug’s maximum tolerated dose (MTD) in Phase II dose-finding studies and toward identifying the optimal biological dose (OBD) — the dose that optimally balances safety, tolerability, and early efficacy. A new method, PKBOIN-12, extends the BOIN12 framework to integrate Pharmacokinetic (PK) parameters to refine the dose-finding and final OBD selection.

Here, we discuss PKBOIN-12, recent regulatory shifts regarding dose finding, including the FDA’s Project Optimus, and Cytel’s East Horizon™ dose-finding module.

What is PKBOIN-12?

PKBOIN-12, developed by Dr. Hao Sun of Bristol Myers Squibb and Tu Jieqi of the University of Illinois Chicago, is an innovative dose-finding method that enhances the established BOIN12 algorithm by incorporating Pharmacokinetic (PK) information into the Optimal Biological Dose (OBD) determination process. In recent years, particularly with the rise of targeted and immunotherapies, the focus in early-phase dose-finding studies has shifted away from finding the Maximum Tolerated Dose (MTD) and toward identifying the OBD, the dose that optimally balances safety, tolerability, and early efficacy.

BOIN12 is one such method that assesses both safety and efficacy, but, like many dose-finding designs, it typically does not formally use auxiliary data. Researchers routinely collect PK measurements in order to characterize drug exposure associated with the various tested dose levels, but these are not usually incorporated into the risk-benefit analysis when designing clinical trials. PKBOIN-12 addresses this by extending the BOIN-12 framework to integrate collected PK data to refine the dose-finding and final OBD selection.

Indeed, simulation results comparing PKBOIN-12 and BOIN-12 demonstrate that the former more effectively identified the OBD and allocated a greater proportion of patients to that optimal dose.

Project Optimus: A regulatory shift toward the OBD

In addition to the general industry trend in collecting and considering a broader set of data in early-phase dose-finding oncology studies, we have seen a real shift in regulatory interest in this area, encapsulated in the FDA’s Project Optimus.

In a previous blog post, James Matcham and Michael Fossler highlight how a recognition of the changing nature of oncology therapies — away from chemotherapies and towards more advanced biologics — necessitated a change in how these products are developed and assessed for efficacy and safety.

Project Optimus posits that the dose-finding paradigm must shift away from safety and tolerability alone, and towards incorporating efficacy considerations at this stage. An ideal dose-finding study under the Project Optimus lens emphasizes the determination of a dose range that does not focus on the MTD, but rather the OBD, or the dose range that considers efficacy, tolerability, safety, and pharmacokinetics.

PKBOIN12 is therefore well-suited to meet the challenges presented by Project Optimus and is indeed at the forefront of both industry trends and regulatory expectations.

Dose finding with the East Horizon™ platform

Cytel’s software development teams will soon be launching the dose-finding module, the sixth installation of the East Horizon platform. This module completes an almost two-year journey of migrating Cytel’s flagship software heritage, East, into a cloud-native, modern, and updated East Horizon platform. Over these months, our teams worked tirelessly to select from our wide repertoire of software solutions, those features, methods, and tests most relevant to our user base, and thoughtfully curated additional frequentist and Bayesian methods that are completely new for Cytel software. One such method is the new PKBOIN-12 dose-finding method.

Interested in learning more?

On November 18, 2025, Cytel will host Dr. Hao Sun for a webinar to discuss this new method in depth, and to highlight the technical as well as tactical aspects of implementing this method. Register today and join us for a fascinating conversation:

Career Perspectives: A Conversation with Vidyadhar Phadke

In this edition of our Career Perspectives series, we had the pleasure of speaking with Vidyadhar Phadke, Director, Software Statistics at Cytel. Moving from theoretical statistics to applied statistics, Vidyadhar discusses his career journey, the power of revolutionary software to design clinical trials more efficiently and improve patients’ lives, emerging trends, and more.

Can you give us a little background on your career so far?

I was born and brought up in a small village near Pune, India. After completing my Masters in Statistics from Pune University, I went to Bowling Green State University in Ohio, USA for my PhD in Statistics. My PhD was in theoretical statistics, but I knew I wanted to work within applied statistics. And although I didn’t have a background in clinical trials, my wish to focus on applied statistics is how I ended up working for Cytel, where I’ve been for the last 16 years.

Initially, I worked as an individual contributor, testing statistical algorithms and then started writing specifications for engineering teams, combining my statistical knowledge with software design and development for the first time. As I grew within Cytel, I started managing other statisticians and currently work as Director of Statistics in our Software division.

I really enjoy discussing advanced algorithms and features with highly accomplished experts like Cyrus Mehta, James Bolognese, and Pantelis Vlachos as well as our product managers.

You’ve grown from a role within statistics to a software role at Cytel, eventually becoming Director of Software Statistics. What motivated that shift, and how did it come about?

This shift came about slowly. After working as an individual contributor testing the algorithms, I started appreciating the power of multiplying efforts through collaboration. Not only was our software and product better because of the collaboration between statisticians and developers, but I also felt like through software, the impact of our work could be multiplied. Working as a statistician on a study would allow me to help one study and one therapy at a time, but through the software, we could help multiple studies and organizations at once.

I also developed an interest in leadership and started reading more about related topics. As Cytel grew, I got opportunities to grow alongside it professionally and contribute at a higher level.

As someone who has been with Cytel for 16 years now, what has kept you motivated and engaged throughout this journey?

I really love our founders’ vision of designing clinical trials more efficiently to improve the lives of patients. I also believed that by creating revolutionary software we can multiply this impact. I loved our culture that has always promoted asking questions and valued opinions. The combination of these aspects keeps me enthusiastic and motivated.

You’ve spoken about the power of software to have a large impact, supporting many clients, and ultimately patients, at once. What’s one product or feature you’ve worked on that really exemplifies that, and what was your role in shaping it?

A few years ago, we started working on our East Horizon™ platform, which can simulate a large amount of trial designs and optimize to improve trial duration, ultimately saving crucial time for patients. I had the privilege of working on this platform from its conception while I was in Boston. The brainstorming sessions and interactions with other experts were incredibly enjoyable! Now, our platform has launched, and we’re already able to see the difference it can make for clients and patients.

Could you share a project you have worked on that you feel particularly proud of, and why?

I got an opportunity to work on improving the underlying probability computations at the back end of our proprietary engines for advanced adaptive designs. I enjoyed this work immensely, which involved core mathematics and made our engines more robust.

How do you balance statistical rigor with user-friendly product design when contributing to software development?

Although I am not an expert in UX development, over the years, I have learned that understanding user personas and their pain points is very important when developing software. Presenting complicated ideas in intuitive UX while not overwhelming the user is a form of art.

You’ve been described as instrumental in building a strong statistical foundation at Cytel. What were some of the key challenges and milestones in that journey?

Understanding all our products and the statistical theory behind them was very important to me so I focused on teaching myself that. Then I learned how to discuss those and communicate clearly with non-statisticians.

Each software platform has been a milestone for me. After I joined Cytel, we expanded the capabilities of the East and Xact products with more advanced features, helping our customers design more complicated and efficient adaptive trials. And for the last few years, we have been building our East Horizon™ platform, which allows our customers to benefit from modern technology, cloud compute power, and the use of AI.

These expansions and the launch of a new software platform required me to learn more about advanced biostatistical methods as well as agile methods to deliver faster.

What does a typical day look like for you, and how do you maintain balance between deep technical and expertise work and leadership responsibilities?

A typical day for me involves discussing requirements with experts and product managers. I work with the engineering team by explaining features and answering their questions, and I guide statisticians on developing algorithms and testing them. During my career, I learned the importance of delegation and mentoring my team to develop a problem-solving attitude. Our work is very niche and often has a steep learning curve, so initially I helped out more. Sometimes, it can be a little tricky to change your mindset quickly and manage your time effectively. For the last few years, I’ve thus focused on learning more about time management. Dave Cranshaw’s time management method is one I find particularly helpful!

How did you navigate the transition from an individual contributor in a statistics-heavy role to managing and leading others? What advice would you give to someone looking to grow into a leadership role from a technical background?

Having an appreciation of the power and influence you have is incredibly important. Honesty, trust, and living by the values you preach are crucial if you want to be an effective leader. I’m not a fan of strict hierarchy, and always found that the culture at Cytel allowed me to ask tough questions ― even to Cyrus Mehta, our co-founder ― which I absolutely love. I think it is useful to remain involved in technical work, at least part-time, even for a leader, because that gives you a better perspective and understanding.

How do you see the role of statisticians evolving in the development of software products at Cytel?

I hope to see statisticians developing better understanding of User Personas and context of clinical trials and working closely together in feature definition. I also feel there is a lot of scope for statisticians to use AI and develop innovative statistical methods.

What emerging trends in biostatistics or software development are you most excited about, and how do you see them influencing Cytel’s roadmap?

AI is going to have a big impact on biostatistics in clinical trials and software development. We already have an AI tool for developing R code, which is integrated with our platform, and we’re developing an AI chatbot for our products. There are a lot of opportunities in making software development and testing more automated and efficient!

What advice would you give to colleagues looking to upskill in statistics or software development, and how do you personally stay ahead of emerging trends?

For upskilling in statistics, I suggest reading about novel methods for designing and analysis. There are a lot of opportunities for attending conferences both offline and online to keep up with latest trends and developing connections.

The movement towards building open-source solutions by various sponsors, either individually or collaboratively, is growing as well, which is a huge opportunity to learn and improve your skills.

Finally, there are ample opportunities to learn and develop innovative prototypes using AI technology.

Mentorship is very important to you. What do you think are the most critical skills or mindsets for young professionals entering the corporate world today?

As AI starts doing routine work, I think it will be important to develop deeper understand of context and focusing on the “why,” which will help us solve problems better. I would recommend young professionals develop deeper understanding of one area as well as a broad understanding of others. This will help them be an expert within their field who is easily capable of working with other teams and specialties.

What’s one piece of career advice you wish you had received earlier?

I think the importance of emotional intelligence is something I only started appreciating late in my career!

Finally, what are your main interests outside of work?

I love reading philosophy and playing chess. I also love reading about time management and agile methodologies. My son, Soham, is 11 years old and I love spending time with him. I often read fantasy books with him like Harry Potter and Wings of Fire series.

Thank you, Vidyadhar, for sharing your experience!

Blending Power and Flexibility: How AI-Generated R Code is Reshaping Clinical Trial Design

In today’s fast-evolving clinical research landscape, designing robust and efficient trials is more critical than ever. As statistical designs grow in sophistication, biostatisticians are increasingly relying on both commercial platforms and open-source tools to meet unique modeling needs. But this hybrid approach also comes with challenges, particularly for those new to advanced simulation software or lacking programming experience.

At Cytel, we’ve been exploring how artificial intelligence (AI) can help bridge this gap. At the 2025 Joint Statistical Meetings (JSM), we will present on our latest innovation: AI-powered R code generation for clinical trial design, a feature embedded in our East Horizon™ platform. This assistant, called RCACTS (R Coding Assistant for Clinical Trial Simulation), represents a significant step forward in making custom trial design faster, more accessible, and more reliable.

Why talk about this now? The open-source imperative

While commercial clinical trial design software offers rapid design development through validated and user-friendly workflows, it doesn’t always address the full complexity of real-world problems. Trial statisticians often face challenges in areas such as oncology, rare diseases, and adaptive designs that require tailored statistical tests, unique outcome generation models, or alternative randomization techniques.

This is where open-source tools like R become invaluable. R allows statisticians to write custom code to simulate complex trial designs, perform Bayesian analyses, or integrate evolving regulatory guidance. Over the years, a vibrant ecosystem of R packages has emerged, offering a high degree of transparency, flexibility, and academic rigor.

Yet this flexibility comes with trade-offs: code development can be time-consuming, error-prone, and requires significant programming expertise. As a result, many biostatisticians find themselves switching between validated commercial workflows and custom R functions, leading to a process that is often fragmented and inefficient.

Recognizing this, Cytel’s East Horizon platform has introduced R integration points, enabling users to inject custom code directly into validated simulation workflows. This integration delivers the best of both worlds: the speed and structure of commercial software with the creativity and control of open-source.

Enter AI: Speed, simplicity, and smarter coding

Our next logical question was: can AI make this process even easier?

The answer, increasingly, is yes. With recent advances in generative AI, particularly large language models (LLMs), it’s now possible to assist in the generation of R code for simulation-based design tasks. At Cytel, we’ve harnessed OpenAI’s GPT-4o via API, securely deployed within Microsoft Azure, to create RCACTS, a coding assistant purpose-built for biostatisticians.

Unlike generic AI tools that produce standalone R scripts, RCACTS generates R code specifically tailored for the East Horizon simulation engine. It ensures that the generated functions:

Match expected input/output structures,
Include pre-defined parameters as shown in our internal statistical package CyneRgy,
Are immediately ready for integration and testing within a live trial design workflow.

With RCACTS, users can simply describe what they want in plain English and receive functioning R code that can be integrated into East Horizon.

Who benefits? Everyone from newcomers to experts

One of the major advantages of this AI-enhanced workflow is lowering the barrier to entry. For a new user unfamiliar with Cytel’s R integration or syntax requirements, writing compatible code from scratch can be daunting. RCACTS significantly reduces the learning curve by providing validated function templates, sensible defaults, and clear parameterization, all supported by generative AI.

At the same time, experienced statisticians benefit by spending less time on repetitive coding tasks, debugging, or remembering function signatures. This allows them to focus on higher-level design questions, such as: What analysis method is most robust? How sensitive is the design to different outcome distributions? What dropout patterns pose the greatest risk?

Our assistant supports a wide range of trial design elements:

Simulating patient responses: Binary, Continuous, Time-to-event, and Repeated-measure endpoints.
Analyzing simulated data: Statistical analysis for these endpoints.
Randomization: Flexible randomization of patients across treatment groups.
Enrollment and dropout modeling: Custom mechanisms for realistic patient enrollment and dropout scenarios.
Treatment selection: Supporting multi-arm multi-stage (MAMS) trial designs.

Balancing innovation with responsibility

Of course, like any AI solution, there are caveats. AI-generated code must be carefully reviewed for correctness, appropriateness, and regulatory readiness. RCACTS includes a built-in testing functionality to flag structural or syntactic errors, but statistical validation remains the user’s responsibility. Also note that all data interactions adhere to Azure OpenAI’s stringent data protection policies to ensure security and compliance.

There’s also a broader concern: will over-reliance on AI limit the creativity and deep statistical thinking that define our profession? At Cytel, we view AI not as a replacement for expertise, but as a tool to amplify it. Our goal is to give statisticians more time and mental space to explore, iterate, and innovate rather than reduce them to prompt engineers.

Looking ahead

The future of clinical trial design lies in intelligent integration: combining the strengths of validated commercial tools, flexible open-source frameworks, and AI-powered coding assistance. With East Horizon and RCACTS, we believe we’re building the blueprint for this future, with a platform that supports both scientific rigor and operational speed.

As the field continues to evolve, biostatisticians will need tools that not only keep up with complexity but also support creativity, collaboration, and efficiency. AI-generated R code, embedded within a powerful simulation engine, is one such tool and is already transforming how we approach design flexibility in clinical trials.

Catch us at JSM 2025 to learn more about how AI is transforming the future of clinical trial design within Cytel.

Interim Decision-Making in Clinical Trials: A Focus on Sample Size Re-Estimation and Population Enrichment

In the evolving landscape of clinical trial design, flexibility and efficiency have become essential for success. Sample size re-estimation (SSR) and population enrichment — both adaptive trial design methods — use interim data to make informed mid-trial adjustments. While they address different aspects — SSR focusing on how many patients to enroll and population enrichment focusing on which patients to include — both approaches aim to optimize trial outcomes, reduce unnecessary exposure, and make better use of limited resources.

This blog explores how these two methods work, their statistical underpinnings, and how they can be used to build more ethical, targeted, and cost-effective trials.

Sample size re-estimation

Sample size re-estimation is a type of clinical trial design adaptation in which the sample size can be reassessed at an interim look, based on accumulated data. Over the years, this method has grown in popularity for several reasons:

SSR designs address variability in an observed treatment effect when the treatment shows some promise, but the effect size is not as pronounced as originally expected.
SSR designs produce more ethical trials, as they limit the number of patients exposed to treatment until sufficient efficacy evidence is collected.
These designs provide flexibility in trial implementation in cases of hard-to-recruit patient populations or rare disease.
They allow for gatekeeping of investment for biotech companies who may undergo additional scrutiny to justify additional R&D spend.
They limit the pursuit of relatively small treatment effects that may not be clinically meaningful.

The CHW and CDL statistical methods for SSR

Following the seminal work on adaptive interim analysis by Bauer and Kohne (1994) and others, Cui, Hung, and Wang proposed a method that is today widely accepted in the field of biostatistics, combining statistics with pre-specified weights to preserve Type I error now known as CHW (1999). An alternative method proposed by Chen, DeMets, and Lan (2004) and known as CDL, provides an alternative to the use of the weighted statistic in a confirmatory two-arm, two-stage design where the sample size of the second stage is increased based on an unblinded analysis of the data at the first stage.

Both CHW and CDL are accepted by regulatory bodies such as the FDA in cases where such an adaptation is deemed appropriate. The CHW method applies a lower weight to the contributions of the second stage of the design relative to those of the first stage, and the CDL method permits the use of conventional statistics for testing the primary endpoint at the end of the study while still preserving Type I error.

Population enrichment

Population enrichment is a clinical trial design adaptation that allows for the use of data from an ongoing clinical trial to adjust the sample size of the entire study population, or a promising subpopulation based on a specific biomarker or other characteristics. At the outset, the overall trial population is enrolled in the study, regardless of biomarker status or other subgroup attribute. At the time of an interim analysis, a decision can be taken to either continue enrollment of the overall population, a subgroup of the population that is showing promise, or terminate the entire study for futility. Restricting enrollment to a specific subgroup enriches the data collected for this subpopulation.

There are several benefits for this adaptation, including:

Optimizing resource allocation by enriching promising subpopulations while avoiding continued investment in less-successful subpopulations.
It allows investigators the opportunity to examine a larger population while reducing the risk of trial failure or unnecessary drug exposure due to heterogeneity among the study’s subpopulations.
At the same token, it increases the probability of success of a study, by increasing the sample size of promising subgroups.

How to model SSR and population enrichment

Both CDL and CHW methods for sample size re-estimation and population enrichment, are adaptations that can be modeled using Cytel’s East Horizon™ platform. Find out more by booking a product demonstration.

Final takeaways

Sample size re-estimation and population enrichment approaches are powerful adaptations in the biostatistician’s toolbox for advanced, cost-effective, and ethical clinical trial design. They empower sponsors to allocate R&D resources more appropriately towards promising treatments, while limiting exposure of patients to potentially ineffective or harmful treatments.

Metrics to Assess Clinical Trial Design Strength

The probability of success of a study is a critical metric in assessing the viability of a study design. In simple trial designs, probability of success can be defined as the study’s statistical power. However, more nuanced definitions of success are available, including some that incorporate assumptions of multiplicity (multiple study arms or multiple endpoints), and uncertainties about the true underlying treatment effect. Assurance is one such Bayesian concept, considering potential variability in treatment effect assumptions.

What is study power?

It is the conditional probability of rejecting your null hypothesis given an assumed treatment effect.

What is assurance?

One way of defining assurance is the expected power across different treatment effect assumptions. Assurance is especially useful when there is uncertainty around the treatment effect. Rather than calculate power based on a single assumption, we can calculate the power across a series of assumptions, and assurance is the average of the power across all scenarios.

In addition, if some information is available about the likelihood of each of these treatment effect scenarios prevailing, it can be incorporated in this calculation to produce a more realistic expectation of results. In this case, each treatment effect scenario is assigned a likelihood, and this likelihood is included in the assurance calculation. This process gives more weight to the scenarios that are more likely to be the true treatment effect.

Cytel has incorporated both concepts — power and assurance — into its East Horizon^TM platform. As well-established metrics in clinical trial design practice, these are widely used in the trial design process. In addition, Cytel has developed two related concepts to power and assurance, performance score and robustness score, to enhance the process of design, offer two additional metrics for assessment, and to elevate the role of the statistician as a strategic thinker in clinical development practice. These two metrics are also embedded in Cytel’s software.

What is the performance score?

The performance score is a linear weighted function that allows the statistician to integrate strategic priorities (reduction in sample size, reduction in duration, and increase in probability of success) into the clinical trial design process. The statistician can assign weights representing the importance of each strategic priority in assessing a study design, and the resulting score provides an estimate of the relative desirability of a design based on these priorities. Thus, the performance score provides an additional metric by which to assess the viability of a study design.

The weighted function of these performance criteria is:

What is the robustness score?

Like the concept of assurance, the robustness score is the average of the performance scores across different treatment effect assumptions. If there is uncertainty about the treatment effect, the robustness score takes that potential variability into account.

Here too, if some prior information about the likelihood of treatment effect scenarios is available, it can be incorporated in the calculation of the robustness score to produce a more nuanced expression of robustness, based on the strategic priorities set by the drug development team. If no prior information is available, each scenario automatically gets assigned equal likelihood.

These four concepts together, power, assurance, performance, and robustness scores, are key tools in the statistician’s toolbox for clinical trial design. Crucially, the latter two metrics are a shift in the statistician’s mindset from a purely scientific consideration of trial design, to incorporating strategic thinking in the design process. In addition, the scoring mechanism allows statisticians to identify a multitude of designs that satisfy basic statistical criteria and choose from among those the best-suited design based on strategic priorities. Finally, the score is a powerful communication tool, anchoring the statistician at the center of the discussion about tradeoffs in priorities in the design selection process with a wider cross-functional team.

In this way, the scoring mechanism embedded in East Horizon has transformed the software from a purely statistical design and analysis tool (albeit nuanced and powerful) to a clinical design strategy tool with a solid and potent statistical core. As more life sciences organizations adopt East Horizon and the advanced design tools it affords, we are seeing a gradual shift in the function of the statistician’s role within these organizations to a more central and consultative role in the trial design process.

Addressing Uncertainty in Survival Studies

As we have highlighted in prior blog posts, the ability to augment design characteristics with custom R code is especially relevant to the ever-evolving therapeutic area of oncology. As regulatory guidelines are routinely adjusted to comply with clinical practice and current research, oncology study simulations often require specific analysis approaches and/or patient outcome data generation methods to conform to changing evidence thresholds and to create more realistic simulated scenarios.

Defining parameters and addressing uncertainty in survival studies

As in all clinical studies, there is a degree of uncertainty in assessing the treatment effect in trials employing a survival endpoint. For these types of studies, the timing of a patient’s event is typically sampled from a distribution with known parameters such as an exponential distribution with a median time value for each arm in the trial. The assumptions employed in defining these parameters are based on some prior knowledge derived from previous studies, meta-analyses, or other experience of the clinical development team.

Why does this matter?

When prior data is scarce, both the assumed distributions and median values are highly uncertain, and may lead to trials that are more costly, longer in duration, and/or with a diminished probability of success. It is therefore important for product development teams to derive meaningful values for these inputs in the design stage of clinical studies.

Custom R coding for oncology designs

One approach to derisking such trials is to simulate patient data based on a distribution of possible median time values for each arm rather than one single value. This accounts for the fact that the true value is difficult to estimate before the trial begins and removes the need to select just one value. This approach also provides confidence in additional investment based on more realistic assumptions.

To employ this design approach, we propose using flexible R code in conjunction with Cytel’s East Horizon^TM platform to customize the way in which the data for each simulated patient is generated. We propose modifying the response generation’s algorithm to consider a distribution of true treatment effects rather than one single value assumption. The probability of success becomes more conservative but also more informative as the simulation is more realistic of the trials about to take place. This gives the product development team more confidence in trial execution and a better estimation of trial costs and length.

Want to learn more?

Watch J. Kyle Wathen and Valeria Mazzanti’s webinar “A Closer Look at Assurance: Sampling Patient Outcomes from Prior Distributions to Account for Uncertainty in Response Scenarios”:

Harnessing AI-Powered Tools for Clinical Trial Design Coding

The global move towards AI-powered tools is sweeping across the life sciences industry. In particular, the roles biostatisticians play in both clinical trial design and programming lend themselves to AI-based innovations.

Earlier this month, Cytel launched its first AI-driven solution for clinical trial design code generation and joined the artificial intelligence revolution. This innovation is predicated on several years of research and development, coupled with the recent maturing of AI-focused service providers. The solution is designed for optimal functioning within the East Horizon platform and intended to enhance R integration functionalities that are now embedded within our software.

What makes Cytel’s AI-powered R coding assistant unique?

The coding assistant generates R code with required parameters for East Horizon. Unlike generic AI-based coding tools that generate standalone R scripts, this solution ensures the generated code includes function templates, expected arguments, and input variable names; is structured for direct integration into East Horizon’s simulation engines; and is aligned with industry best practices for regulatory-compliant clinical trials. In addition, the coding assistant is purpose-built for biostatistics and clinical trial design.

General-purpose AI tools do not innately relate to adaptive trial designs, survival analyses, or clinical trial randomization. Cytel’s AI-powered R coding assistant allows biostatisticians to generate custom statistical tests beyond software-native options; perform advanced patient data modeling such as Quasi-Poisson, longitudinal outcomes, etc.; and allows for alternative randomization and drop out modeling methods.

Finally, the coding assistant is embedded within an industry-standard solution for trial design. The solution ensures compatibility with East Horizon’s statistical engine, generating code that is formatted correctly and validation-ready.

How does the solution work?

Users interested in augmenting their trial design simulation work can select the R integration features within the software and gain access to the coding assistant. Users then enter prompts in natural language to illicit a response. The user can review the response, iterate and refine with additional queries, and modify the code to fit the task at hand. Once refined, the code can be employed in simulation runs for additional validation and debugging.

Why does this matter?

The AI-powered R coding assistant in East Horizon enables biostatisticians to generate complex R code instantly; customize trial simulations with precise statistical methods; and reduce manual coding errors and speed up model validation.

Custom R coding for oncology designs

The ability to augment design characteristics with custom R code is especially relevant to the ever-evolving oncology area of study. As regulatory guidelines are routinely adjusted to comply with clinical practice and current research, oncology studies often require specific analysis approaches and/or patient outcome data generation methods to conform to changing evidence thresholds. For example, the testing method and analysis type chosen for a specific design can be highly sensitive to the underlying distribution of the data. Therefore, simulating designs with a variety of analysis types can help design studies that are robust to a variety of possible data distributions.

With this in mind, using commercial software to generate patient outcome data through simulation takes full advantage of the software’s native workflows and computing power. These data are then analyzed against a variety of analysis types using R code augmentation. This approach to analysis variation also lends itself to advanced Bayesian tests, affording biostatisticians maximum flexibility.

Want to learn more?

In our recent webinar, “Evaluating Different Analysis Options for Your Oncology Study Design by Combining East Horizon and R,” J. Kyle Wathen and Valeria Mazzanti discuss clinical trial design using a combination of R coding and Cytel’s proprietary statistical software, with a focus on analysis testing variations:

Driving Innovation in Clinical Trial Design: Open Source, Commercial Software, and AI in 2025

As we usher in a new year, we reflect on 2024’s prominent trends in simulation software for clinical trial design that will continue to drive innovation in the coming year. The two main areas of growth and innovation we see taking the lead in 2025 are:

The combination of open source with commercial software solutions
The increasing use of AI to generate open-source code and augment clinical trial design

Commercial software: Confident and quick design capabilities

Commercial software remains a common and popular choice for clinical trial design, with many sponsors opting for these tools. This choice allows for confident and quick design through validated workflows and pre-coded and verified design types. As an accepted choice with a wealth of trial design options, biostatisticians can easily and quickly design and compare a variety of trials. Furthermore, users enjoy access to expert professional support in addition to frequent software releases that ensure updates to methodologies and design types.

Open-source code offers a high degree of flexibility

Although commercial software provides numerous benefits to biostatisticians, there are also drawbacks to this choice in isolation. In a complex scientific field, biostatisticians often encounter idiosyncratic problems that require unique and custom solutions. In these cases, validated commercial software may prove insufficient, and custom code must be developed to address the problem at hand. In fact, this need for flexibility is at the root of the rise of open-source software for custom coding using industry-accepted languages like R, Python, or Julia. These languages afford biostatisticians a degree of creativity in their work and go hand-in-hand with the collaborative nature of this highly academic field. Over the years, many code packages have been developed and shared as solutions to unique design aspects, helping to drive and shape industry trends.

However, with this near-limitless flexibility come several drawbacks. Vetting or developing a bespoke solution can be complicated and resource intensive. Time is required for collecting requirements, writing code, testing, and validating a custom open-source design option. This approach relies on a set of expertise in both software development and statistical methodology. While biostatisticians have deep knowledge and experience in statistics and clinical trial design, they are not typically trained in best practices for software development and programming. These best practices are crucial in developing reliable, robust solutions that can easily be shared with others and that apply to a wide array of trials. Finally, the results derived from open-source code require additional resources for both design selection and communication of results, in the context of a multidisciplinary team. The biostatistician’s attention is thus diverted from providing valuable strategic input to the clinical development team towards software development and implementation tasks.

Combining open-source code with commercial software

Acknowledging these challenges, the industry is quickly adopting a combined-capabilities approach that incorporates the established, validated backbone of commercial software with the added creativity afforded by open-source code. This approach allows biostatisticians to augment elements of the design such as the choice of analysis type, statistical test, or the distributions used to generate various design inputs, without the need to code an entire design. In addition, clinical trial design professionals benefit from the cloud computing power embedded in some commercial software solutions, eliminating the need for maintaining an expensive internal computational grid. We believe that this integrated future of study design harnesses the benefits of both commercial software and open-source solutions while limiting the drawbacks experienced with each approach individually.

The use of artificial intelligence in generating code for clinical trial design

Along with the intensive use of R and other coding languages, we believe that we will see increased interest in using AI solutions for a variety of clinical trial design and execution activities. These applications of AI may include data transformation and cleaning; statistical analysis; protocol writing; clinical data reporting; trial management practice; and efficient code generation and validation for clinical trial design. For the latter, AI solutions powered by Large Language Models (LLMs) can be harnessed to produce analysis-ready custom code based on project specifications. Indeed, over the past few months, Cytel has introduced an AI-driven coding assistant in its newest clinical trial design software to augment study designs with novel approaches via custom code. This approach holds several advantages, among them: the ability to generate code faster; the potential for efficient code validation and editing; and the ability to generate code using natural language prompts.

With the great promise that such tools hold, there are also potential drawbacks and concerns expressed by biostatisticians working in the field. AI-supported code generation requires close review by trained coders to ensure the code created using these tools is sound and applicable to the purpose for which it was created. While code generated by AI can save considerable resources, it requires close supervision and review for validation and application in practice. Over-reliance on code-generation tools may, over time, change the way in which statisticians think through complex coding problems, and limit creativity in this field.

Final takeaways

The landscape of clinical trial design is poised for significant advancements in 2025, driven by the integration of commercial software and open-source solutions, as well as the innovative application of AI for code generation. By leveraging the strengths of commercial software — validated workflows, expert support, and computational power — and combining them with the flexibility and creativity of open-source coding, biostatisticians can overcome traditional challenges and design trials more efficiently. Furthermore, AI-powered tools promise to streamline the generation, validation, and customization of code, empowering teams to focus on strategic decision-making and innovation. These trends signal a promising era of collaboration, efficiency, and enhanced capabilities in clinical trial design.

Cytel’s East Horizon Platform now includes open-source integration points, allowing users to inject custom analysis types, statistical tests, and patient outcome generation into existing software workflows. In addition, the software includes an advanced AI-driven coding assistant that can generate compatible custom R code using plain language queries for integration in study designs. These new features, in combination with Cytel’s advanced trial simulation tools and cloud computing capabilities offer a potent, comprehensive solution for clinical trial design and optimization.

Optimizing Interim Looks in Group Sequential Adaptive Study Designs

What are group sequential study designs?

Group sequential study designs include predetermined interim analyses (interim looks) in an ongoing clinical trial, to allow researchers the potential for stopping the trial earlier than the planned final analysis due to overwhelming evidence for success (efficacy), failure (futility), or safety concerns that arise from accumulating study data. Special considerations must be given to the preservation of Type-I error with the implementation of such interim looks, and several approaches have been developed over the years to control Type-I error, including those by Stuart Pocock, Peter O’Brien, and Thomas Fleming.

What are key considerations of group sequential designs?

There are several advantages for incorporating an interim look or looks in a study design, including the potential for more limited patient exposure, more efficient use of resources, time savings, and increased probability of success. Study design teams must weigh these considerations and agree on their strategic priorities before implementing group sequential design features. Specific points for consideration include the number and timing of interim analyses, and the stopping rules or thresholds used to declare early efficacy or futility.

Interim look timing

The timing of an interim look can be critical for the success of the group sequential approach. Performing the analysis too early may mean not enough information is available to make an informed decision; too late, and the benefits of the approach diminish significantly. Running extensive simulations across a variety of potential analysis time points can prove beneficial in selecting the optimal timeframe, balancing the team’s strategic priorities. Adding more than one interim look may prevail as a preferred approach, allowing for early stopping for futility only, with later look, or looks, focused on gains in early efficacy stopping (see schematic 1 below).

Schematic 1: A study with two interim looks: An early futility and later efficacy assessment

Early stopping rules

Setting the correct stopping rules for early efficacy and/or futility is also paramount in designing a robust clinical trial. If an early stopping threshold for futility is set incorrectly, it can lead to the termination of a promising treatment due to limited data. Conversely, setting a stopping rule for efficacy which is too aggressive, may lead to premature trial termination with inaccurate results. Here too, extensive simulation of trials with a variety of stopping rules for both efficacy and futility can help optimize these thresholds and the potential savings from these trial designs.

Schematic 2: Stopping boundaries for efficacy and futility: An interim look at 50% information fraction

A closer look at the benefits of implementing group sequential designs

Group sequential designs offer several key benefits in clinical trial practice:

Design trials that are more ethical: accurate decision rules for early stopping either for futility or efficacy can reduce the number of patients required for enrollment in a clinical trial and reduce unnecessary exposure of patients to potentially ineffective or harmful treatments.
Design trials with more efficient resource use: including interim looks in a study can lead to savings in both the timing and cost of clinical trials. Adaptive designs with interim analyses are shorter in overall average duration and average cost when compared to similar fixed study designs with no interim analyses. These savings are gained through the thoughtful implementation of early stopping rules.
Design trials with a higher probability of success: adaptive designs with interim analyses demonstrate and a higher average probability of success compared to fixed study designs. These benefits is especially pronounced when the true underlying treatment effect is clear at an early study stage (either beneficial or inefficacious).

Overall, interim analyses are an important feature in adaptive clinical trial design, and when well planned and executed, can lead to benefits and savings in clinical trial execution.

Group sequential designs now available in the East Horizon^TM platform

Cytel’s East Horizon platform now includes a Group Sequential module. This module offers statisticians the ability to compute and simulate single-arm and two-arm study designs with interim looks. The module allows users to select and optimize the number and timing of interim looks and the boundaries for efficacy and futility through advanced simulation and analysis tools.

Cytel’s East Horizon Group Sequential Module is the second in a series of six revamped cornerstone components of Cytel’s new cloud-based trial design platform. In combination with other platform components, the module provides statisticians with the tools needed for design, optimization, and selection of adaptive clinical trials with interim analyses.

Older Posts

Discovery

Phase I-III Clinical Trials

Commercialization

Real-World Evidence Solutions

Clinical Trial Design

Trial Delivery

Advanced Analytics

Specialty Areas

Strategic Consulting

Beyond Functional Service Provider

Project-Based Analytical Solutions

Trial Design Software

Trial Implementation and Decision Support Software

LiveSLR® Software for Systematic Literature Reviews

Our Solutions

Quick Links

Quick Links

Quick Links

Quick Links

Quick Links

Quick Links

Quick Links

Quick Links

Quick Links

Quick Links

Quick Links

Quick Links

Quick Links

Quick Links

About Us

Quick Links

Insights

Quick Links

Careers

Quick Links

Multiple endpoints are now the rule, not the exception

The statistical reality of correlated endpoints

Why closed-form calculations do not apply

Modeling multiple endpoint outcomes

Multiplicity control considerations

Integrating external and historical data

A practical perspective

Interested in learning more?

What is PKBOIN-12?

Project Optimus: A regulatory shift toward the OBD

Dose finding with the East Horizon™ platform

Interested in learning more?

Can you give us a little background on your career so far?

You’ve grown from a role within statistics to a software role at Cytel, eventually becoming Director of Software Statistics. What motivated that shift, and how did it come about?

As someone who has been with Cytel for 16 years now, what has kept you motivated and engaged throughout this journey?

You’ve spoken about the power of software to have a large impact, supporting many clients, and ultimately patients, at once. What’s one product or feature you’ve worked on that really exemplifies that, and what was your role in shaping it?

Could you share a project you have worked on that you feel particularly proud of, and why?

How do you balance statistical rigor with user-friendly product design when contributing to software development?

You’ve been described as instrumental in building a strong statistical foundation at Cytel. What were some of the key challenges and milestones in that journey?

What does a typical day look like for you, and how do you maintain balance between deep technical and expertise work and leadership responsibilities?

How did you navigate the transition from an individual contributor in a statistics-heavy role to managing and leading others? What advice would you give to someone looking to grow into a leadership role from a technical background?

How do you see the role of statisticians evolving in the development of software products at Cytel?

What emerging trends in biostatistics or software development are you most excited about, and how do you see them influencing Cytel’s roadmap?

What advice would you give to colleagues looking to upskill in statistics or software development, and how do you personally stay ahead of emerging trends?

Mentorship is very important to you. What do you think are the most critical skills or mindsets for young professionals entering the corporate world today?

What’s one piece of career advice you wish you had received earlier?

Finally, what are your main interests outside of work?

Why talk about this now? The open-source imperative

Enter AI: Speed, simplicity, and smarter coding

Who benefits? Everyone from newcomers to experts

Balancing innovation with responsibility

Looking ahead

Sample size re-estimation

The CHW and CDL statistical methods for SSR

Population enrichment

How to model SSR and population enrichment

Final takeaways

What is study power?

What is assurance?

What is the performance score?

What is the robustness score?

Defining parameters and addressing uncertainty in survival studies

Why does this matter?

Custom R coding for oncology designs

Want to learn more?