Regulatory Affairs Archives -

Insights from WEPA Amsterdam: When Policy Pressure Meets AI Maturity

The World EPA Congress in Amsterdam did not feel like a conference about isolated trends. It felt like a conference about structural transition.

Across sessions and conversations, one consistent narrative emerged: market access is being reshaped simultaneously by tightening policy frameworks and by the operational maturation of artificial intelligence. These are not parallel stories unfolding independently. They are interacting forces that together are redefining how evidence is generated, how value is assessed, and how global pricing strategies are constructed.

The underlying question throughout WEPA was not whether change is coming. It was whether organizations are structurally prepared to manage both forces at once.

1. A policy environment under structural redesign

Joint Clinical Assessment: Harmonization meets operational reality

The first year of Joint Clinical Assessment (JCA) implementation under the EU HTA Regulation represents a historic step toward harmonization of clinical evaluations across Europe. In principle, a single European-level clinical assessment promises efficiency, reduced duplication, and greater consistency in evaluating comparative effectiveness.

Yet the operational reality is more complex. Harmonization does not automatically mean simplification.

Early experience indicates that alignment between EU-level assessments and national reimbursement processes remains incomplete. Questions persist around how Member States will operationalize JCA outputs, how quickly EU HTAR assessors can deliver assessments, and whether national HTA bodies are fully prepared to transition to reliance on joint evaluations.

Methodological challenges are also emerging. PICO multiplicity, expanded evidence requirements, and the risk of unexpected analytical requests are increasing the burden on evidence generation teams, especially for products targeting rare diseases. While duplication of assessments may decrease, the sophistication and coordination required to navigate the system are increasing.

JCA is a milestone in European collaboration. But its success will depend on tighter synchronization between EU-level clinical conclusions and national pricing and reimbursement realities.

Real-world evidence: From complementary input to strategic pillar

Alongside JCA, the role of real-world evidence (RWE) is evolving rapidly. Regulators, payers, and clinicians increasingly seek insight into how therapies perform in routine clinical practice across diverse populations. The European Medicines Agency has clearly signaled its ambition to place patient voice and real-world data at the center of regulatory evaluation.

RWE is no longer supplementary. It is becoming central.

However, tension remains within the EU HTAR context. JCA assessments emphasize statistical precision and internal validity, while real-world evidence reflects the inherent heterogeneity of clinical practice. Methodological expectations between regulatory and HTA frameworks are not yet fully synchronized.

Europe now faces a strategic choice: either build robust, interoperable infrastructures for high-quality real-world data sharing across Member States, or risk creating friction between regulatory innovation and HTA conservatism. The credibility of future evidence strategies will depend on resolving this gap.

MFN pricing: Global interdependence redefines strategy

At the global level, Most-Favored-Nation (MFN) pricing dynamics are reshaping launch and market access strategies beyond the United States. Pricing has become an interconnected global system rather than a sequence of independent national decisions.

Launch sequencing is being reassessed as companies evaluate exposure to international reference pricing and MFN-linked rules. Markets are increasingly categorized by strategic risk, and cross-market interdependence is intensifying. Decisions taken in one jurisdiction reverberate across others.

Europe, despite its strong regulatory institutions, faces pressure due to fragmented access pathways, evolving JCA processes, and uncertainty in national budget negotiations. The traditional logic of “where to launch first” has become a far more complex strategic equation.

Taken together, JCA implementation, the rise of RWE, and MFN pricing pressures are increasing analytical complexity, accelerating timelines, and demanding greater coordination across functions and geographies. This rising structural pressure forms the backdrop to the second defining theme of WEPA.

2. AI moves from experimentation to operating model

From hype to governance

If policy discussions reflected systemic pressure, AI discussions reflected systemic adaptation.

The tone around artificial intelligence at WEPA 2026 was notably mature. The conversation quickly moved beyond questioning whether AI is hype. The focus shifted toward responsible operationalization, governance, and measurable value creation within regulated environments.

The key issue is no longer adoption. It is integration.

Organizations are developing governance frameworks, embedding AI into regulated workflows, and ensuring traceability and auditability of outputs. The emphasis is on scale and accountability rather than isolated experimentation.

AI as infrastructure in market access

Across sessions, AI was framed not as a productivity enhancement tool but as part of the operating model of modern market access organizations.

Companies are redesigning processes around AI-enabled capabilities. Evidence synthesis, systematic literature reviews, indirect treatment comparisons, dossier drafting, pricing simulations, and tender strategy development are increasingly supported by automated or semi-automated systems.

This represents a structural shift. AI is moving from peripheral pilot projects to enterprise-level infrastructure embedded within core functions.

In an environment where JCA increases analytical burden and MFN pricing demands multi-country scenario modeling, such capabilities are becoming operationally essential rather than optional.

From assistant to strategic copilot

One of the most forward-looking discussions centered on the evolution of AI from drafting assistant to strategic copilot.

The emergence of agentic AI and orchestration systems is enabling decision support in areas such as pricing negotiation, tender simulations, and contracting strategy optimization. Rather than merely accelerating document preparation, AI is beginning to inform strategic decision-making.

However, in highly regulated settings such as HTA and pricing negotiations, transparency and explainability remain non-negotiable. The credibility of AI-driven insights depends on robust governance and clear traceability.

The opportunity is substantial — speed, standardization, and efficiency. The responsibility is equally significant.

3. The convergence: Complexity requires capability

The most important insight from WEPA Amsterdam lies not in policy alone, nor in AI alone, but in their convergence.

Policy reforms are increasing complexity. JCA raises expectations for comparative evidence coordination across Europe. Real-world evidence demands stronger data ecosystems. MFN pricing intensifies global interdependence and strategic sensitivity.

At the same time, AI provides the analytical and operational capabilities necessary to manage this complexity. It enables faster synthesis of comparative data, structured analysis of heterogeneous real-world evidence, and dynamic cross-market pricing simulations.

In this sense, policy pressure and AI capability are two sides of the same transformation. The former raises the bar; the latter provides the tools to reach it.

The defining question for market access organizations is whether they can redesign their operating models quickly enough to integrate policy intelligence, evidence generation, pricing foresight, and AI-enabled execution into a coherent system.

WEPA 2026 signaled that the era of treating these dynamics as separate conversations is over. Market access is entering a phase where structural policy reform and technological capability must be managed together.

Those who integrate both dimensions — responsibly, transparently, and strategically — will shape the future of evidence-based access in Europe and beyond.

Central Statistical Monitoring: Transforming Clinical Trial Oversight Through Data Intelligence

As clinical trials grow in complexity — spanning more geographies, more data streams, and more endpoints — the traditional model of on-site monitoring alone is no longer sufficient to ensure data quality and patient safety. Regulatory expectations have evolved, trial budgets are under pressure, and sponsors need earlier, more objective insights into emerging risks.

Central Statistical Monitoring (CSM) sits at the intersection of these demands.

At Cytel, we see first-hand how sponsors are rethinking monitoring strategies to be more risk-based, data-driven, and efficient. Here, we introduce the foundations of CSM, how it supports Risk-Based Quality Management (RBQM), and why it has become a critical component of modern trial oversight.

What is Central Statistical Monitoring?

Central Statistical Monitoring can be defined as the statistical detection of anomalies in accumulating clinical trial data to identify sites, patients, or countries that are performing differently from the rest. These differences may signal issues related to data quality, site conduct, or even patient safety.

The origins of CSM can be traced to early work on fraud detection in clinical trials. However, while fraud is rare, it represents only a small part of the picture. In practice, most CSM findings relate to more common and impactful issues such as errors, sloppiness, or data-handling inconsistencies.

The key principle is straightforward: when most sites are performing consistently, statistically unusual patterns may indicate that something warrants a closer look.

Rather than relying solely on Source Data Verification (SDV) or manual review, CSM uses statistical techniques to evaluate patterns within and across sites — often detecting issues that traditional monitoring approaches would miss.

Beyond KRIs and QTLs: What makes CSM different?

Central Monitoring typically includes three types of analyses:

• Key Risk Indicators (KRIs): site-level metrics such as adverse event rates or protocol deviations
• Quality Tolerance Limits (QTLs): study-level thresholds for critical KRIs
• Central Statistical Monitoring (CSM): advanced anomaly detection across high-volume data

While KRIs and QTLs focus on predefined metrics, CSM goes further by applying broad statistical tests across many variables — often using unsupervised approaches that are now considered the industry gold standard.

These methods may involve single-variable comparisons (such as means, variability, proportions, rates, digit distributions) as well as multivariate techniques that evaluate patterns across multiple variables simultaneously. The result is a structured framework for identifying outliers in a reproducible, objective way.

Why does CSM matter now?

Over the past two decades, regulatory authorities have progressively endorsed risk-based and centralized monitoring approaches. FDA, EMA, and MHRA guidance have emphasized the importance of risk-based monitoring, culminating in ICH E6(R2) and most recently ICH E6(R3), which reinforce the role of centralized monitoring in identifying systemic and site-specific issues.

This regulatory evolution reflects a broader shift toward:

• Quality by Design (QbD)
• Identification of critical-to-quality factors
• Ongoing risk assessment
• Adaptive monitoring strategies

Within a Risk-Based Monitoring (RBM) framework, CSM complements KRIs and QTLs to provide a comprehensive view of trial risk. Insights from CSM can guide targeted on-site or remote monitoring, ensuring that resources are focused where they will have the greatest impact.

This approach aligns closely with the Clinical Trials Transformation Initiative’s definition of quality in clinical trials as the “absence of errors that matter to decision making — that is, errors which have a meaningful impact on the safety of trial participants or the credibility of the results.” By identifying anomalies early — before they escalate into systemic issues — CSM helps safeguard critical-to-quality factors.

For sponsors, the benefits are multifaceted:

• More efficient allocation of monitoring resources
• Potential reduction in unnecessary SDV
• Earlier detection of emerging risks
• Increased confidence in data integrity prior to regulatory submission

In short, CSM transforms monitoring from a predominantly reactive activity into a proactive, data-driven strategy.

Putting CSM into practice: Operational considerations for successful implementation

Understanding the statistical foundations of CSM is important — but translating that understanding into a well-functioning program requires deliberate operational planning. The following considerations provide a practical framework for teams preparing to implement CSM within a clinical trial.

Upfront preparation and governance

A formal CSM kickoff meeting — convened before any analyses begin — is one of the most valuable investments a team can make. This meeting should bring together representatives from biostatistics, data management, clinical operations, medical monitoring, and quality. The goal is to establish shared alignment on the objectives and scope of the CSM program, agree on which critical-to-quality (CtQ) factors will anchor the monitoring strategy, define escalation pathways for signals requiring action, and confirm documentation standards. Equally important is reaching consensus on how CSM integrates within the broader RBQM framework — clarifying how statistical signals will interact with KRI outputs, SDV decisions, and site risk classifications. Without this governance foundation, even technically sound CSM outputs can struggle to gain traction in day-to-day operations.

Determining frequency of analyses

The frequency with which CSM analyses are generated should be proportionate to study risk and dynamics. Key factors to consider include the rate of enrollment, total subject count, number of active sites, and overall study duration. Trials with rapid, multi-site enrollment may benefit from more frequent reviews — bi-monthly — to catch emerging patterns before they compound. Slower-enrolling or smaller studies may reasonably support longer intervals between analyses without compromising oversight. Critically, frequency should not be treated as fixed. As study conditions evolve — sites activate or go on hold, enrollment accelerates, or a new safety signal emerges — the CSM schedule should be revisited. Building in flexibility from the outset ensures the program remains responsive rather than formulaic.

Communication and cross-functional review

CSM outputs are most actionable when they are presented in a structured, interpretable format — combining risk scores or site rankings with narrative interpretation that contextualizes what the statistics show and why it may matter. Findings should be reviewed collaboratively with the wider cross-functional team including Clinical Operations and Clinical Science, whose site-level and medical knowledge is indispensable for determining whether a statistical outlier reflects a genuine quality concern or a legitimate difference. A statistical signal is a prompt for investigation, not a conclusion. The review process should follow a clear feedback loop: identify the signal, evaluate it in context, decide on a response (monitor, query, or escalate), and document the rationale. This structured approach ensures accountability and creates an audit trail that supports both ongoing oversight and regulatory inspection readiness.

Ultimately, CSM delivers the greatest value when it is embedded operationally — treated not as a standalone statistical exercise, but as a living input to risk-based decision-making by the clinical team. When governance, data prioritization, analysis cadence, and cross-functional communication are aligned from the outset, CSM becomes what it is designed to be: an early warning system that enables smarter, more targeted oversight in service of patient safety and data integrity.

Interested in learning more?

Join Charles Warne and William Baker for their upcoming webinar, “Advancing Trial Oversight with Central Statistical Monitoring” on April 8 at 9AM ET / 3PM CET.

Central Statistical Monitoring is a practical, regulatory-aligned tool that can materially strengthen trial oversight and quality management.

In our upcoming webinar, we will explore:

• What CSM entails

When and how CSM adds value to clinical trials
Operational considerations for implementing CSM services

• Case study examples of CSM in action

Whether you work in biometrics, clinical operations, quality, or regulatory affairs, this session will provide actionable insights into building a smarter, more adaptive monitoring strategy.

The Invisibility Machine of the Women’s Health Gap

A 300-year warning

The global timeline for gender equality is not merely stalling; it is a sobering indictment of our collective priorities as a society. Current estimates from the United Nations reveal a staggering distance to parity: at our current trajectory, it will take 300 years to end child marriage, 286 years to eliminate discriminatory laws and legal protection gaps, 140 years to achieve equal representation in workplace leadership, and 47 years to reach an equal footing in national parliaments.

These are not just social milestones; they are structural barriers that define the “Gender Health Gap.” This gap represents the inequitable, systematic differences in health outcomes between women and men — differences rooted in under-researched medical needs, chronic underfunding, and a “medical model” that has historically treated male biology as the universal baseline. To close this divide, we must recognize that health equity is a strategic imperative for global stability, health capital, and economic prosperity.

A ledger of health inequality: The data and the reasons behind the gender gap

Sex is a fundamental genetic modifier of biology, influencing everything from disease susceptibility to treatment response. Yet we remain trapped in a “health-survival paradox”: while women generally live longer than men, they endure higher burdens of morbidity and disability throughout their lives. Some examples are:

Diagnostic Delays: On average, women are diagnosed nearly four years later than men for the same diseases.
Misdiagnosis: Women are twice as likely to die following a heart attack than men, partly because they have a 50% higher chance of receiving an incorrect initial diagnosis.
AI Bias: Modern digital tools often entrench these disparities; AI-powered symptom checkers have been found to flag women experiencing heart attacks as needing psychological care rather than emergency medical intervention.
Invisible Conditions: Many women-specific conditions are severely underdiagnosed. For example, 8 in 10 women with menopause and 6 in 10 women with endometriosis remain undiagnosed. Adenomyosis affects up to 35% of women but is often invisible in medical records due to misdiagnosis as fibroids.

Some of the key reasons for the gender health gap are related to systematic underinvestment in research and innovation funding and the intersection of biology with social factors that historically displaced women’s equal position in society.

A primary driver of the health gap is the systemic neglect of female biology in scientific research:

Underfunding: Only 5% of global research and development funding is allocated to female-related research. Of this, a mere 1% goes toward women-specific conditions like menopause and fertility.
Clinical Trial Underrepresentation: The inclusion of women in clinical research only became a requirement in the 1990s. Today, women make up only about 41.2% of participants in key disease clinical trials. In cardiovascular drug trials, female participation averages only 34%, often failing to match the actual disease prevalence in the population.
Adverse Drug Reactions: Because many drugs are tested primarily on men, women have a 34% increased risk of severe adverse events. A notable example is the sleep aid Zolpidem, which stays in women’s systems longer than men’s; it took until 2013 for the FDA to require reduced dosing for women after decades of increased emergency room visits.

The gap is also influenced not only by the complex interplay of biological sex (genetics, hormones), but also by social gender (norms, roles) and societal roadblocks such as lack of female representation in leadership positions directly shaping inequalities in health policy development not only for women but for all marginalized communities.

Fact vs. fiction: Debunking women’s health misconceptions

Effective strategy requires dismantling the myths that have long perpetuated gender health inequality.

Women’s health is not synonymous with OB/GYN: Progress has been hindered by the misconception that women’s health is limited to reproductive and sexual needs. In reality, the gap spans every disease area, including neurology, immunology, and cardiovascular health, where women present with unique symptoms and risk profiles.
Longevity does not equal better health: The “morbidity burden” is a critical indicator of inequity. Women spend more years in poor health, facing higher disability-adjusted life year rates for musculoskeletal, neurological, and mental health disorders.
Inequality is not solely about race, but intersectionality is critical: While gender is a standalone driver of health outcomes, it does not exist in a vacuum. For example, Black and Native American women face the highest rates of pregnancy-related mortality, and Black women are three times more likely to die from heart failure than White women. These data points illustrate why an intersectional lens is non-negotiable for any health equity strategist.

Progress has remained largely stagnant over the last decade because women remain “invisible” in methodological and decision-making frameworks. The ICH Guidance on Technical Requirements for Pharmaceuticals for Human Use still refers to women as a “special subgroup” to be considered “when appropriate.” This classification is mathematically and medically absurd: women represent half of the global population. This invisibility fuels a self-perpetuating cycle of Data Poverty. The recent FDA guidance on addressing sex differences in clinical trials is, though, a positive step towards recognition of such impact in clinical development.

The roadblocks to reform health technologies and decision-making frameworks to address women health needs and considerations are not just scientific — they are structural. They include a lack of political will, the absence of gender indicators for evaluation, and a strong position of gender norms and laws that favor the lack of protection of women on health matters and beyond.

Conclusion

Health equity does not need to take 300 years — though some of those glacial aspects must be addressed for true success to be achieved.

Big data, digital technologies, and advanced analytics provide the means to overcome the challenges to achieving women’s health equity in the coming years. Gender health equity is not an act of morality — it is the foundation of a sustainable, healthy, and economically stable future for all.

FDA’s New Default: One Pivotal Trial for Drug Approval

A Paradigm Shift Sparking Optimism and Questions

In February 2026, the U.S. Food and Drug Administration (FDA) announced a landmark policy change that one adequate and well‑controlled pivotal trial, supplemented by confirmatory evidence, will now serve as the default basis for drug approval. This decisive shift — articulated by FDA Commissioner Marty Makary and CBER director Vinay Prasad in The New England Journal of Medicine — effectively ends a decades‑long “two‑trial dogma” and reframes the evidentiary foundation of U.S. drug regulation.

“Going forward, the FDA’s default position is that one adequate and well-controlled study, combined with confirmatory evidence, will serve as the basis of marketing authorization of novel products. The FDA will carefully examine all aspects of study design with particular focus on controls, end points, effect size, and statistical protocols.”1

It is important to remember that it has always been possible to obtain a marketing authorization on the basis of a single adequate and well-controlled study in combination with confirmative evidence, but typically this approach was mainly applied in breakthrough program designation, accelerated approval, and priority review pathways.

Why the FDA is moving away from two trials

Makary and Prasad argue that requiring two trials made sense when biology was poorly understood and therapeutics were often blunt instruments rather than targeted molecular tools. In today’s world, duplicative trials may be unnecessarily costly, slow, and redundant.

The original argument for two clinical trials is a statistical one: If a substance does not have any efficacy, then the chances of showing an effect in two studies are much lower than showing it in only one study. The article quantifies this chance as 0.06% instead of 2.5%, assuming that the test is performed at the typically applied one-sided 2.5% level (but that calculation assumes that the two studies are independent of each other, which is not necessarily the case).

The more important argument is that modern drug development provides much more clarity on a precise mechanism of action, assessed by biomarkers as well as a variety of endpoints, thus supporting statistical with biologic inference.

They emphasize several points:

Modern science provides multiple layers of corroboration

Mechanistic data, class‑effect consistency, real‑world evidence, and surrogate endpoints can complement a single pivotal study.

Two trials don’t guarantee correctness

Even under the two‑trial regime, the FDA has approved drugs later found ineffective or unsafe — not because of too few trials, but because trial design quality matters more than quantity.

Lowering trial count may reduce costs and time

One pivotal trial can cost $30–150M and takes years to complete. Reducing this burden may spur innovation and could reduce price‑justification arguments tied to Research & Development investment.

Focus on trial design and analysis

The article clearly articulates the importance of various aspects of trial design to support the credibility of trial results, including the use of a contemporary control group, pre-specification of a hypothesis, choice of a primary endpoint, statistical power, randomization, and blinding. These are key statistical aspects documented in the ICH E9 guidance on Statistical Principles for Clinical Trials, and as such they have been underlying drug development for almost 30 years. What is new and encouraging is that the article specifically states that these can also be provided by a Bayesian framework, referencing the recently published draft FDA guidance on this topic, and described by Cytel’s Savina Jaeger..

Unclear implications for global drug development

For most companies, drug development is a global business, and as such it’s unclear whether this change in FDA policy will affect the expectations from regulatory authorities in other regions and countries. Will they follow suit or maintain their current expectations? Cytel’s Strategic Consulting group will be monitoring this closely as this will have a fundamental impact on designing trials for global approvals.

It is also uncommon for the FDA to announce a major change in policy through a publication, so we will also monitor FDA’s official channels for further announcements on this topic in the future.

Final takeaway: A defining regulatory moment

The FDA’s new one‑trial default represents a significant policy shift in U.S. drug regulation. It aligns with trends in precision medicine, leverages mechanistic and statistical advances, and may unlock faster access and lower development burdens. Yet it also raises profound questions about evidence standards, risk tolerance, and the balance between speed and certainty. Most importantly, though, it reinforces the importance of solid statistical principles underlying credible drug development, with a clear statement that both Bayesian and frequentist approaches can provide them.

FDA’s Bayesian Guidance: Strategic Considerations for Sponsors

The FDA’s January 2026 draft guidance, “Use of Bayesian Methodology in Clinical Trials of Drug and Biological Products,” clarifies how the Agency expects sponsors to justify Bayesian approaches, especially when an informative prior borrows external information to support primary inference. As a draft guidance, it is nonbinding and not for implementation.

This blog highlights strategic considerations that should inform development planning, protocol/SAP design, and FDA engagement.

Type I error control is not the only path

The guidance notes that calibrating Bayesian success criteria to a Type I error rate “may not be applicable or appropriate” when borrowing external information. In those settings, sponsors may instead define success using posterior probability criteria (e.g., Pr(d>a)>c) and, where appropriate, benefit-risk or decision-theoretic frameworks.

At the same time, the draft guidance also recognizes that Bayesian methods are often used within an overall frequentist framework (e.g., to facilitate complex adaptive designs), where Type I error calibration can remain appropriate. Regardless of the framework, success criteria should be pre-specified and justified.

Strategic implication:

When the FDA and sponsor agree that a design does not need to be calibrated to the Type I error rate (often discussed in pediatrics and rare diseases), the draft guidance describes alternative operating characteristics such as Bayesian power (probability of success averaged over a prior) and the probability of a correct decision (akin to positive predictive value). That flexibility increases the premium on a well-justified analysis prior, credible simulations, and early FDA alignment.

Prior specification is now a regulatory deliverable

The draft guidance recommends that sponsors pre-specify and justify the prior in the protocol, document external information sources (including exclusions), and quantify prior influence metrics. For informative priors, the FDA emphasizes a systematic, transparent review of the totality of relevant evidence — effectively bringing evidence-synthesis discipline into prior construction.

Key expectations:

Pre-defined source selection criteria before searching for external data
Patient-level data preferred over published summary statistics
Randomized controlled evidence is generally preferred over single-arm or observational sources
Documentation of sources considered and excluded, with rationale

Strategic implication:

Prior construction cannot be a post-hoc exercise. Build the evidence base for your prior prospectively — ideally while Phase 2 is ongoing — and then plan early for patient-level data access and any needed re-analyses to align primary estimand/estimators/strategies for handling intercurrent events. If patient-level data from prior studies are not accessible, negotiate data-sharing early or design natural history studies with Bayesian use in mind.

Dynamic discounting provides protection — with complexity

The draft guidance discusses both static and dynamic discounting approaches for borrowing external information. Dynamic approaches (e.g., commensurate/supervised power priors, mixture priors, Bayesian hierarchical models, elastic priors) can reduce borrowing when prior-data conflict emerges. These approaches can improve robustness but introduce additional parameters and assumptions that need justification. The FDA also notes the applicability of discounting methods is case-by-case and should be discussed with the Agency.

Strategic implication:

For rare diseases with uncertain external data relevance, dynamic discounting is often an important safeguard. For common diseases with robust and highly relevant prior data, simpler (static) discounting may suffice and can simplify the regulatory narrative. Either way, determine the discounting approach while still blinded to the results of the trials that will be borrowed — per the guidance’s explicit recommendation — and support the choice with simulations that span plausible degrees of prior data conflict.

Effective sample size is a central metric — Not Type I error inflation

The draft guidance recommends against using Type I error inflation to measure prior influence, calling it “philosophically inconsistent.” Instead, it highlights Effective Sample Size (ESS) and other metrics (e.g., the prior-only estimate) as more interpretable ways to quantify borrowing. The guidance also notes that multiple ESS calculation methods exist, and that ESS can exceed the source-study sample size when the variability in the target population variability is higher.

Strategic implication:

Quantify and present ESS across a plausible range of outcomes, including summary statistics such as maximum and mean values. For dynamic methods, show how ESS changes with different degrees of prior-data agreement. Be prepared to explain why ESS may differ from the original study’s nominal samples size and reassess influence after trial completion when dynamic priors are used.

Simulation standards are now explicit

The draft guidance recommends providing a comprehensive simulation report (including code, implementation details, and results) across pre-specified, plausible scenarios, including pessimistic assumptions about treatment effect. Simulations should address statistical parameters (e.g., variance, background rate, intercurrent events) as well as operational assumptions such as accrual rate. For MCMC-based analyses, computational settings (warmup/burn-in, iterations, chains, convergence diagnostics) and any other important algorithm-specific settings should be documented for reproducibility.

Strategic implication:

Treat simulations and computational reproducibility as submission-grade deliverables, not just internal design exploration. Establish reproducible computational workflows from the start. Pre-specify scenarios and decision rules, and define contingency procedures for implementation issues (e.g., MCMC non-convergence) before the first interim look and before the final analysis.

Early FDA engagement is essential

The draft guidance states that “the time needed for FDA and the sponsor to align on an appropriate prior should be considered in the development of the intended trial” and recommends submitting information “as early as possible to ensure sufficient time for FDA feedback prior to initiation.” The draft guidance also states that sponsors should have early discussions with the Agency about the planned estimands, estimators, and approaches for handling missing data in the analyses of external data that will be borrowed, and any differences relative to the approaches planned for the prospective trial data.

Strategic implication:

Use early interactions (e.g., Pre-IND or End-of-Phase 2 meetings and, where applicable, the Complex Innovative Trial Design (CID) program) to align on prior specification, success criteria, operating characteristics, and simulation strategy before protocol finalization. Include detailed design comparisons in meeting packages — the draft guidance explicitly recommends comparing proposed Bayesian designs against an alternative, including simpler alternatives.

Interim analyses: Design the decision points upfront

The guidance emphasizes that in trials with interim decision-making (e.g., group sequential designs), success criteria should be specified for each decision point. When Bayesian success criteria are calibrated to Type I error rate, interim criteria can be constructed to preserve overall control of the family-wise error rate across looks.

For designs not calibrated to Type I error rate, operating characteristics are calculated relative to the prior and can be especially sensitive when the sample size is small — or when an early interim look makes the effective sample size small. The guidance also notes that skeptical (or enthusiastic) priors can be used in adaptive settings to temper early stopping behavior for efficacy (or futility), but the resulting decision framework should be demonstrated via simulation.

Key interim analysis considerations:

Pre-specify what decisions can be made at each look (e.g., stop for efficacy, stop for futility, adapt) and the exact posterior or predictive criteria that trigger each action.
Simulate interim timing under realistic accrual, endpoint maturation, and missing data patterns — not just idealized information fractions.
Plan prior sensitivity and robustness checks targeted at early looks, where prior influence is greatest (e.g., alternative priors and alternative borrowing strengths).
Operationalize Bayesian computation for interim timelines: reproducible pipelines, diagnostic thresholds, locked code/versioning, and contingency plans for non-convergence.
Protect safety and benefit-risk interpretability: consider minimum exposure or follow-up requirements even if an early efficacy threshold is met.

Strategic implication:

Treat interim analyses as part of the regulatory-facing Bayesian package, with pre-specified decision rules, simulations that stress-test early looks, and an execution plan that can be reproduced under tight timelines.

Rare vs. Common Disease Considerations

Consideration	Rare Diseases	Common Diseases
Justification for borrowing	Often straightforward: document infeasibility of a conventionally powered randomized trial (small populations and/or ethics) and explain how borrowing supports interpretable benefit-risk.	Higher burden: efficiency gains alone may not suffice; clearly demonstrate relevance, address potential bias, and explain why non-borrowing alternatives are not adequate.
Prior data availability	Often limited; may rely on natural history studies/registries, small prior trials, and/or structured expert elicitation.	Typically richer: Phase II/earlier indications, external trials, and real-world data may be available, but heterogeneity and relevance must be managed.
Recommended approach	Dynamic discounting and robust priors; success criteria not calibrated to Type I error may be appropriate when FDA and sponsor agree; plan extensive sensitivity analyses.	Bayesian methods embedded in a Type I calibrated frameworks when appropriate; borrowing (if used) is typically limited and carefully justified; pediatric extrapolation handled via separate extrapolation plan.
Key success factor	Prospective natural history characterization and early alignment on estimand definition and strategies to make external data relevant.	Early data-sharing to enable patient-level review, alignment on estimand definition and strategies, and covariance adjustment, plus a clear relevance narrative and drift/bias mitigation plan.

The bottom line

This draft guidance provides a clearer regulatory pathway for Bayesian methods, but that pathway requires substantial upfront investment in prior construction, estimand definition, and strategies for handling intercurrent events, simulations, and documentation at submission quality. The strategic question is not whether Bayesian methods are acceptable in principle — it is whether the efficiency gains justify the additional complexity and review burden for your specific program.

For rare diseases, the answer is often yes. Bayesian borrowing may be the only viable path to interpretable and approvable evidence. For common diseases, the calculus is more nuanced; borrowing typically needs a stronger relevance argument and may be most defensible when embedded in a Type I calibrated framework. Either way, the strategic decisions about prior specification, discounting method, and operating characteristics should be made early, documented thoroughly, and aligned with FDA before the pivotal trial initiation.

What’s clear is that biostatisticians must now be prepared to operate in both paradigms:

To calibrate Bayesian designs to Type I error when appropriate, and
To construct and defend fully Bayesian alternatives (including borrowing) when circumstances warrant.

The January 2026 draft guidance does not eliminate the traditional framework; it expands the toolkit. Using that expanded toolkit effectively will require new skills, new conversations, and new ways of thinking about evidence.

The statistical methodology exists. FDA expectations are clearer. The challenge is execution.

Interested in learning more?

Cytel invites you to an interactive Office Hours session with Melissa Spann and Savina Jaeger on Wednesday, March 4 at 9 am ET, where you will have the opportunity to ask questions about the FDA’s Draft Guidance for Industry: Use of Bayesian Methodology in Clinical Trials of Drugs and Biological Products:

ELEVATE-GenAI: A New Guideline for Reporting Generative AI in HEOR Workflows

Generative artificial intelligence (AI), particularly large language models (LLMs), is increasingly embedded in health economics and outcomes research (HEOR) workflows. Researchers are now using these tools to support activities such as systematic literature reviews, health economic modeling, and real-world evidence generation.

As adoption grows, so does a fundamental question for the HEOR community:

How should the use of generative AI be transparently and consistently reported within HEOR workflows?

To address this question, the International Society for Pharmacoeconomics and Outcomes Research (ISPOR) Working Group on AI has developed ELEVATE-GenAI — a reporting guideline specifically designed to document and communicate how generative AI is used in HEOR research.

Why a dedicated reporting guideline is needed

HEOR has a strong tradition of structured reporting, supported by well-established standards for systematic reviews, economic evaluations, and real-world evidence. However, the rapid integration of LLMs into HEOR workflows has outpaced the development of HEOR-specific guidance on how their use should be reported.

LLMs are now being applied to:

Screening and classifying abstracts in systematic literature reviews
Extracting data and assessing bias
Building or replicating health economic models
Transforming unstructured real-world data into analyzable formats

While these applications offer efficiency and scalability, they also introduce new challenges related to transparency, reproducibility, factual accuracy, bias, uncertainty, and data governance. Existing AI reporting guidelines do not fully address these challenges in the context of HEOR decision-making, regulatory review, or health technology assessment (HTA).

ELEVATE-GenAI was developed to fill this gap by providing clear, HEOR-specific guidance for reporting the use of generative AI within research workflows.

What is ELEVATE-GenAI?

ELEVATE-GenAI is a reporting framework and checklist intended for HEOR studies in which generative AI plays a substantive role in evidence generation, synthesis, or analysis. Its goal is not to evaluate the performance of specific AI tools or to prescribe how AI should be used, but rather to ensure that AI-assisted workflows are clearly described, interpretable, and reproducible.

The guideline is designed to support:

Authors, by clarifying what information should be reported
Reviewers and editors, by enabling consistent evaluation
HTA bodies and regulators, by improving transparency and trust

Importantly, ELEVATE-GenAI is not intended for studies that use AI only for minor tasks such as editing or formatting text. Instead, it applies when generative AI meaningfully influences HEOR outputs.

Reporting generative AI across HEOR workflows: The 10 ELEVATE domains

At the center of ELEVATE-GenAI is a set of 10 reporting domains that together describe how generative AI is integrated into HEOR workflows and how its outputs are assessed.

1. Model characteristics

This domain ensures clarity about what AI system was used. Authors are encouraged to report the model name and version, developer, access method, license type, architecture, and — where available — training and fine-tuning data sources.

2. Accuracy assessment

Accuracy reporting focuses on how closely AI-generated outputs align with expected or correct results, using task-appropriate benchmarks such as expert review, gold-standard datasets, or quantitative performance measures.

3. Comprehensiveness assessment

Comprehensiveness addresses whether AI outputs fully cover all relevant elements of a task — for example, whether all key studies were captured in a literature review or all required components were included in an economic model.

4. Factuality verification

This domain emphasizes verification of factual correctness, including identifying and correcting hallucinated citations, incorrect data, or unsupported claims generated by the model.

5. Reproducibility and generalizability

Authors are encouraged to document prompts, parameters, workflows, and model versions to support reproducibility, and to discuss whether the AI-assisted approach can be applied to similar HEOR questions or settings.

6. Robustness checks

Robustness reporting addresses how sensitive AI outputs are to changes in inputs, such as minor prompt variations, ambiguous wording, or typographical errors.

7. Fairness and bias monitoring

Where applicable, studies should assess whether AI outputs introduce or reinforce biases related to demographic or population characteristics relevant to HEOR analyses.

8. Deployment context and efficiency

This domain captures practical aspects of AI deployment, including hardware and software configurations, processing time, scalability, and resource requirements — factors that influence real-world feasibility.

9. Calibration and uncertainty

Calibration focuses on whether AI confidence aligns with actual performance and how uncertainty is handled, such as defining thresholds for human review in hybrid AI–human workflows.

10. Security and privacy measures

Authors should describe how sensitive data, intellectual property, and regulatory requirements (e.g., GDPR or HIPAA) are addressed when generative AI is used in HEOR workflows.

Each domain is accompanied by reporting guidance and an assessment of metric maturity, recognizing that some areas — such as fairness and uncertainty — are still evolving.

From framework to practice: The ELEVATE checklist

To facilitate adoption, ELEVATE-GenAI includes a practical checklist that translates the 10 domains into concrete reporting questions. An optional scoring system allows authors and reviewers to summarize reporting completeness, while emphasizing that this score is not a measure of methodological quality or study validity.

The authors demonstrate the applicability of the guideline by retrospectively applying it to two published HEOR studies — one focused on systematic literature review automation and another on health economic modeling. These examples show how ELEVATE-GenAI can be used to consistently describe AI-assisted workflows across different HEOR applications and to identify areas where reporting can be strengthened.

Why ELEVATE-GenAI matters for HEOR

As generative AI becomes more deeply integrated into HEOR workflows, transparent reporting is essential to maintain scientific credibility and stakeholder trust. ELEVATE-GenAI provides a shared structure for documenting how AI is used, how outputs are evaluated, and what limitations may affect interpretation.

By establishing common expectations for reporting generative AI in HEOR, ELEVATE-GenAI supports responsible innovation while aligning with the needs of journals, HTA bodies, and regulators.

Final takeaways

ELEVATE-GenAI positions itself as a foundational guideline for reporting the use of generative AI in HEOR workflows. By focusing on transparency, reproducibility, and interpretability, it helps ensure that AI-augmented research can be critically assessed and confidently used in healthcare decision-making.

As a living guideline, ELEVATE-GenAI will continue to evolve alongside advances in generative AI — providing the HEOR community with a practical framework for integrating new technologies without compromising rigor or trust.

Interested in learning more?

Read the full paper: “ELEVATE-GenAI: Reporting Guidelines for the Use of Large Language Models in Health Economics and Outcomes Research: An ISPOR Working Group Report.”

External Control Arms in Drug Development: Methodological and Regulatory Considerations

Drug development is growing more complex, with compressed timelines and increasingly high expectations from regulators, payers, and health systems. In this setting, external control arms (ECAs) leveraging real‑world data (RWD) are emerging as a pragmatic approach to support clinical development and downstream commercial decision‑making.

Randomized controlled trials (RCTs) remain the gold standard for evidence generation. However, in many modern development programs, traditional randomized designs are not feasible or may raise ethical concerns. Sponsors increasingly encounter situations in which:

Patient recruitment is slow, limited, or not achievable
Randomization is ethically challenging
Development costs escalate rapidly
Competitive dynamics demand accelerated evidence generation
Patient populations are small or rapidly progressing
There is a high unmet medical need

These challenges are particularly acute in oncology, rare diseases, post‑approval expansion studies, and advanced or cell‑based therapies.

What is an external control arm?

An external control arm replaces or supplements a traditional control group by leveraging data from patients treated outside the clinical trial. These patients are drawn from routine clinical practice and reflect outcomes under standard‑of‑care treatment in real‑world settings.

External controls are typically constructed using real‑world data sources such as:

Electronic health records (EHRs)
Administrative and insurance claims
Disease and treatment registries

Unlike trial data, real‑world data reflect patterns of diagnosis, treatment, and follow‑up in everyday clinical care. The foundation of a well‑designed external control study is the use of fit‑for‑purpose data that are sufficiently complete, clinically relevant, and reliable to support robust and defensible analyses.

Strategic value of external control arms

When thoughtfully designed and appropriately governed, ECAs can provide meaningful strategic benefits, including:

Shortened development timelines
Improved feasibility of clinical studies
Evidence generation in small or rare populations
Stronger value narratives for payers and health technology assessment bodies
Support for lifecycle management and label expansion strategies

Methodological considerations and risks to manage

The credibility and acceptability of an external control arm depend heavily on methodological rigor.

Key considerations include the following:

1. Study design

External control studies should be designed to closely mirror the clinical trial, including:

Alignment of inclusion and exclusion criteria
Clear definition of index date and baseline
Comparable follow‑up periods and outcome assessment windows
Consistent treatment context and line of therapy

Pre-specification of the estimand and statistical analysis plan is critical to avoid post‑hoc decision‑making.

2. Patient selection and alignment

Ensuring comparability between trial participants and real‑world patients is one of the most critical aspects of ECA design. Sponsors should:

Use transparent, reproducible cohort selection algorithms
Apply consistent definitions for key demographic and clinical variables
Assess overlap and positivity between trial and external populations
Explicitly evaluate differences in baseline characteristics

Sensitivity analyses should be conducted to quantify the impact of residual differences where appropriate.

3. Handling confounding and bias

Because external control arms lack randomization, confounding must be actively addressed. Common analytical approaches include:

Propensity score methods (matching, weighting, stratification)
Multivariable outcome regression
Doubly robust methods that combine weighting and modeling

Method selection should be driven by study objectives, data characteristics, sample size, and variable completeness and not for analytical convenience.

4. Data quality and missingness

Real‑world data are inherently heterogeneous and incomplete. Methodological plans should address:

Data provenance, completeness, and validation
Handling of missing or partially observed variables
Measurement variability across providers, systems, or data sources
Differences in assessment timing and frequency

Imputation strategies and key assumptions should be explicitly documented and tested through sensitivity analyses.

5. Outcome definition and assessment

Endpoints derived from RWD must be clinically meaningful and aligned as closely as possible with trial definitions. Considerations include:

Use of validated real‑world endpoint definitions
Clear attribution and timing of outcomes
Consistency with regulatory‑recognized measures of clinical benefit
Avoidance of surrogate endpoints unless scientifically justified

Outcome misclassification remains a key risk and should be explicitly evaluated.

6. Sensitivity and robustness analyses

Regulators expect evidence that findings are robust under alternative assumptions. Analyses may include:

Variation in matching or weighting specifications
Alternative cohort definitions or look‑back periods
Use of negative control outcomes or exposures
Quantitative bias analyses where feasible

The objective is to demonstrate that conclusions are not driven by a single design or modeling decision.

7. Transparency and documentation

Methodological transparency is essential for regulatory and payer review. Best practices include:

Prespecifying analysis plans and decision rules
Fully documenting data sources, algorithms, and assumptions
Providing traceability from raw data to final outcomes
Enabling reproducibility of key analyses

Regulatory outlook and expectations

Regulatory agencies and health technology assessment bodies, including the U.S. Food and Drug Administration (FDA), the European Medicines Agency (EMA), and the Canadian Agency for Drugs and Technologies in Health (CADTH) have recognized the potential role of external control arms under conditions of methodological rigor and transparency.

Regulatory agencies have not lowered evidentiary standards. Rather, they have:

Provided greater clarity on scenarios in which external control arms may be acceptable
More explicitly articulated methodological expectations
Encouraged early and proactive dialogue with sponsors

Successful regulatory submissions that incorporate ECAs typically:

Provide a clear scientific and ethical rationale for why randomization is not feasible or appropriate
Use high‑quality, fit‑for‑purpose real‑world data sources
Transparently define patient selection criteria and demonstrate alignment with the trial population
Show that findings are robust, reproducible, and minimally biased

Early engagement with regulators remains critical to aligning expectations and maximizing the likelihood of success.

Join Anupama Vasudevan and James Matcham on February 3 at 10 a.m. ET for an open office hours on “Evidence Generation with External Control Arms”:

The What, When, and Why of the Changes to NICE Methods: Is the Devil in the Details?

Following weeks of anticipation, NICE officially announced in December that the recently rumored increase of its standard cost effectiveness threshold will take effect beginning April 2026.

What’s changing and when?

The standard cost effectiveness threshold range that NICE committees use to judge whether a medicine is cost effective will increase by 25% from 20–30K GBP per QALY gained to 25–35K GBP per QALY gained.

NICE stated in its webinar on December 3, 2025, that the Department of Health and Social Care (DHSC) will consult on powers to direct NICE to enact this change starting April 2026, in a targeted change to regulation. This consultation opened on December 9, 2025, and will close on January 13, 2026.

NICE stressed that this targeted change will not mean any broader intervention from government ministers in its methods or decisions. It also confirmed that it is proposing to the government that the new threshold applies across all NICE guidance (Digital, HealthTech, Guidelines) and was awaiting further details. NICE also confirmed in the webinar that it was not aware of any proposals to change the thresholds used to evaluate Highly Specialized Technologies (HSTs) for ultra-rare diseases.

However, the first proposal in the DHSC consultation document refers explicitly to all NICE guidance:

“Do you agree or disagree that the power to direct NICE about the standard cost-effectiveness threshold should apply to all NICE guidance that makes recommendations on health spending? This includes technology appraisal and highly specialised technology evaluation recommendations.”

As part of the timeline announced by NICE (see figure), which is subject to consultation, NICE confirmed that in early 2026 it will consult on how this change will be implemented.

Anticipated timeline to implement the announced changes (Source: NICE webinar on December 3, 2025)

In addition to an increase of its cost effectiveness threshold, NICE also announced it will start using a new EQ-5D-5L UK value set that has been developed by asking 1,200 members of the public to judge different health states and is anticipated to be published in a peer-reviewed publication by March 2026. This change, however, will follow the standard approach to making modular updates to its methods including a public consultation on the proposed change before its full implementation.

NICE’s announcement came in parallel with an announcement from the UK government about the successful closure of a trade deal with the US that includes this change, alongside an agreement regarding the tariff that UK pharmaceutical manufacturers will pay when exporting medicines to the US.

Why these changes?

NICE’s methods changes are anticipated to reshape the market access environment in the UK and beyond. The US-UK trade deal, of which this threshold change is part, may convince pharma companies to continue their presence in the UK and to maintain the UK’s positioning in the launch sequence after previously threatening to pull out of the UK market under pressure from the newly announced US tariffs and policies such as the MFN external reference pricing policy.

According to the UK government’s press release announcing NICE threshold changes:

“This is supported by confirmation that — thanks to strong UK support for innovation — the UK has secured mitigations under the US’ ‘Most Favoured Nation’ drug pricing initiative so that we will continue to ensure access to the latest treatments. This will encourage pharmaceutical companies from around the world to prioritise the UK for early launches of their new medicines, meaning British patients could be among the first globally to access breakthrough treatments.”

The anticipated impact

These NICE methods changes will have far reaching impact on the assessment of cost effectiveness of medicines in the UK, with likely spillover effects on other countries’ practices as well.

The higher WTP threshold expands headroom for treatments near previous ICER cut-offs, improving the feasibility of charging higher prices for innovative therapies. However, the unchanged discount rate limits the full advantage of this increase. This means more flexibility on price, but continued pressure on future value. It remains to be seen whether this increased threshold will also apply to other NICE guidelines apart from its technology appraisal (TA) program. What has been confirmed is that the threshold change will not lead to any reviews of completed appraisals.

NICE’s adoption of the EQ-5D-5L UK value set will also reshape patient-reported outcomes strategy. Utilities derived from EQ-5D directly influence QALY calculations and ICERs. By reflecting more nuanced health states, EQ-5D-5L supports a more accurate calculation of QALYs. Trials that currently collect EQ-5D-3L data may need a new mapping function to align with the new set. Future trials should prioritize EQ-5D-5L and ensure high completion rates for PRO instruments, as missing data will become even more critical.

From a patient perspective, this means their lived experience is better represented in HTA decisions. For pharma companies, it means interventions that improve pain, anxiety, and functional independence can show their full value in cost-effectiveness models.

Regional impact

It is not clear how Europe will respond to these changes on both sides of the Atlantic, but what is clear is that actions will need to be taken to minimize the impact of these changes on both the favorability of European markets as launch markets and the prices to be charged by pharma companies in these markets, both of which are likely to impact patient access to innovative medicines.

Further, we could speculate that this change could bring prices in the UK closer to France and Germany. The UK has been able to achieve low prices because of the powerful negotiating position of the UK’s single centralized payer for the majority of UK healthcare (the NHS), its deeply embedded health technology appraisal processes through NICE, which acts as the gatekeeper for the reimbursement of drugs, and through long-standing price-control mechanisms that effectively cap the NHS’s spend on innovative medicines — the most recent iteration of which is the Voluntary Scheme for Branded Medicines Pricing, Access and Growth (VPAG), and a fallback Statutory Scheme. The current VPAG scheme requires UK manufacturers to pay an effective clawback rate of 23.5% to the UK Government on “newer medicines” (22.9% clawback plus a 0.6% investment program funding, excluding new active substances) — far higher than comparators such as France (5.7%), Germany (7%), and Spain (7.5%).

Have you considered these and other impacts and is your team ready for these changes?

A Look Ahead for Biopharma: Embracing Complexity in 2026

Looking back, 2025 was a year of adjustment for biotech — instability shook an already fragile biopharma sector, AI gained significant momentum, the ability to merge open-source code with commercial software matured, and the use of real-world data (RWD) continued to increase. Most companies responded responsibly: pilots, internal capability builds, exploratory regulatory dialogue, and, unfortunately, many had significant layoffs.

2026 is poised to be about embracing complexity — it’s likely many of the trends begun in 2025 will continue, and in addition we’ll see more emphasis on women’s health, an increase in discussions on external control arms, and a spotlight on biostats/clinical pharmacology. From a regulatory perspective, both the US Food and Drug Administration and the European Medicines Agency are signaling the same thing: innovation is welcome, but only when it produces decision-grade evidence.

This has real implications for how drug development companies operate.

Women’s health has expanded beyond a niche category. Regulators are increasingly focused on sex-specific evidence and generalizability across all therapeutic areas. By 2026, this is no longer optional — it’s foundational. In December 2025, the FDA published guidance on studying sex differences, with recommendations to increase female enrollment and strengthen sex-specific analyses and reporting.

Smarter trial designs are now the default. Adaptive approaches, Bayesian methods, external control arms, and decentralized elements are acceptable — expected, even — but only when the assumptions are explicit and defensible. In addition, regulators are clarifying and further exploring the use of various datasets in regulatory submissions; there’s even been a few external control arms presented to regulators.

Behind all of this is a broader shift: biostatistics and clinical pharmacology are moving from support functions to strategic capabilities. Dose justification, estimands, missing data strategy, and model-informed drug development (MIDD) are now central to regulatory credibility.

For biopharma leaders, the message is clear:

For those companies that embrace complexity as the new normal, 2026 offers a powerful opportunity: faster development, clearer regulatory paths, and greater confidence — from regulators, investors, and, ultimately, patients.

Keep an eye out for these new trends in 2026:

Women’s health expands beyond a single therapeutic area

Women’s health is no longer confined to reproductive medicine or niche pipelines. Regulators are increasingly focused on sex-specific evidence and generalizability across all therapeutic areas.

In 2026, sponsors should expect:

Greater scrutiny of female enrolment and retention
Fewer waivers for sex-specific analyses
Increased innovation in areas such as menopause, endometriosis, fertility, and autoimmune disease

Takeaway: Women’s health isn’t just a therapeutic area; it’s a generalizability requirement. If your statistical analysis plan can’t speak clearly about sex effects (or justify why not), expect pointed questions.

External control arms move from optional to strategic

External and synthetic control arms have crossed an important threshold. In 2026, they are no longer seen as novel add-ons but as intentional design choices, particularly in oncology, rare disease, and high unmet-need indications.

Regulatory acceptance is evolving and both the EMA and FDA are leaning in — carefully:

EMA is actively developing a reflection paper on external controls (including RWD-derived arms) to shape consistent scientific expectations.1
EMA is also doubling down on DARWIN EU, its federated RWE network, with plans to extend work beyond 2027 (tender activity flagged for the first half of 2026).2
FDA continues to expand its RWE framing across programs (and sponsors are expected to demonstrate data relevance, reliability, and bias management — not just “we found a database”).3

Takeaway: External controls are not a shortcut; they’re a design choice that requires 1) pre-specified causal estimands, 2) transparent matching/adjustment strategy, and 3) sensitivity analyses that are realistic.

Biostatistics becomes a strategic capability

Clinical trial design is undergoing a quiet but fundamental upgrade. In 2026, efficiency is no longer achieved by cutting corners, but by thinking better upfront. Biostatistics is no longer a downstream function; it is becoming a strategic driver of development success.

Adaptive and Bayesian designs are becoming mainstream, particularly in early and mid-stage development. Sponsors are expected to define estimands clearly and adequately to address trial objectives, to integrate biomarkers earlier, and to design trials that answer regulatory questions — not just scientific ones. Smaller trials are acceptable; ambiguous trials are not.

In 2026:

Estimands, including novel endpoints and strategies for handling intercurrent events, and missing data strategies are key for addressing primary and secondary trial objectives and are strategic topics for assessing drug product’s risk-benefit and totality of evidence.
Bayesian methods are increasingly used not just in design, but in regulatory dialogue.
Statistics, clinical pharmacology, and translational science should continue moving closer to towards seamless integration.

Takeaway: Bringing biostatisticians to the decision-making table early and leveraging quantitative decision-making frameworks will be important for managing overall pipeline decisions.

Clinical pharmacology takes center stage

Clinical pharmacology is having a moment — and for good reason. Regulators are increasingly unwilling to forgive poorly justified dose selection, study design optimization, population enrichment, extrapolation, and/or benefit risk assessment/labelling.

MIDD is becoming the norm:

Since 2024, FDA is pushing Model-Informed Drug Development via ICH M15 draft guidance and its MIDD meeting program.4
EMA is sharpening expectations for mechanistic models (PBPK/PBBM/QSP) used in MIDD, including how they should be assessed and reported.5 A guidance is expected to be completed in 2026.
In December 2025, the FDA released a guidance that lists products that no longer need (or can reduce) 6-month non-human primate toxicity testing and, in April 2025, outlined a roadmap for reducing animal testing in preclinical safety studies.6,7

Takeaway: In 2026, there will be increased scrutiny in making sure sponsors show why the chosen dose is the best one and can justify the assumptions behind MIDD submitted, or why the approach wasn’t used.

Interested in learning more?

The FDA’s Roadmap to Reducing Animal Testing in Preclinical Safety Studies

For a number of years, the FDA and other regulatory agencies have been concerned about the number of animals used in drug research, particularly with regard to toxicology studies. This concern is not only based on animal welfare considerations, it is also based on the increasing realization that animal toxicology data does not always predict human toxicity.

Here, we discuss these challenges and the FDA’s new roadmap for reducing animal testing in preclinical safety studies.

Animal toxicity often does not predict human toxicity

There have been a number of drugs that appeared safe in animals while under development but were later shown to be toxic in humans.

Some examples include:

Fialuridine: No significant toxicity was observed in mice, rats, or dogs, but in a Phase II clinical trial, several cases of fatal hepatic failure occurred. The reason for this was later shown to be species-specific differences in how nucleoside analogs affect mitochondrial function.1
Troglitazone: No significant toxicity recorded in rats, mice, or dogs, but several reports of liver failure were reported post-approval, which ultimately led to the withdrawal of troglitazone from the U.S. market in 2000. It was later determined that toxic reactive metabolites were formed in humans, but not in animals.2
Rofecoxib: In non-clinical species, there were no safety signals observed. Post-approval, an increased risk of myocardial infarction and stroke were observed during long-term use. The reason for this difference in response is thought to be that rofecoxib may increase the susceptibility of human low-density lipoprotein and cellular membrane lipids to oxidative damage, which then may lead to plaque instability and thrombus formation in humans.3

According to the FDA, over 90% of drugs that appear safe and effective in animals do not ultimately receive FDA approval in humans largely due to safety and/or efficacy concerns.4 Conversely, some medications that are generally considered safe in humans may never have passed animal testing. Such physiological differences underscore why animals may not always provide adequate models of human health and disease.5

A new roadmap: Reducing animal testing in preclinical studies

In April 2025, the FDA published its Roadmap to Reducing Animal Testing in Preclinical Safety Studies.6 The roadmap outlines a long-term plan to reduce or possibly eliminate animal toxicology testing, starting with monoclonal antibodies, by using what is termed the “New Approach Methodologies (NAMs),” which include the use of human tissue-based systems, such as organs-on-chips, in silico modeling, and other innovative approaches.

Organs-on-chips (OoC): Systems that contain engineered or natural miniature tissues grown inside microfluidic chips.7
In silico modeling: Using computational modeling to leverage existing data to predict safety, immunogenicity, and pharmacokinetics, reducing the need for new animal testing. Key tools include PBPK modeling, AI/ML, and so on.8

Other approaches mentioned include ex vivo human tissues, high-throughput cell-based screening, microdosing and imaging in human volunteers, and refined in vivo methods. The Roadmap highlights that these methods all address one or more aspects of animal testing, and thus it will be essential to use an integrative strategy.

Key questions for success

Key questions that need to be answered in order for these approaches to be successful include:

How predictive are NAMs with regard to determining drug safety?
How are NAMs best utilized during the early stages of development, including how studies are to be designed?
How consistent are the results across various manufacturers of NAMs?

Unlike animals, new approach methodologies (such as organs-on-chips) may differ significantly in terms of cell types, genetics, and composition of the overall structure.

Final takeaways

The FDA has laid out an ambitious long-term strategy for reducing or even eliminating animal testing initially for monoclonal antibodies, with the potential to extend this to small molecules and therapeutic proteins. The success of this strategy will depend in part on close cooperation between industry stakeholders and the FDA, as well as other regulatory bodies in the ICH.

Interested in learning more?

Join Cytel’s Michael Fossler, Nelia Padilla, and Mammoth Preclinical’s Edwin Garner for their upcoming webinar, “FDA’s Roadmap to Reducing Animal Testing in Monoclonal Antibody Development” on December 9 at 9 am ET:

Older Posts

Discovery

Phase I-III Clinical Trials

Commercialization

Real-World Evidence Solutions

Clinical Trial Design

Trial Delivery

Advanced Analytics

Specialty Areas

Strategic Consulting

Beyond Functional Service Provider

Project-Based Analytical Solutions

Trial Design Software

Trial Implementation and Decision Support Software

LiveSLR® Software for Systematic Literature Reviews

Our Solutions

Quick Links

Quick Links

Quick Links

Quick Links

Quick Links

Quick Links

Quick Links

Quick Links

Quick Links

Quick Links

Quick Links

Quick Links

Quick Links

Quick Links

About Us

Quick Links

Insights

Quick Links

Careers

Quick Links

1. A policy environment under structural redesign

2. AI moves from experimentation to operating model

3. The convergence: Complexity requires capability

What is Central Statistical Monitoring?

Beyond KRIs and QTLs: What makes CSM different?

Why does CSM matter now?

Putting CSM into practice: Operational considerations for successful implementation

Interested in learning more?

A 300-year warning

A ledger of health inequality: The data and the reasons behind the gender gap

Fact vs. fiction: Debunking women’s health misconceptions

Conclusion

A Paradigm Shift Sparking Optimism and Questions

Why the FDA is moving away from two trials

Focus on trial design and analysis

Unclear implications for global drug development

Final takeaway: A defining regulatory moment

Type I error control is not the only path

Strategic implication:

Prior specification is now a regulatory deliverable

Key expectations:

Strategic implication:

Dynamic discounting provides protection — with complexity

Strategic implication:

Effective sample size is a central metric — Not Type I error inflation

Strategic implication:

Simulation standards are now explicit

Strategic implication:

Early FDA engagement is essential

Strategic implication:

Interim analyses: Design the decision points upfront

Key interim analysis considerations:

Strategic implication:

Rare vs. Common Disease Considerations

The bottom line

Interested in learning more?

Why a dedicated reporting guideline is needed

What is ELEVATE-GenAI?

Reporting generative AI across HEOR workflows: The 10 ELEVATE domains

From framework to practice: The ELEVATE checklist

Why ELEVATE-GenAI matters for HEOR

Final takeaways

Interested in learning more?

What is an external control arm?

Strategic value of external control arms