Solutions
About Us
Insights
Careers
Contact us
Contact Us
Customer Support
Customer Support

How Agentic AI Can Transform HTA Landscaping for EU JCA

Health Technology Assessment (HTA) in the European Union (EU) is entering a new phase with the introduction of the EU Joint Clinical Assessment (JCA). The goal of the new HTA regulation is to improve the availability of innovative health technologies in the EU by ensuring efficient resource use and strengthening the scientific quality of HTA across Member States (MS).

At the heart of this process is the JCA scope, which consolidates diverse evidence requests from all MS into the PICO (Population, Intervention, Comparator, Outcome) framework. Anticipating these policy-driven PICO requests is critical for a successful JCA submission and can turn into a complex, time- and labor-intensive exercise. In addition to understanding the potentially diverse clinical practices across the MS, it demands an in-depth assessment of the different national HTA evidence requirements. Teams working on PICO predictions need a clear mapping of what evidence has been accepted, questioned, or rejected across the different HTA systems. Building that mapping is multifaceted.

 

Why HTA landscaping is challenging

HTA landscaping requires careful review of past HTA decisions to understand what evidence leads to positive HTA outcomes. This involves identifying relevant patient populations, accepted comparators, and meaningful outcomes. It also requires going deeper in the HTA documentation, uncovering why certain choices were criticized or dismissed.

Much of this information is hidden in long reports, potentially including appendices. These HTA documents are written in different languages, follow different formats, and often include subtle but important contextual details that unravel the HTA critiques and reasoning for specific evidence requests. As a result, landscaping is still largely manual, time-consuming, and difficult to scale.

 

What makes agentic AI different

Agentic AI offers a new way to approach this problem. Instead of simply summarizing documents or answering one-off questions, agentic systems are designed to carry out structured tasks. They can follow a defined set of instructions, extract specific types of information, and organize results in a consistent way.

This makes them particularly suited for HTA landscaping, where the goal is not just to read documents, but to systematically extract comparable insights across multiple sources.

 

Our research: Using AI agents for HTA extraction

In our recent research, which will be presented at ISPOR US this May, we explored how autonomous AI agents can support HTA landscaping for EU JCA.

We developed two large language model–based agents designed to extract structured information from HTA reports using a set of 21 expert-defined questions. These questions covered both standard PICO elements, such as population, comparators, and outcomes, as well as more context-specific insights. This included methodological requirements, reasons for rejecting certain outcomes or comparators, and other critique points raised by HTA bodies.

The two agents differed in how they were guided. The first used a general prompt, while the second incorporated additional clarification within selected questions to improve contextual understanding.

 

How we evaluated performance

To test the agents, we used publicly available HTA reports for osimertinib (in locally advanced or metastatic NSCLC with EGFR T790M mutation) from Spain, the Netherlands, and France. These reports varied in length, structure, and language, providing a realistic test of performance.

Local HTA experts applied a strict scoring framework that assessed both accuracy and completeness. Importantly, any answer containing hallucinated content was automatically scored as zero. This ensured that reliability remained central to the evaluation.

 

What we found

Both agents were able to complete the full extraction across all HTA reports, and around 90% of responses were generated without hallucinations. The second agent performed better overall, achieving a higher number of fully correct answers and fewer partially correct responses.

The first agent, while still effective, produced some hallucinated content, particularly in the Spanish report. The second agent avoided hallucinations entirely in this evaluation. Both agents performed best on the French HTA report, suggesting that clearer structure and language can improve AI performance.

One of the most important findings was the impact of prompt design. Adding targeted clarification significantly improved the agent’s ability to interpret and extract complex HTA information.

 

What this means for EU JCA landscaping

These results suggest that agentic AI can meaningfully improve how HTA landscaping is performed. By automating structured extraction, it becomes possible to review multiple reports more quickly and consistently. This allows teams to build a more comprehensive understanding of the landscape in less time.

Importantly, this approach goes beyond standard PICO elements. It captures the context-specific insights that often drive HTA decisions, such as methodological concerns or other reasons for rejecting evidence. This is critical for developing realistic PICO scenarios in the context of JCA.

Another key advantage is the ability to work across languages. Since EU HTA involves multiple jurisdictions, multilingual capability removes a major barrier and enables a more unified analysis.

 

The role of human expertise

Despite these advances, AI alone is not enough. Some limitations remain, including occasional hallucinations and variability depending on the source material. For this reason, human oversight continues to be essential.

The most effective approach is to combine agentic AI with human HTA expertise. AI can handle large-scale extraction and structuring of information, while experts validate the outputs and ensure that interpretations are accurate and relevant.

 

Looking ahead

Agentic AI is unlikely to replace HTA professionals, but it will fundamentally reshape how they work. By reducing the burden of manual review, it frees experts to focus on higher-value activities such as interpretation, strategic planning, and decision-making.

In the context of EU JCA, this shift brings clear advantages. It enables faster, more scalable landscaping and PICO predictions, helping to identify potential evidence gaps earlier in the process. As the methodology evolves, further testing will expand the integration of HTA reports from additional MS into the agent-driven workflows. At the same time, engineering adaptations may be needed to accommodate ongoing changes in local HTA documents as they continue to evolve together with the JCA reports.

 

Interested in learning more?

Manuel Cossio and Lilia Leisle will be presenting their poster “Accelerating Dynamic HTA Landscaping in Oncology Through Autonomous Generative AI-Driven Multilingual Data Extraction” at ISPOR US on May 18 at 4 PM. We hope to see you there!

Building a New Evidence Base for Rare Diseases by Structuring Clinical Narratives with Generative AI

Rare diseases present a paradox in modern healthcare. Individually, they affect small populations, yet collectively they impact millions of patients worldwide. Despite this, progress in diagnosis, treatment, and research remains slow. The fundamental challenge is not only scientific complexity but also a persistent lack of usable data.

Traditional sources of real-world data — electronic health records, claims databases, and clinical trials — struggle to capture rare disease populations at a meaningful scale. Patients are geographically dispersed, frequently misdiagnosed, and often excluded from structured datasets. As a result, generating robust evidence in rare diseases remains difficult.

At the same time, an overlooked resource has quietly accumulated over decades: clinical case reports. These narratives contain detailed descriptions of real patients, their symptoms, diagnostic journeys, and outcomes. The challenge has never been their value, but rather their accessibility and structure.

Recent advances in large language models (LLMs) suggest that this barrier may finally be overcome.

 

Case reports as a foundation for real-world evidence

Case reports represent one of the richest forms of clinical documentation available. Unlike structured datasets, they capture the full nuance of patient care, including symptom evolution, diagnostic uncertainty, and physician reasoning. They are inherently real-world, reflecting how diseases actually present and are managed in practice.

However, their utility has historically been limited. Case reports are written in free text, scattered across millions of publications, and lack standardization. Extracting meaningful insights at scale has required significant manual effort, making systematic use impractical.

The RareArena study demonstrates a new approach. By leveraging LLMs, researchers were able to automatically collect and process hundreds of thousands of case reports from PubMed, filter them for rare diseases, and transform them into a structured dataset comprising tens of thousands of patient cases. This process effectively converts unstructured clinical narratives into analyzable real-world data.

This shift is significant. It reframes case reports not as isolated anecdotes, but as components of a scalable data asset.

 

From unstructured text to scalable patient populations

One of the most important implications of this approach is the ability to expand patient populations in rare disease studies. Traditional datasets are constrained by institutional boundaries and data availability. In contrast, case reports aggregate knowledge globally, capturing patients from diverse healthcare systems and settings.

By structuring these reports, LLMs enable the creation of virtual cohorts that far exceed what any single registry or database could provide. Diagnoses can be standardized using reference ontologies, symptoms can be normalized, and cases can be grouped into clinically meaningful categories.

The RareArena dataset, for example, spans thousands of rare diseases and tens of thousands of patient cases, representing one of the broadest collections of rare disease data assembled to date. This kind of scale opens new possibilities for understanding disease heterogeneity, identifying subpopulations, and generating evidence where none previously existed.

In effect, LLMs allow researchers to move from fragmented observations to aggregated real-world populations.

 

Capturing the diagnostic journey

A particularly valuable aspect of the RareArena framework is its alignment with real clinical workflows. The dataset distinguishes between two stages of diagnosis: early suspicion based on symptoms alone, and confirmation after diagnostic testing.

This distinction mirrors how rare diseases are encountered in practice. Patients often experience long diagnostic odysseys, with years passing before a correct diagnosis is reached. By separating these stages, the dataset captures both the uncertainty of early presentation and the clarity provided by confirmatory tests.

This structure enables deeper analysis of diagnostic pathways, including where delays occur and how different signals contribute to clinical decision-making. It also provides a foundation for developing tools that support earlier recognition of rare diseases, an area where unmet need remains substantial.

 

Preserving clinical complexity in real-world data

A common limitation of many real-world datasets is the loss of clinical nuance. Structured data often simplifies patient information, omitting negative findings, confounding symptoms, and contextual details that are critical for diagnosis.

Case reports, by contrast, preserve this complexity. The RareArena study shows that most cases retain features such as negative symptoms and confounding factors, reflecting the challenges physicians face in real-world settings. This makes the resulting dataset not only large, but also clinically realistic.

Maintaining this level of detail is essential for rare diseases, where subtle distinctions can significantly alter diagnosis and treatment. LLMs play a key role here by rephrasing and structuring text while preserving the underlying clinical information.

The result is a form of real-world data that is both scalable and rich in context.

 

Implications for research and clinical development

The ability to generate structured datasets from case reports has far-reaching implications. For researchers, it enables the study of rare diseases across larger and more diverse populations than previously possible. Patterns of presentation, progression, and response to treatment can be explored with greater statistical power.

In clinical development, this approach offers new ways to identify and characterize patient populations. It can support the design of clinical trials by highlighting underrepresented groups and informing inclusion criteria. It also provides a potential source of external evidence, complementing traditional trial data.

Beyond research, there is a clear opportunity to improve clinical decision support. The RareArena study demonstrates that LLMs already show meaningful capability in diagnosing rare diseases, particularly when provided with comprehensive clinical information. While not yet sufficient for standalone use, these models can assist clinicians by surfacing relevant diagnostic possibilities.

 

Limitations and considerations

Despite its promise, this approach is not without limitations. Case reports are inherently selective, often focusing on unusual or severe presentations. This introduces potential bias in the resulting datasets. Additionally, the data is retrospective and curated, rather than continuously collected.

LLMs themselves introduce another layer of complexity. While they are effective at extracting and structuring information, they can also propagate errors or introduce subtle inaccuracies. Ensuring data quality and validation remains critical.

The RareArena study also highlights that even the most advanced models are far from perfect in diagnostic tasks, particularly in early-stage scenarios. This reinforces the need to view these tools as augmentative rather than autonomous.

 

A shift from data scarcity to data unlocking

What emerges from this work is a broader shift in how we think about data in rare diseases. The challenge is no longer solely about collecting new data, but about unlocking the value of existing information.

Case reports represent decades of accumulated clinical knowledge. With LLMs, it becomes possible to systematically extract, structure, and scale that knowledge into usable real-world data. This approach does not replace traditional data sources, but it significantly expands the available evidence base.

For rare diseases, where every patient case is valuable, this shift is particularly impactful.

 

Toward a more complete picture of rare diseases

The combination of case reports and large language models offers a compelling new pathway for advancing rare disease research. By transforming unstructured narratives into structured datasets, it enables the creation of larger, more representative patient populations and more realistic models of clinical care.

While challenges remain, the potential is clear. This approach can accelerate diagnosis, inform clinical development, and ultimately contribute to better outcomes for patients who have long been underserved.

In a field defined by scarcity, the ability to unlock hidden data may prove to be one of the most important innovations yet.

Leveraging RWE Innovations to Inform Clinical Strategy and Strengthen Healthcare Decision-Making

Real-world evidence (RWE) is no longer a supporting actor, but rather a strategic asset that should be embedded across the product lifecycle.

We now have tools that were unimaginable a decade ago: synthetic data that preserves privacy while enabling scenario modeling and early go/no‑go decisions, external control arms (ECAs) to strengthen single‑arm trials and accelerate access in high unmet need settings,
and decentralized long‑term extensions via tokenization that reduce burden while capturing 10+ years of safety and effectiveness across the patients’ real-world journey.

These innovations aren’t just “nice to have.” They are how we accelerate access to needed therapies, demonstrate value with confidence, and build submissions that stand up to today’s scrutiny.

Here, I discuss how these capabilities are reshaping clinical strategy and unlocking smarter, faster, more equitable evidence generation.

 

Generating synthetic data with agentic AI

Synthetic data is artificially generated data that mimics the statistical properties of real data without containing identifiable patient information. Starting with appropriate real-world data (RWD) (patient-level) or randomized controlled trial (RCT) data source(s), sponsors can use an AI-supported pipeline to generate a synthetic dataset, then assess similarities to the original data to gauge success.

Synthetic data can:

  • Inform early go/no-go decisions: A cost-effective approach to optimizing asset strategy before large investments by simulating expected outcomes under various scenarios in Phase I–II.
  • Inform CT design: Model alternative controls and sample sizes and stress-test treatment effects in a cost-effective manner.
  • Build privacy-preserving cost-effective ECAs: Build an ECA partially (+ RWD) or totally through a fully de-identified synthetic cohort. This is not for regulatory purposes yet, but it can inform provider and payer decisions.

RWD has its limitations: it must closely resemble real patient populations and protect patient privacy, and can be costly, time-consuming, and potentially unethical. Synthetic data can help overcome these challenges.

 

Strengthen regulatory submission with an external control arm

External control arms use data from historical RCT or RWD when randomization is not feasible or ethical, or to power / accelerate a study where there is high unmet need.

ECAs can:

  • Strengthen single-arm trials (SAT): Provide contextual information for SAT regulatory submissions, increasing probability of success.
  • Accelerate access to needed therapies: For RCT in high unmet need (e.g., accelerated approval pathway) and/or with slow recruitment, RWD can augment the control arm.
  • Support a lifecycle management approach: Supports label expansions to new populations (e.g., to male breast cancer) or new lines of therapy for decisions by regulators, payers, and providers.

While RCTs are considered the “gold standard,” the FDA in 2023 wrote that “externally controlled studies may be considered” (with strong justification), while in 2025, the EMA guidance stated “in some situations, causal conclusions may be derived from a setting where the investigational medicinal product data was collected under a clinical trial protocol while the control arm was not a randomized arm in that same protocol.”

 

Assess long-term outcomes with long-term extension studies

Decentralized long‑term extensions for RCT assess long-term outcomes (safety and effectiveness) with or without drug provisions. The extension enables follow-up of tokenized trial patients via real-world databases or direct-to-patient data collection.

Long‑term extension studies can:

  • Allow for long-term follow-up: Cost-effective data collection by reducing site and patient burden while collecting key safety and effectiveness endpoints over 10+ years.
  • Enable earlier launch: For breakthrough therapies and high unmet need, launch can occur as soon as clinical efficacy is proven if the sponsor commits to a Phase IV study to collect long-term data.
  • Improve representativeness: Loss to follow-up in long-term studies can lead to confounding, and RCTs often under-represent certain populations. The shift to real-world endpoints makes the insights more relevant to decision-makers.

 

Key takeaways

Consider RWE as a strategic asset: Integrate RWE early and anticipate post-marketing collection of long-term data and adopt causal inference methods to protect ideals of safety and effectiveness.

Invest in robust RWD: Invest in RWD quality and governance to ensure credibility with regulators and payers.

Adopt a comprehensive strategy: Adopt flexible, hybrid evidence strategies that combine synthetic data, ECAs, and long-term real-world data collection approaches.

Ensure cross-functional readiness: Medical, regulatory, biostats, and data science must operate as one evidence engine.

The Delta Dossier: Why Germany Needs More Than a Reference-Based Approach

With the first Joint Clinical Assessments (JCAs) at the European level, pharmaceutical companies are by no means entering a phase of reduced national HTA requirements. Germany, in particular, is already showing that the so-called delta dossier — an informal term used in the German market access environment for the national content required in addition to the European JCA dossier — is not simply a shorter AMNOG dossier containing references to the European JCA dossier.

Instead, it is becoming the test of whether clinical trial evidence, European and German HTA requirements, and tight procedural timelines can be brought together at an early stage.

There is still only limited practical experience with real delta dossiers. All the more important, then, are the signals coming from Germany’s Federal Joint Committee (Gemeinsamer Bundesausschuss, G-BA), the country’s highest decision-making body in joint self-government and a central institution in the national HTA framework. Its spring 2025 events already made clear where the key requirements are likely to emerge and which questions pharmaceutical companies should be addressing now. The G-BA itself views the planned adjustments in the national setting, including the adaptation of the AMNOG dossier module templates, as a first step and intends to assess further developments on the basis of the first practical experience.

Here, we share five theses on the delta dossier.

 

Thesis 1: The EU JCA will not replace the German benefit assessment

A central point is often underestimated in the current debate: the JCA does not replace Germany’s early benefit assessment. The G-BA makes it clear that alignment with the European assessment does not change the assessment standards applied in the German benefit assessment. Decisions will continue to be taken at the national level. The JCA dossier is intended to inform national decision-making, but it does not itself provide a conclusion on additional clinical benefit compared with the national appropriate comparator therapy (zVT) — the foundation for the subsequent price negotiation.

This also clarifies the role of the delta dossier: the objective is not simply to pass through European content in a formal way, but to prepare it in a manner that is robust and usable for the German procedure.

 

Thesis 2: The delta dossier is about translation, not cross-referencing

The G-BA describes very specifically how the JCA dossier is to be used. References are possible, but only to clearly identified sections. General or dynamic references are not sufficient. At the same time, it remains the responsibility of the pharmaceutical company to determine whether the contents of the JCA dossier are sufficient for the German benefit assessment or whether updated or supplementary evidence is required. There will be no separate dossier template. The structure of the AMNOG modules will remain in place.

This is precisely where the quality of a good delta dossier becomes visible: it is the national translation of the European assessment process and brings the JCA dossier and the AMNOG dossier together. This is achieved not through references alone, but above all through the targeted selection of content that is truly robust and the addition of missing data needed for an evidence-based national assessment.

One point is particularly important here: the G-BA makes it clear that a full national AMNOG dossier may still be submitted. There is therefore no obligation to use the delta dossier as a lean referencing solution. What remains decisive is not the format, but the quality of the national dossier preparation.

 

Thesis 3: The real work starts well before the delta dossier

The determination of the relevant PICOs (PICO scoping) for the JCA already begins when the marketing authorization application is submitted to the EMA, and therefore well before the start of the national AMNOG procedure. The PICOs fed back by Germany are intended to reflect the relevant research questions for the later AMNOG procedure, but — just like, for example, the outcome of an early G-BA consultation on the appropriate comparator therapy — they are not legally binding. This creates a risk scenario, particularly for the national procedure, that must be anticipated and taken into account in strategic planning. Any company that only starts to structure populations, comparator therapies, endpoints, and potential subgroups when preparing the national dossier is already too late.

European scientific consultation on PICO scoping also takes place at a point when studies are still being planned. National consultations remain possible, but parallel duplicate structures are to be avoided. For manufacturers, this means that the real strategic work does not begin with the delta dossier, but with PICO scoping, study design, and early evidence planning.

 

Thesis 4: The biggest risks sit in comparator selection and endpoints

Translation into the German setting already becomes particularly demanding at the scoping stage. The first key question is which PICO, or which set of PICOs, actually reflects the requirements of the German benefit assessment. This determines which comparator therapy is relevant for Germany and whether the evidence addressed in the JCA will in fact support the national assessment. This is precisely where preparation for a strong delta dossier begins: with the early identification of the PICOs relevant for Germany, the selection of robust content, and the supplementation of evidence wherever European materials are not sufficient for the national assessment.

In addition, European JCA scoping may include endpoints that are not necessarily recognized as patient-relevant in the national procedure. The G-BA explicitly distinguishes between endpoints included at the European level and the criteria for patient relevance that apply in the German AMNOG procedure. The same applies to analytical methods: national requirements — such as the 15% relevance threshold for responder analyses — remain in place.

For this reason, the delta dossier is particularly demanding from a scientific and methodological perspective wherever European evidence must be made robust for German comparator therapies and nationally relevant endpoints.

 

Thesis 5: Timing and evidence updates will be decisive

In addition to scientific issues, procedural management is becoming more important. The G-BA continues to require that the underlying systematic literature review on relevant clinical evidence must not be more than three months old at the start of the procedure. Additional data cuts and newly completed studies may therefore become relevant in the AMNOG procedure even if they were not addressed in the JCA dossier. This means that the dataset underlying the national AMNOG dossier may differ from the dataset underlying the JCA dossier.

The timing of the publication of the JCA report is also particularly important. If it is available in time, it will be taken into account in the benefit assessment. If it becomes available later, it may still be considered during the written comments procedure or, at the latest, in the final resolution. However, if it is published only after the start of the written comments procedure, it can no longer formally be taken into account. At the same time, the G-BA points out that there is as yet no reliable practical experience in this regard — another source of uncertainty for pharmaceutical companies.

 

From JCA to delta dossier: Cytel combines global perspective with local execution

Cytel occupies the critical interface between European clinical assessment and national benefit assessment in Germany. Together with the German team at co.value, a Cytel brand, Cytel combines experience in PICO scoping, JCA dossier development, and statistical evidence generation with in-depth local AMNOG expertise. This means support does not begin only at the point of translating into the delta dossier, but much earlier: in evidence planning, the selection of robust comparator therapies, and the targeted shaping of European evidence for reliable use in the German AMNOG procedure.

 

The delta dossier as the true test

The first delta dossiers are only now beginning to emerge. But the substantive guardrails are already clearly visible, and they point in a clear direction: within the framework of European clinical assessment, the German AMNOG procedure will not become a process that can be handled through references alone.

What will matter instead is how early clinical trial evidence, European and German HTA requirements, and tight procedural timelines are brought together. The delta dossier is therefore not merely a new format. It is the clearest expression of whether this translation work has been accomplished in time.

Real-World Data Strategies and Challenges: Making Data Work for Your External Control Arm Study

External control arms (ECAs) are gaining popularity in comparative effectiveness studies, driven by a growing emphasis on robust evidence across disease areas and regulatory body acceptance. ECAs can provide a control group for single-arm studies, complement a larger portfolio of evidence, and enable research for rare or genetic conditions for which randomized controlled trials may be unethical or infeasible.

At the same time, real-world data (RWD) is becoming an essential foundation for building credible ECAs. RWD offers unique advantages: it reflects real clinical practice, captures diverse patient populations, and can provide data for robust treatment effects.

However, integrating data from multiple sources, such as historical trials, concurrent trials, patient registries, and cross-population datasets, requires careful methodological planning to ensure validity and regulatory acceptance.

To fully harness the value of external control arms, sponsors must ensure selected data is fit-for-purpose, index dates are aligned with trial eligibility, and rigorous statistical methods are applied to ensure comparable patient profiles. Here, we outline these three essential elements.

 

Choosing the right data source for your external control arm

When building ECAs, different types of external data sources have different strengths.

 

Historical or concurrent randomized trials

Historical or concurrent randomized trials contain systematically collected data and well-defined endpoints, following a detailed protocol. However, they often have small sample sizes, and evolving standards of care or diagnostic criteria can limit comparability over time.

 

Electronic health records and insurance claims

Electronic health records and insurance claims contain large, diverse cohorts and broad population coverage. But they frequently lack clinical details such as out-of-hospital care and non-prescription medications.

 

Patient registries

Patient registries provide systematic, detailed data collection, the potential for linkage​ and long-term follow up. Yet they can have high missingness and over-represent healthier patients, which could reduce the overlap in characteristics with trial populations.

 

Selecting the best data sources should be guided by fit-for-purpose assessments. These studies include exploring the availability of key prognostic characteristics and missingness, along with practical considerations such as access and timelines.

 

Defining appropriate eligibility criteria and index dates

Carefully establishing index dates is critical yet challenging when incorporating an ECA. In a trial population, the index date is clearly defined as when the patient meets eligibility or is randomized. The same eligibility criteria need to be applied to ECA patients using variables in the external data source. The index date should reflect the point at which those criteria are met. Misalignment of the index date leads to specific types of selection bias, including immortal time. This bias occurs when periods during which an outcome could not have occurred are misclassified, potentially creating a false treatment benefit.

 

Ensuring treatment and control patients are similar

In RCTs, randomization naturally balances prognostic factors between treatment arms. ECAs, by contrast, require explicit identification and adjustment of these variables. Clinical expertise is essential for determining which characteristics matter most. Comparing the distributions of these variables between the treated versus control arm helps to assess similarity. Statistical techniques including propensity-matched controls and inverse treatment of probability weighting can improve comparability and approximate the balance achieved through randomization. Assessing pre- and post-adjustment distribution of baseline characteristics quantifies the success of the method.

 

Final takeaways

Overall, to fully harness the value of external control arms, three elements are essential:

  1. Selecting fit-for-purpose data
  2. Defining index dates that align with trial eligibility
  3. Applying rigorous statistical methods to ensure comparable patient profiles

When executed thoughtfully, ECAs can meaningfully strengthen evidence generation and expand the possibilities for clinical research.

 

Interested in learning more?

Watch our on-demand webinar featuring Deepa Jahagirdar and Vartika Savarna, “Driving Credibility in External Control Arms with Real-World Data,” available now.

Insights from WEPA Amsterdam: When Policy Pressure Meets AI Maturity

The World EPA Congress in Amsterdam did not feel like a conference about isolated trends. It felt like a conference about structural transition.

Across sessions and conversations, one consistent narrative emerged: market access is being reshaped simultaneously by tightening policy frameworks and by the operational maturation of artificial intelligence. These are not parallel stories unfolding independently. They are interacting forces that together are redefining how evidence is generated, how value is assessed, and how global pricing strategies are constructed.

The underlying question throughout WEPA was not whether change is coming. It was whether organizations are structurally prepared to manage both forces at once.

 

1. A policy environment under structural redesign

Joint Clinical Assessment: Harmonization meets operational reality

The first year of Joint Clinical Assessment (JCA) implementation under the EU HTA Regulation represents a historic step toward harmonization of clinical evaluations across Europe. In principle, a single European-level clinical assessment promises efficiency, reduced duplication, and greater consistency in evaluating comparative effectiveness.

Yet the operational reality is more complex. Harmonization does not automatically mean simplification.

Early experience indicates that alignment between EU-level assessments and national reimbursement processes remains incomplete. Questions persist around how Member States will operationalize JCA outputs, how quickly EU HTAR assessors can deliver assessments, and whether national HTA bodies are fully prepared to transition to reliance on joint evaluations.

Methodological challenges are also emerging. PICO multiplicity, expanded evidence requirements, and the risk of unexpected analytical requests are increasing the burden on evidence generation teams, especially for products targeting rare diseases. While duplication of assessments may decrease, the sophistication and coordination required to navigate the system are increasing.

JCA is a milestone in European collaboration. But its success will depend on tighter synchronization between EU-level clinical conclusions and national pricing and reimbursement realities.

 

Real-world evidence: From complementary input to strategic pillar

Alongside JCA, the role of real-world evidence (RWE) is evolving rapidly. Regulators, payers, and clinicians increasingly seek insight into how therapies perform in routine clinical practice across diverse populations. The European Medicines Agency has clearly signaled its ambition to place patient voice and real-world data at the center of regulatory evaluation.

RWE is no longer supplementary. It is becoming central.

However, tension remains within the EU HTAR context. JCA assessments emphasize statistical precision and internal validity, while real-world evidence reflects the inherent heterogeneity of clinical practice. Methodological expectations between regulatory and HTA frameworks are not yet fully synchronized.

Europe now faces a strategic choice: either build robust, interoperable infrastructures for high-quality real-world data sharing across Member States, or risk creating friction between regulatory innovation and HTA conservatism. The credibility of future evidence strategies will depend on resolving this gap.

 

MFN pricing: Global interdependence redefines strategy

At the global level, Most-Favored-Nation (MFN) pricing dynamics are reshaping launch and market access strategies beyond the United States. Pricing has become an interconnected global system rather than a sequence of independent national decisions.

Launch sequencing is being reassessed as companies evaluate exposure to international reference pricing and MFN-linked rules. Markets are increasingly categorized by strategic risk, and cross-market interdependence is intensifying. Decisions taken in one jurisdiction reverberate across others.

Europe, despite its strong regulatory institutions, faces pressure due to fragmented access pathways, evolving JCA processes, and uncertainty in national budget negotiations. The traditional logic of “where to launch first” has become a far more complex strategic equation.

Taken together, JCA implementation, the rise of RWE, and MFN pricing pressures are increasing analytical complexity, accelerating timelines, and demanding greater coordination across functions and geographies. This rising structural pressure forms the backdrop to the second defining theme of WEPA.

 

2. AI moves from experimentation to operating model

From hype to governance

If policy discussions reflected systemic pressure, AI discussions reflected systemic adaptation.

The tone around artificial intelligence at WEPA 2026 was notably mature. The conversation quickly moved beyond questioning whether AI is hype. The focus shifted toward responsible operationalization, governance, and measurable value creation within regulated environments.

The key issue is no longer adoption. It is integration.

Organizations are developing governance frameworks, embedding AI into regulated workflows, and ensuring traceability and auditability of outputs. The emphasis is on scale and accountability rather than isolated experimentation.

 

AI as infrastructure in market access

Across sessions, AI was framed not as a productivity enhancement tool but as part of the operating model of modern market access organizations.

Companies are redesigning processes around AI-enabled capabilities. Evidence synthesis, systematic literature reviews, indirect treatment comparisons, dossier drafting, pricing simulations, and tender strategy development are increasingly supported by automated or semi-automated systems.

This represents a structural shift. AI is moving from peripheral pilot projects to enterprise-level infrastructure embedded within core functions.

In an environment where JCA increases analytical burden and MFN pricing demands multi-country scenario modeling, such capabilities are becoming operationally essential rather than optional.

 

From assistant to strategic copilot

One of the most forward-looking discussions centered on the evolution of AI from drafting assistant to strategic copilot.

The emergence of agentic AI and orchestration systems is enabling decision support in areas such as pricing negotiation, tender simulations, and contracting strategy optimization. Rather than merely accelerating document preparation, AI is beginning to inform strategic decision-making.

However, in highly regulated settings such as HTA and pricing negotiations, transparency and explainability remain non-negotiable. The credibility of AI-driven insights depends on robust governance and clear traceability.

The opportunity is substantial — speed, standardization, and efficiency. The responsibility is equally significant.

 

3. The convergence: Complexity requires capability

The most important insight from WEPA Amsterdam lies not in policy alone, nor in AI alone, but in their convergence.

Policy reforms are increasing complexity. JCA raises expectations for comparative evidence coordination across Europe. Real-world evidence demands stronger data ecosystems. MFN pricing intensifies global interdependence and strategic sensitivity.

At the same time, AI provides the analytical and operational capabilities necessary to manage this complexity. It enables faster synthesis of comparative data, structured analysis of heterogeneous real-world evidence, and dynamic cross-market pricing simulations.

In this sense, policy pressure and AI capability are two sides of the same transformation. The former raises the bar; the latter provides the tools to reach it.

The defining question for market access organizations is whether they can redesign their operating models quickly enough to integrate policy intelligence, evidence generation, pricing foresight, and AI-enabled execution into a coherent system.

WEPA 2026 signaled that the era of treating these dynamics as separate conversations is over. Market access is entering a phase where structural policy reform and technological capability must be managed together.

Those who integrate both dimensions — responsibly, transparently, and strategically — will shape the future of evidence-based access in Europe and beyond.

The Invisibility Machine of the Women’s Health Gap

A 300-year warning

The global timeline for gender equality is not merely stalling; it is a sobering indictment of our collective priorities as a society. Current estimates from the United Nations reveal a staggering distance to parity: at our current trajectory, it will take 300 years to end child marriage, 286 years to eliminate discriminatory laws and legal protection gaps, 140 years to achieve equal representation in workplace leadership, and 47 years to reach an equal footing in national parliaments.

These are not just social milestones; they are structural barriers that define the “Gender Health Gap.” This gap represents the inequitable, systematic differences in health outcomes between women and men — differences rooted in under-researched medical needs, chronic underfunding, and a “medical model” that has historically treated male biology as the universal baseline. To close this divide, we must recognize that health equity is a strategic imperative for global stability, health capital, and economic prosperity.

 

A ledger of health inequality: The data and the reasons behind the gender gap

Sex is a fundamental genetic modifier of biology, influencing everything from disease susceptibility to treatment response. Yet we remain trapped in a “health-survival paradox”: while women generally live longer than men, they endure higher burdens of morbidity and disability throughout their lives. Some examples are:

  • Diagnostic Delays: On average, women are diagnosed nearly four years later than men for the same diseases.
  • Misdiagnosis: Women are twice as likely to die following a heart attack than men, partly because they have a 50% higher chance of receiving an incorrect initial diagnosis.
  • AI Bias: Modern digital tools often entrench these disparities; AI-powered symptom checkers have been found to flag women experiencing heart attacks as needing psychological care rather than emergency medical intervention.
  • Invisible Conditions: Many women-specific conditions are severely underdiagnosed. For example, 8 in 10 women with menopause and 6 in 10 women with endometriosis remain undiagnosed. Adenomyosis affects up to 35% of women but is often invisible in medical records due to misdiagnosis as fibroids.

 

Some of the key reasons for the gender health gap are related to systematic underinvestment in research and innovation funding and the intersection of biology with social factors that historically displaced women’s equal position in society.

A primary driver of the health gap is the systemic neglect of female biology in scientific research:

  • Underfunding: Only 5% of global research and development funding is allocated to female-related research. Of this, a mere 1% goes toward women-specific conditions like menopause and fertility.
  • Clinical Trial Underrepresentation: The inclusion of women in clinical research only became a requirement in the 1990s. Today, women make up only about 41.2% of participants in key disease clinical trials. In cardiovascular drug trials, female participation averages only 34%, often failing to match the actual disease prevalence in the population.
  • Adverse Drug Reactions: Because many drugs are tested primarily on men, women have a 34% increased risk of severe adverse events. A notable example is the sleep aid Zolpidem, which stays in women’s systems longer than men’s; it took until 2013 for the FDA to require reduced dosing for women after decades of increased emergency room visits.

 

The gap is also influenced not only by the complex interplay of biological sex (genetics, hormones), but also by social gender (norms, roles) and societal roadblocks such as lack of female representation in leadership positions directly shaping inequalities in health policy development not only for women but for all marginalized communities.

 

Fact vs. fiction: Debunking women’s health misconceptions

Effective strategy requires dismantling the myths that have long perpetuated gender health inequality.

  • Women’s health is not synonymous with OB/GYN: Progress has been hindered by the misconception that women’s health is limited to reproductive and sexual needs. In reality, the gap spans every disease area, including neurology, immunology, and cardiovascular health, where women present with unique symptoms and risk profiles.
  • Longevity does not equal better health: The “morbidity burden” is a critical indicator of inequity. Women spend more years in poor health, facing higher disability-adjusted life year rates for musculoskeletal, neurological, and mental health disorders.
  • Inequality is not solely about race, but intersectionality is critical: While gender is a standalone driver of health outcomes, it does not exist in a vacuum. For example, Black and Native American women face the highest rates of pregnancy-related mortality, and Black women are three times more likely to die from heart failure than White women. These data points illustrate why an intersectional lens is non-negotiable for any health equity strategist.

Progress has remained largely stagnant over the last decade because women remain “invisible” in methodological and decision-making frameworks. The ICH Guidance on Technical Requirements for Pharmaceuticals for Human Use still refers to women as a “special subgroup” to be considered “when appropriate.” This classification is mathematically and medically absurd: women represent half of the global population. This invisibility fuels a self-perpetuating cycle of Data Poverty. The recent FDA guidance on addressing sex differences in clinical trials is, though, a positive step towards recognition of such impact in clinical development.

The roadblocks to reform health technologies and decision-making frameworks to address women health needs and considerations are not just scientific — they are structural. They include a lack of political will, the absence of gender indicators for evaluation, and a strong position of gender norms and laws that favor the lack of protection of women on health matters and beyond.

 

Conclusion

Health equity does not need to take 300 years though some of those glacial aspects must be addressed for true success to be achieved.  

Big data, digital technologies, and advanced analytics provide the means to overcome the challenges to achieving women’s health equity in the coming years. Gender health equity is not an act of morality — it is the foundation of a sustainable, healthy, and economically stable future for all.

Why Biomarkers Are Transforming How We Diagnose and Understand Alzheimer’s Disease

Alzheimer’s disease (AD) is the most common cause of dementia worldwide, accounting for about two thirds of cases. It is a progressive neurodegenerative condition that slowly erodes memory, thinking, and the ability to manage daily life. Globally, tens of millions of people live with Alzheimer’s today — numbers expected to grow sharply as populations age. The personal toll on individuals and families is profound, and the economic burden runs into hundreds of billions annually when considering healthcare costs, social care, and lost productivity.

Early diagnosis remains one of the biggest challenges. Historically, clinicians relied on observing cognitive decline — often when the disease is already advanced and irreversible brain damage has occurred. Emerging tools, particularly biomarkers, are beginning to change this picture.

 

The value of biomarkers in Alzheimer’s disease

Biomarkers are measurable indicators of biological processes, disease activity, or treatment response. In Alzheimer’s disease, they help detect characteristic brain changes — such as beta amyloid plaques and tau protein tangles — long before symptoms appear. These can be identified through imaging, cerebrospinal fluid tests, and, increasingly, blood-based biomarkers.

Research shows that:

  • Biomarkers can detect early pathological changes along the Alzheimer’s disease continuum, even in individuals without symptoms.1
  • Blood-based biomarkers (BBMs) such as plasma amyloid and tau provide a low cost, accessible alternative to PET scans and lumbar punctures.2
  • Updated diagnostic frameworks now define Alzheimer’s biologically, emphasizing that biomarker-detected pathology is equivalent to diagnosing the disease.3

This shift opens opportunities for earlier intervention, improved trial recruitment, and more personalized care.

 

What recent biomarker research tells us

1. Alzheimer’s is now considered a “biological disease” detectable long before symptoms.

Updated clinical criteria emphasize that identifying Alzheimer’s related proteins — through imaging, cerebrospinal fluid, or blood — is sufficient to diagnose the disease, even in people who feel cognitively normal. This reflects decades of evidence showing pathology accumulates silently before memory loss begins.4

 

2. Blood tests are emerging as game‑changers.

Advances in ultra‑sensitive technologies now allow scientists to detect minute amounts of proteins that leak from the brain into the blood. These tests measure markers like amyloid‑β and phosphorylated tau — proteins central to Alzheimer’s disease. Because they require only a simple blood draw, they enable repeated testing over time, making disease monitoring easier and more accessible.5

 

3. Blood biomarkers may revolutionize primary care detection.

Many people with early cognitive impairment go undiagnosed, especially in community settings. New blood-based biomarker tests can be integrated into routine care to flag individuals at high risk earlier, long before they reach a specialist.6

 

4. Early diagnosis enables better care and more timely treatment.

Because Alzheimer’s pathology starts decades before symptoms, identifying the disease early can help clinicians initiate supportive measures, guide lifestyle interventions, and offer patients and families time to plan. It also allows people to access relevant clinical trials at the most impactful disease stage.7

 

5. Biomarkers deepen scientific understanding of Alzheimer’s progression.

Different biomarkers reveal different aspects of disease — amyloid reflects early accumulation, tau correlates more strongly with neurodegeneration, and neurofilament light chain indicates nerve cell damage. Using multiple tests helps clinicians and researchers understand where a person lies along the disease continuum.8

 

Focusing on Amyloid Beta (Aβ)

I, along with my co-authors, recently published a study that investigated whether changes in amyloid beta (Aβ) can reliably predict whether a treatment will help patients think and function better, i.e., whether Aβ is a surrogate marker. Many new Alzheimer’s drugs reduce Aβ levels, and some have even been approved based on this effect. But the big question remains: Does lowering Aβ actually translate into meaningful clinical benefit?

We collected data from 23 clinical trials of seven different anti-amyloid monoclonal antibody drugs.

These trials reported treatment effects on both:

  • Aβ levels (using brain scans such as PET SUVR or the Centiloid scale), and
  • Clinical outcomes, including:
    • Clinical Dementia Rating – Sum of Boxes (CDR‑SOB)
    • Mini‑Mental State Examination (MMSE)
    • Alzheimer’s Disease Assessment Scale–Cognitive Subscale (ADAS‑Cog)

The team used a Bayesian meta-analysis, a statistical method that combines results across many studies to look for broad patterns.

The results showed that lowering amyloid‑beta often — but not always — leads to better clinical outcomes in Alzheimer’s disease. Aβ is a promising surrogate marker at the group level, but it is not reliable enough to predict benefit for individual drugs without additional evidence. Its use by health technology assessment agencies such as NICE and ICER to make decisions about the value of the new disease modifying treatments should take this into account.

 

The need for continued research into Alzheimer’s biomarkers

Biomarkers are reshaping the landscape of Alzheimer’s disease, offering hope for earlier, more accurate diagnosis and more tailored therapeutic strategies. But despite these advances, more work is needed.

Continued research is vital to:

  • Improve the accuracy and reliability of blood-based tests
  • Ensure tests are validated across diverse populations
  • Link biomarker changes more precisely to clinical outcomes
  • Support equitable access in primary care and low resource settings

As we enter an era of disease-modifying therapies, biomarkers will be indispensable — guiding diagnosis, monitoring response, and helping patients receive the right treatment at the right time.

The future of Alzheimer’s care will be biomarker driven, and ongoing research is the key to making that future accessible to all.

 

Interested in learning more?

Read “Evaluating amyloid-beta as a surrogate endpoint in trials of anti-amyloid-beta drugs in Alzheimer’s disease: A Bayesian meta-analysis.”

ELEVATE-GenAI: A New Guideline for Reporting Generative AI in HEOR Workflows

Generative artificial intelligence (AI), particularly large language models (LLMs), is increasingly embedded in health economics and outcomes research (HEOR) workflows. Researchers are now using these tools to support activities such as systematic literature reviews, health economic modeling, and real-world evidence generation.

As adoption grows, so does a fundamental question for the HEOR community:

How should the use of generative AI be transparently and consistently reported within HEOR workflows?

To address this question, the International Society for Pharmacoeconomics and Outcomes Research (ISPOR) Working Group on AI has developed ELEVATE-GenAI — a reporting guideline specifically designed to document and communicate how generative AI is used in HEOR research.

 

Why a dedicated reporting guideline is needed

HEOR has a strong tradition of structured reporting, supported by well-established standards for systematic reviews, economic evaluations, and real-world evidence. However, the rapid integration of LLMs into HEOR workflows has outpaced the development of HEOR-specific guidance on how their use should be reported.

LLMs are now being applied to:

  • Screening and classifying abstracts in systematic literature reviews
  • Extracting data and assessing bias
  • Building or replicating health economic models
  • Transforming unstructured real-world data into analyzable formats

While these applications offer efficiency and scalability, they also introduce new challenges related to transparency, reproducibility, factual accuracy, bias, uncertainty, and data governance. Existing AI reporting guidelines do not fully address these challenges in the context of HEOR decision-making, regulatory review, or health technology assessment (HTA).

ELEVATE-GenAI was developed to fill this gap by providing clear, HEOR-specific guidance for reporting the use of generative AI within research workflows.

 

What is ELEVATE-GenAI?

ELEVATE-GenAI is a reporting framework and checklist intended for HEOR studies in which generative AI plays a substantive role in evidence generation, synthesis, or analysis. Its goal is not to evaluate the performance of specific AI tools or to prescribe how AI should be used, but rather to ensure that AI-assisted workflows are clearly described, interpretable, and reproducible.

The guideline is designed to support:

  • Authors, by clarifying what information should be reported
  • Reviewers and editors, by enabling consistent evaluation
  • HTA bodies and regulators, by improving transparency and trust

Importantly, ELEVATE-GenAI is not intended for studies that use AI only for minor tasks such as editing or formatting text. Instead, it applies when generative AI meaningfully influences HEOR outputs.

 

Reporting generative AI across HEOR workflows: The 10 ELEVATE domains

At the center of ELEVATE-GenAI is a set of 10 reporting domains that together describe how generative AI is integrated into HEOR workflows and how its outputs are assessed.

 

1. Model characteristics

This domain ensures clarity about what AI system was used. Authors are encouraged to report the model name and version, developer, access method, license type, architecture, and — where available — training and fine-tuning data sources.

 

2. Accuracy assessment

Accuracy reporting focuses on how closely AI-generated outputs align with expected or correct results, using task-appropriate benchmarks such as expert review, gold-standard datasets, or quantitative performance measures.

 

3. Comprehensiveness assessment

Comprehensiveness addresses whether AI outputs fully cover all relevant elements of a task — for example, whether all key studies were captured in a literature review or all required components were included in an economic model.

 

4. Factuality verification

This domain emphasizes verification of factual correctness, including identifying and correcting hallucinated citations, incorrect data, or unsupported claims generated by the model.

 

5. Reproducibility and generalizability

Authors are encouraged to document prompts, parameters, workflows, and model versions to support reproducibility, and to discuss whether the AI-assisted approach can be applied to similar HEOR questions or settings.

 

6. Robustness checks

Robustness reporting addresses how sensitive AI outputs are to changes in inputs, such as minor prompt variations, ambiguous wording, or typographical errors.

 

7. Fairness and bias monitoring

Where applicable, studies should assess whether AI outputs introduce or reinforce biases related to demographic or population characteristics relevant to HEOR analyses.

 

8. Deployment context and efficiency

This domain captures practical aspects of AI deployment, including hardware and software configurations, processing time, scalability, and resource requirements — factors that influence real-world feasibility.

 

9. Calibration and uncertainty

Calibration focuses on whether AI confidence aligns with actual performance and how uncertainty is handled, such as defining thresholds for human review in hybrid AI–human workflows.

 

10. Security and privacy measures

Authors should describe how sensitive data, intellectual property, and regulatory requirements (e.g., GDPR or HIPAA) are addressed when generative AI is used in HEOR workflows.

 

Each domain is accompanied by reporting guidance and an assessment of metric maturity, recognizing that some areas — such as fairness and uncertainty — are still evolving.

 

From framework to practice: The ELEVATE checklist

To facilitate adoption, ELEVATE-GenAI includes a practical checklist that translates the 10 domains into concrete reporting questions. An optional scoring system allows authors and reviewers to summarize reporting completeness, while emphasizing that this score is not a measure of methodological quality or study validity.

The authors demonstrate the applicability of the guideline by retrospectively applying it to two published HEOR studies — one focused on systematic literature review automation and another on health economic modeling. These examples show how ELEVATE-GenAI can be used to consistently describe AI-assisted workflows across different HEOR applications and to identify areas where reporting can be strengthened.

 

Why ELEVATE-GenAI matters for HEOR

As generative AI becomes more deeply integrated into HEOR workflows, transparent reporting is essential to maintain scientific credibility and stakeholder trust. ELEVATE-GenAI provides a shared structure for documenting how AI is used, how outputs are evaluated, and what limitations may affect interpretation.

By establishing common expectations for reporting generative AI in HEOR, ELEVATE-GenAI supports responsible innovation while aligning with the needs of journals, HTA bodies, and regulators.

 

Final takeaways

ELEVATE-GenAI positions itself as a foundational guideline for reporting the use of generative AI in HEOR workflows. By focusing on transparency, reproducibility, and interpretability, it helps ensure that AI-augmented research can be critically assessed and confidently used in healthcare decision-making.

As a living guideline, ELEVATE-GenAI will continue to evolve alongside advances in generative AI — providing the HEOR community with a practical framework for integrating new technologies without compromising rigor or trust.

 

Interested in learning more?

Read the full paper: “ELEVATE-GenAI: Reporting Guidelines for the Use of Large Language Models in Health Economics and Outcomes Research: An ISPOR Working Group Report.”

External Control Arms in Drug Development: Methodological and Regulatory Considerations

Drug development is growing more complex, with compressed timelines and increasingly high expectations from regulators, payers, and health systems. In this setting, external control arms (ECAs) leveraging real‑world data (RWD) are emerging as a pragmatic approach to support clinical development and downstream commercial decision‑making.

Randomized controlled trials (RCTs) remain the gold standard for evidence generation. However, in many modern development programs, traditional randomized designs are not feasible or may raise ethical concerns. Sponsors increasingly encounter situations in which:

  • Patient recruitment is slow, limited, or not achievable
  • Randomization is ethically challenging
  • Development costs escalate rapidly
  • Competitive dynamics demand accelerated evidence generation
  • Patient populations are small or rapidly progressing
  • There is a high unmet medical need

 

These challenges are particularly acute in oncology, rare diseases, post‑approval expansion studies, and advanced or cell‑based therapies.

 

What is an external control arm?

An external control arm replaces or supplements a traditional control group by leveraging data from patients treated outside the clinical trial. These patients are drawn from routine clinical practice and reflect outcomes under standard‑of‑care treatment in real‑world settings.

External controls are typically constructed using real‑world data sources such as:

  • Electronic health records (EHRs)
  • Administrative and insurance claims
  • Disease and treatment registries

Unlike trial data, real‑world data reflect patterns of diagnosis, treatment, and follow‑up in everyday clinical care. The foundation of a well‑designed external control study is the use of fit‑for‑purpose data that are sufficiently complete, clinically relevant, and reliable to support robust and defensible analyses.

 

Strategic value of external control arms

When thoughtfully designed and appropriately governed, ECAs can provide meaningful strategic benefits, including:

  • Shortened development timelines
  • Improved feasibility of clinical studies
  • Evidence generation in small or rare populations
  • Stronger value narratives for payers and health technology assessment bodies
  • Support for lifecycle management and label expansion strategies

 

Methodological considerations and risks to manage

The credibility and acceptability of an external control arm depend heavily on methodological rigor.

Key considerations include the following:

1. Study design

External control studies should be designed to closely mirror the clinical trial, including:

  • Alignment of inclusion and exclusion criteria
  • Clear definition of index date and baseline
  • Comparable follow‑up periods and outcome assessment windows
  • Consistent treatment context and line of therapy

Pre-specification of the estimand and statistical analysis plan is critical to avoid post‑hoc decision‑making.

 

2. Patient selection and alignment

Ensuring comparability between trial participants and real‑world patients is one of the most critical aspects of ECA design. Sponsors should:

  • Use transparent, reproducible cohort selection algorithms
  • Apply consistent definitions for key demographic and clinical variables
  • Assess overlap and positivity between trial and external populations
  • Explicitly evaluate differences in baseline characteristics

Sensitivity analyses should be conducted to quantify the impact of residual differences where appropriate.

 

3. Handling confounding and bias

Because external control arms lack randomization, confounding must be actively addressed. Common analytical approaches include:

  • Propensity score methods (matching, weighting, stratification)
  • Multivariable outcome regression
  • Doubly robust methods that combine weighting and modeling

Method selection should be driven by study objectives, data characteristics, sample size, and variable completeness and not for analytical convenience.

 

4. Data quality and missingness

Real‑world data are inherently heterogeneous and incomplete. Methodological plans should address:

  • Data provenance, completeness, and validation
  • Handling of missing or partially observed variables
  • Measurement variability across providers, systems, or data sources
  • Differences in assessment timing and frequency

Imputation strategies and key assumptions should be explicitly documented and tested through sensitivity analyses.

 

5. Outcome definition and assessment

Endpoints derived from RWD must be clinically meaningful and aligned as closely as possible with trial definitions. Considerations include:

  • Use of validated real‑world endpoint definitions
  • Clear attribution and timing of outcomes
  • Consistency with regulatory‑recognized measures of clinical benefit
  • Avoidance of surrogate endpoints unless scientifically justified

Outcome misclassification remains a key risk and should be explicitly evaluated.

 

6. Sensitivity and robustness analyses

Regulators expect evidence that findings are robust under alternative assumptions. Analyses may include:

  • Variation in matching or weighting specifications
  • Alternative cohort definitions or look‑back periods
  • Use of negative control outcomes or exposures
  • Quantitative bias analyses where feasible

The objective is to demonstrate that conclusions are not driven by a single design or modeling decision.

 

7. Transparency and documentation

Methodological transparency is essential for regulatory and payer review. Best practices include:

  • Prespecifying analysis plans and decision rules
  • Fully documenting data sources, algorithms, and assumptions
  • Providing traceability from raw data to final outcomes
  • Enabling reproducibility of key analyses

 

Regulatory outlook and expectations

Regulatory agencies and health technology assessment bodies, including the U.S. Food and Drug Administration (FDA), the European Medicines Agency (EMA), and the Canadian Agency for Drugs and Technologies in Health (CADTH) have recognized the potential role of external control arms under conditions of methodological rigor and transparency.

Regulatory agencies have not lowered evidentiary standards. Rather, they have:

  • Provided greater clarity on scenarios in which external control arms may be acceptable
  • More explicitly articulated methodological expectations
  • Encouraged early and proactive dialogue with sponsors

 

Successful regulatory submissions that incorporate ECAs typically:

  • Provide a clear scientific and ethical rationale for why randomization is not feasible or appropriate
  • Use high‑quality, fit‑for‑purpose real‑world data sources
  • Transparently define patient selection criteria and demonstrate alignment with the trial population
  • Show that findings are robust, reproducible, and minimally biased

Early engagement with regulators remains critical to aligning expectations and maximizing the likelihood of success.

 

Join Anupama Vasudevan and James Matcham on February 3 at 10 a.m. ET for an open office hours on “Evidence Generation with External Control Arms”: