Solutions
About Us
Insights
Careers

Inside the Black Box: Making Sense of the Individualised Treatment Comparator for JCA

The implementation of EU Joint Clinical Assessments (JCA) represents a momentum shift in how comparative evidence is submitted for new health technologies and scrutinized during local health technology assessment (HTA) across the Member States (MS) in Europe. Central to this new architecture is the precision required in defining comparators.1

Most PICO frameworks assume there is a clearly defined comparator, one treatment, or a short, fixed list of treatments combined as an “OR” or “AND” scenario, which applies to every patient in the targeted population or a subpopulation thereof (Table 1).1 Clinical reality does not always follow this logic. In some therapeutic areas, especially in rare and very rare indications, no well-defined standard of care (SoC) exists. Instead, the most appropriate treatment depends on individual patient characteristics, such as prior treatment history, disease severity, and comorbidities. When this is the case and the full population cannot be split into a “limited number of meaningful subpopulations” (e.g., when the deciding factors are overlapping, not well-defined, or apply to small number of patients), the EU JCA PICO scope introduces an additional comparator scenario: a bundle of treatment options, the so-called “individualised treatment comparator.”1

“There might be situations in which a single treatment suitable for all patients in a given population does not exist. … Actual treatments for individual patients are chosen based on patients’ individual characteristics.” 1

However, the definition of meaningful subpopulations” is still up to interpretation and may vary across the different MS when defining their policy-driven PICO requests, creating a grey zone for decision-making between the OR and the individualized treatment scenario. Moreover, the EU HTA Regulation (HTAR) guidance on the scoping process does not present well-defined rules for the consolidation of PICOs with individualised treatment comparators, introducing additional challenges for the JCA readiness in terms of preparatory activities.

“If several individualised treatment comparators are requested by MS, a discussion between assessor, co-assessor and MS should explore the opportunity to adjust the components of the individualised treatment to consolidate the individualised treatment options. However, this process should always consider the needs of all MS. If individualised treatment options differ, different PICO(s) might have to be formulated.” 1

 

Table 1: Comparator scenarios in EU JCA PICOs1

 

 

What evidence do you need to provide to address a PICO with the individualised treatment comparator?

In rare diseases, off-label treatments may need inclusion due to limited options, whereas in areas with many therapies, licensed and guideline-recommended treatments may suffice. Overall, the comparator set (bundle of treatment options) may include2:

  • Licensed treatments for the indication
  • Guideline-recommended options
  • Commonly used treatments within the healthcare system
  • Evidence-supported treatments from RCTs or registries

A key challenge is that treatment landscapes may vary across MS, so the defined comparator set can differ depending on which local perspective “dominates” PICO scoping discussions — often without health technology developer (HTD) being aware.

“In the interests of achieving the lowest possible number of PICOs, while still addressing MS needs, individual MS may consider whether a broader/narrower range of treatment options within the individualised treatment comparator specification would be acceptable.” 2

 

What if the HTD’s trial covers only some of the bundle treatment options?

This is where the complexity emerges. The HTA Coordination Group’s (CG) Q&A (May 2026) exactly addresses this issue — and its conclusion is deliberately nuanced. If the HTD’s trial captures only a subset of treatments within the individualised treatment comparator, this alone does not invalidate the use of such evidence — but it does materially shape how that evidence can be used in the EU JCA submission.

The key determinant is clinical relevance of the trial’s comparator arm in the EU MS. Where the subset of treatments included in the HTD’s SoC reflects the most meaningful treatment options in EU practice, direct trial evidence may be considered. However, the JCA submission will be correspondingly considered narrower in scope and the JCA assessors will explicitly acknowledge what is missing.

The implications of JCA critique vary depending on the extent of trial’s comparator arm coverage:

  • Broad but incomplete coverage: If the SoC arm includes most options — particularly those most used — the JCA assessors are likely to accept these analyses, albeit with clearly stated limitations.
  • Limited and unrepresentative coverage: If only a small number of options are included, and these do not reflect the predominant treatments across EU MS, the assessment risks losing clinical relevance for the defined PICO.
  • Single-option comparator: If the SoC reduces to just one treatment, the analysis effectively shifts away from the target population and instead describes a de facto subpopulation. In such cases, the results may no longer meaningfully address the full PICO.

In all cases, the JCA report does not resolve these drawbacks — it documents it. The ultimate judgement on whether such evidence is fit for purpose rests with national HTA bodies, who will determine its acceptability for local decision-making.

 

Where does the indirect treatment comparison (ITC) fit?

The ITC serves as a methodological “reserve” when the HTD trial comparator does not meet the individualised treatment comparator requirement — as mentioned above, common in heterogeneous diseases, rare conditions with limited therapies, or heavily pre-treated populations lacking a clear standard of care. Furthermore, global trials designed to satisfy multiple market requirements where no single SoC is defined.

Strategically, the ITC is a “moving target,” shaped by clinical guidelines and available evidence. This creates challenges for HTDs, who must compare against a fragmented and variable treatment landscape, increasing analytical complexity and the need for robust justification. A hierarchical approach is recommended, prioritizing relevant trial comparators and, where needed, constructing external control arms from registries or real-world data.

 

How do the ITC results fit in the JCA Dossier?

When the assessment scope mandates an ITC, HTDs must align their evidence with clinical reality while maintaining strict adherence to the PICO requirements.3,4 It is important that HTDs follow these strategic directives:

  • PICO alignment and informative value: HTDs must aim to meet PICO requirements as closely as possible based on the evidence availability and comparability with the technology’s trial base. Where data deviations occur (e.g., mismatched subgroup definitions), the developer must provide the closest possible analysis and explicitly justify why the data provided remains informative given the assessment scope.
  • Justification for deviations or non-response: Any departure from the requested scope must be supported by robust literature searches. This is not optional; a lack of literature-backed justification can jeopardize the confirmation of the dossier. Evidence from literature searches may also build up robust justifications.
  • What JCA assessors will check: Which treatment options were available in the comparator arm? How was treatment selection made for individual patients? Does the study protocol give investigators discretion to select based on patient characteristics? Were reasons for treatment choices documented?

 

Conclusion and final takeaways

Successfully navigating the individualised comparator situation in the EU JCAs requires more than just clinical data; it requires a sophisticated strategy that anticipates MS needs and addresses evidence gaps before they become “incompleteness” issues in the dossier.

 

References

1. Guidance on the Scoping Process v1.0, HTA Coordination Group, adopted on 28 November 2024.

2. Questions and Answers on general methodological and procedural issues for joint clinical assessments, HTA Coordination Group, 18 May 2026.

3. Methodological Guideline for Quantitative Evidence Synthesis: Direct and Indirect Comparisons, HTA Coordination Group, 8 March 2024.

4. Practical Guideline for Quantitative Evidence Synthesis: Direct and Indirect Comparisons, HTA Coordination Group, 8 March 2024.

 

Navigating the individualised treatment comparator1,2

Living Evidence and the Rise of AI-Enabled HEOR Infrastructure: Insights from ISPOR Philadelphia

The ISPOR US 2026 conference in Philadelphia drew together colleagues and industry partners across evidence, value, and access. Across the presentations and sessions, a major theme emerged: we are an industry moving rapidly from AI experimentation and toward AI-enabled infrastructure. Here we share some of the key takeaways.

 

AI becomes core infrastructure

The strongest signal from ISPOR Philadelphia was that AI is no longer viewed as a side tool for productivity gains. Across HEOR, HTA, and RWE, organizations are beginning to embed AI directly into evidence generation and submission workflows. Discussions focused less on experimentation and more on operationalization, governance, and scalability.

AI is now being explored across the full evidence lifecycle, including systematic literature reviews, economic modeling, patient-reported outcomes, HTA submissions, payer communication, and regulatory documentation. The industry appears to be shifting toward continuously learning evidence systems rather than static, project-based workflows.

 

Agentic AI moves beyond simple automation

One of the biggest themes was the emergence of agentic AI systems. Instead of using isolated prompts, organizations are experimenting with coordinated AI agents that can generate models, review outputs, create documentation, and prepare evidence packages.

Several workshops demonstrated how AI can move from model concept to full implementation in both R and Excel while maintaining human oversight. The emphasis throughout was not full autonomy, but “human-at-the-helm” governance where AI accelerates and supports execution while experts retain accountability.

This reflects a broader transition from AI-assisted work toward AI-orchestrated workflows.

 

AI-supported SLRs reach a turning point

AI-assisted systematic literature reviews (SLRs) dominated the conference agenda. However, the conversation has evolved significantly from earlier discussions focused mainly on efficiency gains.

The field is now grappling with questions around reproducibility, transparency, benchmarking, and governance. Multiple sessions highlighted the lack of shared standards for evaluating AI-SLR performance and proposed industry-wide benchmarking frameworks and validation challenges.

ISPOR itself is increasingly positioning itself as a central body for developing good-practice guidance and methodological standards for AI-enabled evidence synthesis, with the anticipated publication of the GenAI in SLR taskforce report.

 

Regulatory readiness becomes critical

Another major theme was regulatory credibility. Panels focused heavily on FDA, EMA, NICE, and Health Canada guidance regarding AI-assisted evidence generation and real-world data curation.

The industry discussion has shifted from asking whether regulators will engage with AI-generated evidence to determining what documentation, validation, and governance standards will be required for acceptance.

Speakers repeatedly emphasized auditability, traceability, reproducibility, and version control as foundational requirements for regulatory-grade AI workflows.

 

Real-world data and AI converge

Many sessions positioned AI as the enabling layer needed to unlock the value of modern real-world data. Much of healthcare’s most clinically meaningful information remains trapped in unstructured formats such as clinician notes, pathology reports, and medical charts.

AI methods including NLP and machine learning are increasingly being used to transform this information into structured, research-ready evidence. This was especially prominent in sessions involving medical devices, exploratory evidence planning, and dynamic evidence generation strategies.

AI is increasingly being viewed not simply as an analytics tool, but as foundational infrastructure for modern RWE generation.

 

Patient voice gains new attention

Several workshops explored how large language models and conversational AI can support patient-centered research. These applications included free-text analysis, conversational patient interviews, social media analysis, and narrative symptom capture.

The interest in AI application in qualitative research represents an important expansion beyond traditional structured analytics. Researchers are now exploring whether AI can preserve the nuance of lived patient experience while enabling scalability.

At the same time, concerns around hallucination risk, construct validity, and bias remain central to these discussions.

 

HEOR leadership roles are evolving

As AI automates more technical tasks, the role of HEOR and RWE leaders appears to be changing. Multiple sessions suggested that future leadership value will increasingly center on governance, strategic interpretation, stakeholder trust, and organizational coordination.

Rather than replacing experts, AI may elevate the importance of human judgment and scientific oversight. Organizations will need leaders who can balance innovation with credibility in payer and regulatory environments.

This suggests AI adoption is not simply a technology challenge, but an organizational transformation challenge.

 

Responsible AI emerges as the central principle

Across nearly every session, the same themes repeatedly appeared: transparency, reproducibility, validation, governance, and human oversight.

The HEOR community appears to be converging around a shared understanding that AI adoption will only succeed if scientific credibility and integrity remain intact. The conversation is no longer about replacing traditional rigor, but about scaling evidence generation responsibly.

 

ISPOR Philadelphia ultimately showed an industry moving rapidly from AI experimentation toward AI-enabled infrastructure. The next phase of HEOR will likely be defined by organizations that can operationalize AI while maintaining trust, methodological rigor, and decision relevance.

A New Frontier in Real-World Evidence: Can AI Create Reliable Synthetic Trial Data?

Synthetic data is a promising innovation for clinical studies that incorporate an external control arm. Relying solely on traditional control groups can be costly, time-consuming, or even unethical. Instead, researchers are exploring ways to generate “synthetic” patient cohorts that behave like real ones.

At ISPOR US 2026 in Philadelphia, we will be presenting a pilot study that takes an important step in this direction. Our work explores how large language models (LLMs), the same technology behind modern AI assistants, can be used to generate synthetic clinical trial datasets suitable for external control arms (ECAs).

 

Why external control arms matter

ECAs are increasingly important in clinical trials, especially in areas where recruiting patients into placebo groups is difficult or undesirable. By using existing data to simulate a control group, researchers can accelerate trials and reduce patient burden. This challenge is especially pronounced in rare diseases, oncology, gene and cell therapies, and severe or life‑threatening conditions, where patients and clinicians are understandably reluctant to accept randomization to non‑active treatment arms.

However, for ECAs to be useful, they must meet two critical requirements:

  1. They need to closely resemble real patient populations in terms of demographics and clinical characteristics.
  2. They must protect patient privacy and be reproducible for regulatory scrutiny.

This is where AI — and specifically LLMs — enters the picture.

 

Two approaches to generating synthetic data

In our study, we evaluated two different ways of using LLMs to generate synthetic clinical trial datasets.

The first approach was direct generation. Here, the LLM was given access to the original dataset along with a variable dictionary and asked to generate a new synthetic dataset in a single step. This method is fast and intuitive.

The second approach was more structured and code-driven. Instead of generating the dataset directly, the LLM created a Python-based pipeline that performed bootstrapping and anonymization. This pipeline included a noise injection mechanism, where small amounts of statistical “noise” were added to numerical variables. The noise was carefully calibrated — set to 5% of each variable’s standard deviation — and values were constrained within realistic ranges to maintain clinical plausibility.

 

What we found

Both methods were able to generate synthetic cohorts of 100 patients, demonstrating that LLMs can indeed produce usable clinical datasets.

However, the differences between the two approaches were striking. The direct generation method was extremely fast, completing the task in just 23 seconds with a single prompt. In contrast, the code-based approach took longer — around 40 seconds — and required multiple iterations to refine the pipeline.

Despite the extra effort, the code-driven method delivered better results. It more accurately preserved the statistical properties of the original trial data, including key variables like age, body mass index, sex, and race. The distributions in the synthetic dataset closely matched those of the real population, suggesting that the combination of bootstrapping and calibrated noise was effective.

 

Speed vs. scientific rigor

These findings highlight an important trade-off. Direct LLM generation is excellent for rapid prototyping and exploratory analysis. It allows researchers to quickly create synthetic datasets with minimal effort.

But when it comes to regulatory-grade applications — such as external control arms used in decision-making — transparency and control become essential. The code-augmented approach provides a clear, reproducible process that can be audited and validated. This level of rigor is crucial for building trust with regulators and stakeholders.

 

Balancing privacy and realism

A key challenge in synthetic data generation is protecting patient privacy without losing the statistical integrity of the dataset. Our study shows that adding carefully calibrated Gaussian noise can strike this balance.

By scaling noise to the variability of each variable and enforcing realistic bounds, we were able to anonymize the data while preserving meaningful population-level characteristics. This approach helps ensure that synthetic datasets remain useful for analysis while reducing the risk of re-identification.

 

What comes next?

While this pilot study demonstrates the potential of LLM-generated synthetic cohorts, it is only the beginning. Future research needs to explore whether these methods are robust under more challenging conditions.

One critical next step is to evaluate re-identification risk, particularly under adversarial scenarios where attackers actively attempt to reverse-engineer the data. It will also be important to compare noise-based approaches with other privacy-preserving techniques, such as differential privacy. This step would include understanding the amount of noise the model should introduce.

 

Closing thoughts

Synthetic data has the potential to transform clinical research by making trials faster, more efficient, and more ethical. Our findings suggest that LLMs can play a meaningful role in this transformation — but how they are used matters.

Fast, direct generation offers convenience, but structured, code-based approaches provide the reliability and transparency needed for real-world adoption. As the field moves forward, combining the strengths of both may unlock the full potential of AI-driven synthetic data in healthcare.

 

Interested in learning more?

Join Manuel Cossio and Deepa Jahagirdar, along with Anupama Vasudevan, at ISPOR US for their upcoming presentation, “A Pilot Assessment of LLM-Generated Synthetic Cohorts: A First Step Toward Robust Synthetic Control Arms” on May 18 at 4:00 PM.

How Agentic AI Can Transform HTA Landscaping for EU JCA

Health Technology Assessment (HTA) in the European Union (EU) is entering a new phase with the introduction of the EU Joint Clinical Assessment (JCA). The goal of the new HTA regulation is to improve the availability of innovative health technologies in the EU by ensuring efficient resource use and strengthening the scientific quality of HTA across Member States (MS).

At the heart of this process is the JCA scope, which consolidates diverse evidence requests from all MS into the PICO (Population, Intervention, Comparator, Outcome) framework. Anticipating these policy-driven PICO requests is critical for a successful JCA submission and can turn into a complex, time- and labor-intensive exercise. In addition to understanding the potentially diverse clinical practices across the MS, it demands an in-depth assessment of the different national HTA evidence requirements. Teams working on PICO predictions need a clear mapping of what evidence has been accepted, questioned, or rejected across the different HTA systems. Building that mapping is multifaceted.

 

Why HTA landscaping is challenging

HTA landscaping requires careful review of past HTA decisions to understand what evidence leads to positive HTA outcomes. This involves identifying relevant patient populations, accepted comparators, and meaningful outcomes. It also requires going deeper in the HTA documentation, uncovering why certain choices were criticized or dismissed.

Much of this information is hidden in long reports, potentially including appendices. These HTA documents are written in different languages, follow different formats, and often include subtle but important contextual details that unravel the HTA critiques and reasoning for specific evidence requests. As a result, landscaping is still largely manual, time-consuming, and difficult to scale.

 

What makes agentic AI different

Agentic AI offers a new way to approach this problem. Instead of simply summarizing documents or answering one-off questions, agentic systems are designed to carry out structured tasks. They can follow a defined set of instructions, extract specific types of information, and organize results in a consistent way.

This makes them particularly suited for HTA landscaping, where the goal is not just to read documents, but to systematically extract comparable insights across multiple sources.

 

Our research: Using AI agents for HTA extraction

In our recent research, which will be presented at ISPOR US this May, we explored how autonomous AI agents can support HTA landscaping for EU JCA.

We developed two large language model–based agents designed to extract structured information from HTA reports using a set of 21 expert-defined questions. These questions covered both standard PICO elements, such as population, comparators, and outcomes, as well as more context-specific insights. This included methodological requirements, reasons for rejecting certain outcomes or comparators, and other critique points raised by HTA bodies.

The two agents differed in how they were guided. The first used a general prompt, while the second incorporated additional clarification within selected questions to improve contextual understanding.

 

How we evaluated performance

To test the agents, we used publicly available HTA reports for osimertinib (in locally advanced or metastatic NSCLC with EGFR T790M mutation) from Spain, the Netherlands, and France. These reports varied in length, structure, and language, providing a realistic test of performance.

Local HTA experts applied a strict scoring framework that assessed both accuracy and completeness. Importantly, any answer containing hallucinated content was automatically scored as zero. This ensured that reliability remained central to the evaluation.

 

What we found

Both agents were able to complete the full extraction across all HTA reports, and around 90% of responses were generated without hallucinations. The second agent performed better overall, achieving a higher number of fully correct answers and fewer partially correct responses.

The first agent, while still effective, produced some hallucinated content, particularly in the Spanish report. The second agent avoided hallucinations entirely in this evaluation. Both agents performed best on the French HTA report, suggesting that clearer structure and language can improve AI performance.

One of the most important findings was the impact of prompt design. Adding targeted clarification significantly improved the agent’s ability to interpret and extract complex HTA information.

 

What this means for EU JCA landscaping

These results suggest that agentic AI can meaningfully improve how HTA landscaping is performed. By automating structured extraction, it becomes possible to review multiple reports more quickly and consistently. This allows teams to build a more comprehensive understanding of the landscape in less time.

Importantly, this approach goes beyond standard PICO elements. It captures the context-specific insights that often drive HTA decisions, such as methodological concerns or other reasons for rejecting evidence. This is critical for developing realistic PICO scenarios in the context of JCA.

Another key advantage is the ability to work across languages. Since EU HTA involves multiple jurisdictions, multilingual capability removes a major barrier and enables a more unified analysis.

 

The role of human expertise

Despite these advances, AI alone is not enough. Some limitations remain, including occasional hallucinations and variability depending on the source material. For this reason, human oversight continues to be essential.

The most effective approach is to combine agentic AI with human HTA expertise. AI can handle large-scale extraction and structuring of information, while experts validate the outputs and ensure that interpretations are accurate and relevant.

 

Looking ahead

Agentic AI is unlikely to replace HTA professionals, but it will fundamentally reshape how they work. By reducing the burden of manual review, it frees experts to focus on higher-value activities such as interpretation, strategic planning, and decision-making.

In the context of EU JCA, this shift brings clear advantages. It enables faster, more scalable landscaping and PICO predictions, helping to identify potential evidence gaps earlier in the process. As the methodology evolves, further testing will expand the integration of HTA reports from additional MS into the agent-driven workflows. At the same time, engineering adaptations may be needed to accommodate ongoing changes in local HTA documents as they continue to evolve together with the JCA reports.

 

Interested in learning more?

Manuel Cossio and Lilia Leisle will be presenting their poster “Accelerating Dynamic HTA Landscaping in Oncology Through Autonomous Generative AI-Driven Multilingual Data Extraction” at ISPOR US on May 18 at 4 PM. We hope to see you there!

Building a New Evidence Base for Rare Diseases by Structuring Clinical Narratives with Generative AI

Rare diseases present a paradox in modern healthcare. Individually, they affect small populations, yet collectively they impact millions of patients worldwide. Despite this, progress in diagnosis, treatment, and research remains slow. The fundamental challenge is not only scientific complexity but also a persistent lack of usable data.

Traditional sources of real-world data — electronic health records, claims databases, and clinical trials — struggle to capture rare disease populations at a meaningful scale. Patients are geographically dispersed, frequently misdiagnosed, and often excluded from structured datasets. As a result, generating robust evidence in rare diseases remains difficult.

At the same time, an overlooked resource has quietly accumulated over decades: clinical case reports. These narratives contain detailed descriptions of real patients, their symptoms, diagnostic journeys, and outcomes. The challenge has never been their value, but rather their accessibility and structure.

Recent advances in large language models (LLMs) suggest that this barrier may finally be overcome.

 

Case reports as a foundation for real-world evidence

Case reports represent one of the richest forms of clinical documentation available. Unlike structured datasets, they capture the full nuance of patient care, including symptom evolution, diagnostic uncertainty, and physician reasoning. They are inherently real-world, reflecting how diseases actually present and are managed in practice.

However, their utility has historically been limited. Case reports are written in free text, scattered across millions of publications, and lack standardization. Extracting meaningful insights at scale has required significant manual effort, making systematic use impractical.

The RareArena study demonstrates a new approach. By leveraging LLMs, researchers were able to automatically collect and process hundreds of thousands of case reports from PubMed, filter them for rare diseases, and transform them into a structured dataset comprising tens of thousands of patient cases. This process effectively converts unstructured clinical narratives into analyzable real-world data.

This shift is significant. It reframes case reports not as isolated anecdotes, but as components of a scalable data asset.

 

From unstructured text to scalable patient populations

One of the most important implications of this approach is the ability to expand patient populations in rare disease studies. Traditional datasets are constrained by institutional boundaries and data availability. In contrast, case reports aggregate knowledge globally, capturing patients from diverse healthcare systems and settings.

By structuring these reports, LLMs enable the creation of virtual cohorts that far exceed what any single registry or database could provide. Diagnoses can be standardized using reference ontologies, symptoms can be normalized, and cases can be grouped into clinically meaningful categories.

The RareArena dataset, for example, spans thousands of rare diseases and tens of thousands of patient cases, representing one of the broadest collections of rare disease data assembled to date. This kind of scale opens new possibilities for understanding disease heterogeneity, identifying subpopulations, and generating evidence where none previously existed.

In effect, LLMs allow researchers to move from fragmented observations to aggregated real-world populations.

 

Capturing the diagnostic journey

A particularly valuable aspect of the RareArena framework is its alignment with real clinical workflows. The dataset distinguishes between two stages of diagnosis: early suspicion based on symptoms alone, and confirmation after diagnostic testing.

This distinction mirrors how rare diseases are encountered in practice. Patients often experience long diagnostic odysseys, with years passing before a correct diagnosis is reached. By separating these stages, the dataset captures both the uncertainty of early presentation and the clarity provided by confirmatory tests.

This structure enables deeper analysis of diagnostic pathways, including where delays occur and how different signals contribute to clinical decision-making. It also provides a foundation for developing tools that support earlier recognition of rare diseases, an area where unmet need remains substantial.

 

Preserving clinical complexity in real-world data

A common limitation of many real-world datasets is the loss of clinical nuance. Structured data often simplifies patient information, omitting negative findings, confounding symptoms, and contextual details that are critical for diagnosis.

Case reports, by contrast, preserve this complexity. The RareArena study shows that most cases retain features such as negative symptoms and confounding factors, reflecting the challenges physicians face in real-world settings. This makes the resulting dataset not only large, but also clinically realistic.

Maintaining this level of detail is essential for rare diseases, where subtle distinctions can significantly alter diagnosis and treatment. LLMs play a key role here by rephrasing and structuring text while preserving the underlying clinical information.

The result is a form of real-world data that is both scalable and rich in context.

 

Implications for research and clinical development

The ability to generate structured datasets from case reports has far-reaching implications. For researchers, it enables the study of rare diseases across larger and more diverse populations than previously possible. Patterns of presentation, progression, and response to treatment can be explored with greater statistical power.

In clinical development, this approach offers new ways to identify and characterize patient populations. It can support the design of clinical trials by highlighting underrepresented groups and informing inclusion criteria. It also provides a potential source of external evidence, complementing traditional trial data.

Beyond research, there is a clear opportunity to improve clinical decision support. The RareArena study demonstrates that LLMs already show meaningful capability in diagnosing rare diseases, particularly when provided with comprehensive clinical information. While not yet sufficient for standalone use, these models can assist clinicians by surfacing relevant diagnostic possibilities.

 

Limitations and considerations

Despite its promise, this approach is not without limitations. Case reports are inherently selective, often focusing on unusual or severe presentations. This introduces potential bias in the resulting datasets. Additionally, the data is retrospective and curated, rather than continuously collected.

LLMs themselves introduce another layer of complexity. While they are effective at extracting and structuring information, they can also propagate errors or introduce subtle inaccuracies. Ensuring data quality and validation remains critical.

The RareArena study also highlights that even the most advanced models are far from perfect in diagnostic tasks, particularly in early-stage scenarios. This reinforces the need to view these tools as augmentative rather than autonomous.

 

A shift from data scarcity to data unlocking

What emerges from this work is a broader shift in how we think about data in rare diseases. The challenge is no longer solely about collecting new data, but about unlocking the value of existing information.

Case reports represent decades of accumulated clinical knowledge. With LLMs, it becomes possible to systematically extract, structure, and scale that knowledge into usable real-world data. This approach does not replace traditional data sources, but it significantly expands the available evidence base.

For rare diseases, where every patient case is valuable, this shift is particularly impactful.

 

Toward a more complete picture of rare diseases

The combination of case reports and large language models offers a compelling new pathway for advancing rare disease research. By transforming unstructured narratives into structured datasets, it enables the creation of larger, more representative patient populations and more realistic models of clinical care.

While challenges remain, the potential is clear. This approach can accelerate diagnosis, inform clinical development, and ultimately contribute to better outcomes for patients who have long been underserved.

In a field defined by scarcity, the ability to unlock hidden data may prove to be one of the most important innovations yet.

Leveraging RWE Innovations to Inform Clinical Strategy and Strengthen Healthcare Decision-Making

Real-world evidence (RWE) is no longer a supporting actor, but rather a strategic asset that should be embedded across the product lifecycle.

We now have tools that were unimaginable a decade ago: synthetic data that preserves privacy while enabling scenario modeling and early go/no‑go decisions, external control arms (ECAs) to strengthen single‑arm trials and accelerate access in high unmet need settings,
and decentralized long‑term extensions via tokenization that reduce burden while capturing 10+ years of safety and effectiveness across the patients’ real-world journey.

These innovations aren’t just “nice to have.” They are how we accelerate access to needed therapies, demonstrate value with confidence, and build submissions that stand up to today’s scrutiny.

Here, I discuss how these capabilities are reshaping clinical strategy and unlocking smarter, faster, more equitable evidence generation.

 

Generating synthetic data with agentic AI

Synthetic data is artificially generated data that mimics the statistical properties of real data without containing identifiable patient information. Starting with appropriate real-world data (RWD) (patient-level) or randomized controlled trial (RCT) data source(s), sponsors can use an AI-supported pipeline to generate a synthetic dataset, then assess similarities to the original data to gauge success.

Synthetic data can:

  • Inform early go/no-go decisions: A cost-effective approach to optimizing asset strategy before large investments by simulating expected outcomes under various scenarios in Phase I–II.
  • Inform CT design: Model alternative controls and sample sizes and stress-test treatment effects in a cost-effective manner.
  • Build privacy-preserving cost-effective ECAs: Build an ECA partially (+ RWD) or totally through a fully de-identified synthetic cohort. This is not for regulatory purposes yet, but it can inform provider and payer decisions.

RWD has its limitations: it must closely resemble real patient populations and protect patient privacy, and can be costly, time-consuming, and potentially unethical. Synthetic data can help overcome these challenges.

 

Strengthen regulatory submission with an external control arm

External control arms use data from historical RCT or RWD when randomization is not feasible or ethical, or to power / accelerate a study where there is high unmet need.

ECAs can:

  • Strengthen single-arm trials (SAT): Provide contextual information for SAT regulatory submissions, increasing probability of success.
  • Accelerate access to needed therapies: For RCT in high unmet need (e.g., accelerated approval pathway) and/or with slow recruitment, RWD can augment the control arm.
  • Support a lifecycle management approach: Supports label expansions to new populations (e.g., to male breast cancer) or new lines of therapy for decisions by regulators, payers, and providers.

While RCTs are considered the “gold standard,” the FDA in 2023 wrote that “externally controlled studies may be considered” (with strong justification), while in 2025, the EMA guidance stated “in some situations, causal conclusions may be derived from a setting where the investigational medicinal product data was collected under a clinical trial protocol while the control arm was not a randomized arm in that same protocol.”

 

Assess long-term outcomes with long-term extension studies

Decentralized long‑term extensions for RCT assess long-term outcomes (safety and effectiveness) with or without drug provisions. The extension enables follow-up of tokenized trial patients via real-world databases or direct-to-patient data collection.

Long‑term extension studies can:

  • Allow for long-term follow-up: Cost-effective data collection by reducing site and patient burden while collecting key safety and effectiveness endpoints over 10+ years.
  • Enable earlier launch: For breakthrough therapies and high unmet need, launch can occur as soon as clinical efficacy is proven if the sponsor commits to a Phase IV study to collect long-term data.
  • Improve representativeness: Loss to follow-up in long-term studies can lead to confounding, and RCTs often under-represent certain populations. The shift to real-world endpoints makes the insights more relevant to decision-makers.

 

Key takeaways

Consider RWE as a strategic asset: Integrate RWE early and anticipate post-marketing collection of long-term data and adopt causal inference methods to protect ideals of safety and effectiveness.

Invest in robust RWD: Invest in RWD quality and governance to ensure credibility with regulators and payers.

Adopt a comprehensive strategy: Adopt flexible, hybrid evidence strategies that combine synthetic data, ECAs, and long-term real-world data collection approaches.

Ensure cross-functional readiness: Medical, regulatory, biostats, and data science must operate as one evidence engine.

The Delta Dossier: Why Germany Needs More Than a Reference-Based Approach

With the first Joint Clinical Assessments (JCAs) at the European level, pharmaceutical companies are by no means entering a phase of reduced national HTA requirements. Germany, in particular, is already showing that the so-called delta dossier — an informal term used in the German market access environment for the national content required in addition to the European JCA dossier — is not simply a shorter AMNOG dossier containing references to the European JCA dossier.

Instead, it is becoming the test of whether clinical trial evidence, European and German HTA requirements, and tight procedural timelines can be brought together at an early stage.

There is still only limited practical experience with real delta dossiers. All the more important, then, are the signals coming from Germany’s Federal Joint Committee (Gemeinsamer Bundesausschuss, G-BA), the country’s highest decision-making body in joint self-government and a central institution in the national HTA framework. Its spring 2025 events already made clear where the key requirements are likely to emerge and which questions pharmaceutical companies should be addressing now. The G-BA itself views the planned adjustments in the national setting, including the adaptation of the AMNOG dossier module templates, as a first step and intends to assess further developments on the basis of the first practical experience.

Here, we share five theses on the delta dossier.

 

Thesis 1: The EU JCA will not replace the German benefit assessment

A central point is often underestimated in the current debate: the JCA does not replace Germany’s early benefit assessment. The G-BA makes it clear that alignment with the European assessment does not change the assessment standards applied in the German benefit assessment. Decisions will continue to be taken at the national level. The JCA dossier is intended to inform national decision-making, but it does not itself provide a conclusion on additional clinical benefit compared with the national appropriate comparator therapy (zVT) — the foundation for the subsequent price negotiation.

This also clarifies the role of the delta dossier: the objective is not simply to pass through European content in a formal way, but to prepare it in a manner that is robust and usable for the German procedure.

 

Thesis 2: The delta dossier is about translation, not cross-referencing

The G-BA describes very specifically how the JCA dossier is to be used. References are possible, but only to clearly identified sections. General or dynamic references are not sufficient. At the same time, it remains the responsibility of the pharmaceutical company to determine whether the contents of the JCA dossier are sufficient for the German benefit assessment or whether updated or supplementary evidence is required. There will be no separate dossier template. The structure of the AMNOG modules will remain in place.

This is precisely where the quality of a good delta dossier becomes visible: it is the national translation of the European assessment process and brings the JCA dossier and the AMNOG dossier together. This is achieved not through references alone, but above all through the targeted selection of content that is truly robust and the addition of missing data needed for an evidence-based national assessment.

One point is particularly important here: the G-BA makes it clear that a full national AMNOG dossier may still be submitted. There is therefore no obligation to use the delta dossier as a lean referencing solution. What remains decisive is not the format, but the quality of the national dossier preparation.

 

Thesis 3: The real work starts well before the delta dossier

The determination of the relevant PICOs (PICO scoping) for the JCA already begins when the marketing authorization application is submitted to the EMA, and therefore well before the start of the national AMNOG procedure. The PICOs fed back by Germany are intended to reflect the relevant research questions for the later AMNOG procedure, but — just like, for example, the outcome of an early G-BA consultation on the appropriate comparator therapy — they are not legally binding. This creates a risk scenario, particularly for the national procedure, that must be anticipated and taken into account in strategic planning. Any company that only starts to structure populations, comparator therapies, endpoints, and potential subgroups when preparing the national dossier is already too late.

European scientific consultation on PICO scoping also takes place at a point when studies are still being planned. National consultations remain possible, but parallel duplicate structures are to be avoided. For manufacturers, this means that the real strategic work does not begin with the delta dossier, but with PICO scoping, study design, and early evidence planning.

 

Thesis 4: The biggest risks sit in comparator selection and endpoints

Translation into the German setting already becomes particularly demanding at the scoping stage. The first key question is which PICO, or which set of PICOs, actually reflects the requirements of the German benefit assessment. This determines which comparator therapy is relevant for Germany and whether the evidence addressed in the JCA will in fact support the national assessment. This is precisely where preparation for a strong delta dossier begins: with the early identification of the PICOs relevant for Germany, the selection of robust content, and the supplementation of evidence wherever European materials are not sufficient for the national assessment.

In addition, European JCA scoping may include endpoints that are not necessarily recognized as patient-relevant in the national procedure. The G-BA explicitly distinguishes between endpoints included at the European level and the criteria for patient relevance that apply in the German AMNOG procedure. The same applies to analytical methods: national requirements — such as the 15% relevance threshold for responder analyses — remain in place.

For this reason, the delta dossier is particularly demanding from a scientific and methodological perspective wherever European evidence must be made robust for German comparator therapies and nationally relevant endpoints.

 

Thesis 5: Timing and evidence updates will be decisive

In addition to scientific issues, procedural management is becoming more important. The G-BA continues to require that the underlying systematic literature review on relevant clinical evidence must not be more than three months old at the start of the procedure. Additional data cuts and newly completed studies may therefore become relevant in the AMNOG procedure even if they were not addressed in the JCA dossier. This means that the dataset underlying the national AMNOG dossier may differ from the dataset underlying the JCA dossier.

The timing of the publication of the JCA report is also particularly important. If it is available in time, it will be taken into account in the benefit assessment. If it becomes available later, it may still be considered during the written comments procedure or, at the latest, in the final resolution. However, if it is published only after the start of the written comments procedure, it can no longer formally be taken into account. At the same time, the G-BA points out that there is as yet no reliable practical experience in this regard — another source of uncertainty for pharmaceutical companies.

 

From JCA to delta dossier: Cytel combines global perspective with local execution

Cytel occupies the critical interface between European clinical assessment and national benefit assessment in Germany. Together with the German team at co.value, a Cytel brand, Cytel combines experience in PICO scoping, JCA dossier development, and statistical evidence generation with in-depth local AMNOG expertise. This means support does not begin only at the point of translating into the delta dossier, but much earlier: in evidence planning, the selection of robust comparator therapies, and the targeted shaping of European evidence for reliable use in the German AMNOG procedure.

 

The delta dossier as the true test

The first delta dossiers are only now beginning to emerge. But the substantive guardrails are already clearly visible, and they point in a clear direction: within the framework of European clinical assessment, the German AMNOG procedure will not become a process that can be handled through references alone.

What will matter instead is how early clinical trial evidence, European and German HTA requirements, and tight procedural timelines are brought together. The delta dossier is therefore not merely a new format. It is the clearest expression of whether this translation work has been accomplished in time.

Real-World Data Strategies and Challenges: Making Data Work for Your External Control Arm Study

External control arms (ECAs) are gaining popularity in comparative effectiveness studies, driven by a growing emphasis on robust evidence across disease areas and regulatory body acceptance. ECAs can provide a control group for single-arm studies, complement a larger portfolio of evidence, and enable research for rare or genetic conditions for which randomized controlled trials may be unethical or infeasible.

At the same time, real-world data (RWD) is becoming an essential foundation for building credible ECAs. RWD offers unique advantages: it reflects real clinical practice, captures diverse patient populations, and can provide data for robust treatment effects.

However, integrating data from multiple sources, such as historical trials, concurrent trials, patient registries, and cross-population datasets, requires careful methodological planning to ensure validity and regulatory acceptance.

To fully harness the value of external control arms, sponsors must ensure selected data is fit-for-purpose, index dates are aligned with trial eligibility, and rigorous statistical methods are applied to ensure comparable patient profiles. Here, we outline these three essential elements.

 

Choosing the right data source for your external control arm

When building ECAs, different types of external data sources have different strengths.

 

Historical or concurrent randomized trials

Historical or concurrent randomized trials contain systematically collected data and well-defined endpoints, following a detailed protocol. However, they often have small sample sizes, and evolving standards of care or diagnostic criteria can limit comparability over time.

 

Electronic health records and insurance claims

Electronic health records and insurance claims contain large, diverse cohorts and broad population coverage. But they frequently lack clinical details such as out-of-hospital care and non-prescription medications.

 

Patient registries

Patient registries provide systematic, detailed data collection, the potential for linkage​ and long-term follow up. Yet they can have high missingness and over-represent healthier patients, which could reduce the overlap in characteristics with trial populations.

 

Selecting the best data sources should be guided by fit-for-purpose assessments. These studies include exploring the availability of key prognostic characteristics and missingness, along with practical considerations such as access and timelines.

 

Defining appropriate eligibility criteria and index dates

Carefully establishing index dates is critical yet challenging when incorporating an ECA. In a trial population, the index date is clearly defined as when the patient meets eligibility or is randomized. The same eligibility criteria need to be applied to ECA patients using variables in the external data source. The index date should reflect the point at which those criteria are met. Misalignment of the index date leads to specific types of selection bias, including immortal time. This bias occurs when periods during which an outcome could not have occurred are misclassified, potentially creating a false treatment benefit.

 

Ensuring treatment and control patients are similar

In RCTs, randomization naturally balances prognostic factors between treatment arms. ECAs, by contrast, require explicit identification and adjustment of these variables. Clinical expertise is essential for determining which characteristics matter most. Comparing the distributions of these variables between the treated versus control arm helps to assess similarity. Statistical techniques including propensity-matched controls and inverse treatment of probability weighting can improve comparability and approximate the balance achieved through randomization. Assessing pre- and post-adjustment distribution of baseline characteristics quantifies the success of the method.

 

Final takeaways

Overall, to fully harness the value of external control arms, three elements are essential:

  1. Selecting fit-for-purpose data
  2. Defining index dates that align with trial eligibility
  3. Applying rigorous statistical methods to ensure comparable patient profiles

When executed thoughtfully, ECAs can meaningfully strengthen evidence generation and expand the possibilities for clinical research.

 

Interested in learning more?

Watch our on-demand webinar featuring Deepa Jahagirdar and Vartika Savarna, “Driving Credibility in External Control Arms with Real-World Data,” available now.

Insights from WEPA Amsterdam: When Policy Pressure Meets AI Maturity

The World EPA Congress in Amsterdam did not feel like a conference about isolated trends. It felt like a conference about structural transition.

Across sessions and conversations, one consistent narrative emerged: market access is being reshaped simultaneously by tightening policy frameworks and by the operational maturation of artificial intelligence. These are not parallel stories unfolding independently. They are interacting forces that together are redefining how evidence is generated, how value is assessed, and how global pricing strategies are constructed.

The underlying question throughout WEPA was not whether change is coming. It was whether organizations are structurally prepared to manage both forces at once.

 

1. A policy environment under structural redesign

Joint Clinical Assessment: Harmonization meets operational reality

The first year of Joint Clinical Assessment (JCA) implementation under the EU HTA Regulation represents a historic step toward harmonization of clinical evaluations across Europe. In principle, a single European-level clinical assessment promises efficiency, reduced duplication, and greater consistency in evaluating comparative effectiveness.

Yet the operational reality is more complex. Harmonization does not automatically mean simplification.

Early experience indicates that alignment between EU-level assessments and national reimbursement processes remains incomplete. Questions persist around how Member States will operationalize JCA outputs, how quickly EU HTAR assessors can deliver assessments, and whether national HTA bodies are fully prepared to transition to reliance on joint evaluations.

Methodological challenges are also emerging. PICO multiplicity, expanded evidence requirements, and the risk of unexpected analytical requests are increasing the burden on evidence generation teams, especially for products targeting rare diseases. While duplication of assessments may decrease, the sophistication and coordination required to navigate the system are increasing.

JCA is a milestone in European collaboration. But its success will depend on tighter synchronization between EU-level clinical conclusions and national pricing and reimbursement realities.

 

Real-world evidence: From complementary input to strategic pillar

Alongside JCA, the role of real-world evidence (RWE) is evolving rapidly. Regulators, payers, and clinicians increasingly seek insight into how therapies perform in routine clinical practice across diverse populations. The European Medicines Agency has clearly signaled its ambition to place patient voice and real-world data at the center of regulatory evaluation.

RWE is no longer supplementary. It is becoming central.

However, tension remains within the EU HTAR context. JCA assessments emphasize statistical precision and internal validity, while real-world evidence reflects the inherent heterogeneity of clinical practice. Methodological expectations between regulatory and HTA frameworks are not yet fully synchronized.

Europe now faces a strategic choice: either build robust, interoperable infrastructures for high-quality real-world data sharing across Member States, or risk creating friction between regulatory innovation and HTA conservatism. The credibility of future evidence strategies will depend on resolving this gap.

 

MFN pricing: Global interdependence redefines strategy

At the global level, Most-Favored-Nation (MFN) pricing dynamics are reshaping launch and market access strategies beyond the United States. Pricing has become an interconnected global system rather than a sequence of independent national decisions.

Launch sequencing is being reassessed as companies evaluate exposure to international reference pricing and MFN-linked rules. Markets are increasingly categorized by strategic risk, and cross-market interdependence is intensifying. Decisions taken in one jurisdiction reverberate across others.

Europe, despite its strong regulatory institutions, faces pressure due to fragmented access pathways, evolving JCA processes, and uncertainty in national budget negotiations. The traditional logic of “where to launch first” has become a far more complex strategic equation.

Taken together, JCA implementation, the rise of RWE, and MFN pricing pressures are increasing analytical complexity, accelerating timelines, and demanding greater coordination across functions and geographies. This rising structural pressure forms the backdrop to the second defining theme of WEPA.

 

2. AI moves from experimentation to operating model

From hype to governance

If policy discussions reflected systemic pressure, AI discussions reflected systemic adaptation.

The tone around artificial intelligence at WEPA 2026 was notably mature. The conversation quickly moved beyond questioning whether AI is hype. The focus shifted toward responsible operationalization, governance, and measurable value creation within regulated environments.

The key issue is no longer adoption. It is integration.

Organizations are developing governance frameworks, embedding AI into regulated workflows, and ensuring traceability and auditability of outputs. The emphasis is on scale and accountability rather than isolated experimentation.

 

AI as infrastructure in market access

Across sessions, AI was framed not as a productivity enhancement tool but as part of the operating model of modern market access organizations.

Companies are redesigning processes around AI-enabled capabilities. Evidence synthesis, systematic literature reviews, indirect treatment comparisons, dossier drafting, pricing simulations, and tender strategy development are increasingly supported by automated or semi-automated systems.

This represents a structural shift. AI is moving from peripheral pilot projects to enterprise-level infrastructure embedded within core functions.

In an environment where JCA increases analytical burden and MFN pricing demands multi-country scenario modeling, such capabilities are becoming operationally essential rather than optional.

 

From assistant to strategic copilot

One of the most forward-looking discussions centered on the evolution of AI from drafting assistant to strategic copilot.

The emergence of agentic AI and orchestration systems is enabling decision support in areas such as pricing negotiation, tender simulations, and contracting strategy optimization. Rather than merely accelerating document preparation, AI is beginning to inform strategic decision-making.

However, in highly regulated settings such as HTA and pricing negotiations, transparency and explainability remain non-negotiable. The credibility of AI-driven insights depends on robust governance and clear traceability.

The opportunity is substantial — speed, standardization, and efficiency. The responsibility is equally significant.

 

3. The convergence: Complexity requires capability

The most important insight from WEPA Amsterdam lies not in policy alone, nor in AI alone, but in their convergence.

Policy reforms are increasing complexity. JCA raises expectations for comparative evidence coordination across Europe. Real-world evidence demands stronger data ecosystems. MFN pricing intensifies global interdependence and strategic sensitivity.

At the same time, AI provides the analytical and operational capabilities necessary to manage this complexity. It enables faster synthesis of comparative data, structured analysis of heterogeneous real-world evidence, and dynamic cross-market pricing simulations.

In this sense, policy pressure and AI capability are two sides of the same transformation. The former raises the bar; the latter provides the tools to reach it.

The defining question for market access organizations is whether they can redesign their operating models quickly enough to integrate policy intelligence, evidence generation, pricing foresight, and AI-enabled execution into a coherent system.

WEPA 2026 signaled that the era of treating these dynamics as separate conversations is over. Market access is entering a phase where structural policy reform and technological capability must be managed together.

Those who integrate both dimensions — responsibly, transparently, and strategically — will shape the future of evidence-based access in Europe and beyond.

The Invisibility Machine of the Women’s Health Gap

A 300-year warning

The global timeline for gender equality is not merely stalling; it is a sobering indictment of our collective priorities as a society. Current estimates from the United Nations reveal a staggering distance to parity: at our current trajectory, it will take 300 years to end child marriage, 286 years to eliminate discriminatory laws and legal protection gaps, 140 years to achieve equal representation in workplace leadership, and 47 years to reach an equal footing in national parliaments.

These are not just social milestones; they are structural barriers that define the “Gender Health Gap.” This gap represents the inequitable, systematic differences in health outcomes between women and men — differences rooted in under-researched medical needs, chronic underfunding, and a “medical model” that has historically treated male biology as the universal baseline. To close this divide, we must recognize that health equity is a strategic imperative for global stability, health capital, and economic prosperity.

 

A ledger of health inequality: The data and the reasons behind the gender gap

Sex is a fundamental genetic modifier of biology, influencing everything from disease susceptibility to treatment response. Yet we remain trapped in a “health-survival paradox”: while women generally live longer than men, they endure higher burdens of morbidity and disability throughout their lives. Some examples are:

  • Diagnostic Delays: On average, women are diagnosed nearly four years later than men for the same diseases.
  • Misdiagnosis: Women are twice as likely to die following a heart attack than men, partly because they have a 50% higher chance of receiving an incorrect initial diagnosis.
  • AI Bias: Modern digital tools often entrench these disparities; AI-powered symptom checkers have been found to flag women experiencing heart attacks as needing psychological care rather than emergency medical intervention.
  • Invisible Conditions: Many women-specific conditions are severely underdiagnosed. For example, 8 in 10 women with menopause and 6 in 10 women with endometriosis remain undiagnosed. Adenomyosis affects up to 35% of women but is often invisible in medical records due to misdiagnosis as fibroids.

 

Some of the key reasons for the gender health gap are related to systematic underinvestment in research and innovation funding and the intersection of biology with social factors that historically displaced women’s equal position in society.

A primary driver of the health gap is the systemic neglect of female biology in scientific research:

  • Underfunding: Only 5% of global research and development funding is allocated to female-related research. Of this, a mere 1% goes toward women-specific conditions like menopause and fertility.
  • Clinical Trial Underrepresentation: The inclusion of women in clinical research only became a requirement in the 1990s. Today, women make up only about 41.2% of participants in key disease clinical trials. In cardiovascular drug trials, female participation averages only 34%, often failing to match the actual disease prevalence in the population.
  • Adverse Drug Reactions: Because many drugs are tested primarily on men, women have a 34% increased risk of severe adverse events. A notable example is the sleep aid Zolpidem, which stays in women’s systems longer than men’s; it took until 2013 for the FDA to require reduced dosing for women after decades of increased emergency room visits.

 

The gap is also influenced not only by the complex interplay of biological sex (genetics, hormones), but also by social gender (norms, roles) and societal roadblocks such as lack of female representation in leadership positions directly shaping inequalities in health policy development not only for women but for all marginalized communities.

 

Fact vs. fiction: Debunking women’s health misconceptions

Effective strategy requires dismantling the myths that have long perpetuated gender health inequality.

  • Women’s health is not synonymous with OB/GYN: Progress has been hindered by the misconception that women’s health is limited to reproductive and sexual needs. In reality, the gap spans every disease area, including neurology, immunology, and cardiovascular health, where women present with unique symptoms and risk profiles.
  • Longevity does not equal better health: The “morbidity burden” is a critical indicator of inequity. Women spend more years in poor health, facing higher disability-adjusted life year rates for musculoskeletal, neurological, and mental health disorders.
  • Inequality is not solely about race, but intersectionality is critical: While gender is a standalone driver of health outcomes, it does not exist in a vacuum. For example, Black and Native American women face the highest rates of pregnancy-related mortality, and Black women are three times more likely to die from heart failure than White women. These data points illustrate why an intersectional lens is non-negotiable for any health equity strategist.

Progress has remained largely stagnant over the last decade because women remain “invisible” in methodological and decision-making frameworks. The ICH Guidance on Technical Requirements for Pharmaceuticals for Human Use still refers to women as a “special subgroup” to be considered “when appropriate.” This classification is mathematically and medically absurd: women represent half of the global population. This invisibility fuels a self-perpetuating cycle of Data Poverty. The recent FDA guidance on addressing sex differences in clinical trials is, though, a positive step towards recognition of such impact in clinical development.

The roadblocks to reform health technologies and decision-making frameworks to address women health needs and considerations are not just scientific — they are structural. They include a lack of political will, the absence of gender indicators for evaluation, and a strong position of gender norms and laws that favor the lack of protection of women on health matters and beyond.

 

Conclusion

Health equity does not need to take 300 years though some of those glacial aspects must be addressed for true success to be achieved.  

Big data, digital technologies, and advanced analytics provide the means to overcome the challenges to achieving women’s health equity in the coming years. Gender health equity is not an act of morality — it is the foundation of a sustainable, healthy, and economically stable future for all.