Solutions
About Us
Insights
Careers

A New Frontier in Real-World Evidence: Can AI Create Reliable Synthetic Trial Data?

Synthetic data is a promising innovation for clinical studies that incorporate an external control arm. Relying solely on traditional control groups can be costly, time-consuming, or even unethical. Instead, researchers are exploring ways to generate “synthetic” patient cohorts that behave like real ones.

At ISPOR US 2026 in Philadelphia, we will be presenting a pilot study that takes an important step in this direction. Our work explores how large language models (LLMs), the same technology behind modern AI assistants, can be used to generate synthetic clinical trial datasets suitable for external control arms (ECAs).

 

Why external control arms matter

ECAs are increasingly important in clinical trials, especially in areas where recruiting patients into placebo groups is difficult or undesirable. By using existing data to simulate a control group, researchers can accelerate trials and reduce patient burden. This challenge is especially pronounced in rare diseases, oncology, gene and cell therapies, and severe or life‑threatening conditions, where patients and clinicians are understandably reluctant to accept randomization to non‑active treatment arms.

However, for ECAs to be useful, they must meet two critical requirements:

  1. They need to closely resemble real patient populations in terms of demographics and clinical characteristics.
  2. They must protect patient privacy and be reproducible for regulatory scrutiny.

This is where AI — and specifically LLMs — enters the picture.

 

Two approaches to generating synthetic data

In our study, we evaluated two different ways of using LLMs to generate synthetic clinical trial datasets.

The first approach was direct generation. Here, the LLM was given access to the original dataset along with a variable dictionary and asked to generate a new synthetic dataset in a single step. This method is fast and intuitive.

The second approach was more structured and code-driven. Instead of generating the dataset directly, the LLM created a Python-based pipeline that performed bootstrapping and anonymization. This pipeline included a noise injection mechanism, where small amounts of statistical “noise” were added to numerical variables. The noise was carefully calibrated — set to 5% of each variable’s standard deviation — and values were constrained within realistic ranges to maintain clinical plausibility.

 

What we found

Both methods were able to generate synthetic cohorts of 100 patients, demonstrating that LLMs can indeed produce usable clinical datasets.

However, the differences between the two approaches were striking. The direct generation method was extremely fast, completing the task in just 23 seconds with a single prompt. In contrast, the code-based approach took longer — around 40 seconds — and required multiple iterations to refine the pipeline.

Despite the extra effort, the code-driven method delivered better results. It more accurately preserved the statistical properties of the original trial data, including key variables like age, body mass index, sex, and race. The distributions in the synthetic dataset closely matched those of the real population, suggesting that the combination of bootstrapping and calibrated noise was effective.

 

Speed vs. scientific rigor

These findings highlight an important trade-off. Direct LLM generation is excellent for rapid prototyping and exploratory analysis. It allows researchers to quickly create synthetic datasets with minimal effort.

But when it comes to regulatory-grade applications — such as external control arms used in decision-making — transparency and control become essential. The code-augmented approach provides a clear, reproducible process that can be audited and validated. This level of rigor is crucial for building trust with regulators and stakeholders.

 

Balancing privacy and realism

A key challenge in synthetic data generation is protecting patient privacy without losing the statistical integrity of the dataset. Our study shows that adding carefully calibrated Gaussian noise can strike this balance.

By scaling noise to the variability of each variable and enforcing realistic bounds, we were able to anonymize the data while preserving meaningful population-level characteristics. This approach helps ensure that synthetic datasets remain useful for analysis while reducing the risk of re-identification.

 

What comes next?

While this pilot study demonstrates the potential of LLM-generated synthetic cohorts, it is only the beginning. Future research needs to explore whether these methods are robust under more challenging conditions.

One critical next step is to evaluate re-identification risk, particularly under adversarial scenarios where attackers actively attempt to reverse-engineer the data. It will also be important to compare noise-based approaches with other privacy-preserving techniques, such as differential privacy. This step would include understanding the amount of noise the model should introduce.

 

Closing thoughts

Synthetic data has the potential to transform clinical research by making trials faster, more efficient, and more ethical. Our findings suggest that LLMs can play a meaningful role in this transformation — but how they are used matters.

Fast, direct generation offers convenience, but structured, code-based approaches provide the reliability and transparency needed for real-world adoption. As the field moves forward, combining the strengths of both may unlock the full potential of AI-driven synthetic data in healthcare.

 

Interested in learning more?

Join Manuel Cossio and Deepa Jahagirdar, along with Anupama Vasudevan, at ISPOR US for their upcoming presentation, “A Pilot Assessment of LLM-Generated Synthetic Cohorts: A First Step Toward Robust Synthetic Control Arms” on May 18 at 4:00 PM.

How Agentic AI Can Transform HTA Landscaping for EU JCA

Health Technology Assessment (HTA) in the European Union (EU) is entering a new phase with the introduction of the EU Joint Clinical Assessment (JCA). The goal of the new HTA regulation is to improve the availability of innovative health technologies in the EU by ensuring efficient resource use and strengthening the scientific quality of HTA across Member States (MS).

At the heart of this process is the JCA scope, which consolidates diverse evidence requests from all MS into the PICO (Population, Intervention, Comparator, Outcome) framework. Anticipating these policy-driven PICO requests is critical for a successful JCA submission and can turn into a complex, time- and labor-intensive exercise. In addition to understanding the potentially diverse clinical practices across the MS, it demands an in-depth assessment of the different national HTA evidence requirements. Teams working on PICO predictions need a clear mapping of what evidence has been accepted, questioned, or rejected across the different HTA systems. Building that mapping is multifaceted.

 

Why HTA landscaping is challenging

HTA landscaping requires careful review of past HTA decisions to understand what evidence leads to positive HTA outcomes. This involves identifying relevant patient populations, accepted comparators, and meaningful outcomes. It also requires going deeper in the HTA documentation, uncovering why certain choices were criticized or dismissed.

Much of this information is hidden in long reports, potentially including appendices. These HTA documents are written in different languages, follow different formats, and often include subtle but important contextual details that unravel the HTA critiques and reasoning for specific evidence requests. As a result, landscaping is still largely manual, time-consuming, and difficult to scale.

 

What makes agentic AI different

Agentic AI offers a new way to approach this problem. Instead of simply summarizing documents or answering one-off questions, agentic systems are designed to carry out structured tasks. They can follow a defined set of instructions, extract specific types of information, and organize results in a consistent way.

This makes them particularly suited for HTA landscaping, where the goal is not just to read documents, but to systematically extract comparable insights across multiple sources.

 

Our research: Using AI agents for HTA extraction

In our recent research, which will be presented at ISPOR US this May, we explored how autonomous AI agents can support HTA landscaping for EU JCA.

We developed two large language model–based agents designed to extract structured information from HTA reports using a set of 21 expert-defined questions. These questions covered both standard PICO elements, such as population, comparators, and outcomes, as well as more context-specific insights. This included methodological requirements, reasons for rejecting certain outcomes or comparators, and other critique points raised by HTA bodies.

The two agents differed in how they were guided. The first used a general prompt, while the second incorporated additional clarification within selected questions to improve contextual understanding.

 

How we evaluated performance

To test the agents, we used publicly available HTA reports for osimertinib (in locally advanced or metastatic NSCLC with EGFR T790M mutation) from Spain, the Netherlands, and France. These reports varied in length, structure, and language, providing a realistic test of performance.

Local HTA experts applied a strict scoring framework that assessed both accuracy and completeness. Importantly, any answer containing hallucinated content was automatically scored as zero. This ensured that reliability remained central to the evaluation.

 

What we found

Both agents were able to complete the full extraction across all HTA reports, and around 90% of responses were generated without hallucinations. The second agent performed better overall, achieving a higher number of fully correct answers and fewer partially correct responses.

The first agent, while still effective, produced some hallucinated content, particularly in the Spanish report. The second agent avoided hallucinations entirely in this evaluation. Both agents performed best on the French HTA report, suggesting that clearer structure and language can improve AI performance.

One of the most important findings was the impact of prompt design. Adding targeted clarification significantly improved the agent’s ability to interpret and extract complex HTA information.

 

What this means for EU JCA landscaping

These results suggest that agentic AI can meaningfully improve how HTA landscaping is performed. By automating structured extraction, it becomes possible to review multiple reports more quickly and consistently. This allows teams to build a more comprehensive understanding of the landscape in less time.

Importantly, this approach goes beyond standard PICO elements. It captures the context-specific insights that often drive HTA decisions, such as methodological concerns or other reasons for rejecting evidence. This is critical for developing realistic PICO scenarios in the context of JCA.

Another key advantage is the ability to work across languages. Since EU HTA involves multiple jurisdictions, multilingual capability removes a major barrier and enables a more unified analysis.

 

The role of human expertise

Despite these advances, AI alone is not enough. Some limitations remain, including occasional hallucinations and variability depending on the source material. For this reason, human oversight continues to be essential.

The most effective approach is to combine agentic AI with human HTA expertise. AI can handle large-scale extraction and structuring of information, while experts validate the outputs and ensure that interpretations are accurate and relevant.

 

Looking ahead

Agentic AI is unlikely to replace HTA professionals, but it will fundamentally reshape how they work. By reducing the burden of manual review, it frees experts to focus on higher-value activities such as interpretation, strategic planning, and decision-making.

In the context of EU JCA, this shift brings clear advantages. It enables faster, more scalable landscaping and PICO predictions, helping to identify potential evidence gaps earlier in the process. As the methodology evolves, further testing will expand the integration of HTA reports from additional MS into the agent-driven workflows. At the same time, engineering adaptations may be needed to accommodate ongoing changes in local HTA documents as they continue to evolve together with the JCA reports.

 

Interested in learning more?

Manuel Cossio and Lilia Leisle will be presenting their poster “Accelerating Dynamic HTA Landscaping in Oncology Through Autonomous Generative AI-Driven Multilingual Data Extraction” at ISPOR US on May 18 at 4 PM. We hope to see you there!

Building a New Evidence Base for Rare Diseases by Structuring Clinical Narratives with Generative AI

Rare diseases present a paradox in modern healthcare. Individually, they affect small populations, yet collectively they impact millions of patients worldwide. Despite this, progress in diagnosis, treatment, and research remains slow. The fundamental challenge is not only scientific complexity but also a persistent lack of usable data.

Traditional sources of real-world data — electronic health records, claims databases, and clinical trials — struggle to capture rare disease populations at a meaningful scale. Patients are geographically dispersed, frequently misdiagnosed, and often excluded from structured datasets. As a result, generating robust evidence in rare diseases remains difficult.

At the same time, an overlooked resource has quietly accumulated over decades: clinical case reports. These narratives contain detailed descriptions of real patients, their symptoms, diagnostic journeys, and outcomes. The challenge has never been their value, but rather their accessibility and structure.

Recent advances in large language models (LLMs) suggest that this barrier may finally be overcome.

 

Case reports as a foundation for real-world evidence

Case reports represent one of the richest forms of clinical documentation available. Unlike structured datasets, they capture the full nuance of patient care, including symptom evolution, diagnostic uncertainty, and physician reasoning. They are inherently real-world, reflecting how diseases actually present and are managed in practice.

However, their utility has historically been limited. Case reports are written in free text, scattered across millions of publications, and lack standardization. Extracting meaningful insights at scale has required significant manual effort, making systematic use impractical.

The RareArena study demonstrates a new approach. By leveraging LLMs, researchers were able to automatically collect and process hundreds of thousands of case reports from PubMed, filter them for rare diseases, and transform them into a structured dataset comprising tens of thousands of patient cases. This process effectively converts unstructured clinical narratives into analyzable real-world data.

This shift is significant. It reframes case reports not as isolated anecdotes, but as components of a scalable data asset.

 

From unstructured text to scalable patient populations

One of the most important implications of this approach is the ability to expand patient populations in rare disease studies. Traditional datasets are constrained by institutional boundaries and data availability. In contrast, case reports aggregate knowledge globally, capturing patients from diverse healthcare systems and settings.

By structuring these reports, LLMs enable the creation of virtual cohorts that far exceed what any single registry or database could provide. Diagnoses can be standardized using reference ontologies, symptoms can be normalized, and cases can be grouped into clinically meaningful categories.

The RareArena dataset, for example, spans thousands of rare diseases and tens of thousands of patient cases, representing one of the broadest collections of rare disease data assembled to date. This kind of scale opens new possibilities for understanding disease heterogeneity, identifying subpopulations, and generating evidence where none previously existed.

In effect, LLMs allow researchers to move from fragmented observations to aggregated real-world populations.

 

Capturing the diagnostic journey

A particularly valuable aspect of the RareArena framework is its alignment with real clinical workflows. The dataset distinguishes between two stages of diagnosis: early suspicion based on symptoms alone, and confirmation after diagnostic testing.

This distinction mirrors how rare diseases are encountered in practice. Patients often experience long diagnostic odysseys, with years passing before a correct diagnosis is reached. By separating these stages, the dataset captures both the uncertainty of early presentation and the clarity provided by confirmatory tests.

This structure enables deeper analysis of diagnostic pathways, including where delays occur and how different signals contribute to clinical decision-making. It also provides a foundation for developing tools that support earlier recognition of rare diseases, an area where unmet need remains substantial.

 

Preserving clinical complexity in real-world data

A common limitation of many real-world datasets is the loss of clinical nuance. Structured data often simplifies patient information, omitting negative findings, confounding symptoms, and contextual details that are critical for diagnosis.

Case reports, by contrast, preserve this complexity. The RareArena study shows that most cases retain features such as negative symptoms and confounding factors, reflecting the challenges physicians face in real-world settings. This makes the resulting dataset not only large, but also clinically realistic.

Maintaining this level of detail is essential for rare diseases, where subtle distinctions can significantly alter diagnosis and treatment. LLMs play a key role here by rephrasing and structuring text while preserving the underlying clinical information.

The result is a form of real-world data that is both scalable and rich in context.

 

Implications for research and clinical development

The ability to generate structured datasets from case reports has far-reaching implications. For researchers, it enables the study of rare diseases across larger and more diverse populations than previously possible. Patterns of presentation, progression, and response to treatment can be explored with greater statistical power.

In clinical development, this approach offers new ways to identify and characterize patient populations. It can support the design of clinical trials by highlighting underrepresented groups and informing inclusion criteria. It also provides a potential source of external evidence, complementing traditional trial data.

Beyond research, there is a clear opportunity to improve clinical decision support. The RareArena study demonstrates that LLMs already show meaningful capability in diagnosing rare diseases, particularly when provided with comprehensive clinical information. While not yet sufficient for standalone use, these models can assist clinicians by surfacing relevant diagnostic possibilities.

 

Limitations and considerations

Despite its promise, this approach is not without limitations. Case reports are inherently selective, often focusing on unusual or severe presentations. This introduces potential bias in the resulting datasets. Additionally, the data is retrospective and curated, rather than continuously collected.

LLMs themselves introduce another layer of complexity. While they are effective at extracting and structuring information, they can also propagate errors or introduce subtle inaccuracies. Ensuring data quality and validation remains critical.

The RareArena study also highlights that even the most advanced models are far from perfect in diagnostic tasks, particularly in early-stage scenarios. This reinforces the need to view these tools as augmentative rather than autonomous.

 

A shift from data scarcity to data unlocking

What emerges from this work is a broader shift in how we think about data in rare diseases. The challenge is no longer solely about collecting new data, but about unlocking the value of existing information.

Case reports represent decades of accumulated clinical knowledge. With LLMs, it becomes possible to systematically extract, structure, and scale that knowledge into usable real-world data. This approach does not replace traditional data sources, but it significantly expands the available evidence base.

For rare diseases, where every patient case is valuable, this shift is particularly impactful.

 

Toward a more complete picture of rare diseases

The combination of case reports and large language models offers a compelling new pathway for advancing rare disease research. By transforming unstructured narratives into structured datasets, it enables the creation of larger, more representative patient populations and more realistic models of clinical care.

While challenges remain, the potential is clear. This approach can accelerate diagnosis, inform clinical development, and ultimately contribute to better outcomes for patients who have long been underserved.

In a field defined by scarcity, the ability to unlock hidden data may prove to be one of the most important innovations yet.

Leveraging RWE Innovations to Inform Clinical Strategy and Strengthen Healthcare Decision-Making

Real-world evidence (RWE) is no longer a supporting actor, but rather a strategic asset that should be embedded across the product lifecycle.

We now have tools that were unimaginable a decade ago: synthetic data that preserves privacy while enabling scenario modeling and early go/no‑go decisions, external control arms (ECAs) to strengthen single‑arm trials and accelerate access in high unmet need settings,
and decentralized long‑term extensions via tokenization that reduce burden while capturing 10+ years of safety and effectiveness across the patients’ real-world journey.

These innovations aren’t just “nice to have.” They are how we accelerate access to needed therapies, demonstrate value with confidence, and build submissions that stand up to today’s scrutiny.

Here, I discuss how these capabilities are reshaping clinical strategy and unlocking smarter, faster, more equitable evidence generation.

 

Generating synthetic data with agentic AI

Synthetic data is artificially generated data that mimics the statistical properties of real data without containing identifiable patient information. Starting with appropriate real-world data (RWD) (patient-level) or randomized controlled trial (RCT) data source(s), sponsors can use an AI-supported pipeline to generate a synthetic dataset, then assess similarities to the original data to gauge success.

Synthetic data can:

  • Inform early go/no-go decisions: A cost-effective approach to optimizing asset strategy before large investments by simulating expected outcomes under various scenarios in Phase I–II.
  • Inform CT design: Model alternative controls and sample sizes and stress-test treatment effects in a cost-effective manner.
  • Build privacy-preserving cost-effective ECAs: Build an ECA partially (+ RWD) or totally through a fully de-identified synthetic cohort. This is not for regulatory purposes yet, but it can inform provider and payer decisions.

RWD has its limitations: it must closely resemble real patient populations and protect patient privacy, and can be costly, time-consuming, and potentially unethical. Synthetic data can help overcome these challenges.

 

Strengthen regulatory submission with an external control arm

External control arms use data from historical RCT or RWD when randomization is not feasible or ethical, or to power / accelerate a study where there is high unmet need.

ECAs can:

  • Strengthen single-arm trials (SAT): Provide contextual information for SAT regulatory submissions, increasing probability of success.
  • Accelerate access to needed therapies: For RCT in high unmet need (e.g., accelerated approval pathway) and/or with slow recruitment, RWD can augment the control arm.
  • Support a lifecycle management approach: Supports label expansions to new populations (e.g., to male breast cancer) or new lines of therapy for decisions by regulators, payers, and providers.

While RCTs are considered the “gold standard,” the FDA in 2023 wrote that “externally controlled studies may be considered” (with strong justification), while in 2025, the EMA guidance stated “in some situations, causal conclusions may be derived from a setting where the investigational medicinal product data was collected under a clinical trial protocol while the control arm was not a randomized arm in that same protocol.”

 

Assess long-term outcomes with long-term extension studies

Decentralized long‑term extensions for RCT assess long-term outcomes (safety and effectiveness) with or without drug provisions. The extension enables follow-up of tokenized trial patients via real-world databases or direct-to-patient data collection.

Long‑term extension studies can:

  • Allow for long-term follow-up: Cost-effective data collection by reducing site and patient burden while collecting key safety and effectiveness endpoints over 10+ years.
  • Enable earlier launch: For breakthrough therapies and high unmet need, launch can occur as soon as clinical efficacy is proven if the sponsor commits to a Phase IV study to collect long-term data.
  • Improve representativeness: Loss to follow-up in long-term studies can lead to confounding, and RCTs often under-represent certain populations. The shift to real-world endpoints makes the insights more relevant to decision-makers.

 

Key takeaways

Consider RWE as a strategic asset: Integrate RWE early and anticipate post-marketing collection of long-term data and adopt causal inference methods to protect ideals of safety and effectiveness.

Invest in robust RWD: Invest in RWD quality and governance to ensure credibility with regulators and payers.

Adopt a comprehensive strategy: Adopt flexible, hybrid evidence strategies that combine synthetic data, ECAs, and long-term real-world data collection approaches.

Ensure cross-functional readiness: Medical, regulatory, biostats, and data science must operate as one evidence engine.

Insights from WEPA Amsterdam: When Policy Pressure Meets AI Maturity

The World EPA Congress in Amsterdam did not feel like a conference about isolated trends. It felt like a conference about structural transition.

Across sessions and conversations, one consistent narrative emerged: market access is being reshaped simultaneously by tightening policy frameworks and by the operational maturation of artificial intelligence. These are not parallel stories unfolding independently. They are interacting forces that together are redefining how evidence is generated, how value is assessed, and how global pricing strategies are constructed.

The underlying question throughout WEPA was not whether change is coming. It was whether organizations are structurally prepared to manage both forces at once.

 

1. A policy environment under structural redesign

Joint Clinical Assessment: Harmonization meets operational reality

The first year of Joint Clinical Assessment (JCA) implementation under the EU HTA Regulation represents a historic step toward harmonization of clinical evaluations across Europe. In principle, a single European-level clinical assessment promises efficiency, reduced duplication, and greater consistency in evaluating comparative effectiveness.

Yet the operational reality is more complex. Harmonization does not automatically mean simplification.

Early experience indicates that alignment between EU-level assessments and national reimbursement processes remains incomplete. Questions persist around how Member States will operationalize JCA outputs, how quickly EU HTAR assessors can deliver assessments, and whether national HTA bodies are fully prepared to transition to reliance on joint evaluations.

Methodological challenges are also emerging. PICO multiplicity, expanded evidence requirements, and the risk of unexpected analytical requests are increasing the burden on evidence generation teams, especially for products targeting rare diseases. While duplication of assessments may decrease, the sophistication and coordination required to navigate the system are increasing.

JCA is a milestone in European collaboration. But its success will depend on tighter synchronization between EU-level clinical conclusions and national pricing and reimbursement realities.

 

Real-world evidence: From complementary input to strategic pillar

Alongside JCA, the role of real-world evidence (RWE) is evolving rapidly. Regulators, payers, and clinicians increasingly seek insight into how therapies perform in routine clinical practice across diverse populations. The European Medicines Agency has clearly signaled its ambition to place patient voice and real-world data at the center of regulatory evaluation.

RWE is no longer supplementary. It is becoming central.

However, tension remains within the EU HTAR context. JCA assessments emphasize statistical precision and internal validity, while real-world evidence reflects the inherent heterogeneity of clinical practice. Methodological expectations between regulatory and HTA frameworks are not yet fully synchronized.

Europe now faces a strategic choice: either build robust, interoperable infrastructures for high-quality real-world data sharing across Member States, or risk creating friction between regulatory innovation and HTA conservatism. The credibility of future evidence strategies will depend on resolving this gap.

 

MFN pricing: Global interdependence redefines strategy

At the global level, Most-Favored-Nation (MFN) pricing dynamics are reshaping launch and market access strategies beyond the United States. Pricing has become an interconnected global system rather than a sequence of independent national decisions.

Launch sequencing is being reassessed as companies evaluate exposure to international reference pricing and MFN-linked rules. Markets are increasingly categorized by strategic risk, and cross-market interdependence is intensifying. Decisions taken in one jurisdiction reverberate across others.

Europe, despite its strong regulatory institutions, faces pressure due to fragmented access pathways, evolving JCA processes, and uncertainty in national budget negotiations. The traditional logic of “where to launch first” has become a far more complex strategic equation.

Taken together, JCA implementation, the rise of RWE, and MFN pricing pressures are increasing analytical complexity, accelerating timelines, and demanding greater coordination across functions and geographies. This rising structural pressure forms the backdrop to the second defining theme of WEPA.

 

2. AI moves from experimentation to operating model

From hype to governance

If policy discussions reflected systemic pressure, AI discussions reflected systemic adaptation.

The tone around artificial intelligence at WEPA 2026 was notably mature. The conversation quickly moved beyond questioning whether AI is hype. The focus shifted toward responsible operationalization, governance, and measurable value creation within regulated environments.

The key issue is no longer adoption. It is integration.

Organizations are developing governance frameworks, embedding AI into regulated workflows, and ensuring traceability and auditability of outputs. The emphasis is on scale and accountability rather than isolated experimentation.

 

AI as infrastructure in market access

Across sessions, AI was framed not as a productivity enhancement tool but as part of the operating model of modern market access organizations.

Companies are redesigning processes around AI-enabled capabilities. Evidence synthesis, systematic literature reviews, indirect treatment comparisons, dossier drafting, pricing simulations, and tender strategy development are increasingly supported by automated or semi-automated systems.

This represents a structural shift. AI is moving from peripheral pilot projects to enterprise-level infrastructure embedded within core functions.

In an environment where JCA increases analytical burden and MFN pricing demands multi-country scenario modeling, such capabilities are becoming operationally essential rather than optional.

 

From assistant to strategic copilot

One of the most forward-looking discussions centered on the evolution of AI from drafting assistant to strategic copilot.

The emergence of agentic AI and orchestration systems is enabling decision support in areas such as pricing negotiation, tender simulations, and contracting strategy optimization. Rather than merely accelerating document preparation, AI is beginning to inform strategic decision-making.

However, in highly regulated settings such as HTA and pricing negotiations, transparency and explainability remain non-negotiable. The credibility of AI-driven insights depends on robust governance and clear traceability.

The opportunity is substantial — speed, standardization, and efficiency. The responsibility is equally significant.

 

3. The convergence: Complexity requires capability

The most important insight from WEPA Amsterdam lies not in policy alone, nor in AI alone, but in their convergence.

Policy reforms are increasing complexity. JCA raises expectations for comparative evidence coordination across Europe. Real-world evidence demands stronger data ecosystems. MFN pricing intensifies global interdependence and strategic sensitivity.

At the same time, AI provides the analytical and operational capabilities necessary to manage this complexity. It enables faster synthesis of comparative data, structured analysis of heterogeneous real-world evidence, and dynamic cross-market pricing simulations.

In this sense, policy pressure and AI capability are two sides of the same transformation. The former raises the bar; the latter provides the tools to reach it.

The defining question for market access organizations is whether they can redesign their operating models quickly enough to integrate policy intelligence, evidence generation, pricing foresight, and AI-enabled execution into a coherent system.

WEPA 2026 signaled that the era of treating these dynamics as separate conversations is over. Market access is entering a phase where structural policy reform and technological capability must be managed together.

Those who integrate both dimensions — responsibly, transparently, and strategically — will shape the future of evidence-based access in Europe and beyond.

What a New Study on AI Adoption in US Hospitals May Tell Us About the Future of Real-World Data

Artificial intelligence is becoming increasingly common in US hospitals. Nearly half of hospitals surveyed in 2023–2024 reported using AI-based predictive models — but adoption is not evenly distributed across the country. Some regions and health systems are moving quickly, while others — particularly those in healthcare shortage areas — are adopting more slowly.

These findings come from “The Landscape of AI Implementation in US Hospitals,” led by Yeon-Mi Hwang and colleagues and published in Nature Health in 2026.1 The study analyzes data from more than 3,500 hospitals nationwide and maps where predictive AI tools are being implemented — and where they are not.

At first glance, this may seem like a technology adoption story. In reality, it is also a data story.

As healthcare increasingly relies on real-world data (RWD) for research, regulatory decisions, safety monitoring, and value-based payment models, the way hospitals adopt AI could directly influence the quality and coverage of the data being produced across the United States.

 

AI adoption signals digital maturity

Hwang and colleagues found that interoperability — the ability of hospital systems to exchange and integrate data — was the strongest predictor of AI adoption. Hospitals with better health information exchange capabilities and fewer data-sharing barriers were much more likely to implement predictive AI tools.

This matters because AI systems require structured, standardized, and well-integrated data to function effectively. When hospitals invest in AI, they often strengthen their documentation practices, data governance, and system integration in the process. Those same improvements elevate the overall quality of clinical data.

In other words, hospitals that are ready for AI are often also ready to produce higher-quality RWD.

 

Why high-adoption regions may produce richer RWD

Predictive AI systems frequently generate structured outputs such as risk scores, alerts, and time-stamped predictions. These outputs are recorded in electronic health records and become part of the clinical data landscape.

As a result, regions with higher AI adoption may generate data that is more complete, more standardized, and better linked across care settings. Their records may contain clearer severity markers, earlier detection signals, and more consistent documentation of clinical decision points.

This is why high-adoption regions may produce richer RWD. The data is not only documented — it is more granular and more measurable.

Because the study shows that AI adoption clusters geographically, these differences in data richness may also cluster by region.

 

The geography gap

One of the more striking findings in the study is that hospitals in healthcare shortage areas and medically underserved regions were less likely to adopt predictive AI. These areas often include rural and resource-constrained institutions.

If these hospitals have less advanced digital infrastructure, the data they generate may be more fragmented and less standardized. Over time, this could create meaningful differences in data coverage across the country. Regions with strong AI adoption may produce deeper, more analyzable datasets, while underserved areas may remain underrepresented in national RWD pipelines.

That imbalance could influence which populations are most visible in research and regulatory evidence.

 

AI changes the shape of the data

AI adoption does not simply improve data capture — it can also shape how care is delivered and recorded. Predictive systems may trigger alerts, influence documentation patterns, and alter clinical workflows. These changes become embedded in patient records.

As a result, RWD from high-adoption environments may reflect AI-influenced care pathways, while RWD from lower-adoption settings reflects more traditional workflows. Differences in adoption may therefore create differences not only in data volume, but also in data structure and interpretation.

 

Why this matters for real-world evidence

Real-world data increasingly underpins post-market surveillance, comparative effectiveness research, regulatory decision-making, and value-based care arrangements. If richer, more granular data clusters in digitally advanced regions, then the evidence generated from national datasets may disproportionately reflect those environments.

This is not necessarily intentional. It is a structural consequence of uneven infrastructure development. But without attention to digital equity, disparities in AI adoption could gradually translate into disparities in evidence generation.

 

The bottom line

The nationwide analysis by Yeon-Mi Hwang and colleagues offers one of the clearest early views of how AI is spreading across US hospitals. Because AI adoption is closely tied to interoperability, digital maturity, and institutional capacity, it likely influences how real-world data is captured, structured, and represented.

High-adoption regions may produce richer RWD — data that is more complete, more granular, and better connected across care settings. At the same time, uneven adoption raises important questions about representativeness and equity in national datasets.

Understanding how AI adoption is expanding — and where it remains limited — may become a key factor in strengthening the US data ecosystem. If increasing AI adoption leads to more complete and structured RWD, it could significantly enhance the power and reliability of real-world evidence. But ensuring that this digital maturity is broadly distributed will be essential. Otherwise, the strength of future RWE may reflect infrastructure patterns as much as clinical reality.

As AI becomes more embedded in healthcare, how and where it is implemented may quietly shape not only care delivery — but the evidence base that guides it.

Enhancing Pharmacovigilance: Leveraging Generative AI to Transform Patient Safety Narratives

Rethinking clinical documentation with generative AI

Generative artificial intelligence (AI) is rapidly reshaping the landscape of clinical documentation. Traditionally, writing patient safety narratives (PSNs) for Clinical Study Reports (CSRs) has required hours of manual data extraction and synthesis — a time-consuming process that slows pharmacovigilance workflows.

New advances in large language models (LLMs), such as Google Gemini, are demonstrating how AI can generate coherent, accurate narratives from structured clinical data. By doing so, these models promise to improve both speed and consistency while maintaining compliance with International Council for Harmonization (ICH) standards.

 

Study overview: Automating PSNs with a RAG framework

In our recent study, we explored how a retrieval-augmented generation (RAG) system could automate PSN drafting for semaglutide-related adverse events. The system merged structured case data with adaptive AI prompting techniques — specifically, Automatic Prompt Engineering (APE) — to optimize the factual accuracy of the generated narratives.

Using an ICH E3–aligned template, the model generated PSNs across four key sections:

  1. Patient Demographics and Study Information
  2. Relevant Medical History
  3. Adverse Event (AE) Details
  4. Laboratory and Diagnostic Findings

Thirty published case reports were analyzed to assess how well the AI performed in extracting, contextualizing, and summarizing information.

 

Measuring quality and efficiency

Each AI-generated narrative was evaluated by clinical documentation experts on a 1–10 scale across multiple criteria — including completeness, clarity, and accuracy. Evaluation metrics focused on core demographic details, drug administration data, adverse event description, and diagnostic relevance.

The average processing time per case was approximately 10 seconds, compared to the several hours typically required for manual PSN drafting. This represents a remarkable productivity gain for pharmacovigilance teams.

 

Key results

The AI-generated narratives achieved an average composite score of 7.5/10 for narrative quality.

  • Highest-performing areas included:
    • Accuracy of AE/SAE Identification (9.8/10)
    • Relevance of Key Findings (9.8/10)
    • Disease/Treatment Context Accuracy (9.4/10)
    • Extraction of Prior Medications (9.0/10)

These results underscore the model’s strength in synthesizing clinical information into concise, ICH-compliant summaries.

However, the patient demographics section scored lower (6.4–7.0), mainly due to missing temporal details or incomplete demographic data. These gaps reflected the model’s sensitivity to inconsistencies in source reports — a known challenge in real-world data processing.

 

Discussion: The balance between automation and oversight

Our findings reveal that integrating generative AI within a structured RAG framework can significantly accelerate PSN drafting without compromising clinical accuracy. The approach supports a hybrid workflow in which AI handles repetitive data synthesis, while human reviewers focus on interpretation, validation, and scientific review.

Still, the study also highlights that expert oversight remains essential. Variability across cases — especially when data formats or terminology differ — underscores the importance of human supervision to ensure contextual completeness and regulatory compliance.

 

The road ahead

Future research will refine prompt design through adaptive APE techniques to improve temporal and contextual accuracy. Expanding the framework across multiple therapeutic areas and languages will be key to scaling adoption in global regulatory environments.

By combining AI-driven generation with expert validation, pharmacovigilance teams can achieve the best of both worlds: faster, more accurate, and more standardized safety documentation.

 

Key takeaways

AI tools — when integrated with structured RAG systems — hold enormous promise for the future of pharmacovigilance. They can dramatically reduce drafting time, enhance consistency, and allow safety experts to focus where it matters most: interpreting data and protecting patients.

ELEVATE-GenAI: A New Guideline for Reporting Generative AI in HEOR Workflows

Generative artificial intelligence (AI), particularly large language models (LLMs), is increasingly embedded in health economics and outcomes research (HEOR) workflows. Researchers are now using these tools to support activities such as systematic literature reviews, health economic modeling, and real-world evidence generation.

As adoption grows, so does a fundamental question for the HEOR community:

How should the use of generative AI be transparently and consistently reported within HEOR workflows?

To address this question, the International Society for Pharmacoeconomics and Outcomes Research (ISPOR) Working Group on AI has developed ELEVATE-GenAI — a reporting guideline specifically designed to document and communicate how generative AI is used in HEOR research.

 

Why a dedicated reporting guideline is needed

HEOR has a strong tradition of structured reporting, supported by well-established standards for systematic reviews, economic evaluations, and real-world evidence. However, the rapid integration of LLMs into HEOR workflows has outpaced the development of HEOR-specific guidance on how their use should be reported.

LLMs are now being applied to:

  • Screening and classifying abstracts in systematic literature reviews
  • Extracting data and assessing bias
  • Building or replicating health economic models
  • Transforming unstructured real-world data into analyzable formats

While these applications offer efficiency and scalability, they also introduce new challenges related to transparency, reproducibility, factual accuracy, bias, uncertainty, and data governance. Existing AI reporting guidelines do not fully address these challenges in the context of HEOR decision-making, regulatory review, or health technology assessment (HTA).

ELEVATE-GenAI was developed to fill this gap by providing clear, HEOR-specific guidance for reporting the use of generative AI within research workflows.

 

What is ELEVATE-GenAI?

ELEVATE-GenAI is a reporting framework and checklist intended for HEOR studies in which generative AI plays a substantive role in evidence generation, synthesis, or analysis. Its goal is not to evaluate the performance of specific AI tools or to prescribe how AI should be used, but rather to ensure that AI-assisted workflows are clearly described, interpretable, and reproducible.

The guideline is designed to support:

  • Authors, by clarifying what information should be reported
  • Reviewers and editors, by enabling consistent evaluation
  • HTA bodies and regulators, by improving transparency and trust

Importantly, ELEVATE-GenAI is not intended for studies that use AI only for minor tasks such as editing or formatting text. Instead, it applies when generative AI meaningfully influences HEOR outputs.

 

Reporting generative AI across HEOR workflows: The 10 ELEVATE domains

At the center of ELEVATE-GenAI is a set of 10 reporting domains that together describe how generative AI is integrated into HEOR workflows and how its outputs are assessed.

 

1. Model characteristics

This domain ensures clarity about what AI system was used. Authors are encouraged to report the model name and version, developer, access method, license type, architecture, and — where available — training and fine-tuning data sources.

 

2. Accuracy assessment

Accuracy reporting focuses on how closely AI-generated outputs align with expected or correct results, using task-appropriate benchmarks such as expert review, gold-standard datasets, or quantitative performance measures.

 

3. Comprehensiveness assessment

Comprehensiveness addresses whether AI outputs fully cover all relevant elements of a task — for example, whether all key studies were captured in a literature review or all required components were included in an economic model.

 

4. Factuality verification

This domain emphasizes verification of factual correctness, including identifying and correcting hallucinated citations, incorrect data, or unsupported claims generated by the model.

 

5. Reproducibility and generalizability

Authors are encouraged to document prompts, parameters, workflows, and model versions to support reproducibility, and to discuss whether the AI-assisted approach can be applied to similar HEOR questions or settings.

 

6. Robustness checks

Robustness reporting addresses how sensitive AI outputs are to changes in inputs, such as minor prompt variations, ambiguous wording, or typographical errors.

 

7. Fairness and bias monitoring

Where applicable, studies should assess whether AI outputs introduce or reinforce biases related to demographic or population characteristics relevant to HEOR analyses.

 

8. Deployment context and efficiency

This domain captures practical aspects of AI deployment, including hardware and software configurations, processing time, scalability, and resource requirements — factors that influence real-world feasibility.

 

9. Calibration and uncertainty

Calibration focuses on whether AI confidence aligns with actual performance and how uncertainty is handled, such as defining thresholds for human review in hybrid AI–human workflows.

 

10. Security and privacy measures

Authors should describe how sensitive data, intellectual property, and regulatory requirements (e.g., GDPR or HIPAA) are addressed when generative AI is used in HEOR workflows.

 

Each domain is accompanied by reporting guidance and an assessment of metric maturity, recognizing that some areas — such as fairness and uncertainty — are still evolving.

 

From framework to practice: The ELEVATE checklist

To facilitate adoption, ELEVATE-GenAI includes a practical checklist that translates the 10 domains into concrete reporting questions. An optional scoring system allows authors and reviewers to summarize reporting completeness, while emphasizing that this score is not a measure of methodological quality or study validity.

The authors demonstrate the applicability of the guideline by retrospectively applying it to two published HEOR studies — one focused on systematic literature review automation and another on health economic modeling. These examples show how ELEVATE-GenAI can be used to consistently describe AI-assisted workflows across different HEOR applications and to identify areas where reporting can be strengthened.

 

Why ELEVATE-GenAI matters for HEOR

As generative AI becomes more deeply integrated into HEOR workflows, transparent reporting is essential to maintain scientific credibility and stakeholder trust. ELEVATE-GenAI provides a shared structure for documenting how AI is used, how outputs are evaluated, and what limitations may affect interpretation.

By establishing common expectations for reporting generative AI in HEOR, ELEVATE-GenAI supports responsible innovation while aligning with the needs of journals, HTA bodies, and regulators.

 

Final takeaways

ELEVATE-GenAI positions itself as a foundational guideline for reporting the use of generative AI in HEOR workflows. By focusing on transparency, reproducibility, and interpretability, it helps ensure that AI-augmented research can be critically assessed and confidently used in healthcare decision-making.

As a living guideline, ELEVATE-GenAI will continue to evolve alongside advances in generative AI — providing the HEOR community with a practical framework for integrating new technologies without compromising rigor or trust.

 

Interested in learning more?

Read the full paper: “ELEVATE-GenAI: Reporting Guidelines for the Use of Large Language Models in Health Economics and Outcomes Research: An ISPOR Working Group Report.”

Looking Ahead to 2026 and Beyond: Views, News, and PHUSE

At the outset, a disclaimer. This piece is potentially “old hat” for you, as it comes from someone who has retired from executive/managerial roles. But wait! One cannot ever retire from observing, admiring, and, therefore, learning. “With all thy getting, get understanding” — a biblical verse inscribed in a Cytel founder’s office — is etched in my mind, so the insatiable quest for absorbing.

What’s in store in the year ahead and beyond? A few things come to my mind:

 

AI and even more AI

I know, I know. You have probably had an overdose on readings about AI. Still, my two cents in short bullets.

  • You gotta learn to use AI seriously. Like it or not. So, you better like it.
  • You don’t need to become an AI expert, just a skilled user.
  • Examine your job description. Anything routine/mechanical is going to evaporate with AI magic. So, amplify your focus on innovating, creating, and original thinking.
  • Don’t trust AI blindly. Find smart ways to validate what it churns out.

While AI usage is still in a nascent stage, early adopters of smart prompt engineering and dependable validation will be at a great advantage for future opportunities.

Here at Cytel we have access to a first-rate suite of AI tools. Judicious and ingenious use paves excellent career growth pathways. Go get started!

 

Domain knowledge shall reign supreme

Through my 28 years at Cytel, every occasion of learning something new about drug development brought me new opportunities. Whether it be a complex therapeutic area, or how adaptive designs are crafted, or how drug delivery works, or how DMC functions — a little bit of enlightenment went a long way in delivering greater value to a client. Regardless of one’s specialization (the “horizontal”), the domain “vertical” opens doors to career growth. I see that becoming even more prominent going forward.  For example, real-world data (RWD) is helping accelerate and enhance drug development, and I have seen young statisticians get excellent opportunities based on their deepening understanding of RWD.

 

Jack of all trades

I have been a firm believer of broader knowledge (not just deeper) working wonders. Occasionally, when I was pushed into supporting business development (e.g., crafting RFP responses, or making a pre-sales demo and presentation), the value of knowing a little bit of everything dawned bright and clear. This year and beyond in future, I feel sure versatility will be a big virtue — for value delivery to the client and, therefore, to one’s own career.

 

GCCs (Global Capability Centers) gain traction

Knowledge-focused companies like Cytel are ideally suited to become skilled competency centers serving global sponsors. The three-decade-old idea of SDFs in the Software Industry is reincarnating now through the concept of GCCs in our domain. Deep scientific knowledge, when combined with deep understanding of a specific sponsor’s processes and specialties is invaluable. “Outsourcing” began with simple cost saving as the core proposition. That has rapidly matured toward 1) tapping large talent pools; 2) innovation and intellectual property creation; and 3) specialized CoEs (Centers of Excellence). In 2026 and beyond, I foresee GCCs becoming knowledge powerhouses. And I foresee global biopharma continuing to welcome specialist service providers to host the GCCs, in addition to their own DIY versions.

 

PHUSE APAC Connect

From expressing the news and my views, let me now move on to PHUSE. This global Healthcare Data Science Community, over the past two decades, initially held annual conferences all across Europe. It then spread its wings to the US with the CSS (Computational Sciences Symposium), partnering with the US FDA, and then to the “US Connect” annual conferences.

It is now making a grand debut in the Asia Pacific Region. The first ever “APAC Connect” of PHUSE is scheduled from February 19–21 in Hyderabad, India. PHUSE has a large following in the APAC region with over 10,000 members spread across India, China, Japan, Singapore, Malaysia, Australia, and several other countries.

What’s more, this event will include the India CDISC Day 2026!!!

 

This event will address a few major themes.  

  • GCCs in the APAC region. This region has the unique advantage of a huge talent pool and is moving up from cost efficiency to innovation hubs and CoEs.
  • Impact of AI. How AI will reshape careers and leadership in drug development. This topic will figure across panel discussions, presentations, and the leadership stream.
  • There will also be a panel discussion on upcoming innovations in drug development that are going to be potential game-changers.

If you are attending the event, use the PHUSE app to curate your personalized agenda and schedule, choosing among the multiple parallel streams.

 

Cytel has always been a big participant at PHUSE events. Consider these snippets:

  • Several first-time Cytel presenters have won best presentation prizes
  • We have been exhibitors and sponsors at many of PHUSE events
  • A few folks, like Angelo Tinazzi from our Geneva office, are celebrated contributors to a number of PHUSE initiatives. Angelo authored the much-acclaimed eBook The Good Data Submission Doctor on Data Submission and Data Integration to the FDA.
  • A Cytelian, having served as a PHUSE Board Member, and being instrumental in bringing PHUSE to Asia, has been invited to chair the Inaugural APAC Connect. Guess who that is!😊
  • Two more Cytelians, Pratibha Jalui and Sudipta Basu, are serving as Stream Co-Chairs.
  • Angelo will be the EU Connect Chair later this year (he served as the Co-Chair last year) in Glasgow, Scotland.
  • This is the first time ever that Cytelians have been chosen for this privilege.
  • At the PHUSE APAC Connect, we have lots of Cytel presenters: Corey Dunham, Pratibha Jalui, Diganta Bose, Aboli Katdare, Charles Warne, Pradip Maske, Chandan Patel Malyala, and Anoop Rawat. We will also have an exhibit booth (#4) with Mansha Sachdev representing our marketing team.

 

Personally, PHUSE has been a booster rocket for my professional career. It brought numerous opportunities of engaging with three significant audiences:

  • Industry peers, exchanging ideas and co-driving initiatives
  • Prospects among big pharma and biotech, several later became clients
  • A talent pool of bright young professionals, some of whom joined Cytel to enhance our ever-growing brainpower

 

The APAC Connect 2026 has a rich 2.5-day agenda that spans across keynote speeches, panel discussions, presentations, hands-on workshops, software demonstrations, a poster session, and a couple of networking events.

 

The bottomline

We at Cytel have an exemplary track record of bringing rigorous data science to the service of human health outcomes. That’s our raison d’être!

Together, let’s take that forward in 2026 and beyond!

 

Meet with us!

Will you be attending PHUSE APAC Connect in Hyderabad, India, this February? Stop by Booth 4 to get to know our experts and learn how Cytel is shaping the future of data‑driven drug development, or click below to book a meeting to discuss career opportunities at Cytel:

From Regulators to Reimbursement: What the EMA-FDA AI Principles Mean for HEOR

In January 2026, the European Medicines Agency (EMA), together with the U.S. Food and Drug Administration (FDA), have taken an important step by publishing the “Guiding Principles of Good AI Practice in Drug Development.” This document is more than a technical checklist — it is a clear signal that regulators are getting serious about how artificial intelligence (AI) should be developed, validated, governed, and, ultimately, trusted across the medicines lifecycle.

While the principles are formally framed around drug development, their implications go well beyond non-clinical and clinical domains. For Health Economics and Outcomes Research (HEOR), this guidance offers something the field has long needed: a credible regulatory blueprint for responsible AI use that could help agencies move from cautious experimentation to structured adoption.

 

Why this matters now

AI is already being used across HEOR — whether for real-world evidence generation, economic modeling, patient segmentation, or long-term outcome prediction. Yet, despite methodological innovation, acceptance by HTA bodies and payers remains uneven. One of the key barriers is not capability, but confidence: confidence in transparency, robustness, reproducibility, and governance.

By articulating shared principles for AI use, the EMA and its partners are laying the groundwork for that confidence. Importantly, they are doing so in a way that aligns closely with the questions HTA agencies ask every day: What is this model for? What risks does it introduce? Can we trust the outputs? And how do we manage it over time?

 

A bridge to HEOR: Learning from regulatory leadership

We have already seen how regulatory clarity can accelerate adoption. The UK, for example, has actively explored how AI can be used to support evidence generation and decision-making in health systems. EMA-FDA’s principles create an opportunity to extend this momentum across Europe and beyond — including into HEOR and HTA decision frameworks.

Although all ten principles are relevant, four stand out as particularly transformative for HEOR.

 

Four principles with outsized impact on HEOR

1. Human-centric by design

This principle explicitly anchors AI development in ethical and human-centric values. For HEOR, this is critical. Economic models and real-world analyses directly influence access, reimbursement, and, ultimately, patient outcomes.

A human-centric approach reinforces that AI in HEOR should support, not replace, expert judgement. It legitimizes hybrid workflows where analysts, clinicians, patients, and decision-makers remain central, while AI enhances scale, speed, and insight. This framing directly addresses common HTA concerns about “black box” decision-making.

 

2. Risk-based approach

Not all AI use cases carry the same consequences, and this principle explicitly recognizes this. For HEOR, this principle is particularly powerful.

Using AI to automate literature screening does not pose the same risk as using it to inform long-term survival extrapolations or pricing decisions. A risk-based approach allows proportionate validation, governance, and oversight — making AI adoption more realistic and scalable for both developers and agencies.

This is precisely the kind of nuance HTA bodies need to move beyond binary “acceptable/not acceptable” positions on AI.

 

3. Risk-based performance assessment

Closely linked, the EMA and FDA emphasize that performance assessment should consider the complete system, including human-AI interaction, and be tailored to the intended context of use.

For HEOR, this reframes validation away from abstract accuracy metrics and toward decision relevance. The key question becomes: Is this AI fit-for-purpose for the policy or reimbursement decision it supports? This aligns naturally with HTA thinking and opens the door to more pragmatic, decision-focused validation frameworks.

 

4. Life cycle management

Perhaps the most underappreciated principle in HEOR today is life cycle management. The EMA highlights the need for ongoing monitoring, re-evaluation, and management of issues such as data drift.

HEOR models are often treated as static artefacts, yet AI-enabled models evolve as data, clinical practice, and populations change. Recognizing AI as a living system — not a one-off submission — could fundamentally change how HTA agencies think about post-submission evidence generation, managed entry agreements, and reassessment over time.

 

From drug development to HTA: An opportunity not to miss

This guidance is explicitly focused on drug development, but its principles are intentionally broad and collaborative. They invite extension, adaptation, and harmonization across jurisdictions and evidence domains.

For HEOR, this is an opportunity. By aligning AI methods with regulatory expectations early — rather than waiting for explicit HTA-specific rules — the field can help shape how agencies evaluate AI-enabled evidence. In doing so, HEOR can move from being a passive recipient of regulation to an active contributor to responsible AI adoption.

 

Looking ahead

AI will not replace HEOR expertise — but it will increasingly shape how evidence is generated, synthesized, and interpreted. These guiding principles offer a shared language to discuss trust, risk, and value. If agencies apply similar thinking to HEOR, we may finally see a path toward consistent, transparent, and confident use of AI in reimbursement and access decisions.

In that sense, this guidance is not just about AI in drug development. It is about preparing the entire evidence ecosystem — including HEOR — for a future where intelligent systems are used responsibly, transparently, and in service of better patient outcomes.

 

Interested in learning more?

Watch our recent webinar, “AI in HEOR: Case Studies on Navigating Regulatory and HTA Guidance,” on demand, featuring experts Dalia Dawoud, Manuel Cossio, Sheena Singh, and Cale Harrison: