Generative AI in Evidence Synthesis: Harnessing Potential with Responsibility
October 21, 2025
The integration of AI into the healthcare research landscape is accelerating, with one obvious area of application being evidence synthesis. From early scoping reviews to comprehensive systematic literature reviews (SLRs), AI promises to reduce manual burden and enhance efficiency by saving time. However, it is crucial to understand both the strengths and limitations of using AI in this broad context to ensure compliance, reliability, and scientific rigor.
Knowing where it works: A targeted approach
Artificial intelligence, including generative AI models, shines when used for targeted literature reviews (TLRs) or when generating summaries of scientific articles to support evidence-based decision-making at an early development stage. AI can synthesize large volumes of information quickly, offering valuable insights during exploratory or early-phase research.
However, it’s critical to distinguish these from regulatory-facing systematic literature reviews, especially those intended for payer or health technology assessment (HTA) submissions. In this context, SLR extractions have traditionally been completed by two independent human reviewers. This human oversight ensures objectivity and reproducibility, key elements of regulatory compliance.
Expertly trained models vs. generalist giants
The current landscape is filled with large generalist language models trained on diverse internet-scale data. While impressive, these models often exhibit hallucinations — the generation of plausible but incorrect or fabricated content — particularly in domain-specific applications like evidence synthesis.
This is why domain-trained expert models are preferred. These models are fine-tuned on biomedical and scientific corpora, ensuring higher reliability and reducing the risk of misinterpretation or erroneous conclusions. They understand field-specific terminology, data structures, and compliance requirements far better than their generalist counterparts.
The imperative of data traceability
In evidence synthesis, transparency is non-negotiable. Any AI-generated output must allow users to:
- Highlight the exact source (i.e., sentence or section) of the original scientific article from which a conclusion or data point was extracted.
- Compare the model’s interpretation with the source text to identify discrepancies or nuances that could affect meaning or validity.
Using structured tags to annotate key terms, qualifiers, and relationships can make these comparisons clearer and more systematic but also inform advanced search and retrieval activities. By surfacing subtle differences, tagging supports expert review, preserves contextual integrity, and strengthens the reliability and defensibility of the synthesized evidence.
Measuring what matters: Precision and beyond
Traditional evaluation metrics like precision, recall, and F1 score (the harmonic mean of precision and recall) remain foundational when assessing AI model performance in literature screening and data extraction.
But in generative contexts — where the task may be summarization, paraphrasing, or abstract reasoning — additional measures become valuable:
- Answer correctness: Does the output convey a factual, verifiable point?
- Semantic similarity: How closely does the AI output align in meaning with the ground truth?
- BLEU, ROUGE, and BERTScore: These Natural Language Processing metrics offer quantitative insights into the quality of generated text, especially for summarization and content generation tasks.
Selecting the right mix of these metrics provides a comprehensive view of model performance and reliability.
Where AI makes a difference: Screening and beyond
One of the most promising applications of generative AI in evidence synthesis is in literature screening, or the ability to assess whether a publication (abstract or full text) meets the criteria for inclusion. Studies and pilot implementations suggest that AI can reduce screening time by up to 40%, making it a powerful ally for research teams.
AI tools have been leveraged to assign a probability of inclusion to a title or abstract or full text to guide the screening process but also to allow researchers to quickly understand the impact of modifying search strategies on yield. By automating this repetitive and time-consuming phase, organizations can reallocate expert human resources to higher-value tasks, such as:
- Resolving ambiguous or context-dependent data extractions
- Validating nuanced findings and offering insights into implications of these findings
- Ensuring alignment with HTA submission standards
In this way, AI doesn’t replace human reviewers but augments them, driving efficiency without compromising accuracy.
AI with guardrails
Generative AI is reshaping the landscape of evidence synthesis, but its integration must be strategic, measured, and compliant. By combining domain-trained models, robust traceability, appropriate evaluation metrics, and human oversight, organizations can unlock the true value of AI — accelerating workflows without sacrificing quality or compliance.
When used thoughtfully, generative AI becomes more than just a tool — it becomes a partner in advancing scientific research.
Meet with us at ISPOR 2025!
Manuel Cossio and Nathalie Horowicz-Mehler will be in Glasgow for ISPOR Europe 2025! Click the link below to book a meeting, or stop by Booth #1024 to connect with our experts:
Book a meeting!Subscribe to our newsletter
Manuel Cossio
Director, Innovation and Strategic Consulting
Manuel Cossio is Director, Innovation and Strategic Consulting at Cytel. Manuel is an AI engineer with over a decade of experience in healthcare AI research and development. He currently leads the creation of generative AI solutions aimed at optimizing clinical trials, focusing on hierarchical multi-agent systems with multistage data governance and human-in-the-loop dynamic behavior control.
Manuel has an extensive research background with publications in computer vision, natural language processing, and genetic data analysis. He is a registered Key Opinion Leader at the Digital Medicine Society, a member of the ISPOR Community of Interest in AI, a Generative AI evaluator for the EU Commission, and an AI researcher at UB-UPC- Barcelona Supercomputing Center.
He holds an M.Sc. in Translational Medicine from Universitat de Barcelona, a Master of Engineering in AI from Universitat Politècnica de Catalunya, and a M.Sc. in Neuroscience from Universitat Autònoma de Barcelona.
Read full employee bio
Nathalie Horowicz-Mehler
Global Head of Value
Nathalie brings more than 25 years of experience in health economics and real-world evidence to Cytel’s Evidence, Value and Access business. She has a proven track record of driving growth by leading teams of scientist entrepreneurs and applies her clinical expertise as an epidemiologist and experience with HEOR, RWE/RWD and advanced analytics to catalyze innovations in public health.
Nathalie has held leadership roles at IQVIA, Concert AI, and Exponent. As Senior Vice President GM at ConcertAI, she leveraged her expertise to successfully establish their health economics and epidemiology practice. Most recently, she served as Principal Scientist at Exponent, focusing on pharmaceutical sector advancements and healthcare innovation.
Nathalie has an MS from Tufts University School of Medicine, and both an MPH and PhD from Columbia University Mailman School of Public Health.
Read full employee bioClaim your free 30-minute strategy session
Book a free, no-obligation strategy session with a Cytel expert to get advice on how to improve your drug’s probability of success and plot a clearer route to market.