Written by Reza Jafar, Omar Irfan, and Maria Rizzo
Recent advancements in machine learning (ML) and artificial intelligence (AI) can offer tremendous potential benefits to health economics and outcomes research (HEOR), such as in cohort selection, feature selection, predictive analytics, causal inference, and economic evaluation.[1] The use of ML and AI has been previously explored in systematic literature reviews (SLRs), real-world evidence (RWE), economic modeling, and medical writing.[2-4]
In this article, we assess the evolving landscape of evidence and developments attributed to AI in HEOR, reflecting on recent insights and developments presented at the 2024 US conference for The Professional Society for Health Economics and Outcomes Research (ISPOR) in Atlanta.
AI in SLRs
The use of AI in SLRs is emerging as a key area of focus, as evidenced by the 19 presentations at ISPOR US 2024. These studies predominantly focused on the screening task using models such as GPT-4 and AI tools such as Rayyan and Robot Screener.
Additionally, AI tools are being used to characterize studies by specific variables of interest, including profiling adverse events and identifying relevant regulatory precedents.
However, only one presentation at the conference evaluated multiple SLR tasks, specifically screening and data extraction. This highlights the need for research on advanced tools and models capable of handling tasks across the entire SLR process.
AI performance in SLR screening and data extraction
The performance evaluation of AI tools in HEOR is often done using various metrics, including accuracy, sensitivity (recall), specificity, precision, and F1-score. Studies reported a wide range of screening accuracy values for each tool and model when comparing AI-assisted screening with human review. For example, the Rayyan tool demonstrated the lowest accuracy of 10% (on 951 records investigating screening evidence on humanistic burden) and the highest accuracy was 95% with GPT-4 on an unspecified topic.
Sensitivity scores also varied widely, with Rayyan reaching 100% and DistilBERT, a natural language processing (NLP) model, showing a minimum of 75%. The specificity ranged from 8% with Rayyan to 95% with GPT-4, while precision and F1-scores fluctuated between 47% to 92% and 53% to 90%, respectively, across various studies.
For data extraction tasks, four studies reported AI accuracies ranging from 87% to 100%, precision from 75% to 96%, and an F1-score from 80% to 94%. However, there was a lack of data on sensitivity and specificity metrics for data extraction using AI across the presentations, indicating the scope for further research.
Time and cost savings with AI
AI’s potential to save time in HEOR processes is consistently underscored across studies. For instance, the Rayyan tool using ML demonstrated a 47% reduction in review time for screening clinical references. Comparatively, human reviewers spent five additional hours screening 519 publications and ten additional hours on 985 publications than AI-assisted reviews. Another study found that GPT-4 could screen 100 abstracts in 10 to 30 minutes, while a fine-tuned SciFive model, leveraging powerful hardware like the Quadro RTX 8000 GPU, completed the same task in just one to two minutes.
The overall evidence presented at the ISPOR conference indicated that using large language models (LLMs) in the SLR process can significantly save time while maintaining an acceptable degree of accuracy in screening and identifying the required variables, compared to human reviewers. Overall, AI-assisted screening and data extraction, followed by human checking of AI-suggested indexing, classification, and extractions, might be the most effective approach for reference screening, database indexing, and data extraction in literature reviews. This hybrid method leverages the speed and efficiency of AI while ensuring accuracy and reliability through human oversight.
While the potential of AI in time savings was recognized, the lack of detailed reporting on cost savings remains a significant gap in current research. Addressing this gap through comprehensive and detailed studies will provide a clearer understanding of the financial benefits of AI, encourage broader adoption, and support informed decision-making for businesses, policymakers, and researchers.
AI in RWE
Another breakthrough development is the integration of AI into RWE research. We noted that data extraction and data analysis were the most commonly assessed RWE tasks, with LLaMa and GPT-4 being the most frequently used tools at ISPOR US 2024. Overall, the use of AI showed improved data extraction from medical charts/notes on indications including oncology and SLE as well as psychiatric comorbidities, further demonstrating the significant alignment between AI-generated RWE research protocols and manually generated protocols.
AI in economic modeling
Studies presented at ISPOR 2024 showed that LLMs achieved higher accuracies in modeling reporting and adaptation. However, they emphasized the importance of expert guidance in using LLMs for HEOR model development to ensure the accuracy and reliability of the outcomes.
AI in manuscript/dossier development
Similarly, the use of AI models, like Fine-tuned Mistral 7B model and GPT-4, in manuscript and dossier development demonstrated the ability to generate coherent discussions and simplify patient-facing materials. However, these models struggled to maintain disparate points of view.
Final Takeaways
The integration of ML and AI into HEOR is gaining momentum, as demonstrated by the diverse applications presented at the ISPOR 2024 US conference. AI models and tools have shown significant potential in enhancing efficiency and accuracy across SLRs, RWE, economic modeling, and medical writing.
While the AI models demonstrated good performance, they were tested on limited data and indications. This can lead to bias and less reliable outcomes for rare diseases, emphasizing the need for continuous AI model refinement and extensive testing across a broader range of indications. Moreover, these models need to be converted into user-friendly tools for broader adoption. Current tools, such as Rayyan and Robot screener, do not exhibit satisfactory accuracy, highlighting the gap between promising model performance and practical application.
Continued development, expert involvement, and cost-effectiveness evaluation are essential to maximize AI’s potential and ensure reliable results, ultimately aiming to improve healthcare decisions and patient outcomes.
Cytel is a global leader in HEOR, with a multidisciplinary team renowned for pioneering innovative methods shaping the industry. Our LiveSLR® software offers a systematic and comprehensive database of the highest quality global research that is instantly and always available. By leveraging AI and automation, LiveSLR provides a scalable and effective solution tailored to the dynamic needs of the pharmaceutical industry. It combines human expertise with technological precision to ensure the accuracy and relevance of evidence, delivering consistently high-quality and current systematic literature reviews for various health technology assessments.
References
- Padula WV, Kreif N, Vanness DJ, Adamson B, Rueda JD, Felizzi F, Jonsson P, IJzerman MJ, Butte A, Crown W. Machine Learning Methods in Health Economics and Outcomes Research-The PALISADE Checklist: A Good Practices Report of an ISPOR Task Force. Value Health. 2022 Jul;25(7):1063-1080. doi: 10.1016/j.jval.2022.03.022. PMID: 35779937.
- Reason T, Rawlinson W, Langham J, Gimblett A, Malcolm B, Klijn S. Artificial Intelligence to Automate Health Economic Modelling: A Case Study to Evaluate the Potential Application of Large Language Models. Pharmacoecon Open. 2024 Mar;8(2):191-203. doi: 10.1007/s41669-024-00477-8. Epub 2024 Feb 10. PMID: 38340276; PMCID: PMC10884386.
- Adamson B, Waskom M, Blarre A, Kelly J, Krismer K, Nemeth S, Gippetti J, Ritten J, Harrison K, Ho G, Linzmayer R, Bansal T, Wilkinson S, Amster G, Estola E, Benedum CM, Fidyk E, Estévez M, Shapiro W, Cohen AB. Approach to machine learning for extraction of real-world data variables from electronic health records. Front Pharmacol. 2023 Sep 15;14:1180962. doi: 10.3389/fphar.2023.1180962. PMID: 37781703; PMCID: PMC10541019.
- Kacena MA, Plotkin LI, Fehrenbacher JC. The Use of Artificial Intelligence in Writing Scientific Review Articles. Curr Osteoporos Rep. 2024 Feb;22(1):115-121. doi: 10.1007/s11914-023-00852-0. Epub 2024 Jan 16. PMID: 38227177; PMCID: PMC10912250.
Interested in learning more? Click here to watch our on-demand webinar.
Watch WebinarSubscribe to our newsletter
Reza Jafar
Reza is a seasoned Data Scientist with over 10 years of experience in health technology. He specializes in artificial intelligence (AI), machine learning, deep learning, and specifically in natural language processing (NLP) and natural language understanding (NLU).
Currently, as a Principal Data Scientist at Cytel, Reza leads AI projects that leverage machine learning and NLP to drive innovations in health. One of his notable achievements includes developing an automated, patent-pending systematic literature review software using cutting-edge NLP models. This tool features an advanced NLP screener, an AI model to extract key elements from citations, and a de-duplicator feature.
Prior to his role at Cytel, Reza served as a Senior Data Scientist at MTEK Sciences, a HealthTech startup, where he developed NLP tools to automate the curation of medical literature and created models to extract and normalize bio-entities from biomedical and clinical texts. His career also includes roles as a data scientist consultant at Genentech and as a Postdoctoral Fellow at the Ottawa Heart Institute. Reza earned his PhD in Engineering from the University of Toronto in 2013.
Read full employee bioClaim your free 30-minute strategy session
Book a free, no-obligation strategy session with a Cytel expert to get advice on how to improve your drug’s probability of success and plot a clearer route to market.


