Choosing AI for Clinical Workflows: What the Transparency Index Tells Us About Model Quality


January 15, 2026

The life sciences industry is reaching a turning point. Large language models (LLMs) are no longer experimental tools; they are becoming embedded in the day-to-day work of protocol writing, SAP drafting, biostatistical programming, data analysis, CSR generation, and even regulatory communication pre drafting. Organizations are beginning to ask not whether they should use AI, but which model they should trust with some of the most sensitive and scientifically consequential tasks in drug development.

This question becomes even more urgent when we look at the 2025 Foundation Model Transparency Index (FMTI), which evaluates how openly model developers disclose information about data, compute, evaluation methods, and governance. The findings show a steep decline in transparency overall. And as illustrated in the graph below, only a handful of companies — most notably IBM, Writer, and AI21 Labs — score above 60%. Many of the most widely used frontier models fall far lower, with OpenAI at 35%, Google at 41%, and Anthropic at 46%. For clinical development teams deciding which LLM to integrate into validated workflows, these discrepancies are too large to ignore.

 

 

Why transparency must be the first filter for model selection

Clinical development is a regulated environment where every analytical step must be traceable, auditable, and scientifically defensible. That makes transparency the first criterion — not model size, not benchmark performance, not popularity. Transparency determines whether you can validate a model’s outputs, understand its failure modes, and integrate it safely into processes governed by GxP expectations and regulatory submissions.

The findings offer a stark reminder that transparency is not evenly distributed across the AI ecosystem. A large gap separates enterprise-focused developers — who tend to score high on transparency — from consumer-facing or hybrid companies, whose disclosure practices are far more limited. For pharma, the companies at the top of the transparency rankings are the ones most aligned with enterprise governance needs.

 

Understanding what really matters: Data, compute, and evaluation rigor

The FMTI highlights where transparency is most lacking: training data provenance and training compute. For clinical development, these blind spots matter because training data influence a model’s understanding of medical terminology, regulatory structure, statistical concepts, and scientific nuance. Without clarity on data sources, organizations cannot evaluate whether an LLM was exposed to clinical trial–relevant content, whether copyrighted text was used, or whether biases exist that could affect outputs like safety narratives or eligibility criteria.

Compute transparency may seem less vital, but it correlates strongly with engineering discipline, model stability, and reproducibility. Models backed by clear documentation of training processes tend to produce more reliable, less erratic outputs — qualities that matter when an AI system is writing code, generating protocol text, or summarizing patient data.

The same applies to evaluation rigor. While many companies publish capability claims, very few release details sufficient for independent replication. The FMTI graph helps contextualize this: companies with the highest scores are also those more likely to provide reproducible evaluations and detailed documentation. These are crucial qualities when determining whether a model can handle clinical tasks such as explaining statistical tests, drafting SAP language, or interpreting adverse event patterns.

 

Choosing the right partner, not just the right model

The transparency disparities shown in the graph underscore an essential reality: selecting an LLM for clinical development is as much about choosing the developer as the model. Enterprise-focused developers consistently outperform frontier labs and consumer-oriented companies because they design their systems with compliance, governance, and documentation in mind.

When evaluating AI vendors for clinical development, organizations should look for evidence of stable disclosure practices, detailed model documentation, governance frameworks, and clarity about training processes. Companies with declining transparency — those trending downward in the FMTI rankings — introduce risk, especially as regulators begin requiring more visibility into the AI systems used in biomedical workflows.

 

A transparency framework for selecting LLMs

Using insights from the FMTI, organizations can apply a simple but powerful sequence when choosing an LLM:

  1. Start with transparency: eliminate models whose developers do not disclose how they were trained or evaluated.
  2. Evaluate domain relevance: determine whether the training data and tuning strategies support clinical and biomedical reasoning.
  3. Assess methodological reproducibility: ensure the model’s documented performance can be independently validated.
  4. Consider governance maturity: prioritize developers with clear update logs, risk policies, and enterprise support systems.

The companies at the top of the graph tend to check these boxes. Those at the bottom typically do not.

 

Transparency by design for AI agents

The 2025 FMTI suggests that as AI systems increasingly take the form of agents embedded across clinical development workflows, transparency may need to be considered early in the design process. AI agents that support activities such as protocol drafting, statistical interpretation, regulatory pre-authoring, or workflow orchestration can introduce additional complexity, particularly when the underlying models operate as opaque systems. In these contexts, limited visibility into how models behave may make validation, monitoring, and risk assessment more challenging.

For those of us in the life sciences industry, the use of AI agents in clinical development may be more sustainable when their behavior and decision pathways can be traced and reviewed. Regulatory evaluation typically extends beyond final outputs to include how decisions were formed and what controls were in place. If AI contributes to protocol language, safety summaries, or analytical reasoning, the ability to explain inputs, assumptions, and limitations could become increasingly important. This, in turn, may depend on working with model providers that offer sufficient documentation around training approaches, evaluation practices, and governance structures.

The transparency differences highlighted by the FMTI indicate that choosing an LLM may also involve selecting a partner with whom long-term governance and compliance considerations can be aligned. Models developed by providers with stronger disclosure practices may offer advantages when building AI systems that require auditability, reproducibility, and regulatory readiness. For organizations exploring the use of AI agents in clinical development, transparency may therefore be one of several factors that influence how confidently such systems can be scaled over time.

 

Read the Foundation Model Transparency Index 2025 report.

Contact Us!
Subscribe to our newsletter

Manuel Cossio

Director, Innovation and Strategic Consulting

Manuel Cossio is Director, Innovation and Strategic Consulting at Cytel. Manuel is an AI engineer with over a decade of experience in healthcare AI research and development. He currently leads the creation of generative AI solutions aimed at optimizing clinical trials, focusing on hierarchical multi-agent systems with multistage data governance and human-in-the-loop dynamic behavior control.

Manuel has an extensive research background with publications in computer vision, natural language processing, and genetic data analysis. He is a registered Key Opinion Leader at the Digital Medicine Society, a member of the ISPOR Community of Interest in AI, a Generative AI evaluator for the EU Commission, and an AI researcher at UB-UPC- Barcelona Supercomputing Center.

He holds an M.Sc. in Translational Medicine from Universitat de Barcelona, a Master of Engineering in AI from Universitat Politècnica de Catalunya, and a M.Sc. in Neuroscience from Universitat Autònoma de Barcelona.

Read full employee bio

Claim your free 30-minute strategy session

Book a free, no-obligation strategy session with a Cytel expert to get advice on how to improve your drug’s probability of success and plot a clearer route to market.

glow-ring
glow-ring-second