Solutions
About Us
Insights
Careers

Streamlining Data Management and Improving Statistical Accuracy in Clinical Trials with AI

As clinical trials grow increasingly complex, the need for smarter, faster, and more efficient data processes and analysis is in demand. Artificial intelligence (AI) emerges as a powerful tool, especially in programming and data management. For clinical trial professionals, AI offers the promise of streamlining workflows, improving data quality, and reducing time to database lock.

 

The evolving role of AI in clinical data programming

AI is not replacing clinical programmers; it’s augmenting them. AI should be considered a tool to use within clinical trials, just as EDC and SAS are commonly used tools. Automation tools driven by machine learning can now handle routine, rules-based programming tasks such as edit check generation, derivation logic, and data transformation. This allows programmers to focus on more strategic activities like validating statistical code or optimizing data pipelines. AI needs the expertise of our clinical trial professionals.

Natural Language Processing (NLP) is also making great progress. For instance, NLP can interpret free-text protocol documents to auto-generate specifications, electronic case report form (eCRF) templates, or even suggest initial SDTM mappings, significantly reducing manual effort.

 

AI in data cleaning and quality oversight

Traditionally, data cleaning has been labor-intensive, with data managers manually reviewing queries, data listings, and edit checks across multiple data sources and systems. AI tools can now proactively flag anomalies or data trends that human review might miss, such as unexpected patterns in lab values, inconsistencies across visits, or possible fraudulent data across participants and sites.

Predictive models can help identify study participants at high risk of dropout or noncompliance, enabling earlier intervention. This not only improves data completeness but also enhances trial efficiency and participant retention. The effort and cost of replacing clinical trial participants is significant and felt across all stakeholders. Improving the patient’s experience would be a significant way to save time, money, and accelerating progress.

 

AI in statistical programming: From code automation to advanced insights

Statistical programming is central to clinical trial analysis from producing tables, listings, and figures (TLFs) to preparing submission-ready datasets. Traditionally reliant on manual coding in SAS or R, this work is now gaining speed, consistency, and quality through AI augmentation.

 

Where AI adds value in statistical programming

  • Automated code generation: AI models trained on historical programming logic can produce initial SAS macros or R scripts for common TLFs and dataset derivations. These drafts accelerate development by up to 40–60%, freeing programmers and biostatisticians to focus on complex analyses and interpretation.
  • Code review and validation: AI-assisted tools can scan code for logic errors, inefficiencies, redundant steps, and deviations from programming standards. Acting as a “second reviewer,” they flag potential issues early and suggest optimizations.
  • Dynamic statistical modeling: AI algorithms can rapidly explore large trial datasets to detect subgroup effects, anomalies, or emerging trends. When guided by statistical oversight, these insights can refine analysis plans and support adaptive trial decisions.

The aim is not to replace human judgment, but to boost productivity, reproducibility, and the speed of insight generation, without compromising scientific rigor.

 

AI in biostatistics: Powering smarter, more adaptive clinical trials

 Biostatistics remains the foundation of evidence generation in clinical trials, providing the methodological rigor to transform raw data into reliable conclusions. In the context of AI, biostatisticians play a dual role: safeguarding scientific validity while leveraging new computational tools to enhance insight generation. This requires a careful balance between deep domain knowledge and technical proficiency in emerging AI-driven methodologies. From applying knowledge graphs (KGs) to map complex biomedical relationships, to developing predictive models that anticipate trial outcomes, biostatistics is evolving into a more dynamic and interconnected discipline.

 

Where AI adds value in biostatistics

  • Balanced expertise: Integrating statistical theory with AI/ML techniques to ensure robust, interpretable results.
  • Knowledge graph applications: Using KGs to uncover hidden relationships between biomarkers, treatments, and outcomes, supporting hypothesis generation and trial design.
  • Early prediction tools: Building predictive models for recruitment success, dropout risk, and endpoint achievement.
  • Segmentation and personalization: Identifying patient subgroups most likely to benefit from a therapy, improving trial efficiency and precision medicine strategies.
  • Support for registrational trials: Leveraging AI to optimize trial design, stratify patient populations, and run simulations that ensure the study is powered and structured for regulatory success.

 

Regulatory readiness and caution

Despite its promise, AI must be implemented thoughtfully. Regulatory agencies like the FDA are increasingly open to the use of advanced technologies but expect transparency, traceability, and validation. AI-based tools must be auditable and explainable, especially when used in clinical data workflows that feed into regulatory submissions.

 

What’s next?

As AI becomes more embedded in clinical trial ecosystems, we can expect increased integration with EDC systems, CDISC standards, and statistical programming tools. The goal isn’t to eliminate human oversight but to enhance it, allowing clinical data professionals to make faster, better-informed decisions.

 

Final takeaways

AI is reshaping programming and data management in clinical trials. For clinical trial professionals, now is the time to become familiar with these tools, understand their capabilities and limitations, and engage with cross-functional teams to ensure responsible and impactful implementation. Ultimately our goal is to shorten drug development timelines and improve patient outcomes. With AI, we can be part of the solution to provide improved treatments for patients.

 

Interested in learning more?

Join Steven Thacker, Sheree King, Kunal Sanghavi, and Juan Pablo Garcia Martinez for their upcoming webinar, “How AI Enhances Biometrics Services: Streamlining Data Management and Improving Statistical Accuracy in Clinical Trials” on Thursday, August 28 at 10 am ET:

Offshoring Biometrics FSP Teams: Best Practices

Functional Service Provider (FSP) models are widely used to deliver biometrics services in the biopharmaceutical industry. Traditionally, these teams have been based in the United States and Western Europe, but with a globally recognized talent pool and the need to deliver more value within confined budgets, sponsors are now interested in offshore locations, such as India, South Africa, and Eastern Europe.

Here, I detail best practices for sponsors looking to incorporate offshore FSP teams.

 

Best practices for building offshore FSP teams

Best practices for building offshore teams from scratch include:

 

1. Developing a detailed recruitment plan

Creating a comprehensive recruitment plan that outlines timelines and mutually agreed-upon milestones is key to effectively launching offshore teams. The recruitment plan should be viewed as a living document that is reviewed regularly and updated as needed. The focus should be on “planned vs actual” metrics and ensuring that all roadblocks to recruitment are removed in a timely manner.Finally, this document must be based on hard data, acknowledging where the talent pool is and the track record in recruiting that population.

 

2. Focusing on early risk identification and mitigation

Obstacles to recruitment will occur, and anticipating and planning for these challenges early on will do much to support recruitment success. Common risks to recruitment include lengthy country-specific notice periods, changing economic conditions, and competition from other vendors and sponsors. The FSP vendor should have active plans to implement mitigation strategies to minimize any impacts due to these risks.

 

3. Identifying quality resources that fit the sponsor’s culture

Detailed and complete job descriptions are central to recruitment success, but beyond the pure technical skills, FSP recruitment must incorporate an assessment of overall “fit” within the sponsor’s organization. For example, does the role require working as part of a team or is an individual performer more likely to find success? All of this should be supported by a dedicated global talent acquisition team that understands where to find talent to increase the probability of recruitment success.

 

4. Accelerating onboarding

Strong onboarding is highly correlated to employee retention; it must be timely, practical, and clear. Ideally, new FSP hires should start one week prior to their first day with the sponsor, to allow time to complete internal training at the FSP provider, and to understand the sponsor’s expectations before starting from other team members or the FSP Lead. Finally, pairing up new hires with an already established “buddy” to which they can seek day to day advice on the role contributes greatly to new employee satisfaction.

 

5. Prioritizing retention

Tenets of effective retention planning start with a positive and seamless onboarding experience and progress to garnering employee feedback and establishing a continuous feedback loop. Other strategies include employee recognitions and rewards and offering creative professional development opportunities. Additionally, while salary and bonus are indeed important to employees, these should be supplemented with other important benefits, such as flexible work hours, to demonstrate employee value.

 

Considerations for sponsors

Many sponsors with already established onshore FSP teams are interested in offshoring options, essentially replacing these resources with resources in more cost-effective countries. In these cases, business continuity is the utmost priority and transition timings must work around the needs of the business and required portfolio deliverables. This requires a fair amount of upfront planning with the sponsor, based on the following questions:

  • Assuming that timeline slippage is not permissible and that all key deliverables are of equal priority, what are the key deliverables due this calendar year, mapped out by month?
  • Which FSP personnel to be transitioned are involved in these deliverables?
  • Within each FSP personnel assigned to a key deliverable, which are most critical (transitioned later), and which are less critical (transitioned earlier)?
  • For replacement headcount, what geographies are preferred (if any)?
  • What are the notice periods for these preferred geographies?
  • Finally, how do we reconcile the time required to transition off with the time required to transition on, while minimizing any work process disruptions?

This is an iterative process that requires close collaboration with the sponsor.

Insights on the New ADaM guidelines and Europe Interchange 2022

 

I am excited to see you all at the CDISC Europe Interchange, April 27 – 28 but unfortunately, it will be a virtual event (hopefully, for the last time). The program designed by the CDISC Europe Committee and the CDISC team looks promising as always! Silvia Faini and I (members of the committee) will lead the “Tech-Enabled Standards” and “ADaM” streams, respectively.

Read more »

Maximizing Study Momentum: A Case Study in Accelerated DMC Safety Report Creation through IDMC Solutions

In the ever-evolving landscape of clinical development, the need for robust evaluation of interim clinical data through Independent Data Monitoring Committees (IDMC) has become increasingly vital. IDMCs play a pivotal role in ensuring patient safety and upholding trial integrity amid the growing complexity of clinical development programs. Independent data monitoring is especially critical in Central Nervous System (CNS) studies due to the challenges noted with proper enrollment of patients, rater variability, and placebo response rates, among others.

Read more »

How can an optimized data strategy support your clinical program?

Data is the cornerstone of any clinical trial and is used to ultimately drive the decision-making process related to the drug development. The quality of your clinical evidence package is an important factor in gaining approval from key decision-makers, including regulators, payers, and health technology assessment (HTA) agencies. So, how can you generate high-quality clinical data, especially when you don’t have the appropriate in-house expertise?

Read more »

How to conduct better time-to-event analysis with delayed treatment effects

The issue of delayed treatment effects in immuno-oncology was demonstrated during a FDA-Industry sponsored workshop over two years ago. This demonstration made it clear that traditional log-rank tests, often used for analyses of progression free survival and overall survival, would need to be replaced as essential assumptions of the test no longer held.

Cytel scientists along with colleagues at Pfizer, Merck, the Medical University of Vienna, Bath University and Harvard University, have recently proposed a new test in a study published in Biometrical Journal. The max-combo test enables analysis of PFS and OS when handling delayed treatment effects, while also adding the option for early stopping.

Read more »

Career Perspectives: Interview with Charles Warne, Associate Director of Biostatistics

In this edition of the Career Perspectives series, I interview Charles Warne, Associate Director of Biostatistics at Cytel. Charles is originally from Australia where he completed his university education and is now based in Singapore. He has been working as a biostatistician in the pharmaceutical industry since 2011, and prior to that he worked as a biostatistician/epidemiologist in an academic setting.

In this blog we talk to Charles about his journey so far, his current role, and achievements; and we also get his views on the evolution of the life sciences industry and the future of adaptive Bayesian designs.

Read more »

Career Perspectives: Interview with Jessica Bhoyroo, Clinical Data Manager

In this edition of the Career Perspectives series, I interview Jessica Bhoyroo, Cytel Clinical Data Manager based in Basel, Switzerland.

As a student, Jessica was looking forward to a career where she could constantly acquire new scientific skills and feel that her work is useful to others. She graduated in Clinical Operations and Clinical Data Management.

Before assuming the role of a data manager, Jessica worked and trained as a Clinical Research Associate as well as a Study Coordinator in France (Montpellier) and in the U.S. (Tulsa, Oklahoma). These experiences helped her gain a better understanding of the journey of the people she was potentially going to work with.

“If you know and understand what other people do, then you are in a better place to help them.” – Jessica

Data management is an essential part of clinical research and requires collaboration with several teams, starting with data collection to analysis. Jessica’s love for interacting with people and her enthusiasm made this role an intuitive career choice for her.

Read more »

CDISC Certification – Is It Worth Taking?

For years, I have been telling the recruiters at Cytel to be wary of candidates claiming to have a CDISC Certification

“There is no official CDISC certification. The candidate would have probably attended a CDISC Training.”

Last year, CDISC launched a new CDISC Pilot Certification program and in April this year I was offered to be a part of the pilot; the certification will be made available to everyone this August (join the CDISC webinar on September 7, 2021 to know more).

As of today, only the SDTM certification is available (“CDISC Tabulate Certification”) but there is a plan to develop more certification programs on other CDISC standards.

I took the certification test in April through the Prometric platform, and it was an interesting experience!

Read more »

Conduct of IDMCs for Cell and Gene Therapy Trials

Independent data monitoring committees review unblinded clinical trial data and issue recommendations to designated sponsor liaisons. Characteristics of cell and gene therapy trials often require slightly different approaches to IDMC conduct in comparison to typical pivotal trials. Here I will discuss the conduct of IDMCs for cell and gene therapy trials. I will cover the differences that are unique to these therapeutic areas and how those differences impact how the IDMC operates. I will note potential pitfalls in IDMC conduct and introduce mitigation strategies. Specific areas of exploration will include IDMC member and SDAC selection, IDMC report preparation, IDMC reporting specifications, and IDMC meeting conduct. Read more »