Blending Power and Flexibility: How AI-Generated R Code is Reshaping Clinical Trial Design


July 29, 2025

In today’s fast-evolving clinical research landscape, designing robust and efficient trials is more critical than ever. As statistical designs grow in sophistication, biostatisticians are increasingly relying on both commercial platforms and open-source tools to meet unique modeling needs. But this hybrid approach also comes with challenges, particularly for those new to advanced simulation software or lacking programming experience.

At Cytel, we’ve been exploring how artificial intelligence (AI) can help bridge this gap. At the 2025 Joint Statistical Meetings (JSM), we will present on our latest innovation: AI-powered R code generation for clinical trial design, a feature embedded in our East Horizon™ platform. This assistant, called RCACTS (R Coding Assistant for Clinical Trial Simulation), represents a significant step forward in making custom trial design faster, more accessible, and more reliable.

 

Why talk about this now? The open-source imperative

While commercial clinical trial design software offers rapid design development through validated and user-friendly workflows, it doesn’t always address the full complexity of real-world problems. Trial statisticians often face challenges in areas such as oncology, rare diseases, and adaptive designs that require tailored statistical tests, unique outcome generation models, or alternative randomization techniques.

This is where open-source tools like R become invaluable. R allows statisticians to write custom code to simulate complex trial designs, perform Bayesian analyses, or integrate evolving regulatory guidance. Over the years, a vibrant ecosystem of R packages has emerged, offering a high degree of transparency, flexibility, and academic rigor.

Yet this flexibility comes with trade-offs: code development can be time-consuming, error-prone, and requires significant programming expertise. As a result, many biostatisticians find themselves switching between validated commercial workflows and custom R functions, leading to a process that is often fragmented and inefficient.

Recognizing this, Cytel’s East Horizon platform has introduced R integration points, enabling users to inject custom code directly into validated simulation workflows. This integration delivers the best of both worlds: the speed and structure of commercial software with the creativity and control of open-source.

 

Enter AI: Speed, simplicity, and smarter coding

Our next logical question was: can AI make this process even easier?

The answer, increasingly, is yes. With recent advances in generative AI, particularly large language models (LLMs), it’s now possible to assist in the generation of R code for simulation-based design tasks. At Cytel, we’ve harnessed OpenAI’s GPT-4o via API, securely deployed within Microsoft Azure, to create RCACTS, a coding assistant purpose-built for biostatisticians.

Unlike generic AI tools that produce standalone R scripts, RCACTS generates R code specifically tailored for the East Horizon simulation engine. It ensures that the generated functions:

  • Match expected input/output structures,
  • Include pre-defined parameters as shown in our internal statistical package CyneRgy,
  • Are immediately ready for integration and testing within a live trial design workflow.

With RCACTS, users can simply describe what they want in plain English and receive functioning R code that can be integrated into East Horizon.

 

Who benefits? Everyone from newcomers to experts

One of the major advantages of this AI-enhanced workflow is lowering the barrier to entry. For a new user unfamiliar with Cytel’s R integration or syntax requirements, writing compatible code from scratch can be daunting. RCACTS significantly reduces the learning curve by providing validated function templates, sensible defaults, and clear parameterization, all supported by generative AI.

At the same time, experienced statisticians benefit by spending less time on repetitive coding tasks, debugging, or remembering function signatures. This allows them to focus on higher-level design questions, such as: What analysis method is most robust? How sensitive is the design to different outcome distributions? What dropout patterns pose the greatest risk?

Our assistant supports a wide range of trial design elements:

  • Simulating patient responses: Binary, Continuous, Time-to-event, and Repeated-measure endpoints.
  • Analyzing simulated data: Statistical analysis for these endpoints.
  • Randomization: Flexible randomization of patients across treatment groups.
  • Enrollment and dropout modeling: Custom mechanisms for realistic patient enrollment and dropout scenarios.
  • Treatment selection: Supporting multi-arm multi-stage (MAMS) trial designs.

 

Balancing innovation with responsibility

Of course, like any AI solution, there are caveats. AI-generated code must be carefully reviewed for correctness, appropriateness, and regulatory readiness. RCACTS includes a built-in testing functionality to flag structural or syntactic errors, but statistical validation remains the user’s responsibility. Also note that all data interactions adhere to Azure OpenAI’s stringent data protection policies to ensure security and compliance.

There’s also a broader concern: will over-reliance on AI limit the creativity and deep statistical thinking that define our profession? At Cytel, we view AI not as a replacement for expertise, but as a tool to amplify it. Our goal is to give statisticians more time and mental space to explore, iterate, and innovate rather than reduce them to prompt engineers.

 

Looking ahead

The future of clinical trial design lies in intelligent integration: combining the strengths of validated commercial tools, flexible open-source frameworks, and AI-powered coding assistance. With East Horizon and RCACTS, we believe we’re building the blueprint for this future, with a platform that supports both scientific rigor and operational speed.

As the field continues to evolve, biostatisticians will need tools that not only keep up with complexity but also support creativity, collaboration, and efficiency. AI-generated R code, embedded within a powerful simulation engine, is one such tool and is already transforming how we approach design flexibility in clinical trials.

 

Catch us at JSM 2025 to learn more about how AI is transforming the future of clinical trial design within Cytel.

Book a meeting!
Subscribe to our newsletter

Subhajit Sengupta

Associate Director of Data Science, Research & Innovation

Subhajit Sengupta, Ph.D., is Associate Director of Data Science, Research & Innovation at Cytel, where he leads the development of advanced methods and cutting-edge tools in adaptive clinical trial design, Bayesian modeling, and Generative AI. As a well-trained computational research scientist, he brings diverse R&D experience spanning biostatistics, Bayesian statistics, machine learning, generative AI, image processing, and biomedical informatics. Subhajit holds a Ph.D. in Computer & Information Science & Engineering from the University of Florida and previously held a Research Scientist position at NorthShore University HealthSystem, with a dual appointment as Senior Clinical Researcher at the University of Chicago’s Pritzker School of Medicine. He has authored numerous peer-reviewed publications, developed open-source software for tumor subclone analysis, and contributed to large-scale cancer genomics consortia. Subhajit brings deep experience in research innovation, project leadership, and software development, with proficiency in R, Python, C++, Julia, and cloud platforms such as Azure and AWS. His work bridges computational rigor with clinical insight, advancing the frontiers of data science in healthcare.

Read full employee bio

Claim your free 30-minute strategy session

Book a free, no-obligation strategy session with a Cytel expert to get advice on how to improve your drug’s probability of success and plot a clearer route to market.

glow-ring
glow-ring-second