Solutions
About Us
Insights
Careers
Contact us
Contact Us
Customer Support
Customer Support

SDTM IG 4.0 and SDTM 3.0: Celebrating the End of SUPP?

After about five years since the release of CDISC IG 3.4, CDISC has just released CDISC IG 4.0 and SDTM 3.0 for public review. Comments are due April 6, with expected final release expected later this year.

The public review includes the Conformance Rules version 3.0 as well as three draft Knowledge Base articles exploring some of the main changes expected with IG 4.0:

  • NS– Datasets: Why they were built as they were.
  • Why change the structure of SDTMIG metadata?
  • Why does the DC domain differ from what’s described in FDA’s TCG?

For a quick overview of the impact of these changes, see the CDISC Standards timeline webpage or the revision history available in the draft version wiki for public review.

 

Celebrating or regretting the end of SUPP?

We will be moving, for example, from something called SUPPAE to something called NSAE, with a less “normalized” structure. Will this be “a small step for a man, a giant leap for mankind”? “Ai posteri l’ardua sentenza.”1

The change will require us to go from this:

to this:

The structure of these new dataset(s) is “One record per related dataset record,” meaning that the many-to-one relationship will no longer be possible, for example, an NS that applies to several records in the parent domain via –GRPID. That said, there is a hope that this new structure will simplify metadata handling and potentially facilitate the adoption of future data exchange format, such as CDISC Dataset-JSON.

 

Three new domains

Three new proposed domains have been introduced:

  • DC (Demographics for Multiple Participations)
  • GI (Gastrointestinal System Findings)
  • EA (Event Adjudication) are three new proposed domains

DC has been around, unofficially, for some time, following the requirements introduced by the FDA in its FDA Study Technical Conformance Guide (see here my previous blog). This domain supports the representation of multiple enrollments within the same study. Along with DC, SUBJID has been added to all subject-level domains to differentiate data “generated” from each individual subjects’ participations.

Compared with FDA requirements, SDTM IG 4.0 also covers scenarios in which the same subject is enrolled multiple times, not only multiple screenings.

Identification of “Primary Enrollment,” and therefore how DM variables are populated, is left to the sponsor’s discretion. However, in cases where a subject experiences one or more screen failures before finally enrolling, the successful enrollment should clearly be considered the primary one.

EA, a Findings About domain, provides a common structure for studies requiring independent, peer-reviewed endpoint adjudication. In my view, it partially solves the issue of representing study endpoints where more complex “adjudication” is required; for example, in oncology study with efficacy based on tumor response.

 

Changes in metadata

Several new metadata have been introduced, along with some changes. The goal is to improve understanding of variables and their intended use, without impacting metadata included in a submission, e.g., define.xml.

So, when looking at the new SDTM IG, you will notice the following key differences among others:

  • Controlled Terms, Codelist or Format is now split into three separate columns
  • Variable Group has been added to group variables, for example Results Unit, or Results Value
  • Some information previously included the “CDISC Notes” column are now reported in the “Examples” column

 

Other Changes

New versions of IGs are also an opportunity to fix issues (such as typos) and to clarify implementation that previously caused misunderstandings. For example, additional guidance on what Specimen-based Findings domain to use under specific circumstances, such as clarifying that anti-microbial antibody testing data should be mapped to IS domain rather than MS.

Some standard variables have been deprecated, such as –BLFL (Baseline Flag) for Findings domains, and others have been added. One notable addition is –CLASI variable, particularly useful for classifying Protocol Deviations to support requirements for “ICH E3 Q&As (R1)).” This variable is now officially part of the DV domain as DVCLASI, e.g., MINOR/MAJOR. More details on planned new and deprecated variables in all Observational Classes can be found in the CDISC Wiki.

Rumors about deprecating the PP domain appear to be unfounded, as PP is still there.

 

Want to know more?

You can participate in the public review and explore the details yourself. Check here.

My former colleague Varun Debbeti has also done an excellent job in his clinstandards webpage.

A more in-depth discussion of the expected changes will be also presented at upcoming CDISC-EU Interchange in May and this time in my hometown, Milan, and co-chaired by my colleague Silvia Faini.

Cytel will be present with two oral presentations and one poster:

  • “It Got Worse Than Expected: Three Years of Retrospective CBER Requests on SDTM, ADaM, and TFLs” by Mark Malayas and Angelo Tinazzi
  • “Authenticity Matters: Preserving Standards Integrity from Clinical Data Models to Tiramisù” by Angelo Tinazzi
  • “JSON and CORE Unlocking Adoption” by Silvia Faini, Sebastià Barceló, Hugo Signol, and Angelo Tinazzi

See the here full draft agenda.

We look forward to reconnecting with colleagues from around the world, meeting new peers, and exchanging ideas at 2026 CDISC + TMF EU Interchange 2026.

See you in Milan?

A Preview of Cytel’s Contributions at PHUSE EU 2025

I can’t believe it has already been a year since we wrapped up PHUSE EU Connect 2024, and in two weeks we will be gathering another exciting PHUSE EU Connect conference, only a few kilometers from Heidelberg, where everything started twenty years ago with the very first PHUSE event. I was one of the couple hundred lucky attendees and now, twenty years later, I have the great honor of supporting Jennie McGuirk and Jinesh Patel as Conference Co-chair for this year’s edition.

With a promising agenda featuring about 190 presentations, 34 posters, 9 hands-on workshops, 2 panel discussions, and 3 inspiring keynote speakers, this year we are going to the city of Hamburg for the 21st PHUSE EU Connect. The agenda is full of topics looking toward the future, with about 40 talks and posters referring to AI in their titles, and once again open source will be the confirmed leitmotif.

Cytel will make a significant contribution this year, perhaps more than ever, with six presentations, one poster, active participation in both panel discussions, and co-chairing the “Scripts, Macros and Automation” and “People Leadership & Management” streams.

 

Monday topics: Agile code writing, extracting metadata from R OOP functions, and leadership

The week kicks off on Monday with Kamil Foltynski, who will present “Overcoming Challenges in Collaborative Spreadsheet Editing with Shiny, SpreadJS and JSON-Patch” in the Application Development stream at 11:30 am. Kamil will provide a technical deep dive into enabling real-time spreadsheet editing within Shiny applications, using tools such as SpreadJS, sharing key lessons learned so far. Following Kamil’s presentation, Eswara Satyanarayana Gunisetti, will present “Micro-Decisions, Macro Impact: The Role of Agile Thinking in Every Line of Code” in theCoding Tips & Tricks” stream at 12 pm. See his recent blog on the topic. Eswara will share how an agile “mindset” can positively influence the way we write code.

In the same stream, a few hours later at 2 pm, another colleague Edward Gillian, in collaboration with Sanofi, will present “Risk.assessr: Extracting OOP Function Details,” discussing strategies for extracting metadata from R Object-Oriented Programming functions. Prior to Eswara and Edward’s sessions, at 1:30 pm, Kath Wright, will moderate the Interactive People Leadership & Management session “Invisible Glue: Trust, Influence and The Architecture of Teamwork.” With this live workshop, attendees will engage in practical exercises to learn how to identify barriers to trust, evaluate influence dynamics, and apply evidence-based strategies to strengthen collaboration in both physical and virtual environments.

 

Tuesday topics: Industry trends, extracting macro usage and dependency information from SAS programs, and integrating ECA data into CDISC-compliant datasets

Tuesday also brings two presentations and one poster. Right after lunch at 1:30 pm, Cedric Marchand will join other industry leaders in the panel discussion “Reimagining Statistical Programming: AI, Standards & the Talent of Tomorrow.” The panel will explore how current industry trends, such as AI, open source, and the evolution of data standards, will influence the next generation of statistical programmers.

The afternoon continues at 4 pm with my young and talented colleague Marie Poupelin, who will present “From Zero to Programming Hero: How Internships Shape Statistical Programmers in a CRO” in the “Professional Development” stream. Marie is a great example of the success of our internship program, and she will share her journey from having “zero” statistical programming experience to becoming an industry-ready programmer. Thirty minutes later, at 4:30 pm, Guido Wendland will present “Which Macros Are Used in the Study?” in the “Scripts, Macros and Automation” stream, a stream co-led this year for the first time by my colleague Sebastià Barceló. Guido will discuss techniques to extract macro usage and dependency information from SAS programs; this is particularly useful for identifying potential issues or estimating the impact of macro updates.

Later, in the traditional Tuesday evening poster session, you can join my colleague Cyril Sombrin in discussing “Our Journey in Integrating External Control Arms (ECAs) and RWD for Rare Disease Trials.” There you can discuss real-world case studies on integrating ECA data into CDISC-compliant datasets, exploring the unique challenges and solutions when aligning real-world data with CDISC standards.

 

Wednesday topics: Real-time spreadsheet editing within Shiny applications and real-time validation and streamlined submissions

On Wednesday at 12 pm, Hugo Signol, another young talented Cytel statistical programmer and a product of our internship program, will present his talk “From XPT to Dataset-JSON: Enabling Real-Time Validation and Streamlined Submissions.” Building on Cytel’s experience from CDISC Dataset-JSON-Viewer Hackathon, Hugo will demonstrate a Shiny application that supports interactive exploration and real-time validation through API-based checks.

 

Meet us there!

Cytel will be at Booth 9 at the conference, where you can engage in discussions with our team or meet any of us throughout the week.

I hope I didn’t miss anyone, or anything! We look forward again to reuniting with colleagues and friends from around the world and meeting new acquaintances.

See you all in Hamburg!

How CDISC and CDASH (CRF Standards) Streamline Clinical Trials

In today’s global research landscape, clear and consistent communication is more than a necessity — it’s a strategic advantage. It is particularly critical in clinical trials, where data must speak a universal language across teams, geographies, and regulatory frameworks.

The CDISC (Clinical Data Interchange Standards Consortium) and CRF (Case Report Form) standards serve as the universal language of clinical trials, ensuring consistency, clarity, and collaboration across the entire study lifecycle. By implementing these essential frameworks, organizations can optimize data collection, management, and submission — driving cost efficiency and accelerating medical advancements.

Here, we discuss CDISC and CRF standards and how they support the design, execution, and analysis of clinical trials.

 

The need for standardization

Overall, ensuring consistent and reliable data across multiple clinical studies requires the standardization of processes, procedures, and data collection methods. This uniformity can improve data quality, facilitate data sharing and analysis, and ultimately enhance the efficiency and validity of clinical research.

There are many benefits to utilizing CDISC and CRF standards, such as:

  • Improved data quality and reliability
  • Enhanced data sharing and integration
  • Increase efficiency
  • Improved communication and collaboration
  • Support for regulatory compliance
  • Scalability and repeatability

Let’s take a closer look at how CDISC and CDASH standards help create a foundation for data collection, presentation, and submission in clinical trials.

 

CDISC Foundational Standards

CDISC (Clinical Data Interchange Standards Consortium), a global non-profit organization, develops and promotes standards for data exchange in clinical research. The CDISC Foundational Standards support end-to-end clinical and non-clinical research processes, focusing on the core principles for defining data standards, and include models, domains, and specifications for data representation.

 

FDA guidance on CDISC standards

In recent years, the FDA has clearly stated its preference for receiving both clinical and analysis data formatted in compliance with CDISC standards. This has been communicated through a series of guidance documents, correspondence with sponsors, and presentations at conferences. As a result, CDISC models have become the de facto standard for submitting data to the FDA.

As of today, the FDA requires the following CDISC standards:

  • Controlled terminology
  • SEND
  • SDTM
  • ADaM
  • Define-XML

 

CDASH: Maximizing data quality

CDASH (Clinical Data Acquisition Standards Harmonization), a foundational standard developed by CDISC, focuses on harmonizing data collection in clinical trials, providing guidance on how to design and populate case report forms (CRFs) to ensure consistent data collection across studies. These standards help maximize data quality in order to streamline processes across the entire spectrum of medical research, from crafting clinical research protocols to reporting and regulatory submissions.

CDASH Model v1.3 — the latest version — was released in September 2023.

 

Key features of CDASH

  • Provides guidance on designing and populating CRFs/eCRFs, covering all therapeutic areas and phases of clinical trials
  • Specifies standard field names, meanings, and how to fill them
  • Characterizes fields as highly recommended, conditional, or optional
  • Includes a CDASH Model and CDASH Implementation Guide

 

The benefits of CDASH

Instead of following bespoke standards, CDASH’s guidelines for CRFs/eCRFs help sponsors collect data consistently across studies. This further aids in producing data in SDTM format for submission purposes and allows regulators to review data submission packages more accurately and efficiently, identifying concerns or making approvals faster. In addition, you can remove the duplication of trials and post-marketing evaluation, improving patient centricity.

CDASH standards also provide guidance for the development of data collection tools, which are clear, understandable, and precise. Following CDASH standards ensures traceability of trial data from the time the data is collected at the site until the data is ready for final analysis and regulatory submission. This maintains the integrity of source data to support the trial’s outcome/findings.

Sponsors can further save on time required for setting up new studies following the CDASH standards as most of the data collection and associated programming can be standardized across studies.

 

CRF libraries

A Case Report Form (CRF) library in clinical trials is a collection of standardized, reusable CRFs designed to streamline data collection and management. These libraries, whether electronic or physical, offer templates and guidelines for collecting data across different trials and therapeutic areas. They ensure uniformity, accuracy, and efficiency in data collection, ultimately benefiting trial conduct and analysis.

CRF libraries can reduce the cost and time budgeted for the clinical trial database preparation by:

  • Streamlining processes
  • Reducing training
  • Accelerating clinical trials
  • Using resources more efficiently
  • Improving adaptability and consistency
  • Focusing on design

 

Final Takeaways

CDISC standards, including CDASH and CRF standards, have revolutionized the way clinical data is managed, presented, and submitted, enhancing its integrity and efficiency in clinical research and drug development. Conformance to these standards is thus a critical aspect of clinical studies to ensure uniform data collection and submission processes, ultimately bringing quality treatments to patients faster.

 

Interested in learning more?

Watch our on-demand webinar, “Boosting Efficiency with CRF and CDISC Standards”:

From Data Standards to Open Source and Beyond with AI

Key takeaways from CDISC EU Interchange and PHUSE-CSS

As clinical data science evolves rapidly, the CDISC EU Interchange and PHUSE-CSS conferences offer a glimpse into the future of regulatory submissions, standardization, and the rise of open-source tools and AI in drug development. In May, I had the privilege of attending both the events, in Geneva and in Utrecht. I’d like to share here some highlights from both conferences.

 

Data submission in Europe: EMA delays

As anticipated in my previous blog, we were waiting for further announcements from the EMA regarding the outcome of their pilot raw data submission project, for which an interim report was published last year.

Those, like me, who were expecting a final announcement were likely disappointed. The requirement for data submission to the EMA in support of drug approval has been postponed to 2028. The EMA, which was well represented at PHUSE-CSS, needs to further evaluate factors such as tools and technological impacts more broadly. At PHUSE-CSS, they showed particular interest in topics such as Dataset-JSON, BIMO, the use of tools such as R-Shiny and the {teal} framework, as well more advanced topics still under development such as the “Analysis Concept.” The pilot continues, and the EMA is seeking more volunteers. It was guaranteed that submitting data will not negatively affect or delay your drug approval!

It was clear, while discussing with EMA representatives, that a number of stakeholders within the agency still need to be convinced of the benefits of receiving datasets in addition to PDF documents and reports. Some appeared concerned about the additional time and effort required to assess submitted datasets. As we all know, updating regulations, as well as releasing new standards, require a great deal of “diplomacy” and the consensus among multiple stakeholders.

 

Open source: {teal} and R-Shiny adoption

The “open source” revolution continues to gain its momentum in our industry. At PHUSE-CSS, I attended the “{teal} Success Stories” workshop, where various sponsors, including J&J, Sanofi, Novartis, Boehringer, and Roche, shared their experiences.

I was fascinated by the solutions those sponsors have already implemented using {teal}, and how straightforward it seems to develop R-Shiny applications using the framework provided by this R package, which was originally developed by Roche.

For a deeper insight into the capabilities of this package, I recommend reading the paper presented by Roche at PHUSE US 2024.

 

Dataset-JSON pilot update

Another interesting workshop I attended at PHUSE-CSS was on Dataset-JSON, where we reviewed and contributed to a consolidated set of comments and feedback in response to the “FDA Requests for Public Comment on CDISC Dataset-JSON Standard,” which closes next week on June 9, 2025.

While the benefits of such a standard were widely acknowledged, particularly in accelerating drug approval and improving overall interoperability, the discussions also highlighted potential risks and implementation challenges. These included concerns about numeric precision when importing Dataset-JSON to and from SAS, as well as handling special characters.

We therefore emphasized the need for the FDA to provide additional guidance to support future adoption; there was also interest in possible future extensions of Dataset-JSON, such as the inclusion of more metadata and the potential to embed define.xml.

 

BIMO

BIMO was the focus of another PHUSE-CSS workshop. Among all the various topics discussed, such as the presentation of the BIMO PHUSE template reviewer guide, it was particularly interesting to learn that PHUSE will soon be developing a dedicated FAQ to support sponsors and CROs on the gray area of the BIMO requirement such as the definition of “major” studies, currently the object of BIMO requirements.

 

CDISC 360i reboot: Toward an end-to-end digital pipeline

The CDISC 360 initiative is back, and stronger than before, with a major shift toward a fully digitalized and standard-driven clinical development lifecycle. The goal is to break down silos through the application of standards such as the USDM and Biomedical concept. The mission is ambitious, but unlike when CDISC 360 was first launched, we now have more mature standards and technology to support it.

 

Use of AI to support clinical standards

AI remains a hot topic, and as in 2024, a full session was dedicated to it at the CDISC EU Interchange. The common theme across most presentations was the use of generative AI to support the implementation of data standards, such as AI acting as a subject matter expert (SME) for study teams. Although many of the showcased solutions from Argenx, SGS, and AstraZeneca are still in beta, they clearly demonstrate how proper model training can enhance search and navigation within complex data standards libraries, or help manage complex, multidimensional data (e.g., omics, wearable biosensors). Other AI use cases were also featured in several posters at PHUSE-CSS; for example, the application of AI to generate synthetic data or automate local lab ranges.

 

Other topics

For topics such as Digital Data Flow and USDM, I’ll refer you to the LinkedIn newsletter “View From The Coffee Shop,” curated by my friend Dave Iberson-Hurst. In it, he regularly shares insightful thoughts and updates on the ongoing digitalization efforts within our industry. He also summarized some key takeaways from both the CDISC EU Interchange and PHUSE-CSS.

I also had the opportunity to see good use cases of Analysis Results Standards (ARS) at CDISC EU Interchange, showing this relatively new standard have been well perceived by sponsors as well as vendors.

On the regulatory side, aside from the news from the EMA, I found the presentation from Sanofi and GSK particularly interesting. It covered a cross-industry initiative aimed at harmonizing vaccine regulatory submission to FDA-CBER, by sharing experience with this unique division, which often has its own set of sometime unexpected requirements (see also my previous blog on submission experience with FDA-CBER).

For other topics, see also here official CDISC posts for other conference sessions content:

 

Ongoing innovation

Overall, both conferences continue to showcase ongoing innovation in our Industry. It’s clear that change is happening at a pace I have never seen before in my 30-year career, and that’s good for patients, as well as an exciting time for those of us working in biometrics.

 

Interested in learning more?

Download Angelo’s new ebook, The Good Data Submission Doctor on Data Submission and Data Integration to the FDA, a collection of Angelo’s most critical insights on clinical data standards submission to the FDA, including key updates from the new FDA Study Data Technical Conformance Guide:

A Preview of Cytel’s Contributions at the 2025 CDISC + TMF EU Interchange

This week, on May 14–15, the CDISC + TMF EU Interchange 2025 will take place, just a few steps from our Cytel’s Geneva office.

This year at Cytel, we’re making the event even more special! We will be hosting a pre-conference, open to anyone able to arrive into Geneva before the event begins.

But that’s not all, together with my Cytel colleagues we will have three presentations and one poster, where my colleagues will share insights from their work with CDISC standards, including the Trial Master File Standards Model (CDISC-TMF). And it is from the contribution of my colleague Caroline Terril that I’d like to start with some anticipations of what the four Cytel presentations will be about.

 

Key Considerations for Biometrics CROs Not Managing the TMF — The Journey So Far

Caroline Terril, Thursday, May 15, 10:00–10:30 — Session 5E: TMF Management (TMF Track)

If you’ve ever asked yourself after these last three years what really matters when it comes to managing the TMF with CDISC — especially for biometric CROs that don’t directly manage TMFs — then my colleague Caroline Terril might have the answer. In her presentation, she will delve into our journey so far in trying to adopt the CDISC-TMF standard.

 

The Curious Case of External Controlled Arms (ECA): Practical Solutions for External and RWD Integration

Gautham Selvaraj (co-authored by me), Wednesday, May 14, 09:30–10:00 — Session 5C: Real World Data

At Cytel, we’ve seen increasing use of external control arms (ECA) in sponsor projects. In this presentation, Gautham Selvaraj will walk through two real-world case studies on integrating ECA data into CDISC-compliant datasets, exploring the unique challenges and solutions in aligning real-world data with CDISC standards.

Interested in learning more about ECA clinical trial design? Explore more of Cytel’s offerings and insights:

 

Governing the Ungovernable: Can a CRO Effectively Govern Its Standards?

Angelo Tinazzi, Thursday, May 15, 14:30–15:00 — Session 7C: Applied Standard Governance

Are you a CRO struggling with different Sponsor Interpretation of Data Standards? Or perhaps across multiple therapeutic areas or indications? Hard life, isn’t it?

Spoiler alert: although there’s no “magic” in my presentation, and no AI involved, I will offer practical insights into the complexities of CRO data standard governance. Sponsors are also welcome to join to see what life looks like from the other side of the barricades!

 

Managing SDTM Mapping Challenges in Multi-Study Portfolios: A Guide to Standards and Consistency

Jing Zhang and Marianne Dutfoy, Wednesday, May 14, 12:30–13:30 — Poster Session

In their poster, Jing Zhang and Marianne Dutfoy offer guidance for navigating SDTM mapping across multi‐study portfolios. They’ll address challenges such as inconsistent CRFs, variations in source data, and the hurdles of aligning historical studies with newer versions of standards.

 

Interested in learning more?

Download Angelo Tinazzi’s new ebook, “The Good Data Submission Doctor on Data Submission and Data Integration to the FDA”:

From Toplines to Triumph: Visualizing the Pathways to Regulatory Approval

Achieving positive topline results in a clinical trial marks a critical milestone in the drug development process, yet it is far from the end of the submission journey. Instead, it signals the start of a complex, fast-paced effort to prepare for regulatory submission and navigate the FDA’s multi-stage review. The final “regulatory defense” stage demands rigorous collaboration, meticulous planning, and adaptability to meet the expectations of regulatory agencies.

Here we discuss the key stages in the post-topline journey, exploring key milestones, unexpected challenges, and best practices for ensuring a strong submission and a smooth path to approval.

 

1. The Preparation: Post-topline readiness and strategic planning

The preparation phase begins immediately after topline results are available. During this critical window — often lasting several months — cross-functional teams shift their focus to assembling the final submission package. Statisticians and programmers play a central role here, finalizing the tables, listings, and figures (TLFs) that will populate the Clinical Study Report (CSR) and preparing submission-ready datasets following CDISC standards, including ADaM, SDTM, and associated documentation.

In parallel, a pre-BLA or pre-NDA meeting with the FDA is typically scheduled to align on expectations, identify potential concerns, and set the foundation for a smoother review process. This phase is not just about document generation; it’s about establishing a strategy, anticipating regulatory scrutiny, and ensuring the submission is both complete and compelling. The quality of the groundwork laid here often dictates the ease — or difficulty — of the phases that follow.

 

2. The Submission: Crossing the threshold to regulatory review

Once the submission is filed, the process transitions into a more structured phase governed by the FDA’s review protocols. The agency begins with a 60-day filing review to assess whether the BLA or NDA is complete and acceptable for full review. If so, the sponsor receives a Day 74 Letter, which provides early feedback, flags any immediate concerns, and confirms the Prescription Drug User Fee Act (PDUFA) date — typically 10 months post-filing for standard reviews or 6 months for priority reviews. Although this phase may seem procedural, its significance is high. A clean, well-organized submission can streamline the review process, limit questions, and reduce the risk of delays. This is also the point where rolling submissions, if applicable under Fast Track designation, can offer a tactical advantage by accelerating document delivery and potentially shortening review timelines.

For statistical and programming teams, this is not a time to sit back and relax — it’s an opportunity to ensure internal alignment and anticipate questions the FDA may raise based on known data complexities. Strong documentation and traceability within datasets and outputs are essential at this point, helping to support any needed follow-up. Proactive communication and readiness during this phase help lay the groundwork for the more intensive regulatory engagement that follows.

 

3. The Regulatory Defense: Responding, clarifying, and defending your data

The regulatory defense phase is where the bulk of agency interaction occurs — and where flexibility and responsiveness become essential. During this time, the FDA may issue multiple information requests (IRs), asking for clarification on statistical methodology, specific data points, or safety and efficacy outcomes. Mid-cycle communications, typically occurring around months 4–5 for standard reviews, offer a formal opportunity to assess the review’s progress and surface any significant concerns.

In some cases, the agency may convene an Advisory Committee (AdCom) meeting to gather expert input, particularly when there are outstanding safety questions or complex benefit-risk considerations. Throughout this phase, the ability to quickly respond to ad hoc requests, provide high-quality data outputs, and maintain close collaboration across functions is critical. It’s a high-stakes stage where well-prepared teams can help preserve timelines and ensure the submission stays on track.

 

4. The Unexpected: Adapting to setbacks and charting a new course

In some cases, the regulatory journey doesn’t lead directly to approval. If the FDA identifies significant deficiencies in the initial submission — whether related to clinical data, statistical interpretation, manufacturing, or safety — it may issue a Complete Response Letter (CRL). This marks a temporary halt in the process, requiring the sponsor to address the concerns before resubmission. Depending on the scope of the deficiencies, the resubmission may fall under Class I (minor issues, reviewed in 2 months) or Class II (major issues, reviewed in 6 months).

For statisticians and programmers, this could mean conducting additional analyses, integrating new data, or adjusting the structure and presentation of the submission package. While a CRL can be a setback, it’s also an opportunity to recalibrate, seek additional guidance from the FDA, and improve the likelihood of approval in the next cycle. The key is to approach this phase with transparency, strategic thinking, and a readiness to adapt and respond.

 

Final takeaways

The path from topline results to regulatory approval is rarely linear. Timelines can range from as little as 12 months in expedited reviews to over 30 months in cases involving major deficiencies and resubmissions. Success in this post-unblinding phase hinges on proactive planning, adaptable resourcing, and the ability to respond quickly and thoroughly to regulatory needs. Equally important is collaboration across functions — clinical, regulatory, biostatistics, programming, and operations must work closely and cohesively to anticipate challenges, align timelines, and respond efficiently to agency requests. Whether following a standard or accelerated route, the shared priority is a comprehensive, high-quality submission that stands up to regulatory scrutiny — and ultimately supports timely access to new therapies for patients.

 

Interested in learning more?

Watch Jasperlynn Kao and Florence Le Maulf’s recent webinar, “From Toplines to Triumph: Visualizing the Pathways to Regulatory Approval”:

Data Submission to Health Authorities: Current Practices and Future Directions

How far is 2041? Update on data submission to health authorities

Back in the summer 2023, I was invited to present “Standards and Open-Source Hand-in-Hand: Leveraging Automation to Expedite Drug Market Request Review Process” at PharmaSUG-China. I was trying to imagine the future of data submission, travelling to 2041 and envisioning how AI can support and expedite the regulatory drug submission process, and how AI could enhance the preparation and review of data submission packages. I then brought the discussion back to the present, sharing some reflections on the journey ahead — a journey that will inevitably require better use of standards, open-source adoption and solutions, and collaborative industry initiatives.

About 18 months later, the topic of AI became predominant in our industry. This is clearly reflected by the growing number of AI-related presentations at conferences, including the recent PHUSE US Connect Conference held last March in Orlando, and the upcoming CDISC-EU Interchange this May, just a few steps from our offices here in Geneva.

Here, I would like to provide a brief overview of the latest updates on data submission requirements, as well as industry initiatives aimed at improving how we create clinical data packages for submission to health authorities in support of market drug approval.

 

FDA data submission requirements update

Regulatory data submission requirements, more specifically those from the US FDA, have been refined through various updates of their guidance. Since my January 2024 summary of the latest changes, the following additional requirements have been added:

 

  • Submit a dataset, LC, copy of LB with US conventional unit as standard unit (March 2024)
  • Viral load results should be placed in the MB domain, confirming there is still misuse of specific laboratory related data domain e.g., LB, IS and MB (October 2024)
  • The requirement for US conventional unit was recently extended to ADaM, with ADLC dataset (March 2025)

 

See here the latest March 2025 version of the FDA Study Data Technical Conformance Guide.

It’s also worthwhile to mention the FDA’s “Protocol Deviations for Clinical Investigations of Drugs, Biological Products, and Devices,” which provides various recommendations around the management of protocol deviations. This includes some specific recommendations for SDTM mapping, such as including a variable in the DV domain that provides the sponsor’s determination of whether the protocol deviation was important.

 

EMA data submission requirements

While the EMA has not made data submission mandatory — nor specified a required data format — the European Medicines Agency launched the “Raw Data” pilot proof-of-concept project about two years ago. In this initiative, selected applicants were invited to submit structured clinical trial data as part of their initial applications and post-authorization procedures. Clinical trial data in this context refers to individual patient-level data, including:

 

  • Clinical laboratory results
  • Images
  • Medical records

 

The aim of the pilot is to assess whether the use of structured clinical trial data can help speed up and improve the drug assessment process.

An initial outcome of the project was published in a report released last October. It summarizes lessons learned from five data submissions received between September 2022 and December 2023, out of the ten originally planned. Among the key learnings and outcomes, CDISC standards, namely SDTM and ADaM with define.xml and a data reviewer’s guide, were confirmed as suitable formats for data review. The software tools being explored included SAS and R for statistical analysis, and SAS JMP Clinical for visualization. While SAS XPT files were required, other transport formats such as XML or JSON were also accepted, upon mutual agreement between EMA and the applicant.

Although these standards and formats are not yet mandatory, additional guidance has been provided in a Q&A document (e.g., regarding maximum data package size). Since then, the EMA has decided to extend the project’s duration. Final recommendations are expected in 2025 — potentially with some early updates to be shared at the upcoming CDISC EU Interchange in May.

 

Industry initiatives update

Since my speech at PharmaSUG-China, the industry initiatives I discussed there have progressed quite rapidly:

 

   The R Pilot Submission Experience: All four planned pilots have been completed, and in February a fifth pilot was announced. This time, the goal is to establish the new dataset-JSON format as a CDISC standard for clinical data submissions (see here a report from successful pilot submitting data with the new format to the FDA.

•   R Packages for SDTM and ADaM: Both the SDTM (oak) and ADaM (admiral) R packages are now widely used in our industry for submission projects.

•   The Analysis Results Standard (ARS): The first version of the ARS was released in April 2024, along with a new initiative, the eTFL Portal, which shares examples and templates for the most common TFLs.

•   The CORE Project: The project continues its mission to develop Open Conformance Rules, alongside a growing number of Open Source initiatives.

 

Interested in learning more?

Get your copy of Angelo Tinazzi’s latest ebook, “The Good Data Submission Doctor on Data Submission and Data Integration to the FDA”:

Advancing Clinical Data Standards: Guidance, Regulations, and Key Standards Developments

This last year has been marked by new standards, industry templates and initiatives, and regulatory guidance — an intense year for Clinical Data Standards Development.

And it can certainly be challenging for sponsors to keep up with the evolving landscape: Standard Development Organizations (SDOs), such as the CDISC, advanced clinical data standards; regulatory agencies like the FDA added specific requirements; and industry initiatives such as PHUSE worked to clarify and address different stakeholders’ expectations through the development of white papers.

Here I provide a quick summary of what was released in the year we have just left behind.

 

Regulatory guidance

While the industry awaits the EMA’s long-anticipated clinical data standards submission requirements (see the current status of their pilot project here), the FDA celebrated the 10th anniversary of its Study Data Conformance Guide last April. If you missed it, check out my previous post “New FDA Data Submission Requirements and Substantial Changes,” which discussed a decade of guidance improvements and the latest requirements. Since then, the FDA has released two new versions of its Clinical Data Standards Guidance. These updates do not introduce major new requirements other than confirming that SDTM and ADaM Medical Devices IG for both CDER and CBER FDA divisions “align with FDA current business needs.”

It’s also worthwhile to mention the recent release of the FDA’s Protocol Deviations for Clinical Investigations of Drugs, Biological Products, and Devices. The guidance provides various recommendations around the management of protocol deviations, including some specific recommendations for SDTM mapping, such as including a variable in the DV domain that provides the sponsor’s determination of whether the protocol deviation was important. See also my friend Eanna Kiely’s LinkedIn post.  

 

CDISC standards

While continuing to advance initiatives aimed at full clinical trial study data digitalization with for example the release of the third version of the CDISC USDM (CDISC Unified Study Data Model), throughout 2024, CDISC continued to release new standards, update existing ones, and advance cross-industry initiatives such as CDISC CORE.

Notable examples of new releases in 2024 include the first version of the CDISC Analysis Results Standard, aimed at providing a framework for linking analysis results directly to the data and metadata that support them. Several presentations and workshops were held at various events over the past two years to promote first the idea and then gather feedback from the audience prior to its final release last April.

Later in the year, similar to the eCRF Portal, an eTFL Portal was also released. This resource includes examples of the most common analysis table layouts, complete with table shells, input ADaM datasets, define.xml, and the JSON version of the analysis results along with metadata. While not a regulatory requirement, this standard adds an important missing piece in our industry’s “dream” end-to-end goal.

It is also worth mentioning the first version of the CDISC Tobacco Implementation Guide (TIG), developed in collaboration with the FDA’s Center for Tobacco Products (CTP). This guide aims to assist tobacco companies in implementing CDISC foundational standards, such as CDASH, SDTM, and ADaM. If you are working at a Tobacco company, check also the recent launched TIG eSubmission Pilot.

Other highlights include updates to existing standards and new resources added to the CDISC Library (see the full list of resources here). Regular updates to CDISC Biomedical Concepts also became a key focus of the CDISC Library.

For a look ahead, check out the CDISC Standards Timeline for 2025.

 

Industry initiatives

PHUSE continued its work through dedicated Working Groups (WGs) focused on Optimizing the Use of Data Standards. In addition to maintaining a peer-reviewed QA for SDTM ADaM Implementation, these WGs have released two White Papers (WPs) summarizing the Data Standards challenges, nowadays and looking ahead with new trial designs, and how companies implement and govern Data Standards.

Both WPs are based on the outcomes from two surveys conducted during the 2022 and 2023 PHUSE Computational Science Symposium (CSS), with the second one in particular proposing some potential governance best practices across the industry aiming to establish some implementation consistency across companies. It’s worthwhile to mention that these WPs share that about 50% of respondents “affirmed” they do not have a dedicated data standards team or were unaware if one existed in their company. Among those with data standards teams in place, there was significant variation in structure, ranging from cross-functional groups to teams focused on specific functions. Only a few centralized teams addressed topics beyond foundational CDISC standards, such as SAPs and TFL generation. The survey also highlighted that many organizations, more than 50%, rely on Excel or internally developed tools for managing standards, as commercial metadata repositories (MDRs) are often seen as complex and require long-term investment. These challenges remain consistent across pharma, biotech, and CROs.

Applying CDISC standards to non-interventional studies and real-world data continues to be a challenge due to diverse data sources and scenarios, such as the use of external control arms. This was clear when I had the opportunity to host a CDISC ADaM training at one company in Europe specializing in Observational Studies.

The release of the CDISC Considerations for SDTM Implementation in Observational Studies and Real-World Data v1.0 in February addressed some of the SDTM mapping issues. Similarly for the previously released PHUSE Data Standards for Non-Interventional Studies, which covers additional ADaM-related topics, the FDA Data Standards for Drug and Biological Product Submissions Containing Real-World Data guidance (December 2023) provided some initial recommendations when submitting data packages including real-world data.

With the release of the first version of the CDISC Dataset-JSON in 2023, the long-awaited alternative to SAS XPT as a standard for data submission took a significant step forward. Following the release of the first version, in collaboration with the FDA and CIDSC, PHUSE completed a pilot project in 2024, with a full report made available last June. Last December, CDISC released an updated version (1.1) of the standard. Among recommendations from testers, some bugs were detected, for example, when importing or exporting to SAS, which includes some known differences between software such as SAS and R with numeric precision and date representation, or limitations requiring updates to analytics tools used by the FDA (e.g., SAS JMP). Details of the changes implemented or items rejected from the public review can be found here.

In 2024, open source continued to be a leitmotiv in our industry, with tools and initiatives aimed at regulating the use of open-source solutions in clinical data submission (see also Cytel blog “The Journey into Open Source So Far”). Also, check the progress of industry initiatives such as CDISC COSA.

Furthermore, the Using R to Submit Research to the FDA initiative completed the first part of its final Pilot 4. While the focus of the three previous pilots was on demonstrating that SDTM and ADaM packages could be created and submitted to the FDA using R — including packages that might include Shiny applications for reviewers to use — this pilot explored the use of novel technologies such as Linux containers and WebAssembly to bundle a Shiny application into a self-contained package, facilitating a smoother process of both transferring and executing the application, allowing agency reviewers to easily run and evaluate software without complex setups.

 

New releases and updates from 2024

See below a complete list of new releases and updates that occurred in 2024 with links to individual resources. Do not hesitate to spot any missing items!

 

February

 

March

 

April

 

June

 

September

 

October

 

December

 

Interested in learning more?

Download Angelo Tinazzi’s new ebook, “The Good Data Submission Doctor on Data Submission and Data Integration to the FDA”:

The Journey into Open Source … So Far!

Written by Sebastià Barceló, Malte Stein, and Angelo Tinazzi

Open source has been a leitmotif in our industry for many years now, but its adoption poses a number of challenges. At Cytel, our journey into open source began a couple of years ago. Since then, we have focused on building a dedicated Statistical Computing Environment (SCE), defining new processes, and developing new tools to support these processes. Additionally, we also contributed to industry initiatives such as the R {admiral}.

This year, PHUSE-EU will feature a dedicated stream, Open-Source Technology, where presenters will share their experience with open-source technology adoption. In this spirit of collaboration, we will be contributing with two presentations, both addressing critical aspects:

  • The co-existence of R and SAS in the same SCE
  • The risk assessment of R packages

 

Integrating RStudio POSIT and SAS in the same environment

Our new SCE integrates RStudio POSIT and SAS Grid across both Windows and Linux servers. The integration was designed to create a unified and efficient environment for data analytics, leveraging both SAS and POSIT’s capabilities.

The integration was complex and presented several obstacles and surprises along the way. For instance, we encountered compatibility issues, particularly around data access and permissions. To address these, we implemented dual protocol drive, enabling real-time data sharing across platforms, and the use of Git as a version control system, which allows us to maintain and publish content in Connect in a more robust and secure way.

Additional challenges in managing this SCE include balancing security with usability for internal and external access to POSIT Connect and optimizing R package management.

Figure 1 illustrates the final infrastructure.

 

 

R packages risk assessment

Installing and using R packages in the SCE requires assessing the risks associated using these packages. These packages are typically accessed through CRAN, the primary source for R packages developed by various organizations and individuals. Risk assessment is especially critical in industries like pharmaceuticals, where strong compliance requirements (e.g., GxP), necessitate that packages are well maintained, documented, and, after all, reliable.

A key aspect of the risk assessment is the collection of packages metadata, enabling us to classify and assess the reliability of all potential packages we will want to make available in our SCE.

At Cytel, we applied a comprehensive assessment approach by extracting metadata from R packages. We began by evaluating various techniques, such as APIs and web scraping, and compared our approach with the R riskmetric package. This comparison highlighted limitations in conventional methods, which often only address the latest package version. As a result, we enhanced our metadata extraction process.

 

Interested in learning more?

If you are attending the PHUSE-EU in Strasbourg from November 10–13, do not miss Sebastià and Malte’s poster and presentation, where the co-existence of R and SAS and our approach to extracting metadata from R packages will be discussed in more detail:

 

“Bridging Platforms: Integrating RStudio POSIT and SAS Grid in the Same Environment”

Cytel presenters: Sebastià Barceló and Malte Stein

Tuesday, November 12, at 5:30 p.m. (Poster Session – PP28)

 

“Unveiling R Package Risk Assessment: A Comparative Analysis of Metadata Extraction”

Cytel presenters: Malte Stein and Sebastià Barceló

Wednesday, November 13, at 1:30 p.m. (Open-Source Technology Stream – OS14)

 

Angelo Tinazzi will moderate the Scripts, Macros and Automation stream, which will also cover some open-source experiences from other organizations.

 

Cytel will be at Booth #6! We hope to see you there!

Visualizing ADaM: A Practical Guide Through Examples

Written by Silvia Faini, Principal Statistical Programmer, in collaboration with Angelo Tinazzi

 

An important component of clinical trial data submission to the FDA is CDISC’s Analysis Data Model, or ADaM, which defines dataset and metadata standards. However, implementation of ADaM is not always straightforward, leading to inaccuracies and inconsistencies.

At this year’s CDISC EU+TMF Interchange in Berlin, I had the opportunity to present “visualizing ADaM: A Practical Guide Through Examples,” co-authored with Angelo Tinazzi, where I shared how visualizing ADaM provides a guided approach that can address these issues and streamline the process.

Here, we share some of the key takeaways.

 

 

Why a visual approach to ADaM?

Over the years, the CDISC and CDISC ADaM teams have released additional documents with handy ADaM use cases to the Implementation Guidance (IG), demonstrating how ADaM can be used to support the most common statistical methods or specific settings such as medical devices studies, while maintaining good traceability. Additionally, several CDISC Therapeutic Area User Guides (TAUGs) provide specific analysis examples addressing various ADaM requirements in those particular settings.

Typically, the ADaM datasets development process begins with study documents such as the Statistical Analysis Plan (SAP), accompanied by applicable CDISC ADaM guidance. By leveraging gathered knowledge and support from the company’s data governance structure, which includes tailored templates, guidance, and subject matter experts (SMEs), the statistical programmer initiates the design of ADaM datasets needed to support the analytical outputs outlined in the SAP. Despite continuous efforts to support team members with regular updates, we frequently encounter incorrect or inconsistent implementation of the standard, across multiple clients and therapeutic areas.

To enhance the implementation of ADaM by statistical programmers at Cytel, we have developed visual shells based on our standard SAP table shells. These visual shells incorporate annotations to illustrate the ADaM dataset and variables to be utilized, variables to group or categorize, as well as any filters and additional conditions. These visual shells are accompanied by sample ADaM datasets and corresponding standard specifications tailored for Cytel automation tools.

Furthermore, our development efforts extend to slide sets designed to train programmers through practical examples, ensuring a comprehensive understanding of ADaM’s application.

 

A simple example: A demographics table

The following example makes use of the standard ADSL dataset, with its standard variables, either copied from SDTM or derived in ADaM. The table shell is completed with details of variables to be selected and the rationale. For example, this demographic table filters for the safety population; as such, we expect to have in the ADSL dataset a variable containing the actual treatment received. In this case, we choose the TRT01AN variable, and the variable to filter for the applicable subjects, SAFFL. Furthermore, given the fact that our standard tools work well with numeric variables, in addition to character versions of variables such as SEX, AGEGR1, etc., we need to plan to include variables SEXN and AGEGR1N.

Figure 1: ADSL Dataset

 

ADaM Class: ADSL

Because the table is on the Safety Population subset (SAFFL=Y), the Actual Treatment Variable should be used as column group (e.g., TRT01AN)

Despite numeric version of variables such as SEX and RACE are permissible, it is a good practice to also add the numeric version to facilitate tool automation and the desired sorting in the outputs (not alphabetic) e.g., SEXN and RACEN.

In addition to AGE analyzed with continuous descriptive stats (copied from SDTM), the age is required to be analyzed in category, and the standard ADSL variable AGEGR1 and its numeric version AGEGR1N are added to ADSL.

 

Treatment-emergent adverse events table

For the analysis of treatment-emergent adverse events, we discussed two types of outputs.

In the first output, the requirement is to analyze the occurrence of the treatment-emergent adverse events and their incidence using a hierarchical medical dictionary, MedDRA, through which we summarize the occurrences by system organ class and preferred term.

 

Figure 2: ADAE Dataset

ADaM Class: OCCDS / Sub-Class: ADVERSE EVENT

In addition to the Safety Population subset (SAFFL=Y), and the appropriate treatment variables (e.g., TRT01AN), the Actual Treatment Variable should be used as column group (e.g., TRT01AN).

ADVERSE EVENT is a sub-class of the OCCDS class, as such the variables AEBODSYS and AEDECOD became “Required.”

For this type of analysis only the treatment emergent adverse event (TRTEMFL) should be used.

In the second output, our objective is to provide an overview of the types of adverse events that occurred. This includes determining the number of subjects who experienced at least one adverse event, the number who experienced at least one serious adverse event, identifying the most severe adverse event, and quantifying the subjects who experienced adverse events leading to either treatment or study discontinuation, among other metrics.

We could have used the same ADAE dataset created for previous output and applied selection and calculation in the analytical output program. However, to improve traceability, reproducibility (quality control), and to make the ADaM dataset analysis as ready as possible, we also have the option to create another ADAM dataset, ADAESUM, derived from ADAE, and applying a BDS structure. The annotated output shows both versions, with ADAESUM (BDS) and with ADAE (OCCDS).

Again, our example provides detailed explanations and an extract of the ADAESUM dataset.

 

Figure 3: ADAESUM Dataset

Option 1 – ADaM Class: OCCDS / Sub-Class: ADVERSE EVENT

Filters on specific variables, e.g., TRTEMFL, AESER, AEACN.

 

Option 2 – ADaM Class: BDS

In addition to the Safety Population subset (SAFFL=Y), and the appropriate treatment variables (e.g., TRT01AN), the Actual Treatment Variable should be used as column group (e.g., TRT01AN).

Each condition needed for the summary output can be represented by a specific PARAMN / PARAMCD / PARAM, with AVAL (AVALC) containing whether the condition for the subject was satisfied or in case of severity the AVAL will contain the maximum observed severity among all AE each subject had. PARAMN will be also used to display each condition in the order as per planned table shell.

 

Change from baseline with phantom baseline visit

In this last example, I presented a table summarizing the change from baseline at each visit for vital sign parameters. For each visit, we present summary statistics for the actual observed value and the change from baseline. The peculiarity of this example is the baseline definition, defined by SAP as the average between observed value at screening and day-0 visit.

 

Figure 4: ADVS Dataset

ADaM Class: BDS

Safety Population is used (SAFFL=Y), but results are presented without any split/group by treatment (we do recommend keeping TRT01AN in ADVS).

The Analysis Visit is derived in ADVS (AVISITN / AVISIT). This might be derived from SDTM VISITNUM / VISIT by applying some change in the wording to fulfill SAP requirement or apply some visit-windowing. As per the example dataset below the baseline visit (AVISITN=0 / AVISIT=Baseline) is a derived record / visit.

AVAL and CHG are used for observed and change from baseline respectively. CHG is calculated from AVISIT=Baseline.

Not all records / visit will be used in analysis (ANL01FL=Null) but kept in ADVS for traceability.

 

From the above ADVS dataset:

  • Lines 3 and 9 show the derive baseline visit with DTYPE variable, which identifies it and the method used for the derivation (AVERAGE). ABLFL is set to ‚ÄúY.‚Äù
  • Lines 1, 2, 7, and 8 will not be used in the analysis (ANL01FL=Null) because they occurred before the phantom derived ‚Äúbaseline‚Äù visit, but were kept in the ADaM dataset showing traceability, e.g., from which the baseline visit was derived.
  • Lines 5 and 11 will not be used in the analysis (ANL01FL=Null) because they were unscheduled visits. However, the records are kept in the ADaM dataset showing traceability so that reviewer is aware which information was not used in the analysis. Eventually, if AVISIT/AVISTN are derived using some windowing.
  • All records post baseline have the change from baseline (CHG) derived, including the unscheduled visits.

 

Key takeaways

By visualizing ADaM, the choice of the ADaM structure is guided and lets the programmer select the proper dataset structure, check the need of specific variables (e.g., population flags, treatment variables), and check the presence of the data required in the analysis (e.g., collected/derived parameters, hierarchical variables).

The visual approach is highly beneficial to ADaM newcomers, streamlining ADaM specifications and programming and standard outputs production, and it gives more time to focus on non-standard outputs, which are usually more challenging than standard ones.

Internally, this is another step to improve the Cytel automation tools suite (Lighthouse, ALPS, PRISM) and to move toward a more efficient process.

“Sharing is caring” — I feel this motto well captures my feeling when presenting at conferences. It is always a great experience: sharing what we implemented or how we overcame common challenges allows a good discussion with the attendees.

 

Interested in learning more?

Download Angelo Tinazzi’s new ebook, “The Good Data Submission Doctor on Data Submission and Data Integration to the FDA”: