Dataset-JSON and CORE: Unlocking Adoption Through Practical Implementation


June 18, 2026

Regulatory data standards continue to evolve so that clinical trial data can be exchanged, reviewed, and validated more efficiently. Within the CDISC 360i vision, two important components of that future are Dataset-JSON and CORE, the CDISC Conformance Open Rule Engine. While each has value on its own, their combined use creates a particularly compelling opportunity: a modern, machine-readable exchange format that can directly support automated conformance checks on submission-ready data.

At Cytel, this connection was the starting point for a pragmatic adoption strategy that we presented as a poster at the CDISC Europe Interchange 2026.

Rather than treating Dataset-JSON and CORE as separate initiatives, we explored them in parallel, with the objective of understanding their synergy, identifying potential process improvements, and preparing internal teams for future regulatory use.

 

Why Dataset-JSON and CORE belong together

Dataset-JSON is designed as an alternative exchange format for clinical datasets, complementing the long-established SAS XPT format. Its relevance has increased following pilot work conducted by CDISC and PHUSE with the U.S. FDA, including CDER and CBER. Dataset-JSON v1.1, released in December 2024, incorporated improvements identified during pilot testing, and in April 2025, the FDA issued a public notice requesting comments on the adoption of Dataset-JSON v1.1 as an exchange standard for regulatory submissions, and Dataset-JSON is now included in the CBER-CDER Data Standards Program Action Plan.

CORE is the other side of the equation. It provides an open framework for executing conformance rules, helping organizations move toward more transparent, repeatable, and automated validation. When paired with Dataset-JSON, CORE can support continuous validation throughout the study lifecycle, not only at final submission.

 

Building from available open-source tools

The work began with the assessment of tools developed through the CDISC Open-Source Alliance, including outcomes from the COSA Dataset-JSON hackathons. Cytel evaluated both SAS and R implementations for generating Dataset-JSON.

On the SAS side, the team assessed macros developed by Lex Jansen for converting SAS datasets to Dataset-JSON.

On the R side, the team evaluated the Atorus {datasetjson} package, which supports JSON read/write operations and metadata assignment through setter functions.

The process included metadata preparation, Dataset-JSON writing, reading JSON back into SAS or R, dataset comparison, and Dataset-JSON validation.

 

What the testing demonstrated

The testing phase showed that a low-risk transition to Dataset-JSON is feasible, and the work showed no data loss during conversion and demonstrated scalable performance across multiple packages.

Some practical differences between the SAS and R implementations were identified. For example, no direct conversion back to sas7bdat allowed in R, while encoding was handled at the session level in SAS and at the function level in R. These types of findings are valuable because they help organizations understand where process controls, documentation, or implementation guidance may be needed.

The broader conclusion was clear: Dataset-JSON adoption does not need to be disruptive. A parallel production approach allows organizations to maintain regulatory flexibility, continue existing submission processes, and develop expertise incrementally.

 

Extending the work to visualization and CORE

During a second COSA virtual hackathon in 2024, focused on a Dataset-JSON Viewer, Cytel contributed an R Shiny application for interactive exploration and visualization of JSON datasets.

That viewer experience then became the foundation for extending capabilities to support CORE validation execution. The result was CORE-TO-ACT, an internal solution designed to integrate CORE execution into a more interactive and user-friendly environment.

This is an important step. Validation reports can be difficult to interpret when they are separated from the data context. By combining Dataset-JSON, a viewer, and CORE execution, we created a workflow that supports issue overview, affected record and variable highlighting, reruns on updated SDTM data, and tracking previous comments. This brings validation closer to the day-to-day review process and makes findings easier to investigate.

 

 

Lessons from CORE validation testing

We tested CORE reports using CORE v0.16.0 and compared outputs across formats and tools. In one test case, the team compared CORE reports generated from SAS XPT versus Dataset-JSON datasets. The expected result was a complete match; the observed result was approximately 90% alignment. Root-cause analysis showed that Dataset-JSON improved floating-point handling, reducing certain issues compared with SAS XPT. Other differences were linked to incorrectly encoded characters in SAS XPT. In this case, differences between outputs were not simply problems to be fixed; they also highlighted where Dataset-JSON may offer technical advantages.

A second test case compared to rules available in both Pinnacle 21 Community and CORE. Overall, the tools showed generally consistent results, but differences appeared at the “rule” interpretation level. For example, FDA Business Rules can map multiple CORE rules, and similar complexities may exist in Pinnacle 21 Community. This makes one-to-one comparison challenging and reinforces the need for careful interpretation when comparing validation engines.

These findings are especially important for adoption planning. Organizations should not expect immediate perfect alignment between tools, formats, and rule frameworks. Instead, they should design testing approaches that identify differences, classify root causes, and feed lessons learned back into internal guidance.

 

A pragmatic implementation strategy

Our approach focused on three practical goals:

  • ensuring business continuity
  • maintaining regulatory flexibility
  • building internal expertise

Business continuity is preserved by running Dataset-JSON alongside established SAS XPT workflows rather than replacing them prematurely. Regulatory flexibility is maintained by preparing for Dataset-JSON adoption while continuing to support current expectations. Internal expertise is built through hands-on testing, tool evaluation, discrepancy investigation, and dialogue with standards developers.

This strategy also supports a more mature validation model. Instead of waiting until the end of a study to perform conformance checks, Dataset-JSON and CORE together can support more continuous, machine-readable validation throughout the study lifecycle. That shift has the potential to improve quality, reduce late-cycle surprises, and make validation findings more actionable for study teams.

 

Looking ahead

Dataset-JSON and CORE adoption can begin now, even while standards and tools continue to mature. The most effective approach is not to wait for a single future mandate, but to build experience through controlled pilots, parallel workflows, and structured comparison against existing processes.

We plan to continue testing with selected pilot projects, maintain active dialogue with the CDISC CORE team, and use test results to support feedback, alignment, and continuous improvement. The organization is positioning itself to adopt CORE when version 1.0 becomes available, while also advancing readiness for Dataset-JSON as a regulatory exchange standard.

Together, Dataset-JSON, interactive data viewing, and CORE validation represent more than a technical change. They point toward a new way of working: one where clinical data is easier to exchange, validation is more transparent, and quality checks are integrated throughout the study lifecycle. For organizations preparing for the next generation of regulatory data standards, this combination offers a practical and future-ready path forward.

Learn more about Project-Based Services
Subscribe to our newsletter

Silvia Faini

Principal Statistical Programmer

Silvia Faini is a Principal Statistical Programmer and CDISC Standards SME at Cytel, with 18 years of experience in clinical trials. She began as a Trial Statistician in an Italian CRO, later becoming a lead figure in statistical programming and CDISC implementation. She has extensive experience in CDISC submissions and medical devices. Silvia is an active member of the CDISC community, contributing to several European initiatives and the CORE Conformance Rules team since 2025.

Read full employee bio

Hugo Signol

Full Stack Developer II

Hugo Signol is Full Stack Developer II at Cytel.  He contributed to the development of dataset-json-viewer, a tool for exploring and reviewing Dataset-JSON structures, and coreToAct, a Shiny application that leverages the CDISC CORE engine to validate compliance rules against study data. He has also developed and maintained several internal R packages and Shiny applications for PBS.

Read full employee bio

Sebastià Barceló

Associate Director, Statistical Programming

Sebastià Barceló is Associate Director, Statistical Programming, at Cytel in Geneva. He has more than 10 years of experience in the field of clinical research in the areas of data management, biostatistics, and statistical programming with different roles in CROs in Spain and Switzerland. Sebastià currently manages a team working on automation initiatives and tool development using multiple programming languages.

Read full employee bio

Angelo Tinazzi

Senior Director, Statistical Programming, Clinical Data Standard & Submission

Angelo Tinazzi is Senior Director, Statistical Programming, Clinical Data Standard & Submission, at Cytel. Angelo is a well-published and recognized expert in statistical programming, with over 25 years’ experience in clinical research. In particular, his core expertise lies in the application of CDISC standards across ​different therapeutic areas, such as data submission to health authorities like the ​FDA and PMDA.

As well as being an authorized CDISC instructor, Angelo is former member of the CDISC European Committee, and co-lead of the Italian-speaking CDISC User Network. Angelo is also conference co-chair for PHUSE EU Connect 2026 and conference chair for PHUSE EU Connect 2027.

Prior to joining Cytel, Angelo worked at Merck Serono, SENDO Foundation, Phamarcia & Upjohn, Simbologica SAS Quality Partner, the UK Medical Research Council, and the Institute for Pharmacological Research “Mario Negri.”

Read full employee bio

Claim your free 30-minute strategy session

Book a free, no-obligation strategy session with a Cytel expert to get advice on how to improve your drug’s probability of success and plot a clearer route to market.

glow-ring
glow-ring-second