Solutions
About Us
Insights
Careers

Accelerating Database Lock Timelines Without Sacrificing Data Quality

Database Lock (DBL) is a critical milestone in the clinical trial lifecycle. A final step of clinical data management, database lock indicates the completion of data collection, cleaning, and validation, readying the data for statistical analysis. This milestone typically occurs 4–6 weeks after Last Patient Last Visit (LPLV). However, if a more challenging timeline (like 1-3 weeks) for DBL is required — due perhaps to expedited regulatory submissions and pressing business or scientific requirements — it creates a high-pressure scenario for all stakeholders.

As gatekeepers of data integrity and quality in clinical trials, the Clinical Data Manager (CDM) plays an important role in ensuring DBL is achieved on time without sacrificing data quality.

Here, I share best practices for achieving accelerated database lock timelines.

 

Successful database lock depends on early planning

To ensure the database lock is successful, meticulous planning and key stakeholder involvement are vital from the start. Key stakeholders may include Clinical Data Managers (CDM), Clinical Research Associates (CRA), Medical Monitors, Site Staff (Investigators, Coordinators), and Biostatisticians.

 

Stakeholder involvement

Different domains have different perspectives when looking at data, although we share the same goal. Since the biostatisticians are the ones who process and analyze the data, it is important to involve them early on so that our perspectives are aligned.

 

Stakeholder expectations

It’s important to align expectations, responsibilities, and timelines with all key stakeholders early in the planning process, ensuring all parties are on the same page. This will help to identify potential risks, evaluate the likelihood and impact of risks to determine their severity, and allow for contingency planning.

 

Accelerate database lock with continuous data cleaning

Adopting a strategy of continuous data cleaning throughout the trial significantly accelerates DBL. This involves performing regular, structured data review and correction of accumulated trial data.

 

Locking data in groups

Locking data periodically throughout the trial reduces the volume of data that needs to be finalized at the end. Grouping data for locking, verification, and cleaning must be completed before locking can take place.

 

Timelines for each group of data locking

Collaborate with stakeholders on how data could be grouped together for locking and agree on realistic timelines for each group, having specific needs of each stakeholder in mind as some tasks are dependent on one another. These timelines could include last data entry, last query sent, last query resolved, investigator sign-off, and lock date.

For example, a grouping strategy includes:

  1. Looking at the participant recruitment plan
  2. Identifying the number of participants expected to complete the last visit or specific visit in a certain number of months
  3. Grouping these data together to form a batch
  4. Defining timelines for data cleaning activities before performing a lock on these data

When pre-defining timelines, it is important to take into consideration source data verification (SDV) intervals, and the feasible aspects to minimize data unlocking after the lock. For example, if the locking group contains a substantial volume of data, then the timeline for each activity typically needs to be longer. Aiming for a smaller volume of data when nearer to LPLV is essential to facilitate shorter data cleaning turnaround time.

 

Identify issues early with clinical data manager oversight

The CDM should closely monitor data and perform trend analyses to detect common data entry discrepancies, lagging query resolutions, unexpectedly high open queries, or pending SDVs, and alert stakeholders to address the issues promptly. This significantly helps to identify issues early and optimize data quality, which minimize costly delays. The CDM should also monitor the unlocking rates of previously locked data. If the unlocking rate is high, consider revising the data locking plan with more realistic timelines.

 

Delay in Investigator sign-off

Investigator sign-off of Electronic Case Report Forms (eCRFs) is a foundational regulatory requirement serving as documented evidence for the accuracy, completeness, and integrity of the data submitted. It is frequently delayed due to a combination of high investigator workloads, technical complexities, or cumbersome processes. Early discussion on this critical milestone and including the timeline in the Data Locking Plan contribute significantly to expedited DBL.

 

Avoid bottlenecks caused by data outside electronic data capture

External data that are not part of electronic data capture (EDC) often become the bottleneck in DBL due to complexity and time-intensive processes. Early proactive discussions with vendors regarding timelines for data delivery are critical to avoid jeopardizing an accelerated DBL timeline.

 

Final takeaways

Adopting continuous data cleaning approach is essential for organizations aiming to shorten the timeline between LPLV and DBL. With strong attention to planning, timelines, and ongoing stakeholders’ engagement, DBL can be achieved on an accelerated timeline.

Insights From Our Work with CDISC Standards: A Preview of Cytel’s Contributions to the 2026 CDISC + TMF EU Interchange

A year ago, I stepped down from the CDISC EU Committee and, guess what? Just few weeks later, CDISC chose Milan, my hometown, as next destination for CDISC + TMF EU Interchange.

May 20–21 is fast approaching, so be sure to check the agenda if you haven’t already registered. You may immediately notice a different style compared to most conferences, and that’s the influence of a city known for beauty and design (and I promise I’m not biased). But have you ever seen conference tracks with such artistic titles? “AI Espresso Shot,” “TMF Standards and Governance — Allegoria del Buon Governo,” “Uno Standard Da Mangiare (USDM),” “Protocolli alla Moda: The M11-USDM Collection,” just to name a few.

Together with my Cytel colleagues, we will have two presentations and one poster, sharing insights from our work with CDISC standards, including the Datasets-JSON and CORE.

 

JSON and CORE Unlocking Adoption

Silvia Faini (co-author: Hugo Signol, Sebastia Barcelo, Angelo Tinazzi),

Wednesday, May 20, 12:30-13:30 – Poster Session

In this poster, we share our experience working with both CDISC datasets-JSON and CORE. Using several anonymized studies, we assessed available tools, both SAS and R, for creating and importing datasets-JSON files We highlight criticality key challenges, risks, and a comparison of CDISC CORE outputs versus tools such as Pinnacle21 (for SDTM only).

 

Authenticity Matters: Preserving Standards Integrity from Clinical Data Models to Tiramisù

Angelo Tinazzi

Thursday, May 21, 12:00-12:30 – Session 6C: L’Architettura degli Standard

What started as a joke, trying to “cheat” my Belgian friends who always complain about the original alcohol-free Tiramisu recipe, evolved into a serious internal project. We analyzed metadata from more than 300 anonymized SDTM packages (with ADaM to follow), spanning multiple versions, therapeutic areas, and trial phases. Using these metadata, we explored how SDTM implementation and adherence to regulatory expectations, particularly FDA requirements, have evolved over time, assessing quality and consistency through quantitative metrics.


It Got Worse Than Expected: Three Years of Retrospective CBER Requests on SDTM, ADaM, and TFLs

Mark Malayas (co-author: Angelo Tinazzi)

Thursday, May 21, 14:30-15:00 – Session 7: Regulatory Eccellenza

In 2023, we presented at PHUSE-EU Connect an initial experience with a FDA CBER Vaccine submission, following some initial interactions with the FDA. We shared our concerns about requests that, in many cases, required retrospective changes to already concluded studies. But that was not all! Three years later, the situation evolved further, with increasing and often unexpected requests from the agency. Curious to learn more? Join Mark on Thursday.

 

Silvia Faini, CDISC E3C Vice-Chair, will also be moderating Session 6C: L’Architettura degli Standard, Thursday, May 21 from 11:00–12:30.

 

Meet us there

Cytel will also have a booth at the conference! Stop by with our presenters, but also with our Business Development colleagues.

We look forward to reconnecting with colleagues from around the world, meeting new peers, and exchanging ideas at CDISC + TMF EU Interchange 2026.

We hope to see you in Milan!

Embedding R into GxP-Compliant Statistical Computing Environments

Biotech and mid-sized pharmaceutical companies are increasingly modernizing their statistical computing environments (SCEs) to keep pace with growing data complexity, advanced analytics, and evolving regulatory expectations. Open-source languages such as R offer clear advantages in flexibility and innovation. However, in GxP-compliant settings, adoption introduces challenges that go far beyond technology itself.

Much of the discussion around R focuses on its capabilities. In practice, the real challenge lies in operationalizing it within a compliant ecosystem — where validation, governance, and reproducibility become critical.

This article explores these challenges from a practical perspective and outlines how organizations are addressing them.

 

The real barrier: GxP complexity

Adopting R is not the primary hurdle; embedding it into a GxP-compliant environment is. This requires:

  • Validation of open-source packages
  • Governance and auditability
  • Reproducibility and traceability
  • Ongoing lifecycle management

For organizations without established frameworks, these requirements can introduce significant overhead, often slowing innovation rather than accelerating it.

 

Why mid-sized organizations are disproportionately impacted

Mid-sized biotech and pharmaceutical companies face a structural challenge. While regulatory expectations are the same as for large pharma, available resources are not.

Smaller teams must manage validation, infrastructure, and delivery simultaneously, often without dedicated support functions. As a result, system complexity scales faster than internal capacity, directly impacting timelines and limiting the ability to innovate.

 

Different starting points, different challenges

In practice, organizations face different realities depending on their level of SCE maturity:

  • Some lack the infrastructure to support GxP-compliant open-source environments
  • Others have established systems but face integration challenges with external partners
  • A third group is transitioning toward R and multi-language workflows but lacks maturity in governance and tooling

These scenarios require flexible approaches tailored to each organization’s context.

 

Moving toward integrated, multi-language environments

To address fragmentation, many organizations are adopting polyglot SCEs, where SAS and R coexist within unified workflows.

This approach enables greater flexibility while maintaining compliance, ensuring traceability, reproducibility, and smoother collaboration across internal teams and external partners.

 

A practical path forward

Rather than building and maintaining complex infrastructure internally, many organizations are exploring CRO-based service models.

By leveraging GxP-validated environments, sponsors can access production-ready R ecosystems without the burden of developing validation frameworks or managing platform engineering. This approach supports both full outsourcing and hybrid collaboration models, while ensuring alignment with client-specific systems.

 

Final takeaways

The challenge is not adopting R — it is managing the complexity of making it compliant.

Organizations that successfully unlock its value do so by:

  • Addressing GxP requirements early and systematically
  • Adapting approaches to their level of SCE maturity
  • Leveraging integrated, multi-language workflows
  • Exploring service-based models to accelerate adoption

With the right strategy, R becomes not a source of complexity, but a powerful enabler of innovation in clinical development.

 

Interested in learning more?

Join our upcoming webinar, “Navigating GxP Complexity: Unlocking the Value of R,” where we will share practical experience from Cytel’s polyglot SCE, including validation approaches, governance models, and operational best practices.

Register now to learn how to modernize your statistical computing environment — without adding unnecessary complexity.

Why “More Data” Isn’t Helping You Run Better Trials

Clinical Operations teams are being asked to let go of traditional approaches and do more than ever before:

Deliver more complex trials, faster — with fewer resources — and higher confidence in outcomes.

And how has the industry responded?

With a proliferation of data access, tools, and dashboards.  But does a dashboard really help navigate complexity with speed and well-managed risk?  No.

Let’s discuss the methods and tools that help turn this complexity into clarity.

 

The problem isn’t just complexity — It’s information overload

Clinical trials have changed dramatically:

  • 7x increase in data points
  • 4x increase in data sources
  • Increasing reliance on external data, RWE, and predictive modeling

Yet often you’re still expected to manage across multiple systems, in spreadsheets and trackers: CTMS, EDC, RBQM dashboards, query reports, enrollment trackers, deviation logs, and monitoring reports.

None of these disparate sources of information tell the whole story, and every critical study execution decision you make is plagued with data gaps, inconsistencies or discrepancies, and latency issues.

How then can we consolidate and automate our use of the data to make timely decisions that we trust?  There are certainly technology stacks that large organizations license and deploy.  But what happens when you can’t afford them?  You partner with a data management and biometrics specialty provider who understands what you are up against, what is needed to successfully deliver a study, who understands the data and what is required, and who offers critical solutions to help heads of clinical operations gain control at a price that they can afford.

Tools that actually make a difference offer:

  • Actionable insights, not static reports
  • Continuous visibility, not retrospective analysis
  • Aligned teams, not handoffs

 

Central statistical monitoring: Detecting emerging risks early

Early intervention is key to managing trial risks and ensuring reliable results. As clinical trials grow in complexity, data quality and patient safety can no longer be ensured within system reports. And with evolving regulatory expectations, trial budget pressures, and the need for earlier, more objective insights into emerging risks, central statistical monitoring (CSM) has become a critical component of modern trial oversight.

Tools, such as Cytel’s Cytelytics, can leverage statistics to identify trends, detect risks, and optimize source data verification efforts.

Regulatory agencies now treat audit trail data with the same level of scrutiny as clinical data, and expect proactive, ongoing reviews. Relying on outdated or manual approaches is a risk you can’t afford.

Additionally, regulatory agencies emphasize the need for proactive and ongoing audit trail reviews, treating audit trail data with the same level of scrutiny as clinical data. Manual approaches are time sinks and can introduce unnecessary risk. Tools like Cytel’s Audit Detective enhance compliance and data integrity by identifying inconsistencies, unauthorized access, and unusual activity patterns in audit trails.

 

Better data visualization: Driving decisions, not just reporting

Traditional reports tell you what happened. Modern visualization:

  • Links operational metrics to clinical outcomes
  • Allows drill-down from summary to patient level
  • Highlights where intervention changes the outcome

Tools like Cytel’s ClinCytesDV provides interactive graphs, tables, and listings, layering data together to tell a richer story.

 

Data management: Operating environments that drive speed and quality

Data ingestion, cleaning, reconciliation, and reporting should not operate in lock step. A modern approach:

  • Automates data ingestion across sources (EDC, RWD, wearables)
  • Standardizes data structures (CDISC, OMOP, FHIR)
  • Enables real-time cleaning and review

The result is better data processing, reduced site burden, faster lock — and less firefighting. This is the difference between oversight and control.

 

Final takeaways

The answer isn’t more dashboards, systems, or data, but rather the methods and tools that result in fewer reconciliations across systems, earlier visibility into risks, faster decisions with higher confidence, and ultimately, that allow you to spend less time managing the process — and more time managing the study.

Building External Control Arms in Rare Disease Clinical Trials: A Programmer’s Perspective

External Control Arms (ECAs) are gaining a lot of attention in clinical research, particularly in rare diseases, where traditional randomized trials are often difficult to execute. Much of the discussion focuses on the statistical methodology and study design required to identify appropriate populations and data sources. But in practice, one of the biggest challenges lies in the programming effort, which is equally critical, but often more complex than anticipated.

Given that ECAs are still an evolving area, formal regulatory and industry guidance remains relatively limited. However, available publications are beginning to address key considerations. For example, the FDA’s Data Standards for Drug and Biological Product Submissions Containing Real-World Data (2024) provides recommendations on preparing and submitting RWD-derived datasets, while highlighting challenges in standardization and traceability. In parallel, industry initiatives such as the PHUSE white paper on Data Standards for Non-Interventional Studies outline common data standardisation challenges and practical approaches to address them. In addition, dedicated working groups within PHUSE are actively contributing to the development of best practices for ECAs.

This article focuses on the practical challenges from a programming perspective, drawing on recent case study experience.

 

Working with real-world and heterogeneous data

From a programming perspective, ECAs differ significantly from traditional clinical trials. Instead of working with well-structured datasets collected under controlled protocols, programmers are required to integrate data from multiple sources, including Real-World Data (RWD), historical trials, observational studies, and natural history cohorts. Each source brings its own structure, conventions, and limitations, often with poor documentation.

In one case study, external control data was derived from two independent natural history cohorts across different regions. While both sources represented similar patient populations, differences in baseline definitions, visit schedules, and outcome assessments required careful reconciliation.

The programming team aligned key covariates, including baseline age, genetic subtype, and functional scores to support comparability with the treated trial population. This went far beyond standard data mapping and required informed decisions to standardize variables that were not originally designed for cross-study integration.

 

Harmonization and data standardization

Once data sources are understood, harmonization becomes a critical step. The validity of an ECA depends on ensuring consistent definitions across baseline variables, endpoints, covariates, and visit timing.

In practice, this involves standardizing baseline windows, assessment schedules, coding dictionaries (such as MedDRA, across multiple versions, and laboratory standard units), endpoint derivations, and covariates used for matching. Across the case studies, this proved to be one of the most time-intensive phase.

Even small differences required careful reconciliation. For example, the same functional score was recorded on different scales across studies, requiring re-derivation into a common format.

If not addressed early, these inconsistencies can significantly impact downstream analyses, including propensity score modelling and bias estimation. Early and systematic harmonization is therefore essential to ensure consistency and minimize rework.

 

CDISC alignment, missing data, and analytical complexity

For studies intended for regulatory submission, alignment with CDISC standards (SDTM and ADaM) is essential. However, external datasets are rarely structured with these standards in mind, requiring substantial programming effort during transformation.

In another case study, SDTM datasets pooled from multiple studies, were used as the source. However, inconsistencies in specifications and differences in SDTM Implementation Guide versions across studies created challenges in standardization and traceability during ADaM specifications development. Key variables including demographics and baseline characteristics such as age, sex, education, genotype, and clinical scores had to be consistently derived and validated across studies. Maintaining traceability was critical, with define.xml playing a key role in documenting transformations and assumptions.

At the same time, missing and inconsistent data remain inherent challenges. In the natural history cohort example, gaps in timepoints and patient coverage, limited direct comparability with the treated trial arm. Programmers addressed this by defining analysis windows and deriving aligned time variables, enabling more meaningful longitudinal comparisons. However, such adjustments introduce assumptions that must be clearly justified and documented in specifications and Reviewers guide.

ECA analyses also rely heavily on advanced statistical techniques, including propensity score matching, weighting, and longitudinal modelling. These methods can be computationally intensive, particularly when working with multiple heterogeneous datasets. In one case study, certain models required several hours to run for a single output, directly impacting timelines for quality control and iterative revisions.

As a result, programmers must optimize code for long-running processes, manage runtime constraints, and ensure reproducibility across environments. For example, when generating figures based on many simulations (e.g., 500,000 iterations), a single output could require several hours of execution time. To improve efficiency, figure generation was separated into independent programs rather than being combined within a single workflow, which significantly reduced total runtime. Similarly, validation procedures for computationally intensive simulations were performed in a staged manner, starting with smaller sample sizes and progressively increasing to the full scale, allowing for earlier detection of discrepancies, while minimizing unnecessary computational cost. In addition, parallel execution strategies were employed, with multiple programmers running processes concurrently, further reducing overall turnaround time.

Furthermore, the inherent uncertainty in external data typically necessitates multiple sensitivity analyses, requiring flexible and efficient programming workflows.

 

Operational constraints and regulatory expectations

Beyond technical challenges, ECAs introduce operational complexities. External datasets are often subject to strict privacy and governance requirements, with analyses conducted in secure or third-party environments. These constraints can limit direct data access, slow iteration cycles, and introduce additional layers of review and approval.

Programmers must therefore adapt to restricted computing environments, limited data visibility, and evolving access rules, all of which require careful planning to maintain timelines.

At the same time, regulatory expectations remain high. While agencies are increasingly open to ECAs, they require strong evidence of data quality, bias mitigation, and endpoint consistency. From a programming perspective, this places significant emphasis on transparency and documentation.

All transformations and analytical decisions must be fully traceable and clearly justified, including mapping approaches, imputation methods, endpoint derivations, harmonization decisions, and sensitivity analyses. Well-structured documentation is therefore as critical as the datasets themselves in supporting reproducibility and regulatory review.

 

Final takeaways

The development of ECAs extends far beyond data integration. It requires a structured and methodical programming approach to ensure consistency, traceability, and regulatory readiness.

The case studies highlight that successful ECA implementation depends not only on methodological rigor but also on the quality of data preparation and standardization. Early harmonization, robust documentation, and flexible programming frameworks are essential to delivering reliable and submission-ready results.

As ECAs continue to gain traction, programming plays a central role in bridging diverse data sources and generating credible evidence for regulatory decision-making. Despite the availability of industry white papers and broader guidance on observational data standardization, dedicated standards and detailed guidance specific to ECAs remain limited, highlighting the need for continued collaboration and development in this area.

 

Interested in learning more?

Join Gautham Selvaraj, Ralf Koelbach, and Steven Ting for their upcoming webinar, “Implementing External Control Arms in a Rare Disease Case Study” on April 30 at 10 am ET, where they will offer practical insights and experience-based strategies for implementing ECAs with real-world data:

Central Statistical Monitoring: Transforming Clinical Trial Oversight Through Data Intelligence

As clinical trials grow in complexity — spanning more geographies, more data streams, and more endpoints — the traditional model of on-site monitoring alone is no longer sufficient to ensure data quality and patient safety. Regulatory expectations have evolved, trial budgets are under pressure, and sponsors need earlier, more objective insights into emerging risks.

Central Statistical Monitoring (CSM) sits at the intersection of these demands.

At Cytel, we see first-hand how sponsors are rethinking monitoring strategies to be more risk-based, data-driven, and efficient. Here, we introduce the foundations of CSM, how it supports Risk-Based Quality Management (RBQM), and why it has become a critical component of modern trial oversight.

 

What is Central Statistical Monitoring?

Central Statistical Monitoring can be defined as the statistical detection of anomalies in accumulating clinical trial data to identify sites, patients, or countries that are performing differently from the rest. These differences may signal issues related to data quality, site conduct, or even patient safety.

The origins of CSM can be traced to early work on fraud detection in clinical trials. However, while fraud is rare, it represents only a small part of the picture. In practice, most CSM findings relate to more common and impactful issues such as errors, sloppiness, or data-handling inconsistencies.

The key principle is straightforward: when most sites are performing consistently, statistically unusual patterns may indicate that something warrants a closer look.

Rather than relying solely on Source Data Verification (SDV) or manual review, CSM uses statistical techniques to evaluate patterns within and across sites — often detecting issues that traditional monitoring approaches would miss.

 

Beyond KRIs and QTLs: What makes CSM different?

Central Monitoring typically includes three types of analyses:

• Key Risk Indicators (KRIs): site-level metrics such as adverse event rates or protocol deviations
• Quality Tolerance Limits (QTLs): study-level thresholds for critical KRIs
• Central Statistical Monitoring (CSM): advanced anomaly detection across high-volume data

While KRIs and QTLs focus on predefined metrics, CSM goes further by applying broad statistical tests across many variables — often using unsupervised approaches that are now considered the industry gold standard.

These methods may involve single-variable comparisons (such as means, variability, proportions, rates, digit distributions) as well as multivariate techniques that evaluate patterns across multiple variables simultaneously. The result is a structured framework for identifying outliers in a reproducible, objective way.

 

Why does CSM matter now?

Over the past two decades, regulatory authorities have progressively endorsed risk-based and centralized monitoring approaches. FDA, EMA, and MHRA guidance have emphasized the importance of risk-based monitoring, culminating in ICH E6(R2) and most recently ICH E6(R3), which reinforce the role of centralized monitoring in identifying systemic and site-specific issues.

This regulatory evolution reflects a broader shift toward:

• Quality by Design (QbD)
• Identification of critical-to-quality factors
• Ongoing risk assessment
• Adaptive monitoring strategies

Within a Risk-Based Monitoring (RBM) framework, CSM complements KRIs and QTLs to provide a comprehensive view of trial risk. Insights from CSM can guide targeted on-site or remote monitoring, ensuring that resources are focused where they will have the greatest impact.

This approach aligns closely with the Clinical Trials Transformation Initiative’s definition of quality in clinical trials as the “absence of errors that matter to decision making — that is, errors which have a meaningful impact on the safety of trial participants or the credibility of the results.” By identifying anomalies early — before they escalate into systemic issues — CSM helps safeguard critical-to-quality factors.

For sponsors, the benefits are multifaceted:

• More efficient allocation of monitoring resources
• Potential reduction in unnecessary SDV
• Earlier detection of emerging risks
• Increased confidence in data integrity prior to regulatory submission

In short, CSM transforms monitoring from a predominantly reactive activity into a proactive, data-driven strategy.

 

Putting CSM into practice: Operational considerations for successful implementation

Understanding the statistical foundations of CSM is important — but translating that understanding into a well-functioning program requires deliberate operational planning. The following considerations provide a practical framework for teams preparing to implement CSM within a clinical trial.

 

Upfront preparation and governance

A formal CSM kickoff meeting — convened before any analyses begin — is one of the most valuable investments a team can make. This meeting should bring together representatives from biostatistics, data management, clinical operations, medical monitoring, and quality. The goal is to establish shared alignment on the objectives and scope of the CSM program, agree on which critical-to-quality (CtQ) factors will anchor the monitoring strategy, define escalation pathways for signals requiring action, and confirm documentation standards. Equally important is reaching consensus on how CSM integrates within the broader RBQM framework — clarifying how statistical signals will interact with KRI outputs, SDV decisions, and site risk classifications. Without this governance foundation, even technically sound CSM outputs can struggle to gain traction in day-to-day operations.

 

Determining frequency of analyses

The frequency with which CSM analyses are generated should be proportionate to study risk and dynamics. Key factors to consider include the rate of enrollment, total subject count, number of active sites, and overall study duration.  Trials with rapid, multi-site enrollment may benefit from more frequent reviews — bi-monthly — to catch emerging patterns before they compound. Slower-enrolling or smaller studies may reasonably support longer intervals between analyses without compromising oversight. Critically, frequency should not be treated as fixed. As study conditions evolve — sites activate or go on hold, enrollment accelerates, or a new safety signal emerges — the CSM schedule should be revisited. Building in flexibility from the outset ensures the program remains responsive rather than formulaic.

 

Communication and cross-functional review

CSM outputs are most actionable when they are presented in a structured, interpretable format — combining risk scores or site rankings with narrative interpretation that contextualizes what the statistics show and why it may matter. Findings should be reviewed collaboratively with the wider cross-functional team including Clinical Operations and Clinical Science, whose site-level and medical knowledge is indispensable for determining whether a statistical outlier reflects a genuine quality concern or a legitimate difference. A statistical signal is a prompt for investigation, not a conclusion. The review process should follow a clear feedback loop: identify the signal, evaluate it in context, decide on a response (monitor, query, or escalate), and document the rationale. This structured approach ensures accountability and creates an audit trail that supports both ongoing oversight and regulatory inspection readiness.

Ultimately, CSM delivers the greatest value when it is embedded operationally — treated not as a standalone statistical exercise, but as a living input to risk-based decision-making by the clinical team. When governance, data prioritization, analysis cadence, and cross-functional communication are aligned from the outset, CSM becomes what it is designed to be: an early warning system that enables smarter, more targeted oversight in service of patient safety and data integrity.

 

Interested in learning more?

Join Charles Warne and William Baker for their upcoming webinar, “Advancing Trial Oversight with Central Statistical Monitoring” on April 8 at 9AM ET / 3PM CET.

Central Statistical Monitoring is a practical, regulatory-aligned tool that can materially strengthen trial oversight and quality management.

In our upcoming webinar, we will explore:

• What CSM entails

  • When and how CSM adds value to clinical trials
  • Operational considerations for implementing CSM services

• Case study examples of CSM in action

Whether you work in biometrics, clinical operations, quality, or regulatory affairs, this session will provide actionable insights into building a smarter, more adaptive monitoring strategy.

SDTM IG 4.0 and SDTM 3.0: Celebrating the End of SUPP?

After about five years since the release of CDISC IG 3.4, CDISC has just released CDISC IG 4.0 and SDTM 3.0 for public review. Comments are due April 6, with expected final release expected later this year.

The public review includes the Conformance Rules version 3.0 as well as three draft Knowledge Base articles exploring some of the main changes expected with IG 4.0:

  • NS– Datasets: Why they were built as they were.
  • Why change the structure of SDTMIG metadata?
  • Why does the DC domain differ from what’s described in FDA’s TCG?

For a quick overview of the impact of these changes, see the CDISC Standards timeline webpage or the revision history available in the draft version wiki for public review.

 

Celebrating or regretting the end of SUPP?

We will be moving, for example, from something called SUPPAE to something called NSAE, with a less “normalized” structure. Will this be “a small step for a man, a giant leap for mankind”? “Ai posteri l’ardua sentenza.”1

The change will require us to go from this:

to this:

The structure of these new dataset(s) is “One record per related dataset record,” meaning that the many-to-one relationship will no longer be possible, for example, an NS that applies to several records in the parent domain via –GRPID. That said, there is a hope that this new structure will simplify metadata handling and potentially facilitate the adoption of future data exchange format, such as CDISC Dataset-JSON.

 

Three new domains

Three new proposed domains have been introduced:

  • DC (Demographics for Multiple Participations)
  • GI (Gastrointestinal System Findings)
  • EA (Event Adjudication) are three new proposed domains

DC has been around, unofficially, for some time, following the requirements introduced by the FDA in its FDA Study Technical Conformance Guide (see here my previous blog). This domain supports the representation of multiple enrollments within the same study. Along with DC, SUBJID has been added to all subject-level domains to differentiate data “generated” from each individual subjects’ participations.

Compared with FDA requirements, SDTM IG 4.0 also covers scenarios in which the same subject is enrolled multiple times, not only multiple screenings.

Identification of “Primary Enrollment,” and therefore how DM variables are populated, is left to the sponsor’s discretion. However, in cases where a subject experiences one or more screen failures before finally enrolling, the successful enrollment should clearly be considered the primary one.

EA, a Findings About domain, provides a common structure for studies requiring independent, peer-reviewed endpoint adjudication. In my view, it partially solves the issue of representing study endpoints where more complex “adjudication” is required; for example, in oncology study with efficacy based on tumor response.

 

Changes in metadata

Several new metadata have been introduced, along with some changes. The goal is to improve understanding of variables and their intended use, without impacting metadata included in a submission, e.g., define.xml.

So, when looking at the new SDTM IG, you will notice the following key differences among others:

  • Controlled Terms, Codelist or Format is now split into three separate columns
  • Variable Group has been added to group variables, for example Results Unit, or Results Value
  • Some information previously included the “CDISC Notes” column are now reported in the “Examples” column

 

Other Changes

New versions of IGs are also an opportunity to fix issues (such as typos) and to clarify implementation that previously caused misunderstandings. For example, additional guidance on what Specimen-based Findings domain to use under specific circumstances, such as clarifying that anti-microbial antibody testing data should be mapped to IS domain rather than MS.

Some standard variables have been deprecated, such as –BLFL (Baseline Flag) for Findings domains, and others have been added. One notable addition is –CLASI variable, particularly useful for classifying Protocol Deviations to support requirements for “ICH E3 Q&As (R1)).” This variable is now officially part of the DV domain as DVCLASI, e.g., MINOR/MAJOR. More details on planned new and deprecated variables in all Observational Classes can be found in the CDISC Wiki.

Rumors about deprecating the PP domain appear to be unfounded, as PP is still there.

 

Want to know more?

You can participate in the public review and explore the details yourself. Check here.

My former colleague Varun Debbeti has also done an excellent job in his clinstandards webpage.

A more in-depth discussion of the expected changes will be also presented at upcoming CDISC-EU Interchange in May and this time in my hometown, Milan, and co-chaired by my colleague Silvia Faini.

Cytel will be present with two oral presentations and one poster:

  • “It Got Worse Than Expected: Three Years of Retrospective CBER Requests on SDTM, ADaM, and TFLs” by Mark Malayas and Angelo Tinazzi
  • “Authenticity Matters: Preserving Standards Integrity from Clinical Data Models to Tiramisù” by Angelo Tinazzi
  • “JSON and CORE Unlocking Adoption” by Silvia Faini, Sebastià Barceló, Hugo Signol, and Angelo Tinazzi

See the here full draft agenda.

We look forward to reconnecting with colleagues from around the world, meeting new peers, and exchanging ideas at 2026 CDISC + TMF EU Interchange 2026.

See you in Milan?

Parkinson’s Disease Through a Statistical Lens

Parkinson’s disease — a progressive movement disorder of the nervous system — affects more than 1.1 million people in the US (and over 11 million globally), with an estimated 90,000 new diagnoses each year, making it the second-most common neurodegenerative disease after Alzheimer’s disease.1,2  The prevalence and rise of Parkinson’s disease has led to robust investment in understanding and treating this disorder.3

Here, we provide a brief overview of Parkinson’s disease and discuss common endpoints used in clinical trials with an illustrative case study on how those endpoints may be analyzed.

 

An introduction to Parkinson’s disease

Parkinson’s disease is a progressive movement disorder of the nervous system.4 It causes nerve cells (neurons) in parts of the brain to weaken, become damaged, and die, leading to symptoms that include problems with movement, tremor, stiffness, and impaired balance. As symptoms progress, people with Parkinson’s disease (PD) may have difficulty walking, talking, or completing other simple tasks.

The rate of PD progression and the particular symptoms differ among individuals. The four primary/hallmark symptoms of PD are tremor, rigidity, bradykinesia, and postural instability.

 

 

Other problems related to PD may include mental and emotional health problems, speech changes, dementia or other cognitive problems, pain, and fatigue.

 

On and Off states/periods

The On state is when PD medications are effective and motor and non-motor symptoms are controlled. The Off state is when PD symptoms return between medication doses or in the morning before the first dose.

 

Measuring Parkinson’s disease severity: Two evaluation methods

MDS-UPDRS: Evaluating motor and non-motor symptoms

The MDS-UPDRS (Movement Disorder Society–Unified Parkinson’s Disease Rating Scale) was developed to evaluate various aspects of PD, including daily non-motor and motor experiences and motor complications.5, 6

It is the most frequently used outcome in clinical trials, though it can also be employed in the clinical setting. It consists of four parts with 50 items in total, with each item rating the impairment with scores from 0 (normal) to 4 (severe). A patient’s global impairment is calculated as the total sum of these scores, with a higher score indicating greater impairment. Missing values might be imputed by the worst-case value of 4 (severe) if sufficient items are scored, otherwise the total score is set to missing. Each part can be analyzed separately as well.

 

MDS-UPDRS:

Parts of the MDS-UPDRS can be assessed during the ON and OFF state to evaluate the differences between those two states.

 

PDQ-39: A patient-reported health status questionnaire

The PDQ-39 (Parkinson’s Disease Questionnaire) is a 39-item patient-reported measure that assesses Parkinson’s disease–specific health-related quality of life.7, 8

It requires the patient to grade how often he/she experienced difficulties over the past month. Each item is scored on a scale from 0 (never) to 4 (always or cannot do at all, if applicable), with lower scores indicating better status. Items are grouped into eight dimension subscales.

 

PDQ-39:

PDQ-39 subscale scores range from 0 to 100, with 0 representing perfect health for the dimension and 100 representing worst health for the dimension. A PDQ-39 total score — the PDQ-39 Summary Index (PDSI) — can be computed as the mean of the eight PDQ-39 subscale scores providing an overall score reflecting the impact of Parkinson’s on quality of life.

In case of missing values, a possible approach is to impute missing values with the mean of the available subscale items, if the number of missing values is smaller than 50% within the subscale.

 

LED (Levodopa Equivalent Dose)

The dose of antiparkinsonian medication is standardized to the LED in mg based on predefined conversion rates.

 

A confirmatory Parkinson’s study: Statistical analysis and adaptive design

Our team partnered with a large biotech and biomedical engineering company to conduct the statistical analysis of a multi-center, open-label (one-arm) adaptive confirmatory study that used a device providing deep brain stimulation for Parkinson’s patients. The efficacy and futility boundaries of the adaptive design were computed using Cytel’s East Horizon™ platform.

The study had the following endpoints:

  • Primary endpoint: MDS-UPDRS (part III)
  • Secondary and exploratory endpoints: Other parts of MDS-UPDRS, PDQ-39, Clinical Global Impression of Change (CGI), Schwab and England ADL (Activities of Daily Living), antiparkinsonian medication use

 

Statistical analysis and its challenges

MDS-UPDRS (part III) score, PDQ-39, and antiparkinsonian medication use were analyzed using the paired t-test and CGI was analyzed using the non-parametric Wilcoxon signed-rank test. The Schwab and England ADL scale was analyzed with an ANOVA.

The first challenge was to understand the differences between the Off and On states. We also had to deal with missing data. It was decided that the missing values on visit level would be imputed by the worst response observed among all participants (primary analysis), with sensitivity analyses employing the baseline observation carried forward (BOCF) and the multiple imputation (MI) using Markov chain Monte Carlo (MCMC) methods.

Another more challenging aspect was understanding and programming the antiparkinsonian medication use (analyzed as secondary endpoint), which is calculated in LED. For this task, a close collaboration with the sponsor’s medical experts was needed to define the conversion factors and handle correctly special cases of medication combinations.

 

An adaptive design with four interim analyses

The study was designed to include four interim analyses and one final analysis, using the Lan-DeMets group sequential method with the O’Brien-Flemming α-spending function and Pocock β-spending function. The O’Brien-Fleming boundaries preserve a nominal significance level at the final analysis that is close to that of a single test procedure, so it is very conservative for the earlier interim analysis.9 The Pocock β-spending function uses approximately equal cutoffs for each analysis.

The efficacy and futility boundaries were computed via Cytel’s EAST software, which is integrated into the East Horizon™ platform. For the interim analyses, the efficacy and futility boundaries had to be recalculated based on the actual sample sizes.

 

Final takeaways

Parkinson’s disease is a lifelong and progressive, degenerative multiple-symptom disease that affects millions worldwide. The treatment is highly individualized and depends on the disease stage and severity of motor and non-motor symptoms. When symptoms become bothersome, current therapies primarily focus on symptom management, with pharmacological options such as levodopa and dopamine agonists forming the cornerstone of care. For those whose symptoms don’t respond well to medication in later stages, advanced options like deep brain stimulation (DBS) offer hope, which can provide relief for tremors and reduce dyskinesias.

The adaptive design of the case study offered a flexible, efficient, and ethical approach without compromising the validity and integrity of the study, which is implemented in the East Horizon™ platform that offers a comprehensive tool for trial design during all stages of development.

Clinical Data Management’s Next Evolution: From Data Stewardship to Data Intelligence

Clinical Data Management (CDM) is undergoing a fundamental transformation. What was once primarily a function focused on data collection, validation, and cleaning is now emerging as a strategic, technology-driven discipline at the heart of modern clinical research.

Today’s trials generate unprecedented volumes of complex data. A recent Tufts Center for the Study of Drug Development survey found a 7x increase in data points and 4x increase in data sources. Here at Cytel we have seen studies with over 20 data sources. Beyond traditional electronic data capture (EDC), clinical studies increasingly incorporate electronic health records (EHRs), wearable devices, mobile applications, genomics, imaging, and real-world evidence (RWE). While these data sources create enormous potential for deeper insight, they also introduce new challenges that conventional CDM approaches were never designed to handle.

To unlock the value of this expanding data universe, clinical organizations must rethink not only their tools, but also their talent, workflows, and mindset.

 

The rise of new roles in clinical data management

This evolution has created demand for new, specialized roles that bridge clinical knowledge, data science, and technology:

 

Clinical Data Scientist (CDS)

Clinical Data Scientists focus on extracting insight from complex medical data. They apply advanced analytics, visualization, and domain expertise to uncover trends, assess data quality risks, and support clinical and operational decision-making.

 

Clinical Data Engineer (CDE)

Clinical Data Engineers design and maintain the data infrastructure that makes modern analytics possible. They build robust, compliant data pipelines, integrate diverse data sources, and ensure data is reliable, traceable, and analysis-ready across the clinical trial ecosystem.

 

Together, these roles move CDM beyond data stewardship toward true data enablement.

 

The expanding complexity of clinical data

Modern clinical trials are no longer linear or siloed. Data flows continuously from multiple sources, often in near real time, and in formats that vary widely in structure, granularity, and reliability. Managing this complexity requires more than rule-based checks and manual reviews. Organizations need scalable data architecture, advanced analytics, and intelligent monitoring approaches that can adapt as data volume, velocity, and variety increase. This shift marks a move away from reactive data cleaning toward proactive data intelligence.

 

Why data visualization matters more than ever

As data points multiply, traditional listings and static reports quickly become unmanageable. Data visualization is no longer a “nice to have,” it is essential. Advanced visual analytics enable clinical teams to identify patterns, compare data across sites, and detect emerging issues early, before they compromise data quality or timelines. By transforming complex datasets into intuitive visual insights, teams can move faster, ask better questions, and focus attention where it matters most.

 

Figure 1: Early Detection of Data Quality Risks through Data Visualization Use Case

Systemic audit trail analysis and regulatory expectations

Regulatory expectations are also evolving alongside data complexity. The 2023 EMA guidance places increased emphasis on audit trail review, signaling a shift from point-in-time checks to systemic analysis. Manual audit trail reviews are no longer sufficient at scale. Instead, sponsors and CROs must adopt analytical approaches that continuously monitor audit trail activity while identifying unusual patterns. This will support site fraud detections, risk-based quality management, and inspection readiness. Analytics-driven audit trail review not only improves compliance, but it also strengthens overall data integrity and operational oversight. In short, the audit trail data needs to be treated similarly to clinical data. In 2025, Cytel was made aware of multiple sponsors being asked to provide evidence of a systematic review of the audit trail data by regulatory authorities.

 

Figure 2: Systemic Audit Trail Analysis Use Case

From comprehensive reviews to trend and outlier detection

In a world of big data, reviewing everything is neither practical nor effective. The future of data cleaning lies in intelligent prioritization. By leveraging statistical methods and trend analysis, CDMs can shift from exhaustive data review to targeted investigation focusing on outliers, inconsistencies, and meaningful deviations. This will reduce manual effort while improving data quality outcomes, aligning with risk-based monitoring principles, and enabling faster, more confident decision-making throughout the trial lifecycle. This is accomplished by statistically analyzing the data variability similar to how statistics are used to evaluate for safety and efficacy and assigning risk levels to the various checks that are performed. An overall risk level is also created and based on the analysis targeted data checks are performed.

 

Figure 3: Risk-Based Data Cleaning Use Case

Building insight-ready clinical data ecosystems

The future of clinical data management is not defined by a single tool or technology, but by an ecosystem; one that combines modern platforms, advanced analytics, and specialized talent.

Organizations that invest in insight-ready data architectures and deploy the right expertise will be better positioned to improve data quality, accelerate timelines, and generate deeper insights from increasingly complex datasets. As clinical research continues to evolve, CDM’s role is expanding from managing data to unlocking its full strategic value.

 

Interested in learning more?

William Baker and Jenn Sustin will be hosting the webinar “Enabling the Shift to Clinical Data Science and Engineering for Modern Trials” on February 18 at 10 am ET:

Looking Ahead to 2026 and Beyond: Views, News, and PHUSE

At the outset, a disclaimer. This piece is potentially “old hat” for you, as it comes from someone who has retired from executive/managerial roles. But wait! One cannot ever retire from observing, admiring, and, therefore, learning. “With all thy getting, get understanding” — a biblical verse inscribed in a Cytel founder’s office — is etched in my mind, so the insatiable quest for absorbing.

What’s in store in the year ahead and beyond? A few things come to my mind:

 

AI and even more AI

I know, I know. You have probably had an overdose on readings about AI. Still, my two cents in short bullets.

  • You gotta learn to use AI seriously. Like it or not. So, you better like it.
  • You don’t need to become an AI expert, just a skilled user.
  • Examine your job description. Anything routine/mechanical is going to evaporate with AI magic. So, amplify your focus on innovating, creating, and original thinking.
  • Don’t trust AI blindly. Find smart ways to validate what it churns out.

While AI usage is still in a nascent stage, early adopters of smart prompt engineering and dependable validation will be at a great advantage for future opportunities.

Here at Cytel we have access to a first-rate suite of AI tools. Judicious and ingenious use paves excellent career growth pathways. Go get started!

 

Domain knowledge shall reign supreme

Through my 28 years at Cytel, every occasion of learning something new about drug development brought me new opportunities. Whether it be a complex therapeutic area, or how adaptive designs are crafted, or how drug delivery works, or how DMC functions — a little bit of enlightenment went a long way in delivering greater value to a client. Regardless of one’s specialization (the “horizontal”), the domain “vertical” opens doors to career growth. I see that becoming even more prominent going forward.  For example, real-world data (RWD) is helping accelerate and enhance drug development, and I have seen young statisticians get excellent opportunities based on their deepening understanding of RWD.

 

Jack of all trades

I have been a firm believer of broader knowledge (not just deeper) working wonders. Occasionally, when I was pushed into supporting business development (e.g., crafting RFP responses, or making a pre-sales demo and presentation), the value of knowing a little bit of everything dawned bright and clear. This year and beyond in future, I feel sure versatility will be a big virtue — for value delivery to the client and, therefore, to one’s own career.

 

GCCs (Global Capability Centers) gain traction

Knowledge-focused companies like Cytel are ideally suited to become skilled competency centers serving global sponsors. The three-decade-old idea of SDFs in the Software Industry is reincarnating now through the concept of GCCs in our domain. Deep scientific knowledge, when combined with deep understanding of a specific sponsor’s processes and specialties is invaluable. “Outsourcing” began with simple cost saving as the core proposition. That has rapidly matured toward 1) tapping large talent pools; 2) innovation and intellectual property creation; and 3) specialized CoEs (Centers of Excellence). In 2026 and beyond, I foresee GCCs becoming knowledge powerhouses. And I foresee global biopharma continuing to welcome specialist service providers to host the GCCs, in addition to their own DIY versions.

 

PHUSE APAC Connect

From expressing the news and my views, let me now move on to PHUSE. This global Healthcare Data Science Community, over the past two decades, initially held annual conferences all across Europe. It then spread its wings to the US with the CSS (Computational Sciences Symposium), partnering with the US FDA, and then to the “US Connect” annual conferences.

It is now making a grand debut in the Asia Pacific Region. The first ever “APAC Connect” of PHUSE is scheduled from February 19–21 in Hyderabad, India. PHUSE has a large following in the APAC region with over 10,000 members spread across India, China, Japan, Singapore, Malaysia, Australia, and several other countries.

What’s more, this event will include the India CDISC Day 2026!!!

 

This event will address a few major themes.  

  • GCCs in the APAC region. This region has the unique advantage of a huge talent pool and is moving up from cost efficiency to innovation hubs and CoEs.
  • Impact of AI. How AI will reshape careers and leadership in drug development. This topic will figure across panel discussions, presentations, and the leadership stream.
  • There will also be a panel discussion on upcoming innovations in drug development that are going to be potential game-changers.

If you are attending the event, use the PHUSE app to curate your personalized agenda and schedule, choosing among the multiple parallel streams.

 

Cytel has always been a big participant at PHUSE events. Consider these snippets:

  • Several first-time Cytel presenters have won best presentation prizes
  • We have been exhibitors and sponsors at many of PHUSE events
  • A few folks, like Angelo Tinazzi from our Geneva office, are celebrated contributors to a number of PHUSE initiatives. Angelo authored the much-acclaimed eBook The Good Data Submission Doctor on Data Submission and Data Integration to the FDA.
  • A Cytelian, having served as a PHUSE Board Member, and being instrumental in bringing PHUSE to Asia, has been invited to chair the Inaugural APAC Connect. Guess who that is!😊
  • Two more Cytelians, Pratibha Jalui and Sudipta Basu, are serving as Stream Co-Chairs.
  • Angelo will be the EU Connect Chair later this year (he served as the Co-Chair last year) in Glasgow, Scotland.
  • This is the first time ever that Cytelians have been chosen for this privilege.
  • At the PHUSE APAC Connect, we have lots of Cytel presenters: Corey Dunham, Pratibha Jalui, Diganta Bose, Aboli Katdare, Charles Warne, Pradip Maske, Chandan Patel Malyala, and Anoop Rawat. We will also have an exhibit booth (#4) with Mansha Sachdev representing our marketing team.

 

Personally, PHUSE has been a booster rocket for my professional career. It brought numerous opportunities of engaging with three significant audiences:

  • Industry peers, exchanging ideas and co-driving initiatives
  • Prospects among big pharma and biotech, several later became clients
  • A talent pool of bright young professionals, some of whom joined Cytel to enhance our ever-growing brainpower

 

The APAC Connect 2026 has a rich 2.5-day agenda that spans across keynote speeches, panel discussions, presentations, hands-on workshops, software demonstrations, a poster session, and a couple of networking events.

 

The bottomline

We at Cytel have an exemplary track record of bringing rigorous data science to the service of human health outcomes. That’s our raison d’être!

Together, let’s take that forward in 2026 and beyond!

 

Meet with us!

Will you be attending PHUSE APAC Connect in Hyderabad, India, this February? Stop by Booth 4 to get to know our experts and learn how Cytel is shaping the future of data‑driven drug development, or click below to book a meeting to discuss career opportunities at Cytel: