Solutions
About Us
Insights
Careers
Contact us
Contact Us
Customer Support
Customer Support

Embedding R into GxP-Compliant Statistical Computing Environments

Biotech and mid-sized pharmaceutical companies are increasingly modernizing their statistical computing environments (SCEs) to keep pace with growing data complexity, advanced analytics, and evolving regulatory expectations. Open-source languages such as R offer clear advantages in flexibility and innovation. However, in GxP-compliant settings, adoption introduces challenges that go far beyond technology itself.

Much of the discussion around R focuses on its capabilities. In practice, the real challenge lies in operationalizing it within a compliant ecosystem — where validation, governance, and reproducibility become critical.

This article explores these challenges from a practical perspective and outlines how organizations are addressing them.

 

The real barrier: GxP complexity

Adopting R is not the primary hurdle; embedding it into a GxP-compliant environment is. This requires:

  • Validation of open-source packages
  • Governance and auditability
  • Reproducibility and traceability
  • Ongoing lifecycle management

For organizations without established frameworks, these requirements can introduce significant overhead, often slowing innovation rather than accelerating it.

 

Why mid-sized organizations are disproportionately impacted

Mid-sized biotech and pharmaceutical companies face a structural challenge. While regulatory expectations are the same as for large pharma, available resources are not.

Smaller teams must manage validation, infrastructure, and delivery simultaneously, often without dedicated support functions. As a result, system complexity scales faster than internal capacity, directly impacting timelines and limiting the ability to innovate.

 

Different starting points, different challenges

In practice, organizations face different realities depending on their level of SCE maturity:

  • Some lack the infrastructure to support GxP-compliant open-source environments
  • Others have established systems but face integration challenges with external partners
  • A third group is transitioning toward R and multi-language workflows but lacks maturity in governance and tooling

These scenarios require flexible approaches tailored to each organization’s context.

 

Moving toward integrated, multi-language environments

To address fragmentation, many organizations are adopting polyglot SCEs, where SAS and R coexist within unified workflows.

This approach enables greater flexibility while maintaining compliance, ensuring traceability, reproducibility, and smoother collaboration across internal teams and external partners.

 

A practical path forward

Rather than building and maintaining complex infrastructure internally, many organizations are exploring CRO-based service models.

By leveraging GxP-validated environments, sponsors can access production-ready R ecosystems without the burden of developing validation frameworks or managing platform engineering. This approach supports both full outsourcing and hybrid collaboration models, while ensuring alignment with client-specific systems.

 

Final takeaways

The challenge is not adopting R — it is managing the complexity of making it compliant.

Organizations that successfully unlock its value do so by:

  • Addressing GxP requirements early and systematically
  • Adapting approaches to their level of SCE maturity
  • Leveraging integrated, multi-language workflows
  • Exploring service-based models to accelerate adoption

With the right strategy, R becomes not a source of complexity, but a powerful enabler of innovation in clinical development.

 

Interested in learning more?

Join our upcoming webinar, “Navigating GxP Complexity: Unlocking the Value of R,” where we will share practical experience from Cytel’s polyglot SCE, including validation approaches, governance models, and operational best practices.

Register now to learn how to modernize your statistical computing environment — without adding unnecessary complexity.

Why “More Data” Isn’t Helping You Run Better Trials

Clinical Operations teams are being asked to let go of traditional approaches and do more than ever before:

Deliver more complex trials, faster — with fewer resources — and higher confidence in outcomes.

And how has the industry responded?

With a proliferation of data access, tools, and dashboards.  But does a dashboard really help navigate complexity with speed and well-managed risk?  No.

Let’s discuss the methods and tools that help turn this complexity into clarity.

 

The problem isn’t just complexity — It’s information overload

Clinical trials have changed dramatically:

  • 7x increase in data points
  • 4x increase in data sources
  • Increasing reliance on external data, RWE, and predictive modeling

Yet often you’re still expected to manage across multiple systems, in spreadsheets and trackers: CTMS, EDC, RBQM dashboards, query reports, enrollment trackers, deviation logs, and monitoring reports.

None of these disparate sources of information tell the whole story, and every critical study execution decision you make is plagued with data gaps, inconsistencies or discrepancies, and latency issues.

How then can we consolidate and automate our use of the data to make timely decisions that we trust?  There are certainly technology stacks that large organizations license and deploy.  But what happens when you can’t afford them?  You partner with a data management and biometrics specialty provider who understands what you are up against, what is needed to successfully deliver a study, who understands the data and what is required, and who offers critical solutions to help heads of clinical operations gain control at a price that they can afford.

Tools that actually make a difference offer:

  • Actionable insights, not static reports
  • Continuous visibility, not retrospective analysis
  • Aligned teams, not handoffs

 

Central statistical monitoring: Detecting emerging risks early

Early intervention is key to managing trial risks and ensuring reliable results. As clinical trials grow in complexity, data quality and patient safety can no longer be ensured within system reports. And with evolving regulatory expectations, trial budget pressures, and the need for earlier, more objective insights into emerging risks, central statistical monitoring (CSM) has become a critical component of modern trial oversight.

Tools, such as Cytel’s Cytelytics, can leverage statistics to identify trends, detect risks, and optimize source data verification efforts.

Regulatory agencies now treat audit trail data with the same level of scrutiny as clinical data, and expect proactive, ongoing reviews. Relying on outdated or manual approaches is a risk you can’t afford.

Additionally, regulatory agencies emphasize the need for proactive and ongoing audit trail reviews, treating audit trail data with the same level of scrutiny as clinical data. Manual approaches are time sinks and can introduce unnecessary risk. Tools like Cytel’s Audit Detective enhance compliance and data integrity by identifying inconsistencies, unauthorized access, and unusual activity patterns in audit trails.

 

Better data visualization: Driving decisions, not just reporting

Traditional reports tell you what happened. Modern visualization:

  • Links operational metrics to clinical outcomes
  • Allows drill-down from summary to patient level
  • Highlights where intervention changes the outcome

Tools like Cytel’s ClinCytesDV provides interactive graphs, tables, and listings, layering data together to tell a richer story.

 

Data management: Operating environments that drive speed and quality

Data ingestion, cleaning, reconciliation, and reporting should not operate in lock step. A modern approach:

  • Automates data ingestion across sources (EDC, RWD, wearables)
  • Standardizes data structures (CDISC, OMOP, FHIR)
  • Enables real-time cleaning and review

The result is better data processing, reduced site burden, faster lock — and less firefighting. This is the difference between oversight and control.

 

Final takeaways

The answer isn’t more dashboards, systems, or data, but rather the methods and tools that result in fewer reconciliations across systems, earlier visibility into risks, faster decisions with higher confidence, and ultimately, that allow you to spend less time managing the process — and more time managing the study.

Building External Control Arms in Rare Disease Clinical Trials: A Programmer’s Perspective

External Control Arms (ECAs) are gaining a lot of attention in clinical research, particularly in rare diseases, where traditional randomized trials are often difficult to execute. Much of the discussion focuses on the statistical methodology and study design required to identify appropriate populations and data sources. But in practice, one of the biggest challenges lies in the programming effort, which is equally critical, but often more complex than anticipated.

Given that ECAs are still an evolving area, formal regulatory and industry guidance remains relatively limited. However, available publications are beginning to address key considerations. For example, the FDA’s Data Standards for Drug and Biological Product Submissions Containing Real-World Data (2024) provides recommendations on preparing and submitting RWD-derived datasets, while highlighting challenges in standardization and traceability. In parallel, industry initiatives such as the PHUSE white paper on Data Standards for Non-Interventional Studies outline common data standardisation challenges and practical approaches to address them. In addition, dedicated working groups within PHUSE are actively contributing to the development of best practices for ECAs.

This article focuses on the practical challenges from a programming perspective, drawing on recent case study experience.

 

Working with real-world and heterogeneous data

From a programming perspective, ECAs differ significantly from traditional clinical trials. Instead of working with well-structured datasets collected under controlled protocols, programmers are required to integrate data from multiple sources, including Real-World Data (RWD), historical trials, observational studies, and natural history cohorts. Each source brings its own structure, conventions, and limitations, often with poor documentation.

In one case study, external control data was derived from two independent natural history cohorts across different regions. While both sources represented similar patient populations, differences in baseline definitions, visit schedules, and outcome assessments required careful reconciliation.

The programming team aligned key covariates, including baseline age, genetic subtype, and functional scores to support comparability with the treated trial population. This went far beyond standard data mapping and required informed decisions to standardize variables that were not originally designed for cross-study integration.

 

Harmonization and data standardization

Once data sources are understood, harmonization becomes a critical step. The validity of an ECA depends on ensuring consistent definitions across baseline variables, endpoints, covariates, and visit timing.

In practice, this involves standardizing baseline windows, assessment schedules, coding dictionaries (such as MedDRA, across multiple versions, and laboratory standard units), endpoint derivations, and covariates used for matching. Across the case studies, this proved to be one of the most time-intensive phase.

Even small differences required careful reconciliation. For example, the same functional score was recorded on different scales across studies, requiring re-derivation into a common format.

If not addressed early, these inconsistencies can significantly impact downstream analyses, including propensity score modelling and bias estimation. Early and systematic harmonization is therefore essential to ensure consistency and minimize rework.

 

CDISC alignment, missing data, and analytical complexity

For studies intended for regulatory submission, alignment with CDISC standards (SDTM and ADaM) is essential. However, external datasets are rarely structured with these standards in mind, requiring substantial programming effort during transformation.

In another case study, SDTM datasets pooled from multiple studies, were used as the source. However, inconsistencies in specifications and differences in SDTM Implementation Guide versions across studies created challenges in standardization and traceability during ADaM specifications development. Key variables including demographics and baseline characteristics such as age, sex, education, genotype, and clinical scores had to be consistently derived and validated across studies. Maintaining traceability was critical, with define.xml playing a key role in documenting transformations and assumptions.

At the same time, missing and inconsistent data remain inherent challenges. In the natural history cohort example, gaps in timepoints and patient coverage, limited direct comparability with the treated trial arm. Programmers addressed this by defining analysis windows and deriving aligned time variables, enabling more meaningful longitudinal comparisons. However, such adjustments introduce assumptions that must be clearly justified and documented in specifications and Reviewers guide.

ECA analyses also rely heavily on advanced statistical techniques, including propensity score matching, weighting, and longitudinal modelling. These methods can be computationally intensive, particularly when working with multiple heterogeneous datasets. In one case study, certain models required several hours to run for a single output, directly impacting timelines for quality control and iterative revisions.

As a result, programmers must optimize code for long-running processes, manage runtime constraints, and ensure reproducibility across environments. For example, when generating figures based on many simulations (e.g., 500,000 iterations), a single output could require several hours of execution time. To improve efficiency, figure generation was separated into independent programs rather than being combined within a single workflow, which significantly reduced total runtime. Similarly, validation procedures for computationally intensive simulations were performed in a staged manner, starting with smaller sample sizes and progressively increasing to the full scale, allowing for earlier detection of discrepancies, while minimizing unnecessary computational cost. In addition, parallel execution strategies were employed, with multiple programmers running processes concurrently, further reducing overall turnaround time.

Furthermore, the inherent uncertainty in external data typically necessitates multiple sensitivity analyses, requiring flexible and efficient programming workflows.

 

Operational constraints and regulatory expectations

Beyond technical challenges, ECAs introduce operational complexities. External datasets are often subject to strict privacy and governance requirements, with analyses conducted in secure or third-party environments. These constraints can limit direct data access, slow iteration cycles, and introduce additional layers of review and approval.

Programmers must therefore adapt to restricted computing environments, limited data visibility, and evolving access rules, all of which require careful planning to maintain timelines.

At the same time, regulatory expectations remain high. While agencies are increasingly open to ECAs, they require strong evidence of data quality, bias mitigation, and endpoint consistency. From a programming perspective, this places significant emphasis on transparency and documentation.

All transformations and analytical decisions must be fully traceable and clearly justified, including mapping approaches, imputation methods, endpoint derivations, harmonization decisions, and sensitivity analyses. Well-structured documentation is therefore as critical as the datasets themselves in supporting reproducibility and regulatory review.

 

Final takeaways

The development of ECAs extends far beyond data integration. It requires a structured and methodical programming approach to ensure consistency, traceability, and regulatory readiness.

The case studies highlight that successful ECA implementation depends not only on methodological rigor but also on the quality of data preparation and standardization. Early harmonization, robust documentation, and flexible programming frameworks are essential to delivering reliable and submission-ready results.

As ECAs continue to gain traction, programming plays a central role in bridging diverse data sources and generating credible evidence for regulatory decision-making. Despite the availability of industry white papers and broader guidance on observational data standardization, dedicated standards and detailed guidance specific to ECAs remain limited, highlighting the need for continued collaboration and development in this area.

 

Interested in learning more?

Join Gautham Selvaraj, Ralf Koelbach, and Steven Ting for their upcoming webinar, “Implementing External Control Arms in a Rare Disease Case Study” on April 30 at 10 am ET, where they will offer practical insights and experience-based strategies for implementing ECAs with real-world data:

Central Statistical Monitoring: Transforming Clinical Trial Oversight Through Data Intelligence

As clinical trials grow in complexity — spanning more geographies, more data streams, and more endpoints — the traditional model of on-site monitoring alone is no longer sufficient to ensure data quality and patient safety. Regulatory expectations have evolved, trial budgets are under pressure, and sponsors need earlier, more objective insights into emerging risks.

Central Statistical Monitoring (CSM) sits at the intersection of these demands.

At Cytel, we see first-hand how sponsors are rethinking monitoring strategies to be more risk-based, data-driven, and efficient. Here, we introduce the foundations of CSM, how it supports Risk-Based Quality Management (RBQM), and why it has become a critical component of modern trial oversight.

 

What is Central Statistical Monitoring?

Central Statistical Monitoring can be defined as the statistical detection of anomalies in accumulating clinical trial data to identify sites, patients, or countries that are performing differently from the rest. These differences may signal issues related to data quality, site conduct, or even patient safety.

The origins of CSM can be traced to early work on fraud detection in clinical trials. However, while fraud is rare, it represents only a small part of the picture. In practice, most CSM findings relate to more common and impactful issues such as errors, sloppiness, or data-handling inconsistencies.

The key principle is straightforward: when most sites are performing consistently, statistically unusual patterns may indicate that something warrants a closer look.

Rather than relying solely on Source Data Verification (SDV) or manual review, CSM uses statistical techniques to evaluate patterns within and across sites — often detecting issues that traditional monitoring approaches would miss.

 

Beyond KRIs and QTLs: What makes CSM different?

Central Monitoring typically includes three types of analyses:

• Key Risk Indicators (KRIs): site-level metrics such as adverse event rates or protocol deviations
• Quality Tolerance Limits (QTLs): study-level thresholds for critical KRIs
• Central Statistical Monitoring (CSM): advanced anomaly detection across high-volume data

While KRIs and QTLs focus on predefined metrics, CSM goes further by applying broad statistical tests across many variables — often using unsupervised approaches that are now considered the industry gold standard.

These methods may involve single-variable comparisons (such as means, variability, proportions, rates, digit distributions) as well as multivariate techniques that evaluate patterns across multiple variables simultaneously. The result is a structured framework for identifying outliers in a reproducible, objective way.

 

Why does CSM matter now?

Over the past two decades, regulatory authorities have progressively endorsed risk-based and centralized monitoring approaches. FDA, EMA, and MHRA guidance have emphasized the importance of risk-based monitoring, culminating in ICH E6(R2) and most recently ICH E6(R3), which reinforce the role of centralized monitoring in identifying systemic and site-specific issues.

This regulatory evolution reflects a broader shift toward:

• Quality by Design (QbD)
• Identification of critical-to-quality factors
• Ongoing risk assessment
• Adaptive monitoring strategies

Within a Risk-Based Monitoring (RBM) framework, CSM complements KRIs and QTLs to provide a comprehensive view of trial risk. Insights from CSM can guide targeted on-site or remote monitoring, ensuring that resources are focused where they will have the greatest impact.

This approach aligns closely with the Clinical Trials Transformation Initiative’s definition of quality in clinical trials as the “absence of errors that matter to decision making — that is, errors which have a meaningful impact on the safety of trial participants or the credibility of the results.” By identifying anomalies early — before they escalate into systemic issues — CSM helps safeguard critical-to-quality factors.

For sponsors, the benefits are multifaceted:

• More efficient allocation of monitoring resources
• Potential reduction in unnecessary SDV
• Earlier detection of emerging risks
• Increased confidence in data integrity prior to regulatory submission

In short, CSM transforms monitoring from a predominantly reactive activity into a proactive, data-driven strategy.

 

Putting CSM into practice: Operational considerations for successful implementation

Understanding the statistical foundations of CSM is important — but translating that understanding into a well-functioning program requires deliberate operational planning. The following considerations provide a practical framework for teams preparing to implement CSM within a clinical trial.

 

Upfront preparation and governance

A formal CSM kickoff meeting — convened before any analyses begin — is one of the most valuable investments a team can make. This meeting should bring together representatives from biostatistics, data management, clinical operations, medical monitoring, and quality. The goal is to establish shared alignment on the objectives and scope of the CSM program, agree on which critical-to-quality (CtQ) factors will anchor the monitoring strategy, define escalation pathways for signals requiring action, and confirm documentation standards. Equally important is reaching consensus on how CSM integrates within the broader RBQM framework — clarifying how statistical signals will interact with KRI outputs, SDV decisions, and site risk classifications. Without this governance foundation, even technically sound CSM outputs can struggle to gain traction in day-to-day operations.

 

Determining frequency of analyses

The frequency with which CSM analyses are generated should be proportionate to study risk and dynamics. Key factors to consider include the rate of enrollment, total subject count, number of active sites, and overall study duration.  Trials with rapid, multi-site enrollment may benefit from more frequent reviews — bi-monthly — to catch emerging patterns before they compound. Slower-enrolling or smaller studies may reasonably support longer intervals between analyses without compromising oversight. Critically, frequency should not be treated as fixed. As study conditions evolve — sites activate or go on hold, enrollment accelerates, or a new safety signal emerges — the CSM schedule should be revisited. Building in flexibility from the outset ensures the program remains responsive rather than formulaic.

 

Communication and cross-functional review

CSM outputs are most actionable when they are presented in a structured, interpretable format — combining risk scores or site rankings with narrative interpretation that contextualizes what the statistics show and why it may matter. Findings should be reviewed collaboratively with the wider cross-functional team including Clinical Operations and Clinical Science, whose site-level and medical knowledge is indispensable for determining whether a statistical outlier reflects a genuine quality concern or a legitimate difference. A statistical signal is a prompt for investigation, not a conclusion. The review process should follow a clear feedback loop: identify the signal, evaluate it in context, decide on a response (monitor, query, or escalate), and document the rationale. This structured approach ensures accountability and creates an audit trail that supports both ongoing oversight and regulatory inspection readiness.

Ultimately, CSM delivers the greatest value when it is embedded operationally — treated not as a standalone statistical exercise, but as a living input to risk-based decision-making by the clinical team. When governance, data prioritization, analysis cadence, and cross-functional communication are aligned from the outset, CSM becomes what it is designed to be: an early warning system that enables smarter, more targeted oversight in service of patient safety and data integrity.

 

Interested in learning more?

Join Charles Warne and William Baker for their upcoming webinar, “Advancing Trial Oversight with Central Statistical Monitoring” on April 8 at 9AM ET / 3PM CET.

Central Statistical Monitoring is a practical, regulatory-aligned tool that can materially strengthen trial oversight and quality management.

In our upcoming webinar, we will explore:

• What CSM entails

  • When and how CSM adds value to clinical trials
  • Operational considerations for implementing CSM services

• Case study examples of CSM in action

Whether you work in biometrics, clinical operations, quality, or regulatory affairs, this session will provide actionable insights into building a smarter, more adaptive monitoring strategy.

SDTM IG 4.0 and SDTM 3.0: Celebrating the End of SUPP?

After about five years since the release of CDISC IG 3.4, CDISC has just released CDISC IG 4.0 and SDTM 3.0 for public review. Comments are due April 6, with expected final release expected later this year.

The public review includes the Conformance Rules version 3.0 as well as three draft Knowledge Base articles exploring some of the main changes expected with IG 4.0:

  • NS– Datasets: Why they were built as they were.
  • Why change the structure of SDTMIG metadata?
  • Why does the DC domain differ from what’s described in FDA’s TCG?

For a quick overview of the impact of these changes, see the CDISC Standards timeline webpage or the revision history available in the draft version wiki for public review.

 

Celebrating or regretting the end of SUPP?

We will be moving, for example, from something called SUPPAE to something called NSAE, with a less “normalized” structure. Will this be “a small step for a man, a giant leap for mankind”? “Ai posteri l’ardua sentenza.”1

The change will require us to go from this:

to this:

The structure of these new dataset(s) is “One record per related dataset record,” meaning that the many-to-one relationship will no longer be possible, for example, an NS that applies to several records in the parent domain via –GRPID. That said, there is a hope that this new structure will simplify metadata handling and potentially facilitate the adoption of future data exchange format, such as CDISC Dataset-JSON.

 

Three new domains

Three new proposed domains have been introduced:

  • DC (Demographics for Multiple Participations)
  • GI (Gastrointestinal System Findings)
  • EA (Event Adjudication) are three new proposed domains

DC has been around, unofficially, for some time, following the requirements introduced by the FDA in its FDA Study Technical Conformance Guide (see here my previous blog). This domain supports the representation of multiple enrollments within the same study. Along with DC, SUBJID has been added to all subject-level domains to differentiate data “generated” from each individual subjects’ participations.

Compared with FDA requirements, SDTM IG 4.0 also covers scenarios in which the same subject is enrolled multiple times, not only multiple screenings.

Identification of “Primary Enrollment,” and therefore how DM variables are populated, is left to the sponsor’s discretion. However, in cases where a subject experiences one or more screen failures before finally enrolling, the successful enrollment should clearly be considered the primary one.

EA, a Findings About domain, provides a common structure for studies requiring independent, peer-reviewed endpoint adjudication. In my view, it partially solves the issue of representing study endpoints where more complex “adjudication” is required; for example, in oncology study with efficacy based on tumor response.

 

Changes in metadata

Several new metadata have been introduced, along with some changes. The goal is to improve understanding of variables and their intended use, without impacting metadata included in a submission, e.g., define.xml.

So, when looking at the new SDTM IG, you will notice the following key differences among others:

  • Controlled Terms, Codelist or Format is now split into three separate columns
  • Variable Group has been added to group variables, for example Results Unit, or Results Value
  • Some information previously included the “CDISC Notes” column are now reported in the “Examples” column

 

Other Changes

New versions of IGs are also an opportunity to fix issues (such as typos) and to clarify implementation that previously caused misunderstandings. For example, additional guidance on what Specimen-based Findings domain to use under specific circumstances, such as clarifying that anti-microbial antibody testing data should be mapped to IS domain rather than MS.

Some standard variables have been deprecated, such as –BLFL (Baseline Flag) for Findings domains, and others have been added. One notable addition is –CLASI variable, particularly useful for classifying Protocol Deviations to support requirements for “ICH E3 Q&As (R1)).” This variable is now officially part of the DV domain as DVCLASI, e.g., MINOR/MAJOR. More details on planned new and deprecated variables in all Observational Classes can be found in the CDISC Wiki.

Rumors about deprecating the PP domain appear to be unfounded, as PP is still there.

 

Want to know more?

You can participate in the public review and explore the details yourself. Check here.

My former colleague Varun Debbeti has also done an excellent job in his clinstandards webpage.

A more in-depth discussion of the expected changes will be also presented at upcoming CDISC-EU Interchange in May and this time in my hometown, Milan, and co-chaired by my colleague Silvia Faini.

Cytel will be present with two oral presentations and one poster:

  • “It Got Worse Than Expected: Three Years of Retrospective CBER Requests on SDTM, ADaM, and TFLs” by Mark Malayas and Angelo Tinazzi
  • “Authenticity Matters: Preserving Standards Integrity from Clinical Data Models to Tiramisù” by Angelo Tinazzi
  • “JSON and CORE Unlocking Adoption” by Silvia Faini, Sebastià Barceló, Hugo Signol, and Angelo Tinazzi

See the here full draft agenda.

We look forward to reconnecting with colleagues from around the world, meeting new peers, and exchanging ideas at 2026 CDISC + TMF EU Interchange 2026.

See you in Milan?

Parkinson’s Disease Through a Statistical Lens

Parkinson’s disease — a progressive movement disorder of the nervous system — affects more than 1.1 million people in the US (and over 11 million globally), with an estimated 90,000 new diagnoses each year, making it the second-most common neurodegenerative disease after Alzheimer’s disease.1,2  The prevalence and rise of Parkinson’s disease has led to robust investment in understanding and treating this disorder.3

Here, we provide a brief overview of Parkinson’s disease and discuss common endpoints used in clinical trials with an illustrative case study on how those endpoints may be analyzed.

 

An introduction to Parkinson’s disease

Parkinson’s disease is a progressive movement disorder of the nervous system.4 It causes nerve cells (neurons) in parts of the brain to weaken, become damaged, and die, leading to symptoms that include problems with movement, tremor, stiffness, and impaired balance. As symptoms progress, people with Parkinson’s disease (PD) may have difficulty walking, talking, or completing other simple tasks.

The rate of PD progression and the particular symptoms differ among individuals. The four primary/hallmark symptoms of PD are tremor, rigidity, bradykinesia, and postural instability.

 

 

Other problems related to PD may include mental and emotional health problems, speech changes, dementia or other cognitive problems, pain, and fatigue.

 

On and Off states/periods

The On state is when PD medications are effective and motor and non-motor symptoms are controlled. The Off state is when PD symptoms return between medication doses or in the morning before the first dose.

 

Measuring Parkinson’s disease severity: Two evaluation methods

MDS-UPDRS: Evaluating motor and non-motor symptoms

The MDS-UPDRS (Movement Disorder Society–Unified Parkinson’s Disease Rating Scale) was developed to evaluate various aspects of PD, including daily non-motor and motor experiences and motor complications.5, 6

It is the most frequently used outcome in clinical trials, though it can also be employed in the clinical setting. It consists of four parts with 50 items in total, with each item rating the impairment with scores from 0 (normal) to 4 (severe). A patient’s global impairment is calculated as the total sum of these scores, with a higher score indicating greater impairment. Missing values might be imputed by the worst-case value of 4 (severe) if sufficient items are scored, otherwise the total score is set to missing. Each part can be analyzed separately as well.

 

MDS-UPDRS:

Parts of the MDS-UPDRS can be assessed during the ON and OFF state to evaluate the differences between those two states.

 

PDQ-39: A patient-reported health status questionnaire

The PDQ-39 (Parkinson’s Disease Questionnaire) is a 39-item patient-reported measure that assesses Parkinson’s disease–specific health-related quality of life.7, 8

It requires the patient to grade how often he/she experienced difficulties over the past month. Each item is scored on a scale from 0 (never) to 4 (always or cannot do at all, if applicable), with lower scores indicating better status. Items are grouped into eight dimension subscales.

 

PDQ-39:

PDQ-39 subscale scores range from 0 to 100, with 0 representing perfect health for the dimension and 100 representing worst health for the dimension. A PDQ-39 total score — the PDQ-39 Summary Index (PDSI) — can be computed as the mean of the eight PDQ-39 subscale scores providing an overall score reflecting the impact of Parkinson’s on quality of life.

In case of missing values, a possible approach is to impute missing values with the mean of the available subscale items, if the number of missing values is smaller than 50% within the subscale.

 

LED (Levodopa Equivalent Dose)

The dose of antiparkinsonian medication is standardized to the LED in mg based on predefined conversion rates.

 

A confirmatory Parkinson’s study: Statistical analysis and adaptive design

Our team partnered with a large biotech and biomedical engineering company to conduct the statistical analysis of a multi-center, open-label (one-arm) adaptive confirmatory study that used a device providing deep brain stimulation for Parkinson’s patients. The efficacy and futility boundaries of the adaptive design were computed using Cytel’s East Horizon™ platform.

The study had the following endpoints:

  • Primary endpoint: MDS-UPDRS (part III)
  • Secondary and exploratory endpoints: Other parts of MDS-UPDRS, PDQ-39, Clinical Global Impression of Change (CGI), Schwab and England ADL (Activities of Daily Living), antiparkinsonian medication use

 

Statistical analysis and its challenges

MDS-UPDRS (part III) score, PDQ-39, and antiparkinsonian medication use were analyzed using the paired t-test and CGI was analyzed using the non-parametric Wilcoxon signed-rank test. The Schwab and England ADL scale was analyzed with an ANOVA.

The first challenge was to understand the differences between the Off and On states. We also had to deal with missing data. It was decided that the missing values on visit level would be imputed by the worst response observed among all participants (primary analysis), with sensitivity analyses employing the baseline observation carried forward (BOCF) and the multiple imputation (MI) using Markov chain Monte Carlo (MCMC) methods.

Another more challenging aspect was understanding and programming the antiparkinsonian medication use (analyzed as secondary endpoint), which is calculated in LED. For this task, a close collaboration with the sponsor’s medical experts was needed to define the conversion factors and handle correctly special cases of medication combinations.

 

An adaptive design with four interim analyses

The study was designed to include four interim analyses and one final analysis, using the Lan-DeMets group sequential method with the O’Brien-Flemming α-spending function and Pocock β-spending function. The O’Brien-Fleming boundaries preserve a nominal significance level at the final analysis that is close to that of a single test procedure, so it is very conservative for the earlier interim analysis.9 The Pocock β-spending function uses approximately equal cutoffs for each analysis.

The efficacy and futility boundaries were computed via Cytel’s EAST software, which is integrated into the East Horizon™ platform. For the interim analyses, the efficacy and futility boundaries had to be recalculated based on the actual sample sizes.

 

Final takeaways

Parkinson’s disease is a lifelong and progressive, degenerative multiple-symptom disease that affects millions worldwide. The treatment is highly individualized and depends on the disease stage and severity of motor and non-motor symptoms. When symptoms become bothersome, current therapies primarily focus on symptom management, with pharmacological options such as levodopa and dopamine agonists forming the cornerstone of care. For those whose symptoms don’t respond well to medication in later stages, advanced options like deep brain stimulation (DBS) offer hope, which can provide relief for tremors and reduce dyskinesias.

The adaptive design of the case study offered a flexible, efficient, and ethical approach without compromising the validity and integrity of the study, which is implemented in the East Horizon™ platform that offers a comprehensive tool for trial design during all stages of development.

Clinical Data Management’s Next Evolution: From Data Stewardship to Data Intelligence

Clinical Data Management (CDM) is undergoing a fundamental transformation. What was once primarily a function focused on data collection, validation, and cleaning is now emerging as a strategic, technology-driven discipline at the heart of modern clinical research.

Today’s trials generate unprecedented volumes of complex data. A recent Tufts Center for the Study of Drug Development survey found a 7x increase in data points and 4x increase in data sources. Here at Cytel we have seen studies with over 20 data sources. Beyond traditional electronic data capture (EDC), clinical studies increasingly incorporate electronic health records (EHRs), wearable devices, mobile applications, genomics, imaging, and real-world evidence (RWE). While these data sources create enormous potential for deeper insight, they also introduce new challenges that conventional CDM approaches were never designed to handle.

To unlock the value of this expanding data universe, clinical organizations must rethink not only their tools, but also their talent, workflows, and mindset.

 

The rise of new roles in clinical data management

This evolution has created demand for new, specialized roles that bridge clinical knowledge, data science, and technology:

 

Clinical Data Scientist (CDS)

Clinical Data Scientists focus on extracting insight from complex medical data. They apply advanced analytics, visualization, and domain expertise to uncover trends, assess data quality risks, and support clinical and operational decision-making.

 

Clinical Data Engineer (CDE)

Clinical Data Engineers design and maintain the data infrastructure that makes modern analytics possible. They build robust, compliant data pipelines, integrate diverse data sources, and ensure data is reliable, traceable, and analysis-ready across the clinical trial ecosystem.

 

Together, these roles move CDM beyond data stewardship toward true data enablement.

 

The expanding complexity of clinical data

Modern clinical trials are no longer linear or siloed. Data flows continuously from multiple sources, often in near real time, and in formats that vary widely in structure, granularity, and reliability. Managing this complexity requires more than rule-based checks and manual reviews. Organizations need scalable data architecture, advanced analytics, and intelligent monitoring approaches that can adapt as data volume, velocity, and variety increase. This shift marks a move away from reactive data cleaning toward proactive data intelligence.

 

Why data visualization matters more than ever

As data points multiply, traditional listings and static reports quickly become unmanageable. Data visualization is no longer a “nice to have,” it is essential. Advanced visual analytics enable clinical teams to identify patterns, compare data across sites, and detect emerging issues early, before they compromise data quality or timelines. By transforming complex datasets into intuitive visual insights, teams can move faster, ask better questions, and focus attention where it matters most.

 

Figure 1: Early Detection of Data Quality Risks through Data Visualization Use Case

Systemic audit trail analysis and regulatory expectations

Regulatory expectations are also evolving alongside data complexity. The 2023 EMA guidance places increased emphasis on audit trail review, signaling a shift from point-in-time checks to systemic analysis. Manual audit trail reviews are no longer sufficient at scale. Instead, sponsors and CROs must adopt analytical approaches that continuously monitor audit trail activity while identifying unusual patterns. This will support site fraud detections, risk-based quality management, and inspection readiness. Analytics-driven audit trail review not only improves compliance, but it also strengthens overall data integrity and operational oversight. In short, the audit trail data needs to be treated similarly to clinical data. In 2025, Cytel was made aware of multiple sponsors being asked to provide evidence of a systematic review of the audit trail data by regulatory authorities.

 

Figure 2: Systemic Audit Trail Analysis Use Case

From comprehensive reviews to trend and outlier detection

In a world of big data, reviewing everything is neither practical nor effective. The future of data cleaning lies in intelligent prioritization. By leveraging statistical methods and trend analysis, CDMs can shift from exhaustive data review to targeted investigation focusing on outliers, inconsistencies, and meaningful deviations. This will reduce manual effort while improving data quality outcomes, aligning with risk-based monitoring principles, and enabling faster, more confident decision-making throughout the trial lifecycle. This is accomplished by statistically analyzing the data variability similar to how statistics are used to evaluate for safety and efficacy and assigning risk levels to the various checks that are performed. An overall risk level is also created and based on the analysis targeted data checks are performed.

 

Figure 3: Risk-Based Data Cleaning Use Case

Building insight-ready clinical data ecosystems

The future of clinical data management is not defined by a single tool or technology, but by an ecosystem; one that combines modern platforms, advanced analytics, and specialized talent.

Organizations that invest in insight-ready data architectures and deploy the right expertise will be better positioned to improve data quality, accelerate timelines, and generate deeper insights from increasingly complex datasets. As clinical research continues to evolve, CDM’s role is expanding from managing data to unlocking its full strategic value.

 

Interested in learning more?

William Baker and Jenn Sustin will be hosting the webinar “Enabling the Shift to Clinical Data Science and Engineering for Modern Trials” on February 18 at 10 am ET:

Looking Ahead to 2026 and Beyond: Views, News, and PHUSE

At the outset, a disclaimer. This piece is potentially “old hat” for you, as it comes from someone who has retired from executive/managerial roles. But wait! One cannot ever retire from observing, admiring, and, therefore, learning. “With all thy getting, get understanding” — a biblical verse inscribed in a Cytel founder’s office — is etched in my mind, so the insatiable quest for absorbing.

What’s in store in the year ahead and beyond? A few things come to my mind:

 

AI and even more AI

I know, I know. You have probably had an overdose on readings about AI. Still, my two cents in short bullets.

  • You gotta learn to use AI seriously. Like it or not. So, you better like it.
  • You don’t need to become an AI expert, just a skilled user.
  • Examine your job description. Anything routine/mechanical is going to evaporate with AI magic. So, amplify your focus on innovating, creating, and original thinking.
  • Don’t trust AI blindly. Find smart ways to validate what it churns out.

While AI usage is still in a nascent stage, early adopters of smart prompt engineering and dependable validation will be at a great advantage for future opportunities.

Here at Cytel we have access to a first-rate suite of AI tools. Judicious and ingenious use paves excellent career growth pathways. Go get started!

 

Domain knowledge shall reign supreme

Through my 28 years at Cytel, every occasion of learning something new about drug development brought me new opportunities. Whether it be a complex therapeutic area, or how adaptive designs are crafted, or how drug delivery works, or how DMC functions — a little bit of enlightenment went a long way in delivering greater value to a client. Regardless of one’s specialization (the “horizontal”), the domain “vertical” opens doors to career growth. I see that becoming even more prominent going forward.  For example, real-world data (RWD) is helping accelerate and enhance drug development, and I have seen young statisticians get excellent opportunities based on their deepening understanding of RWD.

 

Jack of all trades

I have been a firm believer of broader knowledge (not just deeper) working wonders. Occasionally, when I was pushed into supporting business development (e.g., crafting RFP responses, or making a pre-sales demo and presentation), the value of knowing a little bit of everything dawned bright and clear. This year and beyond in future, I feel sure versatility will be a big virtue — for value delivery to the client and, therefore, to one’s own career.

 

GCCs (Global Capability Centers) gain traction

Knowledge-focused companies like Cytel are ideally suited to become skilled competency centers serving global sponsors. The three-decade-old idea of SDFs in the Software Industry is reincarnating now through the concept of GCCs in our domain. Deep scientific knowledge, when combined with deep understanding of a specific sponsor’s processes and specialties is invaluable. “Outsourcing” began with simple cost saving as the core proposition. That has rapidly matured toward 1) tapping large talent pools; 2) innovation and intellectual property creation; and 3) specialized CoEs (Centers of Excellence). In 2026 and beyond, I foresee GCCs becoming knowledge powerhouses. And I foresee global biopharma continuing to welcome specialist service providers to host the GCCs, in addition to their own DIY versions.

 

PHUSE APAC Connect

From expressing the news and my views, let me now move on to PHUSE. This global Healthcare Data Science Community, over the past two decades, initially held annual conferences all across Europe. It then spread its wings to the US with the CSS (Computational Sciences Symposium), partnering with the US FDA, and then to the “US Connect” annual conferences.

It is now making a grand debut in the Asia Pacific Region. The first ever “APAC Connect” of PHUSE is scheduled from February 19–21 in Hyderabad, India. PHUSE has a large following in the APAC region with over 10,000 members spread across India, China, Japan, Singapore, Malaysia, Australia, and several other countries.

What’s more, this event will include the India CDISC Day 2026!!!

 

This event will address a few major themes.  

  • GCCs in the APAC region. This region has the unique advantage of a huge talent pool and is moving up from cost efficiency to innovation hubs and CoEs.
  • Impact of AI. How AI will reshape careers and leadership in drug development. This topic will figure across panel discussions, presentations, and the leadership stream.
  • There will also be a panel discussion on upcoming innovations in drug development that are going to be potential game-changers.

If you are attending the event, use the PHUSE app to curate your personalized agenda and schedule, choosing among the multiple parallel streams.

 

Cytel has always been a big participant at PHUSE events. Consider these snippets:

  • Several first-time Cytel presenters have won best presentation prizes
  • We have been exhibitors and sponsors at many of PHUSE events
  • A few folks, like Angelo Tinazzi from our Geneva office, are celebrated contributors to a number of PHUSE initiatives. Angelo authored the much-acclaimed eBook The Good Data Submission Doctor on Data Submission and Data Integration to the FDA.
  • A Cytelian, having served as a PHUSE Board Member, and being instrumental in bringing PHUSE to Asia, has been invited to chair the Inaugural APAC Connect. Guess who that is!😊
  • Two more Cytelians, Pratibha Jalui and Sudipta Basu, are serving as Stream Co-Chairs.
  • Angelo will be the EU Connect Chair later this year (he served as the Co-Chair last year) in Glasgow, Scotland.
  • This is the first time ever that Cytelians have been chosen for this privilege.
  • At the PHUSE APAC Connect, we have lots of Cytel presenters: Corey Dunham, Pratibha Jalui, Diganta Bose, Aboli Katdare, Charles Warne, Pradip Maske, Chandan Patel Malyala, and Anoop Rawat. We will also have an exhibit booth (#4) with Mansha Sachdev representing our marketing team.

 

Personally, PHUSE has been a booster rocket for my professional career. It brought numerous opportunities of engaging with three significant audiences:

  • Industry peers, exchanging ideas and co-driving initiatives
  • Prospects among big pharma and biotech, several later became clients
  • A talent pool of bright young professionals, some of whom joined Cytel to enhance our ever-growing brainpower

 

The APAC Connect 2026 has a rich 2.5-day agenda that spans across keynote speeches, panel discussions, presentations, hands-on workshops, software demonstrations, a poster session, and a couple of networking events.

 

The bottomline

We at Cytel have an exemplary track record of bringing rigorous data science to the service of human health outcomes. That’s our raison d’être!

Together, let’s take that forward in 2026 and beyond!

 

Meet with us!

Will you be attending PHUSE APAC Connect in Hyderabad, India, this February? Stop by Booth 4 to get to know our experts and learn how Cytel is shaping the future of data‑driven drug development, or click below to book a meeting to discuss career opportunities at Cytel:

Evaluating Safety and Efficacy in Phase III Alzheimer’s Disease Trial: Endpoints and Statistical Analysis Methods

In clinical trials studying Alzheimer’s disease — a complex neurodegenerative condition that gradually impairs cognitive functions — cognitive performance and functional abilities are often assessed together. Understanding these dimensions and how they’re measured in clinical trials is essential in shaping Cytel’s statistical analyses.

Here, we discuss our experience working with a sponsor on a Phase III clinical trial evaluating the safety and efficacy of monotherapy in patients with Alzheimer’s disease and the statistical model we used to analyze the repeated measurements on two co-primary endpoints.

 

Alzheimer’s disease

Alzheimer’s disease is a complex neurodegenerative condition that gradually impairs cognitive functions. Its onset and progression are influenced by a range of risk factors and some of the most well-established include age, gender, family history, genetic predisposition, and underlying health conditions.

The disease unfolds in distinct stages, each reflecting a different level of cognitive and functional decline. These stages range from mild cognitive impairment to severe dementia, with symptoms worsening as the disease advances.

 

Evaluation of Alzheimer’s disease in clinical trials

In clinical trials, the severity of impairment is evaluated using various scales, each addressing distinct aspects of cognitive and functional decline. The most effective approach combines both cognitive and functional assessments, as functional abilities are closely tied to cognitive performance.

Understanding these dimensions and how they’re measured in clinical trials is essential in shaping the statistical analyses used. Multiple discussions between stakeholders and the sponsor need to take place to reach a consensus on the appropriate endpoints and statistical methods to be used for the analyses.

 

Investigating safety and efficacy of monotherapy in patients with Alzheimer’s disease

We recently collaborated with a small biotech company specializing in Alzheimer’s research on a Phase III clinical trial investigating the safety and efficacy of monotherapy in participants with Alzheimer’s disease, followed by a 12-month open-label treatment. This study has been the subject of complementary analyses exploring biomarkers (p-tau181 and p-tau217) and additional comparative effectiveness analyses with external control arms.

 

Two primary endpoints: ADAS-Cog11 and ADCS-ADL23

To evaluate treatment efficacy in the Phase III trial, we focused on two co-primary endpoints: the ADAS-Cog11 and the ADCS-ADL23, measured at multiple timepoints throughout the study.

 

ADAS-Cog11: The cognitive assessment

The ADAS-Cog11 is a cognitive subscale that assesses key domains such as memory, praxis, orientation, and language. Scores range from 0 to 70, with higher scores indicating greater cognitive impairment. A more refined version of the ADAS-Cog11, known as the ADAS-Cog13, includes two additional items that assess memory and attention. This new version provides additional sensitivity to change in cognition at earlier stages of AD.

For the primary analysis, ADAS-Cog11 was retained as the primary endpoint. This decision was guided by its use in previous studies evaluating the same investigational product, ensuring consistency and comparability across trials. The added value of the ADAS-Cog13 was also analyzed as an explorative efficacy variable to provide deeper insights into cognitive outcomes.

 

ADCS-ADL23: The functional perspective

The ADCS-ADL23 scale complements the ADAS-Cog11 by providing a functional perspective that reflects the impact of cognitive decline. It evaluates the ability to perform daily living activities, with scores ranging from 0 to 78, where higher scores reflect better functional ability and thus less impairment.

 

Cytel’s approach: Analysis with Mixed Models for Repeated Measures (MMRM)

To analyze the repeated measurements on the co-primary endpoints, we employed Mixed Models for Repeated Measures (MMRM). This approach allows the comparison of cognitive and functional changes over time across treatment arms in a robust and flexible way.

In our models, several key risk factors are included to ensure a well-adjusted analysis. These include baseline disease severity, as measured by the Mini-Mental State Examination (MMSE), prior use of standard AD treatments, and geographic region, as fixed effects. Adjustment for baseline values of the ADAS-Cog11 or ADCS-ADL23 scores is considered to account for differences between subjects at baseline. This helps improve the precision of treatment effect estimates and correct for any imbalances between treatment groups. We also include the treatment group indicator along with its interactions with visit timing to capture if and how treatment effects evolve over time.

This method is particularly valuable for multiple reasons. First, it allows controlling for variables that could influence the observed outcomes — like known risk factors — to be able to understand the treatment effect more accurately. Additionally, by using mixed effects models, both the between and within-subject variability over time is accounted for, which is especially important in a heterogeneous condition like Alzheimer’s. Finally, one of the key strengths of MMRM is its ability to handle incomplete data, meaning it can account for missing values without requiring imputation.

The MMRM method supports the generation of individual and group profile graphs over time. These visualizations offer a clear and intuitive way to observe the evolution of treatment effect. They make it easier to compare trends across groups or subjects, and communicate findings in a straightforward manner, both to scientific audiences and to stakeholders who may not be familiar with the statistical details.

 

Final takeaways

Alzheimer’s disease is the most prevalent neurodegenerative disease and remains one of the most complex challenges in clinical research, requiring robust methodologies to capture both cognitive and functional decline over time. Complementary and adapted clinical scales are essential tools for assessing disease progression, and advanced statistical methods offer a robust and flexible interpretation of the treatment effect.

By leveraging adaptive models, mixed-effects approaches, and sensitivity analyses, we help sponsors generate reliable insights that drive decision-making in neurodegenerative drug research.

A Preview of Cytel’s Contributions at PHUSE EU 2025

I can’t believe it has already been a year since we wrapped up PHUSE EU Connect 2024, and in two weeks we will be gathering another exciting PHUSE EU Connect conference, only a few kilometers from Heidelberg, where everything started twenty years ago with the very first PHUSE event. I was one of the couple hundred lucky attendees and now, twenty years later, I have the great honor of supporting Jennie McGuirk and Jinesh Patel as Conference Co-chair for this year’s edition.

With a promising agenda featuring about 190 presentations, 34 posters, 9 hands-on workshops, 2 panel discussions, and 3 inspiring keynote speakers, this year we are going to the city of Hamburg for the 21st PHUSE EU Connect. The agenda is full of topics looking toward the future, with about 40 talks and posters referring to AI in their titles, and once again open source will be the confirmed leitmotif.

Cytel will make a significant contribution this year, perhaps more than ever, with six presentations, one poster, active participation in both panel discussions, and co-chairing the “Scripts, Macros and Automation” and “People Leadership & Management” streams.

 

Monday topics: Agile code writing, extracting metadata from R OOP functions, and leadership

The week kicks off on Monday with Kamil Foltynski, who will present “Overcoming Challenges in Collaborative Spreadsheet Editing with Shiny, SpreadJS and JSON-Patch” in the Application Development stream at 11:30 am. Kamil will provide a technical deep dive into enabling real-time spreadsheet editing within Shiny applications, using tools such as SpreadJS, sharing key lessons learned so far. Following Kamil’s presentation, Eswara Satyanarayana Gunisetti, will present “Micro-Decisions, Macro Impact: The Role of Agile Thinking in Every Line of Code” in theCoding Tips & Tricks” stream at 12 pm. See his recent blog on the topic. Eswara will share how an agile “mindset” can positively influence the way we write code.

In the same stream, a few hours later at 2 pm, another colleague Edward Gillian, in collaboration with Sanofi, will present “Risk.assessr: Extracting OOP Function Details,” discussing strategies for extracting metadata from R Object-Oriented Programming functions. Prior to Eswara and Edward’s sessions, at 1:30 pm, Kath Wright, will moderate the Interactive People Leadership & Management session “Invisible Glue: Trust, Influence and The Architecture of Teamwork.” With this live workshop, attendees will engage in practical exercises to learn how to identify barriers to trust, evaluate influence dynamics, and apply evidence-based strategies to strengthen collaboration in both physical and virtual environments.

 

Tuesday topics: Industry trends, extracting macro usage and dependency information from SAS programs, and integrating ECA data into CDISC-compliant datasets

Tuesday also brings two presentations and one poster. Right after lunch at 1:30 pm, Cedric Marchand will join other industry leaders in the panel discussion “Reimagining Statistical Programming: AI, Standards & the Talent of Tomorrow.” The panel will explore how current industry trends, such as AI, open source, and the evolution of data standards, will influence the next generation of statistical programmers.

The afternoon continues at 4 pm with my young and talented colleague Marie Poupelin, who will present “From Zero to Programming Hero: How Internships Shape Statistical Programmers in a CRO” in the “Professional Development” stream. Marie is a great example of the success of our internship program, and she will share her journey from having “zero” statistical programming experience to becoming an industry-ready programmer. Thirty minutes later, at 4:30 pm, Guido Wendland will present “Which Macros Are Used in the Study?” in the “Scripts, Macros and Automation” stream, a stream co-led this year for the first time by my colleague Sebastià Barceló. Guido will discuss techniques to extract macro usage and dependency information from SAS programs; this is particularly useful for identifying potential issues or estimating the impact of macro updates.

Later, in the traditional Tuesday evening poster session, you can join my colleague Cyril Sombrin in discussing “Our Journey in Integrating External Control Arms (ECAs) and RWD for Rare Disease Trials.” There you can discuss real-world case studies on integrating ECA data into CDISC-compliant datasets, exploring the unique challenges and solutions when aligning real-world data with CDISC standards.

 

Wednesday topics: Real-time spreadsheet editing within Shiny applications and real-time validation and streamlined submissions

On Wednesday at 12 pm, Hugo Signol, another young talented Cytel statistical programmer and a product of our internship program, will present his talk “From XPT to Dataset-JSON: Enabling Real-Time Validation and Streamlined Submissions.” Building on Cytel’s experience from CDISC Dataset-JSON-Viewer Hackathon, Hugo will demonstrate a Shiny application that supports interactive exploration and real-time validation through API-based checks.

 

Meet us there!

Cytel will be at Booth 9 at the conference, where you can engage in discussions with our team or meet any of us throughout the week.

I hope I didn’t miss anyone, or anything! We look forward again to reuniting with colleagues and friends from around the world and meeting new acquaintances.

See you all in Hamburg!