Metadata Repositories: Overcoming Challenges with Automation
February 23, 2024
Written by Angelo Tinazzi, Nicolas Rouillé, and Sebastià Barceló
In the realm of standards management, companies of all sizes are increasingly exploring the potential of metadata repositories (MDR). From protocol development to eCRF, SDTM, and ADaM to Analysis Results, these repositories are being used to speed up study set up and delivery. Sponsors leverage metadata using a framework that involves putting together a data governance team that establishes and pilots company standards, defines roles, establishes workflows, develops standard operating procedures, and provides necessary training. This structured framework is supported by selecting or building a metadata repository that aligns with the established infrastructure.
However, unlike sponsors, CROs encounter specific challenges when implementing standards management or metadata use, given that each sponsor has unique processes and not all aspects of clinical trial are always managed by a single CRO.
Here, we share a new approach to automation to address the challenges inherent in metadata repositories.
Use of metadata: Sponsors vs. CROs
The key distinction between sponsors and CROs is that sponsors have the “autonomy” to deploy their standards framework across their portfolio, on a limited number of indications, potentially integrating studies within a metadata repository while coexisting with off-MDR studies.
On the other hand, CROs that choose to implement MDRs operate within diverse client contexts where they must adapt to each client’s standards and requirements (or lack thereof), managing variability in data governance practices and technical choices. The scope of CRO contracts further adds nuance, as the benefit derived from MDR usage may vary based on the contracted services, ranging from full-service biometrics to limited reporting services. Lastly, in addition to compliance with health authorities’ submission requirements, CROs are required to adapt to the evolving sponsor strategies and technical choices.
Challenges in implementing a metadata repository
Both sponsor organizations and CROs share a common objective of reducing the duration of clinical trials and expediting the time to market for new products. Meeting the increasingly complex requirements of health authorities necessitates the development of technical strategies that align with the expected level of standardization, supported by meticulous documentation highlighting traceability, linkage of data elements, and detailed descriptions.
Moreover, we also need to consider the adoption of multiple programming languages like R and Python, and the impact this has on data workflows, process and resource configurations within a company, and outsourcing engagements between companies.
A new approach to addressing metadata usage challenges
There are numerous MDR solutions available, yet no single dominant leader has emerged. The proliferation of MDR solutions adds complexity to centralizing and managing metadata, with variations in data governance team assembly and standards implementation.
Contemporary MDR solutions tend to primarily emphasize standardization in upstream artifacts like protocol and case report form development, leaving significant limitations in consuming metadata for statistical analysis deliverables such as ADaM, Tables, Listings and Figures (TLFs); statistical analysis plans (SAP), and study mock shells documents. Given these constraints, while still looking at the market options and evolving open-source initiatives such as “The Open Study Builder” from the CDISC COSA Initiative, we’ve pursued an alternative approach to harnessing metadata.
Our automation team’s approach involves a strategic division of the data workflow based on the type of deliverables, such as eCRF (CDASH), SDTM, SAP, study mock shells, ADaM, and TLFs. We identify and enrich metadata from each artifact to fully automate the production of a given deliverable. This user-centered strategy breaks down the challenge into manageable components, attributing each artifact to a specific function (e.g., data management, biostatistics, statistical programming). This approach provides subject-matter expertise and directly supports the conceptualization and testing of automation tools. It also accelerates tool development, allowing parallel development of multiple tools and asynchronous release. This helps to significantly speed up the release cadence and accommodates potential failures in tool development without hindering the use of tools created for other upstream or downstream steps of the workflow.
The Cytel PRISM application (presented at PHUSE-EU 2022 and 2023), is one example of how we broke down the challenge into manageable components.1,2 With PRISM, we are able to capture tagged metadata directly from study mock shells developed at Cytel with Lighthouse, another tool to support Cytel biostatisticians in developing SAP mock shells from a standard library, to then automatically generates TLFs programs, either in SAS or R. At this stage, we can automate on average about 60% of the outputs needed for a study through PRISM (see applied workflow in Figure 1).
Figure 1: Cytel analysis results standard ARS-driven generation of TLFs

The value of metadata can be maximized in clinical trial delivery by starting with the metadata inherent in study artifacts. This divergent approach accommodates a multitude of sponsor standards and delivery requirements without sacrificing the benefits of automation within an ecosystem of interdependencies between regulatory authorities, industry consortia, sponsors, CROs, and third-party technology vendors. Using our unique approach, we have streamlined the automation of TLFs production.
Interested in learning more about data submission and data integration? Download our complimentary eBook.
Subscribe to our newsletter
Angelo Tinazzi
Senior Director, Statistical Programming, Clinical Data Standard & Submission
Angelo Tinazzi is Senior Director, Statistical Programming, Clinical Data Standard & Submission, at Cytel. Angelo is a well-published and recognized expert in statistical programming, with over 25 years’ experience in clinical research. In particular, his core expertise lies in the application of CDISC standards across different therapeutic areas, such as data submission to health authorities like the FDA and PMDA.
As well as being an authorized CDISC instructor, Angelo is member of the CDISC European Committee, where he also manages the Italian-speaking CDISC User Network. Angelo is also stream chair of PHUSE-EU “Scripts, Macros and Automation.”
Prior to joining Cytel, Angelo worked at Merck Serono, SENDO Foundation, Phamarcia & Upjohn, Simbologica SAS Quality Partner, the UK Medical Research Council, and the Institute for Pharmacological Research “Mario Negri.”
Read full employee bio
Sebastià Barceló
Associate Director, Statistical Programming
Sebastià Barceló is Associate Director, Statistical Programming, at Cytel in Geneva. He has more than 10 years of experience in the field of clinical research in the areas of data management, biostatistics, and statistical programming with different roles in CROs in Spain and Switzerland. Sebastià currently manages a team working on automation initiatives and tool development using multiple programming languages.
Read full employee bioClaim your free 30-minute strategy session
Book a free, no-obligation strategy session with a Cytel expert to get advice on how to improve your drug’s probability of success and plot a clearer route to market.