Reimagining Clinical Data Standards
In clinical research, data is our most valuable asset, yet the standards we rely on to manage it—CDISC’s SDTM, ADaM, and CDASH—are outdated and overcomplicated. It’s time to step back and rethink how we structure, store, and utilize clinical trial data while embracing the advantages of newer technologies.
The Burden of Complexity and Redundancy
Current standards are weighed down by redundant and overlapping guidance across clinical domains, therapeutic area guides, and implementation guides. These standards have been developed in silos over the decades, resulting in a patchwork of inconsistent rules and processes. Consider Lab, ECG, and Vitals—three datasets that share the same basic normalized data structure, but are treated differently within and across SDTM IG, SENDIG, ADaM, and other therapeutic guides. This mishmash leads to inconsistent rules and unnecessary nuances, creating significant challenges for teams trying to standardize the data and manage it efficiently.
Furthermore, data standards such as Supplemental datasets, Subject Elements (SE) and EPOCH offer little value downstream, yet teams are forced to invest significant resources into their creation and maintenance. These data add complexity without contributing meaningfully to the overall analysis.
The future demands a unified standard—one that spans CDASH, SDTM, and ADaM with consistent rules and variables, eliminating unnecessary complexities and allowing data to flow seamlessly from collection to analysis. Such a standard would make it easier for teams to standardize, analyze, and extract value from the data, while fostering clarity and consistency across clinical trials.
Outdated Technology
The foundation of our current data standards—built on SAS xpt v5 and XML—was developed in a time when data processing capabilities were far more limited. Today, we are working with large, complex datasets, yet we remain tethered to outdated formats that restrict flexibility and scalability. For instance, limiting variable names to 8 characters, data values to 200 characters, and use of SAS date formats not recognized in other platforms, create unnecessary obstacles for interoperability with today’s technologies.
It’s time to embrace formats like Parquet—a highly compressed, efficient data structure that supports high-volume analysis and real-time processing. This isn’t just about improving efficiency; it’s about equipping the industry to handle the vast scale and complexity of modern clinical trials. Parquet’s ability to store massive datasets efficiently and enable fast querying will allow clinical researchers to optimize data use.
In parallel, it's time to replace XML-based metadata representations in define.xml with more flexible approaches such as HTML or JSON. These modern formats are better suited to current web standards and ensure compatibility with the tools and platforms that will define the future of clinical data management.
Unlocking benefits of automation & AI
Simplified standards remove inconsistencies and unnecessary complexity, enabling automated processes to operate more efficiently. With uniform rules and streamlined data structures, automation tools can handle data mapping, validation, and transformation seamlessly, reducing manual effort. Additionally, AI algorithms will thrive on clean, well-structured data standards, allowing faster insights, anomaly detection, and predictive analysis. By simplifying standards, we unlock the full potential of automation and AI, driving productivity and accelerating the clinical trial process.
Recommended by LinkedIn
Imagining a Better Future
Change won’t come easily. Organizations and regulatory bodies like CDISC and the FDA are cautious when it comes to adopting new standards, even as technology has dramatically evolved. This reluctance stifles innovation and limits the industry’s ability to achieve productivity via the power of automation and AI. However, the need for modernization is undeniable. Without it, we remain locked into a outdated framework that fails to support the advanced tools and technologies required for the future of clinical research.
Transformation may not happen overnight, but it begins with vision. It’s time to reimagine clinical data standards as streamlined, unified systems that enhance interoperability and adaptability. Redundant complexities should give way to concise guidance that empowers teams to innovate, automate, and focus on what matters most—improving patient outcomes.
Some of the articles I found interesting this week are:
Driving AI and R Shiny Initiatives | Lead FDA Submissions | Managed Global Teams |
1moGreat article! Thanks for these insights
Experience in RWE, HEOR, PRO, HTA, R-Development, SAS/Analyst, Biostatistician and Consultant Statistician
2moYes Vineet really all regulators should realise the need of this new way of accepting the submission packages with other formats as you highlighted.
Retired
2moVery important to bring these complications to the forefront. CDISC and CDASH need to redirect the way they have been structured to be less restricted and more common. This article brings that point home. We also need to work with the agencies to get them to move into todays technology to make the analysis and submission less challenging.
Very informative