Transforming Pharma Data: A Practical Guide to Creating a FAIR Organization

Transforming Pharma Data: A Practical Guide to Creating a FAIR Organization

How often have you found yourself faced with a file containing experimental results that you might have even created yourself? At the time, it made perfect sense—you analyzed it, used it, and then “stored it somewhere… because, you know, you might need it one day.” But months or even years later, when you open it up for an important paper or report, you find yourself staring at something like this:

Table -A- "Dummy data" illustrating an assay result without any useful header, metadata, or structure
“What the hell is this thing?” you wonder.

Without any description, this file is essentially a table full of numbers, which, without context or explanation, make no sense at all. Raise your hand if this has ever happened to you! As an experimental biologist, I can tell you that early in my career, it happened more often than not. And while, in the past, this was simply frustrating because you couldn’t reuse something you’d worked hard to produce, today, the stakes are even higher.

We’re in the age of AI, where data fuels innovation, and it’s not just your data—AI feeds on data from countless sources. For AI to work effectively, we need a system that lets us look at data and immediately understand what it’s about, how it was produced, who produced it, and whether it meets quality standards.

This article is about the importance of making data reusable and the steps you can take to ensure that your data, and that of your organization, can support the future of AI-driven discovery.

Introducing the FAIR Data principle in Pharma ...

In today’s world, pharmaceutical companies are producing several exabytes of data annually. From biological assays to chemical structures and patient studies, the amount of information being generated is staggering. But there’s a critical problem: much of this data is inaccessible, disconnected, and impossible to reuse. If data remains hidden in unstructured formats or isolated in silos, the opportunity to leverage it for innovation is lost. In a world where AI and machine learning are revolutionizing drug discovery, having well-organized, reusable data is essential. This is where FAIR data principles—Findable, Accessible, Interoperable, and Reusable—come in.

Yet, many pharmaceutical organizations find themselves with datasets that are anything but FAIR. In this article, I’ll illustrate the challenge with real-world example, break down the current state of data management, and provide a practical roadmap to make your organization FAIR-compliant.

The Current Challenge: An All-Too-Familiar Story

Let’s imagine a typical scenario in Pharma R&D. You’re exploring data from a recent enzyme inhibition assay to understand a new compound’s potency, but all you have is an Excel file with rows and columns of numbers, devoid of headers or context (see Table-A- above).

No headers, no units, no context. Without metadata (data that describes other data), it’s impossible to know if these numbers represent IC50 values, temperatures, compound IDs, or assay results. Each row is an isolated data point, disconnected from the bigger picture, leaving users to guess or retrace experimental steps.

And imagine if this data is still fresh, you might know what the different rows and columns correspond to ... but look at the same file in a few months, or give it to a colleague ... Different story!

The Solution: Making Data FAIR

Published in Nature in March 2016, "The FAIR Guiding Principles for scientific data management and stewardship" explained what is FAIR/why it is important, but it was more a philosophy than a cook-book

The FAIR data principles, as defined by Wilkinson et al., address a growing challenge in scientific data management: the need for data that is easily accessible, understandable, and reusable—not just for researchers but also for machines. Traditionally, datasets are often stored without adequate descriptions or standardized formats, making them difficult to locate and interpret. This is particularly problematic in today’s data-rich environment, where maximizing the impact of scientific research depends on effective data reuse.

The FAIR principles provide a framework to overcome these challenges by ensuring that data is:

  • Findable: Data should have a unique identifier and be well-documented, so it can be easily located in searchable resources.
  • Accessible: Data and its metadata should be retrievable using standard communication protocols, ensuring availability over time.
  • Interoperable: Data should use standardized vocabularies and formats to enable integration with other datasets.
  • Reusable: Data should be richly annotated with clear provenance and usage licenses, supporting future research.

These principles serve as a foundation for sustainable data stewardship, making data not just a byproduct of research but a reusable asset that can drive innovation in the age of AI and machine learning

FAIR data example: in this example, it doesn't matter if you or a colleague produced the data, there is enough context to understand what's going on today ... and also if you're looking at this file several years from now.

Practical Steps for Implementing FAIR Data in Your Organization: A Cookbook, not a Philosophy Manual 🤔

Implementing FAIR data principles in your organization doesn’t have to be an abstract, philosophical undertaking. Think of it like following a cookbook: each step should be practical, clear, and achievable to guide your team through the process of making data FAIR—Findable, Accessible, Interoperable, and Reusable. By focusing on concrete actions, you can turn FAIR data from an intimidating concept into a straightforward, step-by-step plan that integrates seamlessly with daily workflows. Here’s how:

  1. Automate Data Capture and Metadata Entry: Integrate automated data capture into your instruments, LIMS (Laboratory Information Management Systems), or ELNs (Electronic Lab Notebooks). This automation can record essential metadata—such as date, time, and instrument ID—without relying on manual entry, reducing human error and ensuring consistency.
  2. Pre-defined Templates and Standard Fields: Leverage pre-configured templates within your LIMS or ELNs to standardize data entry. Set up fields with auto-complete options for common values, like temperature units, to ensure consistency while minimizing extra effort. This simple step can enhance the uniformity and reliability of your data.
  3. Centralized Repository and Unique Identifiers: Create a centralized database that assigns unique identifiers for each dataset. Link metadata fields to external standards (e.g., PubChem, UniProt IDs) to make your data more findable and interoperable, facilitating connections across different datasets and resources.
  4. FAIR Compliance Dashboard: Add a dashboard within your LIMS to track and provide instant feedback on data completeness and FAIR compliance. This feature can alert users to any missing metadata, allowing them to correct issues in real time and ensure that all datasets meet FAIR standards.
  5. Regular Training and Recognition: Offer regular training sessions to keep your team informed about FAIR principles and their importance. Recognize and reward teams that excel in producing FAIR-compliant data to foster a culture of data quality and accountability. Making the “why” behind FAIR clear helps motivate staff and integrate FAIR principles into everyday practices.

These steps transform the FAIR principles from theory into practice, making FAIR data achievable in any organization.

Why Data Deserves the Spotlight: The Hidden ROI of a FAIR Organization

Imagine an executive enamored by AI’s transformative potential, captivated by the promise of cutting-edge models and automation. Yet, when it comes to data, they’re less enthusiastic—data isn’t “sexy,” and it’s tough to secure funding for foundational data initiatives. This disconnect is common but problematic. AI “eats data for breakfast,” and without a robust data foundation, even the most sophisticated AI projects are likely to underperform. Organizations that recognize data as a strategic asset, especially when managed under FAIR principles, gain a distinct advantage in the race toward successful AI implementation. The ROI of a FAIR organization is clear and measurable, particularly in the pharmaceutical sector where data-driven insights fuel discovery and innovation.

I’ve worked on projects where, due to the lack of FAIR data, we spent months just preparing data before we could even start using it in meaningful ways. It’s no secret that data scientists and data engineers often spend up to 80% of their time on data preparation instead of actual analysis—just think about that!

  1. Faster AI/ML Integration: FAIR data is clean, standardized, and ready for AI and machine learning models. This dramatically reduces pre-processing time, enabling AI systems to deliver insights faster. In pharma, this translates to accelerated drug discovery and quicker time to market, directly impacting the bottom line.
  2. Increased Efficiency and Reduced Rework: FAIR data allows researchers to quickly locate and reuse information, minimizing redundant work and freeing scientists to focus on innovation instead of sifting through data. This boost in efficiency translates to lower costs and higher productivity.
  3. Better Collaboration and Decision-Making: With accessible and interoperable data, cross-functional teams can more readily collaborate, combining datasets from different departments or sites to gain a comprehensive view. This improved data flow leads to better-informed decisions, enhancing strategic alignment across the organization.
  4. Future-Proofing the Organization: As data needs evolve, FAIR principles offer a foundation that can scale and adapt, keeping your organization agile and ready for technological advancements.

By transforming into a FAIR organization, pharmaceutical companies build a foundation for sustainable growth, unlocking the full value of their data and accelerating discovery. In a data-driven world, FAIR principles empower organizations to harness AI’s potential, improve patient outcomes, and drive competitive advantage.

Calculating the ROI of FAIR ...

The European Commission estimates that inefficiencies related to non-FAIR data cost the European economy at least €10.2 billion annually. This underscores the economic impact of poor data management and the potential savings achievable through FAIR data implementation

To better understand the financial impact of implementing FAIR data principles in a large pharmaceutical organization, we conducted a thought experiment focusing on key areas where FAIR data can drive value. The table below summarizes the estimated ROI of adopting FAIR principles, for a large Pharma, over a five-year period, considering factors such as reduced time to market, increased productivity, minimized redundancy, and enhanced collaboration. By breaking down each benefit and associated cost, we aim to quantify the tangible advantages of FAIR data and its potential to transform pharmaceutical R&D and business operations.

Quantifying the ROI of FAIR Data Implementation in Pharmaceutical R&D

Further Notes:

  • Additional Revenue from Reduced Time to Market: By reducing the drug development timeline by 6 months, the company gains an extra 6 months of market exclusivity, leading to $500 million in additional revenue.
  • Increased Productivity: Implementing FAIR data reduces time spent on data preparation from 80% to 20%, freeing up 60% of the data team's capacity, valued at $9 million per year.
  • Reduced Redundancy and Rework: Improved data practices cut redundant research efforts by 50%, saving $5 million annually.
  • Improved Collaboration and Decision-Making: Better data accessibility enhances cross-functional collaboration, conservatively estimated to add $30 million in value over 5 years.
  • Costs: The total cost of implementing FAIR data principles over 5 years is $10 million, including initial and ongoing expenses.
  • ROI: The return on investment over 5 years is calculated to be 5,900%, highlighting the substantial financial benefits compared to the costs.

Implementing FAIR data principles in the pharmaceutical industry has been shown to offer substantial financial benefits. A study published in Data Intelligence highlights that while the initial costs of FAIRification are significant, the long-term advantages, such as enhanced data reusability and potential cost savings, are widely acknowledged within the industry.

Making Data Great Again ... sorry 🙏

In the end, embracing FAIR data principles is more than a commitment to better data management—it’s a strategic decision that positions organizations for success in a data-driven, AI-powered world. By making data findable, accessible, interoperable, and reusable, organizations not only maximize the value of their own work but also contribute to the larger scientific ecosystem, creating a ripple effect of innovation. In the fast-evolving pharmaceutical landscape, where AI and machine learning are reshaping drug discovery, FAIR data is a key differentiator that accelerates R&D, enhances collaboration, and ensures that valuable insights are never lost in translation. Ultimately, a FAIR approach isn’t just good practice; it’s a pathway to sustainable growth and transformative impact in healthcare.


Stu Angus

AI Consultant to Progressive Biotech C-Suite Leaders | AI Strategy, Tool Selection & Implementation

2w

Hey 🤖 🧠 Thibault GEOUI 🧬 💊, i'm pumped for your new podcast! I'd love to be a guest on it if you've got an open spot. My niche is AI applications used by small biotechs (<100 employees) in Massachusetts, so very relevant to the podcast topic! If you ever need any referrals for the Cambridge, USA area please reach out

Like
Reply

Do you think there're sufficient incentives to share this type of data at all? Seems we are light years behind in terms of training data for AI for life sciences vs e.g. general language/images where AI models take the world by storm. What is the root cause? How to tackle it? Improving re-usability by making it FAIR is an amazing step of course...

Like
Reply
Carmen Kivisild

Co-Founder and CEO @Elnora AI | Drug Discovery | Techbio

1mo

💯💯💯💯💯💯💯💪💪💪 so true!

To view or add a comment, sign in

More articles by 🤖 🧠 Thibault GEOUI 🧬 💊

Explore topics