To unleash the power of AI, you need a strong data foundation

To unleash the power of AI, you need a strong data foundation

While generative AI is taking the world by storm, a more fundamental aspect of data science excites Dr. Joe Mullen even more.

“AI technologies will come and go, but foundational data management is forever,” he says. “Having your data in order buys you the agility to quickly jump on and reap the benefits of the latest innovations — whether it’s around machine learning, LLMs or beyond.” 

(Watch the latest webinar in our Foundations for Effective AI series to learn more about data challenges and opportunities – including breaking down data silos and layering in AI on top of reusable and interoperable data.)

Joe is Director of Data Science & Professional Services at SciBite, a semantic analytics software company acquired by Elsevier in 2020.

“We’re strong believers that data fuels discovery and we’re always out to apply the latest tech applications to help accelerate scientific breakthroughs,” he says.

“Of course, it can’t be any old data,” he adds. “It needs to have provenance and hence be well-managed. Only then can you make evidence-based decisions to generate a hypothesis — the bedrock of scientific progress. And the data must be built on being FAIR: Findable, Accessible, Interoperable and Reusable. Then you really have something.”

As an example, Joe pointed out that SciBite is able to support R&D in the Life Sciences for such matters as target prioritization, market surveillance, adverse event detection and drug repositioning opportunities:

“Basically, our team helps customers solve their problems by getting the most out of their data. And that’s not only about expediting insight extraction, but also lowering the barriers of entry for customers to get the most of what we offer. And while we use the latest machine learning technologies to help make this happen, it’s all based on an understanding that all the best digital strategies are built on strong data foundations. And that there’s a lot of data out there waiting to be structured and mined for value.”

Joe says he was always solutions-driven: “I always look at problems and try to work out how best to resolve them. Initially, I was very enthused by biology — understanding how the body works. But a deep appreciation for data analytics was sparked by a small module while doing my biology degree.

With a PhD in semantic data integration — developing knowledge graphs to drive the identification of new uses for existing drugs —Joe was a perfect candidate for startup SciBite: “I was hired as number 13,” he recalls. “Now six years later, we have around 80 people. It’s been very hectic and incredibly rewarding being part of this incredible data science team — a team I am now lucky enough to lead.” 

“We’ve always been a software company that allows customers to get the most value out of their data,” Joe says. “And since we’ve been acquired by Elsevier — who have the gold standard in data and data platforms — it’s a pure pleasure to see how our combined efforts work to provide even better solutions to those problems we see customers coming in with.

“Elsevier doesn’t just have data, they also have human expertise.,” he notes. “And human expertise is not going to reach any sell-by date. I very much align with that expression: ‘AI is not going to replace humans, but humans with AI are going to replace humans without AI’.” 

“Obviously, everybody has a lot of data,” Joe says. “Now, in order to understand that data, it takes the Subject Matter Experts (SMEs) to sort it out: to build the definitions and standards — the ontologies — so we can recognize different entities within the data, may it be a drug, a disease, a protein or a phenotype. We’ve always had a lot of SMEs in the life sciences. And now Elsevier is opening things up for us by also having SMEs in other verticals such as chemistry and engineering. They’re famous for having a lot of these SMEs.

“These are people who understand the importance of building public identifiers that build on the FAIR data principles. Yes, technologies can expedite a lot of these tasks but you need the human in the loop to validate the information.” 

The fact that SciBite retains its startup mentality dovetails nicely with the idea of having strong foundational data management. “It comes down to the fact that technologies may come and go, but your data is what remains consistent throughout. By having good quality, foundational data management, it allows you to nimbly pivot and make use of the next state-of-the-art technology when it becomes available."

Large language models (LLMs) are a case in point. Certainly, its most publicized version, ChatGPT put data science on the map for the general public as an exciting field. However, such generalized solutions simply do not cut it in an industry based on a specialty knowledge. And while Joe admits much of SciBite’s work around organizing the data may seem dry to some, it remains fundamental. In fact, once you have your data house in order, things can get exciting fast. 

“There are certain areas we can now do rather quickly — and are even developing more cookie-cutter-like approaches to. But where I get very excited are the deeper dives into scientific questions that we may not have touched on before: microbiome target identification, post-translational modification to disease linkage, and any other type of relationship inference,” says Joe.  “Often, we are now dealing with deeper scientific questions that require many different lines of evidence,” Joe says. “And we’re in an exciting phase where we have the foundational components in place so we can better connect the dots between multiple data sources — be it Elsevier’s extensive databases, customer internal databases, or those many open data sources. 

“But, at the same time, every point during our customer’s R&D process, they’ll have to submit things to regulatory bodies. So you need to know exactly where you're getting these hypotheses from — where you're actually identifying this information.”

In other words, it comes down to the touchstones of science: providence, reproducibility and transparency.


To read more from Dr Joe Mullen about transparency, the drawbacks and potential of LLMs, and more, check out the full article “Today’s innovations are built on organized data” on Elsevier Connect.

Learn more from Dr Mullen in this webinar on “The perils and pitfalls of generative AI for R&D”.

#Elsevier #GenerativeAI #Data #Research

Dr. Reza Rahavi

Experimental Medicine , Faculty of Medicine, UBC, Vancouver | Medical Content Writing

7mo

"Dr. Mullen highlights the critical role of foundational data management in leveraging AI technologies. Similarly, informed dietary choices are pivotal in managing and enhancing metabolic health. My latest post delves into how a well-informed diet can significantly improve your health outcomes, much like structured data can revolutionize AI applications. Dive in to discover actionable insights from the latest research!  I've recently shared a post titled "𝐄𝐥𝐞𝐯𝐚𝐭𝐢𝐧𝐠 𝐌𝐞𝐭𝐚𝐛𝐨𝐥𝐢𝐜 𝐇𝐞𝐚𝐥𝐭𝐡 𝐓𝐡𝐫𝐨𝐮𝐠𝐡 𝐈𝐧𝐟𝐨𝐫𝐦𝐞𝐝 𝐃𝐢𝐞𝐭𝐚𝐫𝐲 𝐂𝐡𝐨𝐢𝐜𝐞𝐬 🌿 🥩": https://lnkd.in/dV_49W-N. Your insights on how these nutritional strategies can be integrated into wound care would be invaluable.

Like
Reply
Theresa Lewis

Owner of Diamond healthcare Corporation LLC.

8mo

I would love to learn more.

Like
Reply
Like
Reply
Katrina Villanueva

Martial Arts + Volunteer!

8mo

Oh this is so good. Thank you for sharing.

Like
Reply
Dr Saeid banifatemi

New Knowledge PHD Entrepreneurship and PHD Psychology

8mo

" Al technologies will come and go , but foundational data management is forever , "🌧💌🌧

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics