Jan Beger’s Post

View profile for Jan Beger, graphic

Healthcare needs AI … because it needs the human touch.

This study evaluates the performance of multimodal AI models in medical diagnostics using the NEJM Image Challenge dataset, comparing their accuracy to human collective intelligence. 1️⃣ Anthropic's Claude 3 models showed the highest accuracy, surpassing average human performance by about 10%. 2️⃣ Human collective intelligence achieved a 90.8% accuracy rate, outperforming all AI models. 3️⃣ GPT-4 Vision Preview was selective, often responding to easier questions with smaller images and longer texts. 4️⃣ OpenAI’s GPT-4 Vision Preview answered only 76% of the cases, while other models responded to all queries. 5️⃣ The study highlights the potential and current limitations of multimodal AI in clinical diagnostics. 6️⃣ Ethical and reliability concerns arise from the integration of multimodal AI in medical diagnostics. 7️⃣ The EU AI Act emphasizes the need for transparency, robustness, and human oversight in high-risk AI systems, including medical AI. ✍🏻 Robert Kaczmarczyk, Theresa Isabelle Wilhelm, Dr. med. Ron Martin, B.Sc., Dr. med. Jonas Roos. Evaluating multimodal AI in medical diagnostics. npj Digital Medicine. 2024. DOI: 10.1038/s41746-024-01208-3

Shrikant Pandya Ph.D.

Generative AI Engineer and Consultant | Machine Learning Engineer | Ph.D. Biomedical Engineering

4mo

This is an exciting coincidence, I was designing this study in my head yesterday as I was working with multimodal models for a different application! The sobering observation I had while reading this is that this is already out of date. Claude 3.5 is out, Gemini 1.5 pro and Flash are multimodal by design, and were not evaluated here, and Open AI has already released GPT4omni (GPT4o) that is natively multimodal. And that's just the major players, smaller labs have released many other open and closed source models like LLaVA and others. This is not to disparage the work by the authors, but just to remind everyone that the field moves very quickly, take every metric you read with a large handful of salt. By the time you read it, it's probably already incorrect.

Niamh S.

Medical Device Regulatory Affairs, Software, AI and Risk Management expert. TC contributing member for Ireland on IEC 62304, IEC 63450 and AI Advisory Group SNAIG

4mo

#Accuracy is the primary reported variable when reporting such systems, but accuracy can be #inflated when training and testing the same or similar data sets, or even from creating similar images from which the results are taken from. There are ways to inflate accuracy whether intentionally or not. What are the metrics that we need to look at and the underlying #assumptions that we need to understand before accepting the results of #performance of these #AI models?

Sameer P.

HealthTech Product & Venture Builder

4mo

Human collective intelligence will be a vital benchmark as we pursue the automation of key tasks, what is to be seen is how policymakers view risk in clinical settings, how does malpractice insurance evolve in this pursuit?

Like
Reply
Dr. med. Ron Martin, B.Sc.

Assistenzarzt Plastische, Rekonstruktive und Ästhetische Chirurgie, Bachelor of Science - B.Sc., Geographie, ATLS® Provider

4mo

Thanks for sharing :)

Like
Reply
Javed Haris

Senior Data Scientist at Boston Scientific | Gen AI | Machine Learning

4mo

Thank you for sharing!

Like
Reply
🌐 John Hall, Ph.D.

Industry Mentor - Psychologist

4mo

Interesting AI calibration!

Trivikram Tanguturi

Senior Product Manager AI | Management Consulting

4mo

Very informative

Shefali Sanekar

SEO Analyst at Edvak Health

4mo

Impressive study! It’s fascinating to see multimodal AI models pushing boundaries in medical diagnostics.

See more comments

To view or add a comment, sign in

Explore topics