Coneixemt i IA...
📜 New study: Human doctors outperform chatGPT when questions aren't multiple choice, in the context of Swedish primary care. This is the first study evaluating GPT4(o) text responses to complex cases where doctors need to consider caveats due to multimorbidity, social problems, compliance, and legal aspects - core skills for a specialist doctor in general practice. 🏅🥼 Top doctors scored 7.2 of 10. 🥈🥼 Average doctors scored 6.0 of 10. 🥈🤖 GPT-4 scored 4.5 of 10. 🏅🤖 GPT-4o scored 0.7 points higher than GPT-4 Other studies have typically evaluated narrower use cases such as AI suggesting a diagnosis. 🩺 As doctors we know that diagnostics isn't a major pain point in clinical reality. Our time is spent on many other actions and considerations. In our study, chatGPT fell short in terms of suggesting: 🦠Diagnoses 🧪 Tests 🩻 Examinations 📄 Referrals ⚖️ Legal considerations Answers were graded with pre-specified criteria by three blinded doctors, with excellent reliability. ⁉️ Does this mean we should give up on AI in health care? Not at all. Three caveats making the AI underperform in this study: 1️⃣ We used zero-shot prompting without post-training enhancements (RAG, chain-of thought, reflexion etc.) 2️⃣ We didn't evaluate the latest GPT-o1 or o3 models 3️⃣ Models were not fine-tuned nor trained for medical use (like f.ex. AMIE or MedGemini), nor for a Swedish medico-legal context. 💡 That said, here are three practical implications: 1️⃣ GPT4o should not be used "as is" for clinical decision support by doctors in Sweden. 2️⃣ Expect AI to keep progressing, but the day of superhuman generalist medical AI is not yet here. 3️⃣ Keep humans in charge of medical decisions and use current AI to reduce non-medical burdens to doctors - like we are doing at Tandem Health. 🇸🇪🏥🙏 A big kudos to the entire research team PETRA at Sahlgrenska Academy at University of Gothenburg, including specialist doctor Rasmus Arvidsson, David Sundemo PhD, professor Ronny Gunnarsson and Carl Wikberg PhD for making this study a reality. Link to full study below.