OpenAI took a giant victory lap with o1, and it's advanced thinking abilities.
One of their biggest claims was o1's supposedly superior diagnostic capabilities. However, after some research, I have reached the following conclusions-
1) OpenAI has been extremely negligent in their testing of the preview model, and has not adequately communicated it's limitations in their publications. They should do so immediately.
2) o1's estimation of the probability of having a disease given a phenotype profile is broken and inconsistent. For the same profile, it gives you different top-3 likely diseases. Another concerning observation: It gave a 70-20-10 probability split in 4/5 cases (with a different top 3 every time). This points to a severe limitation regarding the model's computations.
3) o1 also severely overestimated the chance of an extremely rare medical outcome, which could imply faulty calculations with prior and posterior probabilities.
All of these lead me to conclude the following-
1. o1 is not ready for medical diagnosis.
2. To quote the brilliant Sergei (can't tag him, or anyone else, for some reason)- "OpanAI was overly cavalier in suggesting that its new o1 Strawberry model could be used for medical diagnostics. It’s not ready. OpenAI should apologize—they haven’t yet."
3. We need more transparent testing and evaluations in mission-critical fields like Medicine and Law.
To read more about our research into the problems and possible solutions, read the following article- https://lnkd.in/dnZVq9cv
I solve clinical, scientific and regulatory problems
7moThank you for posting this!