NLP in healthcare: Standing on the shoulders of giants
"Don Swanson (1924 – 2012) was well appreciated during his lifetime as Dean of the Graduate Library School at University of Chicago, as winner of the American Society for Information Science Award of Merit for 2000, and as author of many seminal articles. Don became Emeritus in 1996, but did not truly retire until around 2007, when he suffered a series of strokes. Don Swanson was probably the most relevant pioneer in the application of text mining (now more known as "natural language processing") in the area of biomedicine.
Perhaps the most influential and enduring contribution that Don has had on information science is the concept of “undiscovered public knowledge” (UPK), which he approached from a very broad, philosophical standpoint. Knowledge can be public (e.g., it is published) and at the same time, inaccessible or imperfectly known for one reason or another.
The most novel and fruitful type of undiscovered public knowledge discussed by Don occurs when information is not explicitly discussed in any single article at all. Rather, different assertions and findings need to be assembled across documents to create a new coherent assertion, much as different pieces of a puzzle are assembled to create a single picture.
But how to find these pieces residing in scattered places across the literature, and how to assemble them?
Recommended by LinkedIn
Don focused his analyses on first identifying two sets of articles, or literatures, which appear to be complementary yet are not directly connected to each other. Such literatures are unconnected if they do not have any articles in common, do not have authors in common, and articles in one literature do not cite any articles in the other literature.
In a series of articles in the 1980s, Don analyzed two classic examples of medical literatures that were not (or only slightly) connected, yet contained multiple links of the form “A affects B” in one literature and “B affects C” in the other, such that when they were brought together and assembled, created a persuasive, novel hypothesis. These have become widely analyzed benchmarks for nearly all subsequent studies of literature based discovery.
The first case was the set of articles on Raynaud disease vs. the set of articles on fish oil (Swanson, 1986b). Don noticed that several of the pathological alterations that occur in Raynaud disease corresponded to physiological alternations that are produced by ingesting fish oil, only in opposite directions. That suggests that ingesting fish oil should counteract some of the signs and symptoms of Raynaud disease. Subsequent clinical studies supported this hypothesis (Swanson, 1993).
The second case was the set of articles on dietary Magnesium vs. on migraine headaches (Swanson, 1988). Again, Don noticed that magnesium deprivation has multiple effects in the body that are similar to alterations that are known to worsen migraine headaches, and magnesium itself has effects which should be expected to prevent or treat migraines. For example, magnesium is a calcium channel blocker, and reduces neuronal excitability via opening of NMDA glutamate receptors. Thus, he proposed that supplementation with dietary magnesium may prevent or alleviate migraines. Again, subsequent clinical studies supported this hypothesis (Swanson, 1993).
Late in his career, Don proposed a link between atrial fibrillation and running (Swanson, 2006). Exercise is known to be a risk factor for atrial fibrillation, and he proposed that this may be mediated by gastroesophageal reflux, which in turn may be alleviated by taking proton pump inhibitors.
Besides being another masterful, insightful example of putting together separate pieces of evidence to form a new whole, it is worth mentioning that these analyses were all based on conditions he experienced, himself. He had Raynaud syndrome, and he had migraine headaches. And, his chronic atrial fibrillation eventually caused his strokes and led to his withdrawal from active life.
AI Consultant @ Joseph Pareti's AI Consulting Services | AI in CAE, HPC, Health Science
2yApologies if I am on the wrong track, but isn't this discussion a quest for causation beyond correlation? If this is true, there are current results in #ml that move the needle, such as: - Diagnostic Robotics: a triage #ml system that was successfully employed for #coronaviruspandemic in 2020, and is now amplified to cover further use cases. They strive to move from correlation to causality analysis, which looks like the holy grail in advanced diagnosis