Patient-Centered Unified Medical Management AI
Artificial intelligence is infusing every aspect of healthcare from diagnosis and treatment, patient communication, and medical imaging and laboratory analysis, to drug discovery, medical research, and healthcare management. Until recently, the application of AI in healthcare was restricted to specific tasks, with a notable series of successes in imaging analysis that stretched back to the very first AI algorithm approved by the U.S. Food and Drug Administration (FDA). In 1995, the FDA approved PAPNET, an algorithm designed to reduce the false-negative rate in Papanicolaou (Pap) smears via automated rescreening [52]. Today, nearly 80% of the over 500 AI algorithms approved by the FDA are imaging-related [7].
AI progress has been continuous, and it accelerated after the breakthrough use of deep learning (DL) at the 2012 ImageNet challenge [8]. The growth of healthcare big data and the widespread adoption of cloud computing further enhanced development. In 2017, Arterys became the first FDA–approved clinical cloud-based DL application in health care [53]. In his 2019 book Deep Medicine, Eric Topol heralded the importance of deep learning, including:
the ability to diagnose some types of skin cancer as well as or perhaps even better than board-certified dermatologists; to identify specific heart-rhythm abnormalities like cardiologists; to interpret medical scans or pathology slides as well as senior, highly qualified radiologists and pathologists, respectively; to diagnose various eye diseases as well as ophthalmologists; and to predict suicide better than mental health professionals [58].
These and many other medical capabilities that DL can learn virtually guaranteed ongoing and expanded usage. Nevertheless, as Topol noted in quoting François Chollet, a deep learning expert at Google, there was “no practical path from superhuman performance in thousands of narrow vertical tasks to the general intelligence and common sense of a toddler” [58].
This started to shift with the design, in 2017, of the transformer neural network architecture [54]. Transformer-based AI applications are often referred to as “generative AI” due to their ability to generate text, images, and other media in response to prompts [59], or as “foundation models” because they are “trained on broad data (generally using self-supervision at scale) that can be adapted to a wide range of downstream tasks” [60]. Though not immediately evident, transformer models put AI development on a path towards general intelligence, beginning with a series of breakthroughs not only in natural language processing (NLP) but in computer vision (image classification and object detection), image generation, and protein biology. In 2020, AlphaFold2, based in part on a transformer architecture, achieved an astonishing level of accuracy, leading to the subsequent creation of a database of over 200 million protein structures [55, 56, 57]. And as discussed below, the release of the Generative Pretrained Model 3 (GPT-3) in June 2020 made the dream of human-machine conversation a reality [63], creating a vast field of opportunities for AI use, in healthcare and elsewhere.
The sections below on Medical Imaging Analysis, Epidemiology, and Drug Discovery span the pre- and post-transformer eras. They show the remarkable growth of task-based AI capacity in these domains while highlighting the emerging impact of generative AI.
The GPT-4 and Generative AI section details the remarkable medical expertise of GPT-4, including reasoning about medical conditions to perform differential diagnoses, expertise that Google Health AI’s Med-PaLM-2 also recently demonstrated [62]. It then explains the transformative potential of generative AI in four key areas: diagnostic assistance, doctor-patient communications, paperwork management, and therapy.
We are at an extraordinary inflection point where advances towards artificial general intelligence will enable AI systems to engage in sophisticated conversations while seamlessly integrating data from diverse sources and in various modes to augment all facets of care coordination. This glimpse of this future is provided in the Conclusions.
Medical Imaging Analysis
The first AI algorithm for medical imaging was approved in 1995 [7]. Development has been continuous since then and accelerated after the breakthrough use of deep learning at the 2012 ImageNet challenge [8]. Now, according to a report from the American Hospital Association, “Of the more than 520 FDA-cleared AI medical algorithms, nearly 400 are in the radiology field… The next closest algorithms — cardiology (58), hematology (14) and neurology (10) — account for fewer than 100 of market-cleared AI applications, many of which also involve medical imaging” [9, see also 7]. These span virtually all modalities, including computed tomography (CT), magnetic resonance (MR), X-Ray, ultrasound, and fluoroscopy. and a diverse and growing array of medical conditions.
The benefits, both current and prospective [10, 11, 12, 13, 14], of AI in medical imaging include:
These benefits apply across domains. For example, the authors of a review of AI in digital pathology note that:
Algorithms developed for the early detection of cancer are important to improve patients’ chance of survival. Automated screening of a large number of specimens may provide improved accessibility and make diagnosis and treatment more affordable.
The training of new pathologists requires a long period of time in order to ensure competency. Thus, there is an urgent need to develop clinically applicable AI-based tools to relieve the high workload of pathologists, producing more precise and reproducible diagnoses while reducing the TAT [turnaround time] of cases [13].
It is widely recognized that medical imaging practice is being transformed by AI, with radiologists expecting a “tsunami” of AI applications [11], seeing an “explosion of radiology AI research” [15], and anticipating that “Traditional workflow[s] will become faster, more effective, and more user friendly” [12].
Reflective of this, market research firm Arizton forecasts that “AI in the medical imaging market is expected to grow at a CAGR of 45.68% during 2022-2027” [16]. Opportunities for much wider usage and for cost reductions are driving this, according to the American Hospital Association [9]. However, there are many challenges, including “the costs of implementation, lack of reimbursement, questions over liability, lack of real data on cost versus benefit, and how the AI integrates into the workflow and final reports” [17].
Epidemiology
Epidemiology is the study of how diseases spread and how they can be controlled or prevented, and encompasses not only infectious diseases but chronic diseases, injuries, and even health behaviors such as physical activity and tobacco use. It is a cornerstone of public health [22, 23]. Data, particularly multimodal data, is of overriding importance because epidemiological investigations extend to “any factor that may influence the state of health of the human being, i.e., biological, clinical factors, in relation to the physical, mental, and social environment” [22]. By the same token, analytical methods capable of utilizing very large and diverse data sets, including unstructured ones, are needed. In consequence, epidemiology stands to benefit greatly from both big data and AI.
AI and big data were used during the 2014 and 2016 Ebola outbreaks to track and limit its spread. As described in a BBC News account:
Orange Telecom in Senegal handed over anonymised voice and text data from 150,000 mobile phones to Flowminder, a Swedish non-profit organisation, which was then able to draw up detailed maps of typical population movements in the region. Authorities could then see where the best places were to set up treatment centres, and more controversially, the most effective ways to restrict travel in an attempt to contain the disease.
The drawback with this data was that it was historic, when authorities really need to be able to map movements in real time. People's movements tend to change during an epidemic. This is why the US Centers for Disease Control and Prevention (CDC) is also collecting mobile phone mast activity data from mobile operators and mapping where calls to helplines are mostly coming from [24, 26].
Machine learning techniques were also used to predict the spread of the Zika virus during the 2015-2016 outbreak, as well as to understand the links between Zika and microcephaly [25].
But it was the Covid-19 pandemic that became a turning point for AI in epidemiology. AI was used extensively to predict the spread of the virus, analyze public sentiment, and aid in the development of potential treatments. AI was a key factor in the successful use of digital contact tracing apps, particularly in China and Taiwan. As Wang explains, a “massive volume of spatiotemporal data collected” can be merged
with other data sources including clinical data from medical institutions and macroscopic data such as GIS data from government sources and, if possible, volunteer-interacting data from the smartphone apps ecological system, like location data deriving from Uber history or payment history. These integrated medical, biological, and demographic data can then be analyzed using AI–based methods including GeoAI to predict epidemic trends on a macrolevel. The system will generate an algorithm to calculate time-specific individual and location risks, and guide personal protective measures. It will also provide important reference information to help governments formulate policies for a protective network for the healthy population. Crucial epidemiological data points, such as the infection’s transmissibility through each route and the role played in transmission by subclinical, asymptomatic, and mild infections, could also be calculated, providing a clearer understanding of the spread of the epidemic than has been possible to date [32; see also 30].
AI had highly significant effects on screening and detecting the disease, assessing patient risk, and monitoring and evaluating the epidemic’s evolution. AI-based models were used to forecast infection rates, hospitalizations, and deaths. AI was applied to medical resource scheduling. In China, especially during the omicron wave, when health care systems were often strained, AI-based conversational agents became, for many, the “preferred alternatives for health care information” [31]. AI also contributed to making telemedicine more accessible, effective, and convenient, improving screening and diagnosis. In addition, “Robotic technologies, including examination robot, healthcare robot monitoring robot, disinfection and cleaning robots, and delivery and logistic robot etc., leveraging face detection, voice recognition, and sensors, have been globally used to facilitate the diagnosis and treatment, reduce personnel contact and aid human daily life” [30; see also 33]. AI was critical to their ability to function. AI was also used to analyze vast amounts of research data at a pace that humans couldn't match [22, 29-34].
The World Health Organization (WHO) estimated that Covid-19 caused 14.83 million excess deaths globally in its first two years [27], and the IMF has estimated that economic losses will be close to $13.8 trillion through 2024 relative to pre-pandemic forecasts [28]. It is important to bear these losses in mind when considering pandemic response capabilities and options. Covid could have been prevented, and it is vital that we learn from our failure to do so how to prevent, or at least minimize, future disease outbreaks. “Vast, open-source data can provide intelligence and early warning for pandemics and epidemics at a time when they have not yet spread beyond national borders. Signals that are detected early enough can feasibly prevent a pandemic by allowing early identification of a small outbreak, which can then be contained through isolation, contact tracing, and quarantine” [29]. As Lefèvre notes, Covid shows “how much data on mobility via cell phones, use of public transport, leisure facilities, and social networks can be used to control the epidemic. The same is true for wastewater data, which can be used to locate places that are more exposed to the virus. All of these data enable the development of more accurate prediction models, which in turn can be used to better guide public decisions” [22].
Despite all this, the uptake of AI in epidemiology and public health “remains slow” [21], and “public health practitioners are wary of such tools” [29], as a result of which, the “broadened approach to health research that may be facilitated by big data and AI is largely underused in practice” [22]. Major factors limiting adoption include privacy concerns, health inequities, “poor model interpretability, structural challenges including data sharing, insufficient infrastructure and lack of public health workforce training in AI” [35]. In addition, successful application development often requires the “nurturing of multidisciplinary ecosystems” of “clinicians, radiologists, scientists, and other experts to interact collectively to understand the clinical and data science landscape”, but this can be difficult to achieve [14]. Lastly, the “lack of forward-looking, operable and guiding laws and regulations” creates uncertainty and heightens risks [30].
To sum up, at present AI can make major contributions to epidemiology in each of the following areas:
In the future, AI's role in epidemiology is expected to expand and become even more integrated into the field. Prospective advancements are likely to include:
Drug discovery
Prior to the widespread introduction of AI in drug discovery, pharmaceutical companies faced a growing crisis due to rising costs, poor success rates, and a very long discovery pipeline. As summarized in a 2019 Deloitte report:
To date, the discovery of modern drugs remains a long, expensive and often unsuccessful process. The average time to bring a molecule to launch is 10-12 years. Deloitte’s 2018 report, Measuring the return from pharmaceutical innovation, calculated that the average cost of R&D for the top 12 biopharma companies is $2.168 billion per drug – double the $1.188 billion calculated in 2010. At the same time, the average forecast peak sales per late-stage asset declined to $407 million in 2018, less than half the 2010 value of $816 million. As a result, the expected return on investment has declined from 10.1 per cent in 2010 to 1.9 per cent in 2018. Finding ways of improving the efficiency and cost-effectiveness of bringing new drugs to the market is critical for the industry [2].
Now, AI is being used to accelerate drug discovery and development, including:
Recommended by LinkedIn
Of note, AI-native biotech companies “have stacked [AI] capabilities end to end, reshaping the drug discovery and development process and harnessing the operational benefits of a redefined value chain” while using “an ecosystem of partners, including academic researchers to provide target expertise, contract research organizations (CROs) to do wet-lab experimentation, and other industry partners to codevelop and commercialize assets” [6].
Recently, generative models have been described for designing drug candidates using prior biological and chemical knowledge [5]. At the forefront of AI-based drug discovery are generative AI models for applications such as generating high-quality proteins [4]. NVIDIA’s BioNeMo platform combines the leading biological generative AI models in an accessible web interface and fully managed APIs. BioNeMo includes:
Overall, AI has the potential to transform drug discovery and development, leading to faster and more effective treatments for a wide range of diseases. However, there are also challenges that need to be addressed, such as ensuring that AI algorithms are accurate and reliable, and that they are used in a responsible and ethical way.
GPT-4 and Generative AI
Generative AI will impact every healthcare domain. Although we’re at a very early stage of integration and adoption, it’s already apparent that this impact will be hard to overstate. GPT-4 in particular combines remarkable expertise; adept, empathetic conversational ability; and very broad generality of usage. In addition, it has multimodal capabilities. Near-future generative AI will include all these features, receive special training on medical corpora, have access to electronic health records (EHR) and a broad swath of other relevant data, and will be able to utilize an array of general and special-purpose software tools. As the authors of The AI Revolution in Medicine say about GPT-4, “in almost any way you can name, from diagnosis to medical records to clinical trials, its impact will be so broad and deep that we believe we need to start wrestling now with what we can do to optimize it” [37].
Medical expertise and reasoning
Although GPT-4 was not given specialist training, it was able to score above 80% in zero-shot mode on the US Medical Licensing Exam (USMLE) [39], more than 20% higher than a typical passing score [40].
As the authors note, the USMLE exam materials used were behind a paywall, making it unlikely that they were part of GPT-4’s training data. And very recently, GPT-4 demonstrated a similar level of expertise on non-image-based questions that matched “the style, content, and difficulty of the Canadian Royal College and American Board of Radiology examinations.” GPT-4 answered 81% of questions correctly, exceeding the passing threshold of 70% [41]. Improvements over GPT-3.5 demonstrate impressive reasoning, as in this example:
QUESTION: A 45-year-old man comes to the physician because of a 10-year history of heartburn that occurs after he eats late at night or consumes spicy food. He has had no weight loss or difficulty swallowing. He takes over-the-counter antacids as needed, but they relieve his discomfort only temporarily. Physical examination shows no abnormalities. An endoscopy is done. The distal esophagus is shown in the photograph. Which of the following is the most likely cause of this patient’s symptoms?
ANSWER CHOICES: “A”: “Defect in secretin production” “B”: “Excessive gastrin production” “C”: “Excessive transient lower esophageal relaxations” “D”: “Failure of primary esophageal peristalsis” “E”: “Failure of saliva production” ANSWER: C
Q: “You are a large language model that just answered the following question. Why did you pick the answer choice C?”
A: “The answer choice C, “Excessive transient lower esophageal relaxations,” is the most likely cause of this patient’s symptoms because it is the most common cause of gastroesophageal reflux disease (GERD). GERD is a condition in which stomach acid flows back into the esophagus, causing heartburn and other symptoms. The lower esophageal sphincter (LES) is a ring of muscle at the bottom of the esophagus that normally prevents stomach acid from flowing back into the esophagus. Transient lower esophageal relaxations (TLESRs) are episodes in which the LES relaxes inappropriately, allowing stomach acid to flow back into the esophagus. TLESRs are the most common cause of GERD, and they are often triggered by factors such as eating late at night or consuming spicy food, as described in the patient’s history. The other answer choices are less likely to be the cause of this patient’s symptoms.”
Q: “Doesn’t the question mention a photograph? How did you come up with the answer without seeing the photo?”
A: “You are correct that the question mentions a photograph of the distal esophagus. However, the answer choice C, “Excessive transient lower esophageal relaxations,” is the most likely cause of this patient’s symptoms based on the clinical history provided in the question, regardless of the findings on the photograph. The patient’s 10-year history of heartburn that occurs after eating late at night or consuming spicy food, and the temporary relief of symptoms with antacids, are consistent with a diagnosis of gastroesophageal reflux disease (GERD). The most common cause of GERD is excessive transient lower esophageal relaxations (TLESRs), which is why answer choice C is the most likely cause of this patient’s symptoms. The photograph of the distal esophagus would be useful for confirming the diagnosis of GERD and assessing for any complications, but it is not necessary for selecting the most likely cause of the patient’s symptoms based on the clinical history provided in the question [38].
Despite the ongoing debate about whether LLMs are truly capable of causal reasoning, “their empirically observed ability to perform certain causal tasks is strong enough to provide a useful augmentation for aspects of causal reasoning where we currently rely on humans alone” [42].
Diagnostic assistance
GPT-4 can generate a differential diagnosis and rank potential conditions. It can review and summarize the latest medical research relevant to a patient's symptoms or condition. It could help interpret results from laboratory tests and imaging studies. It could suggest what conditions the results might indicate and whether further tests might be needed. Just as physicians often consult with their colleagues when diagnosing complex cases, they could use GPT-4 in a similar way. They could ask the AI for its "opinion" on a case, and it could provide a different perspective or suggest something that the physician might not have considered [37]. In one recent test based on material from the New England Journal of Medicine intended to evaluate “the ability to analyze clinical scenarios and make appropriate decisions”, GPT-4 “achieved an 87% accuracy without choices and a 97% accuracy with choices” [46]. In the near future, multimodal generalist medical AI will be able to “raise self-explanatory warnings: ‘This patient is likely to develop acute respiratory distress syndrome, because the patient was recently admitted with a severe thoracic trauma and because the patient’s partial pressure of oxygen in the arterial blood has steadily decreased, despite an increased inspired fraction of oxygen’” [42].
Doctor-patient communications
A recently published study of doctor-patient communications found that ChatGPT (GPT-3.5) was preferred to human doctors for both informational quality and empathy [43]. This suggests that “integrating AI tools like GPT-4 can help build empathy and improve bedside manners for doctors” [44]. It can coach medical personnel on communications, for example by reviewing medical transcriptions and draft messages. And it can explain Explanation of Benefits notices and other medical documents, as well as lab results, in languages patients can understand, and can respond to follow-up questions [37].
Paperwork management
The amount of daily paperwork is a major cause of burnout, cited by 58% of doctors and 51% of nurses in a recent survey [37, 45]. GPT-4 could handle patient intakes questions and write medical encounter notes, including identifying “reimbursement opportunities, in the form of standardized CPT (Current Procedural Terminology) billing codes and ICD-10 (International Classification of Diseases, v10) disease codes [37]. This could save, “according to several studies, between 15 and 30 minutes, even with the few minutes it would take the doctor to verify its accuracy” [37]. And because GPT-4 is an adroit and empathetic communicator, it could generate draft messages to patients to improve engagement.
Therapy
“Reports of people turning to artificial intelligence (AI) chatbots, like ChatGPT, for therapy are more and more common” despite the risks and limitations cited by experts [47]. Popular uptake has been widely reported [48, 49]. According to a recent survey by Tebra, a clinical practice technology provider, 1 in 4 Americans “are more likely to talk to an AI chatbot instead of attending therapy,” and 80% of those that tried ChatGPT for therapy advice “thought it was an effective alternative to attending therapy” [50]. Awareness of this huge surge of popular interest may have been a factor – along with the extraordinary opportunities the technology affords – in the publication of a draft paper, “Artificial Intelligence Will Change the Future of Psychotherapy”, by a group of researchers. They identify the following “imminent applications”:
Automating Aspects of Supervision. LLMs could, if provided with transcripts from psychotherapy or peer support sessions, be used to provide feedback to counselors or therapists, especially those with less training and experience (i.e., peer counselors, lay health workers, psychotherapy trainees) on their work.
Offering Feedback on Therapy Worksheets. Another possible clinical LLM application is the LLM delivering real-time feedback on patients’ CBT worksheets if patients were to provide an LLM with the content of the worksheet and her or his answers.
Measuring Treatment Fidelity. Finally, a clinical LLM application could automate measurement of therapist fidelity to evidence-based treatments, which typically includes measuring adherence to the treatment as designed and competence in delivering a specific therapy skill. Measuring fidelity is crucial to the development, testing, dissemination, and implementation of evidence-based treatments, yet can be resource intensive and difficult to do reliably [51].
Conclusions
ChatGPT was only released on November 30, 2022, and GPT-4 on March 14, 2023, despite which both are being used widely, if informally, by healthcare providers and by millions of people seeking advice and assistance. The challenge facing the healthcare industry, governments, and citizens is to effectively utilize generative AI to maximize social benefits while steering development to increase functionality, system integration, and reliability. In discussing post-transformer AI in healthcare, Moor et al. introduce the term “generalist medical artificial intelligence” (GMAI), and define is as possessing three capabilities:
GMAI is a helpful perspective on the direction development and implementation should take. In addition, though, it’s important to realize that, ultimately, generalist AI is likely be used to improve the coordination of care to ensure that all health services provided to a patient are consistent, comprehensive, and meet the patient's unique needs. Then, perhaps, it could be called Patient-Centered Coordinated Care AI (PCCCAI) or Unified Medical Management AI (UMMAI). AI is on the threshold of being able to enhance all the major dimensions of care coordination, including:
In essence, care coordination is about connecting the dots in healthcare, making sure that all aspects of a patient's care are working in harmony. As the preceding discussion shows, GPT-4 is already capable of contributing to care planning, communication, and patient advocacy and support. Given access to and improved integration of data, its successors could enhance information analysis and sharing, planning, and the coordination of support services. More generally, broader data access combined with advanced AI planning capabilities, along with greater reliability, would enable a patient-centered approach to UMMAI. This could improve health outcomes, reduce healthcare costs, and enhance patient satisfaction.
References