Of Algorithms and Minds: Navigating the AI-Human Partnership #14 Exploring The Dynamic Synergy Between Artificial Intelligence And Humans
Hey, in this issue: AI-enhanced reconstruction of the 12-lead electrocardiogram via 3-leads with accurate clinical assessment; A generalist vision–language foundation model for diverse biomedical tasks; An operational guide to translational clinical machine learning in academic medical centers and more…
RESEARCH ARTICLES
In this issue
1) AI-enhanced reconstruction of the 12-lead electrocardiogram via 3-leads with accurate clinical assessment | npj Digital Medicine (nature.com)
3) Free access via computational cloud to deep learning-based EEG assessment in neonatal hypoxic-ischemic encephalopathy: revolutionary opportunities to overcome health disparities | Pediatric Research (nature.com)
4) Generalization—a key challenge for responsible AI in patient-facing clinical applications | npj Digital Medicine (nature.com)
5) Hidden flaws behind expert-level accuracy of multimodal GPT-4 vision in medicine | npj Digital Medicine (nature.com)
6) An operational guide to translational clinical machine learning in academic medical centers | npj Digital Medicine (nature.com)
8) Can Large Language Models Provide Useful Feedback on Research Papers? A Large-Scale Empirical Analysis | NEJM AI / Can large language models provide useful feedback on research papers? A large-scale empirical analysis (arxiv.org)
9) From LLMs to LLM-based Agents for Software Engineering: A Survey of Current, Challenges and Future (arxiv.org)
11) VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs (arxiv.org) / VideoLLaMA 2 Released: A Set of Video Large Language Models Designed to Advance Multimodal Research in the Arena of Video-Language Modeling - MarkTechPost
12) Performance of machine learning algorithms for lung cancer prediction: a comparative approach | Scientific Reports (nature.com)
20) AI and ethics: Investigating the first policy responses of higher education institutions to the challenge of generative AI | Humanities and Social Sciences Communications (nature.com)
· AI-enhanced reconstruction of the 12-lead electrocardiogram via 3-leads with accurate clinical assessment | npj Digital Medicine (nature.com) - This article presents an artificial intelligence (AI) algorithm developed to reconstruct a full 12-lead electrocardiogram (ECG) using only 3 leads - two limb leads (I and II) and one precordial lead (V3). The algorithm was trained on over 600,000 clinically acquired ECGs. When evaluated, the reconstructed 12-lead ECGs showed high correlation with the original ECGs. An automated algorithm for detecting acute myocardial infarction performed similarly well on the reconstructed and original ECGs. Additionally, when interpreted by cardiologists, the reconstructed ECGs achieved comparable accuracy to original ECGs in identifying ST-segment elevation myocardial infarction. The researchers suggest this approach could enable ECG acquisition with simplified tools in non-specialized settings, potentially facilitating diagnosis of heart conditions outside traditional clinical environments. However, they note further validation through prospective clinical trials is needed to evaluate real-world performance and clinical utility. Overall, this work represents a step towards more accessible cardiac diagnostics using AI and limited-lead ECG data.
· A generalist vision–language foundation model for diverse biomedical tasks | Nature Medicine - This article introduces BiomedGPT, an open-source, lightweight vision-language foundation model designed for diverse biomedical tasks. The model can process both visual and textual data, performing well on tasks like visual question answering, image captioning, report generation, and text summarization across various medical domains. BiomedGPT was pretrained on a large corpus of biomedical data, including images, text, and image-text pairs. It achieved state-of-the-art results in 16 out of 25 experiments while maintaining a smaller model size compared to other large biomedical AI models. The researchers conducted human evaluations to assess BiomedGPT's capabilities in radiology tasks, finding it performed competitively with human experts in some areas. The model demonstrated strong transfer learning capabilities, performing well on downstream tasks after fine-tuning. It also showed promise in zero-shot learning scenarios, though performance varied depending on the task. The authors discuss the potential of BiomedGPT as a generalist biomedical AI model, capable of handling diverse data types and tasks. They highlight its advantages in terms of efficiency, transparency, and accessibility compared to larger, closed-source models. However, they also note limitations and areas for improvement, particularly in safety, equity, and bias considerations. The study concludes that while BiomedGPT shows potential for improving diagnosis and workflow efficiency in healthcare, further research and development are needed before such models can be effectively deployed in clinical settings.
· Free access via computational cloud to deep learning-based EEG assessment in neonatal hypoxic-ischemic encephalopathy: revolutionary opportunities to overcome health disparities | Pediatric Research (nature.com) - This article discusses a novel deep learning-based EEG assessment tool called Brain State of the Newborn (BSN) for evaluating hypoxic-ischemic encephalopathy (HIE) in neonates. The study by Kota et al. found that BSN can effectively distinguish between normal and abnormal neurodevelopmental outcomes from 6 hours of life, correlating with clinical severity scores. A key innovation is that the BSN algorithm is freely available via a computational cloud service called Babacloud, allowing researchers worldwide to analyze their own EEG data without specialized equipment. The authors highlight this as a revolutionary approach that could help overcome healthcare disparities by providing advanced EEG interpretation capabilities to underserved areas lacking neurophysiology expertise. While further validation is needed, the authors believe this cloud-based AI approach has the potential to improve neonatal care globally by enabling rapid, accurate assessment of HIE severity and prognosis, even in resource-limited settings. They emphasize that ethical implementation of such AI tools could help reduce gaps in healthcare quality between high and low-income regions.
· Generalization—a key challenge for responsible AI in patient-facing clinical applications | npj Digital Medicine (nature.com) - The article addresses the challenge of generalization in clinical AI applications and proposes selective deployment as a potential solution. Generalization, the ability of AI systems to apply knowledge to new data, is a major challenge for real-world AI applications, especially in healthcare. Poor generalization can result from overfitting, dataset biases, and the complex nature of clinical data. Using a breast cancer prognostic algorithm case study, the authors illustrate the ethical dilemmas of deploying AI systems that may not generalize well to underrepresented groups. They propose selective deployment - using AI only for groups well-represented in training data - as a potential ethical compromise between delaying deployment entirely and risking harm to underrepresented groups. The article discusses technical approaches for implementing selective deployment, including data-centric methods (data curation) and model-centric methods (uncertainty estimation, out-of-distribution detection). The authors explore ethical considerations of selective deployment, acknowledging it could temporarily maintain health disparities but may be the most ethical option in some cases. The authors conclude by calling for more research into improving generalization in clinical ML and encourage exploring sample selection strategies to make clinical AI applications trustworthy and safe for all patients.
· Hidden flaws behind expert-level accuracy of multimodal GPT-4 vision in medicine | npj Digital Medicine (nature.com) - This article presents a comprehensive evaluation of GPT-4V's performance on medical image challenges from the New England Journal of Medicine. While GPT-4V achieved high accuracy (81.6%) in answering multiple-choice questions, comparable to human physicians in closed-book settings, the study revealed hidden flaws in its reasoning process. The researchers assessed GPT-4V's capabilities in three areas: image comprehension, recall of medical knowledge, and step-by-step reasoning. They found that image comprehension was the most problematic area, with over 25% of cases containing flawed rationales. Surprisingly, even when GPT-4V provided the correct final answer, it often made errors in one or more aspects of its reasoning, particularly in image comprehension (27.2% error rate). The study highlights that while GPT-4V shows promise in medical decision support, especially in cases where physicians struggle, its rationales are not always sound. This underscores the need for thorough evaluation of AI models' reasoning processes before integration into clinical workflows. The researchers emphasize that comprehensive assessments beyond mere multiple-choice accuracy are crucial. They note that human physicians still outperform GPT-4V in open-book settings, particularly for difficult questions. The study concludes by stressing the importance of further in-depth evaluations of multimodal AI models' rationales to ensure their reliability and safety in clinical applications.
· An operational guide to translational clinical machine learning in academic medical centers | npj Digital Medicine (nature.com) - This article provides an operational guide for translating clinical machine learning tools from academic research to real-world deployment in healthcare settings. The authors, drawing from their experiences at two large academic medical centers, outline a strategy to bridge the gap between research and clinical practice. They emphasize the importance of a multidisciplinary team, including principal investigators, data scientists, machine learning engineers, IT administrators, and clinician end-users. The guide outlines three main phases: prerequisites for model deployment, planning for deployment, and building a deployable tool. Prerequisites include establishing clinical value, identifying stakeholders, ensuring timely data availability, and determining an operational home for the tool. The planning phase involves an inverted extract, transform, load (ETL) approach to gather requirements and specifications. The authors then detail the process of building a deployable tool, covering infrastructure considerations, data extraction and processing, and presenting model outputs to end-users. They stress the importance of adhering to best practices in software development, information security, and scalability throughout the process. The article concludes with post-deployment considerations, including monitoring for dataset drift, maintaining data connections, and adapting to changes in clinical practice. The authors hope this guide will help health systems deploy minimum viable data science tools and realize their value in clinical practice, ultimately improving patient care through the successful translation of academic research into real-world applications.
· Multimodal large language models for bioimage analysis | Nature Methods - This article discusses the potential of multimodal large language models (MLLMs) in bioimage analysis. The authors argue that MLLMs, which can process and integrate information from various data types like text and images, offer promising solutions to challenges in analyzing complex biological data. The paper outlines three main applications of MLLMs in bioimage analysis: direct image analysis tasks like segmentation, automatic report generation for large-scale studies, and serving as intelligent agents for smart microscopes. The authors propose a three-step approach to developing MLLMs for bioimage analysis, which they describe as "bricks, buildings, and facilities." The "bricks" stage involves creating comprehensive training datasets combining images with text descriptions. The "buildings" stage focuses on designing the MLLM architecture, including encoders, fusion modules, and decoders. The "facilities" stage involves fine-tuning the models for specific bioimage analysis tasks and implementing them in practical applications. The authors address challenges in MLLM development, such as ensuring trustworthiness and adapting to new concepts. They suggest using techniques like retrieval augmented generation (RAG) and parameter efficient fine-tuning (PEFT) to overcome these issues. Overall, the article presents MLLMs as a promising tool for integrating diverse information and knowledge in bioimage analysis, potentially surpassing the capabilities of individual experts or small teams. The authors envision future bioimage analysis being conducted by intelligent MLLM agents that can assist researchers throughout the entire research process, from experimental design to data analysis and knowledge discovery.
Recommended by LinkedIn
· Can Large Language Models Provide Useful Feedback on Research Papers? A Large-Scale Empirical Analysis | NEJM AI / Can large language models provide useful feedback on research papers? A large-scale empirical analysis (arxiv.org) - This article presents a large-scale study examining whether large language models (LLMs) like GPT-4 can provide useful feedback on scientific research papers. The study analyzed feedback generated by GPT-4 on over 3,000 papers from Nature journals and 1,700 papers from the ICLR machine learning conference. It found significant overlap between GPT-4's feedback and human reviewer comments, with about 30-40% of GPT-4's comments matching those from individual human reviewers. This level of overlap was comparable to the overlap between different human reviewers. The study also conducted a survey of 308 researchers, finding that over 50% considered GPT-4's feedback helpful for improving their work. However, limitations were noted, including GPT-4's tendency to focus on certain aspects of feedback more than others and sometimes providing less specific comments than human reviewers. The authors conclude that while LLM-generated feedback shows promise as a complementary tool, especially for researchers with limited access to timely expert feedback, it cannot replace thoughtful human expert review in the scientific process.
· From LLMs to LLM-based Agents for Software Engineering: A Survey of Current, Challenges and Future (arxiv.org) - This article discusses a comprehensive survey on the evolution and impact of Large Language Models (LLMs) and LLM-based agents in software engineering. The study, conducted by Shoaib Nazir and published on August 10, 2024, addresses the limitations of LLMs in autonomy and self-improvement, and explores how LLM-based agents can potentially overcome these challenges. The research examines the applications of LLMs and LLM-based agents across six key areas of software engineering: requirement engineering, code generation, autonomous decision-making, software design, test generation, and software maintenance. It highlights the growing interest in LLM-based agents, which combine LLMs with decision-making capabilities to enhance autonomy in software development. The study employed a systematic literature review methodology, analyzing 117 relevant papers from late 2023 to May 2024. It identified 79 unique LLMs and used various performance metrics and benchmarks to evaluate their effectiveness. The findings indicate significant advancements in AI for software engineering while also pointing out areas for further research. The paper emphasizes the need for unified standards and benchmarking in the field of LLM-based agents. It concludes that these agents represent a promising evolution in addressing the limitations of traditional models, potentially leading to more autonomous and effective software engineering solutions.
· Climate change and artificial intelligence in healthcare: Review and recommendations towards a sustainable future - ScienceDirect - This article, titled "Climate change and artificial intelligence in healthcare: Review and recommendations towards a sustainable future," examines the intersection of climate change and artificial intelligence (AI) in healthcare. The authors discuss the environmental impact of AI systems, including their energy consumption, carbon footprint, and e-waste generation. They highlight the challenges posed by the increasing adoption of AI in healthcare, such as the energy-intensive nature of AI model training and deployment, and the contribution of data centers to greenhouse gas emissions. The article also explores potential solutions to mitigate the environmental impact of AI in healthcare. These include developing energy-efficient AI models, adopting green computing practices, and integrating renewable energy sources. The authors emphasize the role of AI in optimizing healthcare workflows, reducing resource waste, and facilitating sustainable practices like telemedicine. The review discusses the importance of policy and governance frameworks, global initiatives, and collaborative efforts in promoting sustainable AI practices in healthcare. It outlines best practices for sustainable AI deployment, including eco-design, lifecycle assessment, responsible data management, and continuous monitoring and improvement. The authors conclude by stressing the need for the healthcare industry to prioritize sustainability and environmental responsibility as it continues to embrace AI technologies. They argue that by following best practices for sustainable AI deployment, healthcare organizations can harness the transformative potential of AI while actively contributing to environmental preservation.
· VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs (arxiv.org) / VideoLLaMA 2 Released: A Set of Video Large Language Models Designed to Advance Multimodal Research in the Arena of Video-Language Modeling - MarkTechPost - This article introduces VideoLLaMA 2, a set of advanced Video Large Language Models (Video-LLMs) designed to enhance spatial-temporal modeling and audio understanding in video analysis tasks. The model builds upon its predecessor by incorporating a Spatial-Temporal Convolution (STC) connector and an Audio Branch, allowing for improved capture of spatial and temporal dynamics in video data and seamless integration of audio cues. The researchers evaluated VideoLLaMA 2 on various benchmarks, including multiple-choice video question answering (MC-VQA), open-ended video question answering (OE-VQA), video captioning (VC), and audio-video question answering (AVQA) tasks. The results show that VideoLLaMA 2 consistently outperforms existing open-source models and even approaches the performance of some proprietary models in several benchmarks. The article details the model's architecture, training process, and implementation, emphasizing its ability to handle complex multimodal data. It also presents qualitative examples demonstrating VideoLLaMA 2's capabilities in global scene understanding, spatial-temporal orientation awareness, commonsense reasoning, and fine-grained recognition. The researchers conclude that VideoLLaMA 2 represents a significant advancement in video-language modeling, offering improved performance across various video and audio-oriented tasks. They suggest that the model can serve as a foundation for further development in specialized areas such as long video understanding, video agents, autonomous driving, and robotic manipulation.
· Performance of machine learning algorithms for lung cancer prediction: a comparative approach | Scientific Reports (nature.com) - This article presents a comparative study of machine learning algorithms for predicting lung cancer based on clinical data. The researchers analyzed a dataset containing patient symptoms and habits, with 310 instances and 16 attributes. They applied twelve different machine learning algorithms to the data, including logistic regression, naive Bayes, support vector machines, random forests, and neural networks. The study found that the K-Nearest Neighbor algorithm performed best, achieving 92.86% accuracy, followed closely by Bernoulli Naive Bayes and Gaussian Naive Bayes at 91.07% accuracy. The researchers used confusion matrices, ROC curves, and AUC scores to evaluate model performance. Initial data analysis revealed that alcohol consumption was significantly correlated with lung cancer detection, and symptoms like yellow fingers, coughing, chronic disease, chest pain, and allergies were critical indicators when examining data by gender. The authors suggest that this approach could be useful for early lung cancer detection, potentially allowing for more timely and effective treatment. However, they acknowledge limitations due to the small dataset size and recommend further studies with larger, more comprehensive datasets. They also propose exploring the use of Electronic Health Record (EHR) data and developing a weighting system for specific attributes in lung cancer detection. The study concludes that machine learning techniques show promise for improving lung cancer prediction, but more research is needed to verify and refine these methods for clinical use.
· Physics-informed deep generative learning for quantitative assessment of the retina | Nature Communications - This article presents a novel approach for analyzing retinal vasculature using physics-informed generative adversarial networks (PI-GAN). The researchers developed algorithms to create realistic 3D models of retinal blood vessels based on biophysical principles. These models incorporate fully connected arterial and venous trees and can simulate blood flow and fluorescein delivery. The PI-GAN framework allows for the automated segmentation of blood vessels from clinical retinal images without requiring manual labeling. It outperformed human labeling in detecting small vessels in high-resolution images. The method was validated using public datasets and achieved near state-of-the-art performance in vessel segmentation. Additionally, the researchers demonstrated the ability to simulate pathological conditions like diabetic retinopathy and retinal vein occlusion using their model. The approach has potential applications in early disease detection, monitoring progression, and improving patient care in ophthalmology. Overall, this physics-informed deep learning technique offers a promising tool for quantitative assessment of retinal vasculature, with implications for both clinical practice and research into systemic diseases that affect retinal blood vessels. The method's ability to generate realistic synthetic data and perform segmentation without manual labeling addresses key challenges in applying AI to retinal image analysis.
· Computer-aided detection of tuberculosis from chest radiographs in a tuberculosis prevalence survey in South Africa: external validation and modelled impacts of commercially available artificial intelligence software - The Lancet Digital Health - This article presents a comprehensive evaluation of 12 commercially available computer-aided detection (CAD) products for tuberculosis screening using chest X-rays in South Africa, a high tuberculosis and HIV burden setting. The study compared the performance of these AI products against microbiological evidence of tuberculosis in 774 individuals from a national tuberculosis prevalence survey. The results showed that several CAD products, including some not previously evaluated by the World Health Organization, performed similarly well with high accuracy. However, there were notable differences in the thresholds required to achieve target sensitivity and specificity levels across products. The study also found that most CAD products performed worse in older individuals and those with a history of tuberculosis, while performance differences based on HIV status were not statistically significant. The authors emphasize the importance of tailoring threshold selection to specific contexts and populations rather than using a universal threshold. They advocate for on-site operational research to determine optimal thresholds and highlight the need for a coordinated global evaluation effort to keep pace with rapidly evolving AI technologies in tuberculosis screening. The study has some limitations, including potential inaccuracies in self-reported data and limited statistical power for certain subgroup analyses. Nevertheless, it provides valuable insights into the performance of CAD products in a high-burden setting and offers guidance for implementers on product and threshold selection for tuberculosis screening programs.
· The diagnostic and triage accuracy of the GPT-3 artificial intelligence model: an observational study - The Lancet Digital Health - This article presents a study comparing the diagnostic and triage accuracy of GPT-3, a general-purpose AI language model, to lay individuals and physicians using 48 validated clinical case vignettes. The study found that GPT-3's diagnostic accuracy (88%) was significantly better than lay individuals (54%) and close to physicians (96%). However, GPT-3's triage accuracy (70%) was comparable to lay individuals (74%) but inferior to physicians (91%). The researchers observed an inverse relationship between case acuity and GPT-3's accuracy, with performance declining as case severity increased. GPT-3 showed good calibration in its confidence scores for both diagnosis and triage predictions. The study highlights the potential of general-purpose AI models in healthcare, particularly for providing broad access to expert diagnostic advice. However, it also raises concerns about biases, misinformation, and the need for safeguards before deploying such models in clinical settings. The authors acknowledge limitations, including the use of synthetic cases and the impact of prompting methods on GPT-3's output. They emphasize the need for further research to validate these findings with larger studies and real-world data, as well as to explore the performance of more recent AI models. Overall, the study suggests that while general-purpose AI models show promise in medical diagnosis, challenges remain in triage accuracy and addressing potential biases and misinformation.
· A deep learning-based model to estimate pulmonary function from chest x-rays: multi-institutional model development and validation study in Japan - The Lancet Digital Health - This article describes the development and validation of a deep learning-based artificial intelligence (AI) model that can estimate pulmonary function measurements (forced vital capacity and forced expiratory volume in 1 second) from chest x-rays. The study used data from over 140,000 x-ray and spirometry pairs from multiple institutions in Japan. The AI model showed strong performance in estimating pulmonary function, with correlation coefficients around 0.90 when tested on external datasets. The model focused primarily on lung regions in the x-rays to make its predictions. The authors suggest this AI tool could be useful as a complementary method to spirometry, especially for patients who have difficulty performing spirometry tests. It may also help customize CT imaging protocols based on estimated lung function. However, the model's performance varied somewhat for specific disease subgroups like COPD and asthma patients. Limitations include the retrospective design, likely focus on Asian patients, and data collection partially during the COVID-19 pandemic. The authors call for further validation in diverse populations and prospective studies. Overall, this AI approach demonstrates the potential to extract dynamic functional information from static chest x-ray images, potentially adding value to routine radiographs in assessing pulmonary function.
· Medical artificial intelligence for clinicians: the lost cognitive perspective - The Lancet Digital Health - This article discusses the challenges of integrating artificial intelligence (AI) into clinical decision-making, particularly in radiology. The authors argue that there is a fundamental mismatch between how clinicians and AI systems make decisions. Clinicians use "ecologically bounded reasoning," relying on contextual cues and expertise to make efficient decisions. In contrast, AI systems are "dataset bounded," using any correlations found in training data without necessarily understanding clinical relevance. The paper highlights the risks of this mismatch, including the potential for AI to use irrelevant or spurious correlations in making diagnoses. It critiques current approaches to human-AI interaction in healthcare, such as the concept of "teaming" and the field of explainable AI (XAI), arguing that these do not adequately address the cognitive differences between humans and AI. The authors propose a new framework for studying human-AI interaction in medicine, based on three levels of analysis: behavioral, cognitive, and cognitive model. They argue that research needs to move beyond surface-level observations to understand the complex cognitive processes involved in clinical decision-making with AI. The article concludes by emphasizing the need for a more comprehensive understanding of clinical cognition in the context of AI use, to ensure the safe and effective implementation of AI in healthcare. It calls for research that acknowledges the fundamental differences between human and AI decision-making, and explores the cognitive, environmental, and neurophysiological aspects of human-AI interaction in clinical settings.
· AlphaFold accelerated discovery of psychotropic agonists targeting the trace amine–associated receptor 1 | Science Advances - This article describes how researchers used computational models of the trace amine-associated receptor 1 (TAAR1) generated by AlphaFold and traditional homology modeling to conduct virtual screening for potential drug compounds. They found that the AlphaFold models outperformed homology models, leading to a higher hit rate of 60% vs 22% for identifying TAAR1 agonists when experimentally testing top virtual screening candidates. The most potent compound discovered using the AlphaFold models showed promising selectivity and exhibited antipsychotic-like effects in mouse behavioral tests that were dependent on TAAR1. The study demonstrates that AlphaFold-generated protein structures can be successfully used for structure-based virtual screening to accelerate drug discovery, potentially opening up new opportunities for finding ligands for previously challenging drug targets. The researchers note that while AlphaFold models performed well, they still have some limitations compared to experimental structures in capturing protein dynamics and diverse ligand binding modes.
· Sharing brain imaging data in the Open Science era: how and why? (thelancet.com) - This article discusses the challenges and opportunities of sharing brain imaging data in the context of Open Science. The authors, members of the European Cluster for Imaging Biomarkers, conducted a survey among senior neuroscientists to identify key issues in data sharing. While the scientific community recognizes the importance of data sharing for accelerating research and improving patient care, several obstacles remain. The main challenges identified include technical aspects (such as data harmonization and standardization), legal barriers (particularly related to the EU's General Data Protection Regulation), and motivational issues. The article provides recommendations for overcoming these challenges, including the use of standardized data formats like the Brain Imaging Data Structure (BIDS), the implementation of FAIR (Findable, Accessible, Interoperable, and Reusable) principles, and the development of clear guidelines for anonymization and data protection. The authors emphasize the need for a harmonized application of data protection regulations across EU countries and call for dedicated funding for data sharing initiatives. They also propose the development of a new credit system that recognizes data sharing as a valuable scientific contribution, linking researchers, datasets, and resulting publications. The article concludes by highlighting the need for political action to address these issues and foster a culture of data sharing in neuroscience research. The authors argue that improving data sharing practices will ultimately accelerate scientific progress and translation of research findings into clinical practice.
· How to set up your first machine learning project in astronomy | Nature Reviews Physics - This article provides guidance on setting up and conducting effective machine learning (ML) projects in astronomy, with principles that are applicable to medical research as well. The authors emphasize the importance of clearly defined objectives, establishing baselines, and rigorous validation practices. Key recommendations include defining specific, measurable goals for ML projects that directly address scientific questions, establishing simple baselines and dummy models to benchmark against before pursuing more complex approaches, and properly handling uncertainties and data limitations, including addressing covariate shift between training and application data. The authors also stress the importance of conducting thorough ablation studies to understand which inputs and model components are truly important, critically examining model performance across different regions of the input space, and producing calibrated uncertainty estimates along with predictions. They emphasize striving to extract interpretable insights from complex models and carefully documenting both successful and unsuccessful approaches. For medical research, these principles could be applied to projects involving disease diagnosis, treatment outcome prediction, or analysis of medical imaging data. The emphasis on rigorous validation, uncertainty quantification, and interpretability is particularly relevant given the high stakes of medical applications. The article's discussion of covariate shift is especially pertinent, as medical training data often comes from controlled studies while real-world application involves more diverse populations. The recommendations on calibrating uncertainties and conducting ablation studies could help produce more robust and trustworthy ML models in medicine. Additionally, the focus on extracting interpretable insights rather than just pursuing high accuracy aligns well with the need for explainable AI in healthcare. Overall, adopting these practices could lead to more reliable and scientifically insightful ML applications in medical research, enhancing the potential for ML to contribute meaningfully to advancing medical knowledge and improving patient care.
· AI and ethics: Investigating the first policy responses of higher education institutions to the challenge of generative AI | Humanities and Social Sciences Communications (nature.com) - This article examines the initial responses of leading higher education institutions to the ethical challenges posed by generative AI tools like ChatGPT. The authors analyzed policy documents and guidelines from 30 top universities worldwide, focusing on four key ethical principles derived from international AI ethics frameworks: accountability and responsibility, human agency and oversight, transparency and explainability, and inclusiveness and diversity. The study found that universities generally aim to explore AI's potential benefits while addressing emerging challenges. A common theme is that students must complete assignments based on their own knowledge, with human individuals retaining moral and legal responsibility. Many institutions adopt a decentralized approach, allowing instructors flexibility in determining AI use in their courses while requiring clear communication of policies. Best practices identified include combining preventive measures with soft, dialogue-based procedures for oversight, providing clear guidelines on permissible AI use, and offering centralized resources to ensure equal access and support for all students. The authors note that early responses often drew from traditional AI ethics frameworks rather than academic literature, as institutions sought to quickly address the rapid emergence of powerful generative AI tools in education. The paper concludes that while universities are actively grappling with these ethical challenges, approaches vary and will likely continue to evolve as understanding of AI's impacts on higher education deepens. It suggests further research to track how institutional policies develop over time in response to advancing AI capabilities.
AI TOOLS
· GPT-4o System Card | OpenAI - Codestral is an open-weight generative AI model developed by Mistral AI for code generation tasks. It supports over 80 programming languages, including popular ones like Python, Java, and C++, and more specific ones like Swift and Fortran. Codestral can complete code functions, write tests, and fill in partial code, thereby saving developers time and effort. The model outperforms others in performance benchmarks, featuring a larger context window and excelling in tasks like code completion and
· Imagen 3 (arxiv.org) / Google AI Released the Imagen 3 Technical Paper: Showcasing In-Depth Details - MarkTechPost – The article describes the development and evaluation of Imagen 3, a state-of-the-art text-to-image generation model by Google. The model is a latent diffusion model designed to produce high-quality images based on text prompts, performing particularly well in photorealism and complex prompt adherence. It outperforms previous versions, like Imagen 2, and other competing models, such as DALL·E 3 and Midjourney v6, in several aspects, including prompt-image alignment and numerical reasoning. The article details the data curation, evaluation processes, and responsible deployment strategies, emphasizing safety and ethical considerations in its deployment. Extensive human and automated evaluations confirm Imagen 3's leading position in text-to-image generation, despite ongoing challenges in areas like numerical reasoning and complex scene generation.
IN THE MEDIA
· OpenAI Warns ChatGPT Voice May Make People Emotionally Reliant - The New York Times (nytimes.com) - The article discusses OpenAI's recent report acknowledging potential risks associated with the voice response feature of ChatGPT, particularly GPT-4o. The company observed that some users formed unusual emotional bonds with the AI during testing, using language that suggested a connection. OpenAI warns that the humanlike voice capability could lead to anthropomorphization and possibly reduce the need for human interaction. The report highlights both benefits and concerns, such as alleviating loneliness but potentially affecting healthy relationships. It also notes the AI's quick response time, similar to human conversation. OpenAI plans further research on diverse user populations and independent studies to better understand and mitigate risks. The article mentions the broader context of the AI industry, with companies like Apple and Google also developing AI technologies. It references concerns about rushing AI deployment without adequate safety measures. Additionally, it touches on a previous controversy involving actress Scarlett Johansson and OpenAI's use of a voice similar to hers. Overall, the piece emphasizes the need for continued investigation into the long-term effects of human-AI interactions, especially as voice features become more widespread.
· 5 Best Books About Artificial Intelligence - The New York Times (nytimes.com) - This article from The New York Times, written by Stephen Marche, discusses five recommended books about artificial intelligence (AI). Marche critiques the current discourse on AI, noting that it often swings between hype and panic while lacking clear answers to fundamental questions. The recommended books are:
"The Alignment Problem" by Brian Christian (2020), which explores the challenge of aligning machine behavior with human values.
"Artificial Intelligence" by Melanie Mitchell (2019), offering a comprehensive history and explanation of AI development.
"The Algorithm" by Hilke Schellmann (2024), examining AI's impact on human resources and employment practices.
"Progressive Capitalism" by Ro Khanna (2022), proposing political and economic responses to AI's societal effects.
"AI 2041" by Kai-Fu Lee and Chen Qiufan (2021), combining speculative fiction with factual explanations of AI concepts.
Marche emphasizes that these books approach AI with necessary humility, acknowledging the complexity and uncertainty surrounding the technology. He concludes that the most revealing aspect of AI might be what it teaches us about human limitations and problems, rather than those of machines.
PODCASTS
· Application of Artificial Intelligence in medical education: What is the future of AI in medicine? | AMA Update Video | AMA (ama-assn.org) - This podcast transcript discusses the application of artificial intelligence (AI) in medical education. The AMA Update features Dr. Kim Lomis and Dr. Margaret Lozovatsky discussing how medical schools are incorporating AI into their curriculum. The conversation highlights that while AI has been used in medicine for some time, the emergence of generative AI like ChatGPT in 2023 has brought renewed attention to its potential in healthcare. The experts outline five key areas for medical education to focus on regarding AI: foundational knowledge, critical appraisal, AI-enhanced clinical encounters, technical considerations, and understanding risks and unintended consequences. The podcast emphasizes the importance of integrating AI education across all levels of medical training, from undergraduate to continuing professional development. Medical schools are encouraged to incorporate AI concepts into existing curricula, such as evidence-based medicine, clinical reasoning, and communication skills. The discussion also touches on the concept of "precision education," using AI to personalize learning experiences for medical students and residents. This approach aims to make education more effective and potentially improve well-being by targeting individual learning needs. The experts address the challenges of keeping up with the rapid pace of AI development in medicine. They suggest breaking down complex concepts, using applied learning in clinical settings, integrating expertise into institutions, educating faculty, and regularly reviewing curricula to stay current. The podcast concludes by mentioning various resources available through the AMA for educators and practitioners interested in learning more about AI in healthcare, including advocacy principles, research, learning modules, and ethical guidelines.
If you're finding value in our newsletter, why not share the knowledge? Please pass it along to your friends and colleagues and help us expand our reach and improve our content! Thank you!
Wow, une sorte de "State of the art" de l'application de l'IA au domaine médical! Bravo Salim!