Voxel51 Filtered Views Newsletter – July 26, 2024
Welcome to Voxel51’s weekly digest of the latest trending AI, machine learning and computer vision news, events and resources! Subscribe to the email version.
📰 The Industry Pulse
👁️ Could the secret to unmasking deepfakes be hiding in plain sight, right in the eyes of the beholder?
Researchers at the University of Hull have developed a technique to identify AI-generated fake images by examining eye reflections. The method compares the consistency of light reflections between the left and right eyeballs. These reflections are typically consistent in real images, while deepfakes often differ. The researchers applied astronomical techniques to study galaxies to analyze eye reflections. They used the Gini coefficient, which measures light distribution, to compare similarities between left and right eyeballs.
This method, inspired by techniques used in astronomy to study galaxies, could provide a new weapon in the ongoing battle against deepfakes.
Key takeaways:
This innovative approach demonstrates how techniques from one scientific field (astronomy) can be creatively applied to solve problems in another area (image authentication), showcasing the potential for interdisciplinary research in addressing modern technological challenges.
As AI-generated images become increasingly sophisticated, how might this astronomical approach to deepfake detection evolve to stay ahead of the curve? The answers may lie in the stars – or in this case, the eyes – but you’ll have to dive deeper into the article to uncover the full scope of this intriguing research.
🧠 Could the next big leap in AI come from a model within a model?
The AI landscape, long dominated by transformer architectures, is now witnessing a surge in the search for new model architectures.
Transformers, which power notable models like OpenAI’s Sora and GPT-4, are hitting computational efficiency roadblocks. Researchers are exploring alternatives to the dominant transformer architecture in AI, with test-time training (TTT) models emerging as a promising contender. These models, developed by a team from Stanford, UC San Diego, UC Berkeley, and Meta, could potentially process vast amounts of data more efficiently than current transformer model.
Key takeaways:
As the AI landscape evolves, will TTT models revolutionize the field by overcoming the limitations of transformers? While it’s too early to say for certain, the race for more efficient AI architectures is heating up, and the implications for the future of generative AI are fascinating to consider.
🤖 Could spatial intelligence be the key to unlocking the next level of AI reasoning?
Fei-Fei Li, a renowned computer scientist known as the “godmother of AI,” is reportedly developing a startup focused on enhancing AI’s spatial intelligence, a subfield of Visual AI which involves developing algorithms capable of realistically extrapolating images into three-dimensional reconstructions.
World Labs, which has reached a valuation of over $1 billion in just four months, aims to enhance AI’s reasoning capabilities by developing human-like visual data analysis. The company is developing a framework for understanding the three-dimensional physical world, including object dimensions, spatial location, and functionalities.
Key takeaways:
💎 GitHub Gems
LivePortrait is a project for efficient portrait animation.
The main goal of this framework is to synthesize lifelike videos from a single source image, using it as an appearance reference, while deriving motion (facial expressions and head pose) from a driving video, audio, text, or generation.
Key aspects of LivePortrait:
The inference code and models for LivePortrait are publicly available on GitHub. The authors have made it super easy to get up and running. The documentation is pretty good, and they seem quite responsive to GitHub issues. Alternatively, you can give this a try on Hugging Face Spaces.
📙 Good Reads
How deeply can AI usage alter our cognitive processes and work habits?
Nick Potkalitsky explores the cognitive impact of extended AI interaction in his newsletter “Educating AI.”
Drawing from personal experience and theoretical insights, he discusses how immersive engagement with AI tools like ChatGPT and Claude can lead to subtle yet significant changes in our thinking patterns and self-perception. His article explores the cognitive and experiential effects of prolonged AI usage, particularly in the context of writing and content creation. Potkalitsky shares insights from an intensive 6-day period of AI-assisted work, during which he noticed several significant changes in his mental processes and work patterns.
Here’s a breakdown of the effects Potkalitsky discusses:
These effects raise important questions about AI’s long-term impact on cognition, creativity, and sense of self. How can we harness AI’s benefits while mitigating these potential cognitive side effects? Potkalitsky’s article offers valuable insights into this complex issue, encouraging readers to reflect on their AI interactions and their implications for education and cognition.
🎙️ Good Listens : AI Consciousness and the Space of Possible Minds
Recommended by LinkedIn
This week’s recommendation is from the ML Street Talk podcast, hosted by Tim Scarfe.
Tim’s guest this week is Murray Shanahan, a principal research scientist at Google DeepMind and professor of cognitive robotics at Imperial College London. The episode explores the intersection of artificial intelligence, consciousness, and philosophy, with a healthy dose of Ludwig Wittgenstein’s ideas and philosophies mixed in.
This episode offers a unique blend of cutting-edge AI research and classical philosophy. Shanahan’s application of Wittgensteinian concepts to modern AI challenges provides fresh insights into both fields. I felt like I came away with a deeper understanding of the philosophical questions surrounding AI consciousness and the limitations of our current language in describing AI phenomena.
Key points from the episode:
Whether you’re an AI enthusiast, a philosophy buff, or simply curious about the intersection of technology and human understanding, this episode provides thought-provoking content that will challenge your perspectives on artificial intelligence and consciousness.
👩🏽🔬 Interesting Research
The scale of data and computation in machine learning continues to grow exponentially while the pursuit of efficiency becomes increasingly important.
The paper “Data curation via joint example selection further accelerates multimodal learning” presents an approach that could revolutionize how we train large-scale multimodal models. By introducing a method that intelligently selects batches of data rather than individual examples, the authors demonstrate remarkable improvements in training speed and computational efficiency.
This work challenges our current understanding of data curation and opens up new possibilities for scaling machine learning models more effectively. The authors achieve state-of-the-art performance with up to 13 times fewer iterations and 10 times less computation. This method, called JEST (multimodal contrastive learning with joint example selection), reveals new insights into the importance of batch composition in machine learning.
To analyze this groundbreaking research, we’ll use the PACES method, which breaks down the paper into its key components: Problem, Approach, Claim, Evaluation, and Substantiation.
Purpose
The paper discusses the inefficiency of current data curation methods in large-scale multimodal pretraining. These methods rely on selecting individual data points and do not consider the importance of batch composition. The authors explore the potential of jointly selecting batches of data as being more effective for learning compared to selecting examples independently in multimodal contrastive learning. The authors aim to speed up multimodal learning through a novel data curation method.
Approach
The researchers developed a method called JEST (multimodal contrastive learning with joint example selection), which:
The main contributions of this paper include:
Claim
JEST significantly accelerates multimodal learning, achieving state-of-the-art performance with up to 13 times fewer iterations and 10 times less computation than current methods. The significance of this work lies in its potential to:
Evaluation
The authors evaluated their approach through several experiments:
Substantiation
The evaluation strongly supports the paper’s claim. The results demonstrate that JEST and Flexi-JEST consistently outperform baseline methods and achieve comparable or better performance with significantly fewer iterations and less computation. The authors provide extensive ablation studies and analyses that further substantiate their claims about the effectiveness of joint example selection in accelerating multimodal learning.
In summary, this paper presents a novel approach to data curation in multimodal learning that shows promise in significantly accelerating training while maintaining or improving performance on downstream tasks. The method’s ability to bootstrap from smaller, well-curated datasets to improve learning on larger datasets could have broad implications for efficient large-scale model training.
🗓️. Upcoming Events
Check out these upcoming AI, machine learning and computer vision events! View the full calendar and register for an event.