Racing toward the horizon
I'm blown away. Speechless. Breathless.
Artificial Intelligence continues to evolve at an unprecedented pace. Many researchers and engineers at many organisations with all levels of funding and resources are building and tuning novel products on what seems like a weekly basis.
Of course I have to acknowledge the last couple of days - Google ’s Project Astra and OpenAI ’s GPT-4o, are pushing the boundaries of what AI can achieve.
Project Astra
Unveiled at Google I/O 2024, Project Astra is a new AI initiative that leverages your phone’s camera and voice recognition to provide responses. It’s a Gemini-based multimodal AI tool that allows users to point their phone’s camera at real-life objects and get a spoken description of what they’re looking at.
One of the impressive features is its ability to continuously scan camera feeds to provide contextual understanding of the world around you. For example, in a press demo, a user pointed a smartphone camera and smart glasses at things and asked Astra to explain what they were. When the person pointed the device out the window and asked “What neighbourhood do you think I’m in?” the AI system was able to identify King’s Cross, London, headquarters of Google DeepMind .
Obviously this is part of their vision for the future of AI assistants, interacting with the world around it by taking in information, remembering what it sees, processing that information, and understanding contextual details. Has to be noted that it also speaks much more naturally than the current form of Google Assistant and has little lag or delay.
Very impressive.
GPT-4o
OpenAI stole some thunder by announcing on the previous day but, in terms of what was on show, GPT-4o was more of an evolution. GPT-4 was already a significant improvement over its predecessors, with a larger model size, the ability to ingest and generate multiple data modalities beyond just text, and improved reasoning, logic, and common sense capabilities.
Here are the key enhancements:
Recommended by LinkedIn
The demos were powerful, despite some of them having a definite whiff of cheese - also, having a somewhat flirtatious tone for your AI may well not be well received by some!
The Future of AI in iOS
If the rumours are to be believed, one of these may well be the underpinning of a rebooted Siri (and not before time eh? You're with me, right?) which would become more conversational and context-aware, providing more accurate and detailed responses. It would also mean users could have more natural, dynamic conversations within apps and help with task automation.
Looking Ahead
Based on these recent advancements, the next breakthrough in AI could likely be in the realm of even more advanced multimodal capabilities.
Forthcoming AI systems will not only understand and generate text, but also interpret and generate images, video, and other types of data, providing a more holistic understanding of the world.
The importance of advanced multimodal capabilities in AI systems lies in their potential to provide a more comprehensive and nuanced understanding of the world, similar to how humans interact with their environment.
Humans naturally process information from multiple sources - visual, auditory, textual, and more. An AI with advanced multimodal capabilities can mimic this, leading to a more robust and holistic understanding of context:
In conclusion, these are a significant step towards creating AI that understands and interacts with the world in ways that are closer to human cognition and experience. This not only broadens the potential applications of AI but also enhances the effectiveness and user-friendliness of AI systems.