Racing toward the horizon

KT B.

- Securing the art of the possible | MEng, DPhil, PGDip

Published May 14, 2024

I'm blown away. Speechless. Breathless.

Artificial Intelligence continues to evolve at an unprecedented pace. Many researchers and engineers at many organisations with all levels of funding and resources are building and tuning novel products on what seems like a weekly basis.

Of course I have to acknowledge the last couple of days - Google ’s Project Astra and OpenAI ’s GPT-4o, are pushing the boundaries of what AI can achieve.

Project Astra

Unveiled at Google I/O 2024, Project Astra is a new AI initiative that leverages your phone’s camera and voice recognition to provide responses. It’s a Gemini-based multimodal AI tool that allows users to point their phone’s camera at real-life objects and get a spoken description of what they’re looking at.

One of the impressive features is its ability to continuously scan camera feeds to provide contextual understanding of the world around you. For example, in a press demo, a user pointed a smartphone camera and smart glasses at things and asked Astra to explain what they were. When the person pointed the device out the window and asked “What neighbourhood do you think I’m in?” the AI system was able to identify King’s Cross, London, headquarters of Google DeepMind .

Obviously this is part of their vision for the future of AI assistants, interacting with the world around it by taking in information, remembering what it sees, processing that information, and understanding contextual details. Has to be noted that it also speaks much more naturally than the current form of Google Assistant and has little lag or delay.

Very impressive.

GPT-4o

OpenAI stole some thunder by announcing on the previous day but, in terms of what was on show, GPT-4o was more of an evolution. GPT-4 was already a significant improvement over its predecessors, with a larger model size, the ability to ingest and generate multiple data modalities beyond just text, and improved reasoning, logic, and common sense capabilities.

Here are the key enhancements:

Recommended by LinkedIn

AI’s next arms race may revolve around generative video

Fast Company 11 months ago

AI News. Issue #2

Avenga 9 months ago

🤖 AI K-news #21

Keepler Data Tech 1 month ago

Reasoning across audio, vision, and text in real time. It accepts any combination of text, audio, and image inputs and generates any combination of text, audio, and image outputs.
It can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response time in a conversation. It matches GPT-4 Turbo performance while also being much faster and 50% cheaper in the API.
GPT-4o is trained end-to-end across text, vision, and audio, meaning that all inputs and outputs are processed by the same neural network.

The demos were powerful, despite some of them having a definite whiff of cheese - also, having a somewhat flirtatious tone for your AI may well not be well received by some!

The Future of AI in iOS

If the rumours are to be believed, one of these may well be the underpinning of a rebooted Siri (and not before time eh? You're with me, right?) which would become more conversational and context-aware, providing more accurate and detailed responses. It would also mean users could have more natural, dynamic conversations within apps and help with task automation.

Looking Ahead

Based on these recent advancements, the next breakthrough in AI could likely be in the realm of even more advanced multimodal capabilities.

Forthcoming AI systems will not only understand and generate text, but also interpret and generate images, video, and other types of data, providing a more holistic understanding of the world.

The importance of advanced multimodal capabilities in AI systems lies in their potential to provide a more comprehensive and nuanced understanding of the world, similar to how humans interact with their environment.

Humans naturally process information from multiple sources - visual, auditory, textual, and more. An AI with advanced multimodal capabilities can mimic this, leading to a more robust and holistic understanding of context:

It can interact with users in more natural and intuitive ways. For example, it could understand a user’s spoken words, facial expressions, and body language all at once, leading to more effective communication.
It can be useful in a wider range of applications. For instance, it could be used in autonomous vehicles to process visual, auditory, and sensor data simultaneously to make safe driving decisions.
It can make technology more accessible. For example, it could help visually impaired individuals understand their surroundings by interpreting and describing visual data.
It can process and interpret different types of data in real-time, making them highly responsive and effective for tasks that require immediate feedback.

In conclusion, these are a significant step towards creating AI that understands and interacts with the world in ways that are closer to human cognition and experience. This not only broadens the potential applications of AI but also enhances the effectiveness and user-friendliness of AI systems.

To view or add a comment, sign in

Racing toward the horizon

KT B.

- Securing the art of the possible | MEng, DPhil, PGDip

Project Astra

GPT-4o

Recommended by LinkedIn

The Future of AI in iOS

Looking Ahead

More articles by KT B.

Insights from the community

Others also viewed

The AI Gazette 📰: News, Insights, and Discoveries!

Gen AI for business newsletter # 26

Unleashing the Power of AI and ML🌐: Investigating the Latest Breakthroughs and Present Trends in 2024!

LLMs: Where We Are and Where We're Heading

Last Week on AI - no. 38

Is It Real or a Robot? Exploring the Future of AI-Driven Speech

🤖 Daily News in AI Agents: Key Updates 12/5

GenAI: From "Good Enough" to "Bet the Business"

🤖 Daily News in AI Agents: Key Updates 12/13

This week in Mundo Data-Driven, august 3, 2024

Explore topics

Project Astra

GPT-4o

Recommended by LinkedIn

The Future of AI in iOS

Looking Ahead

More articles by KT B.

Test-Driven Development and AI Pair Programming: A Natural Fit

Identity is the New Perimeter: Keeping Organisations Secure in Hybrid and Remote Work

IAM, IGM, PAM: Untangling the Security Alphabet Soup!

The EU AI Act: What Does the Ban on "Risky AI" Mean?

The Evolution of Identity Management: Why Your Organisation's Security Depends On It

Securing the Rise of the Machines: Authentication and Authorisation for AI Agents

2025: A Year of Data Focus?

The Future of Security Testing: Can AI Solve the Inadequacies of SAST and DAST?

10 Challenges for SASE in Banking

The Perils of Over-Reliance on Generative AI in Cybersecurity: Risks and Mitigations

Insights from the community

Others also viewed

The AI Gazette 📰: News, Insights, and Discoveries!

Gen AI for business newsletter # 26

Unleashing the Power of AI and ML🌐: Investigating the Latest Breakthroughs and Present Trends in 2024!

LLMs: Where We Are and Where We're Heading

Last Week on AI - no. 38

Is It Real or a Robot? Exploring the Future of AI-Driven Speech

🤖 Daily News in AI Agents: Key Updates 12/5

GenAI: From "Good Enough" to "Bet the Business"

🤖 Daily News in AI Agents: Key Updates 12/13

This week in Mundo Data-Driven, august 3, 2024

Explore topics