GPT-4o: a step-up?

Jonathan Gabay

Senior Lecturer (Fellow) @ CIM | AI Marketing Expert | Author | Journalist

Published May 14, 2024

OpenAI recently launched GPT-4o, where "o" stands for "omni." In the coming weeks, the model will gradually become accessible to all ChatGPT users, including those who prefer not to subscribe.

Despite AI proving immensely useful, sales for such technologies have yet to reach the anticipated heights. Other models, like Microsoft Copilot, are encountering mixed user responses.

Multimodal capabilities

Sam Altman, CEO of OpenAI, describes the new model as "natively multimodal." It can create or interpret content across voice, text, or images. Its ability to engage through various mediums significantly reduces time and costs, particularly for applications requiring instant interaction, like customer support bots, virtual assistants, and real-time language translation services.

Higher rate limits. Lowers costs

Performance-wise, GPT-4o is more affordable, costing half as much as GPT-4-turbo and twice as fast. Early access to the newest features and better rate limits are perks for paying users, enabling them to rapidly process vast amounts of data. It can respond to audio inputs in just 232 milliseconds, approximating human response times in conversations.

Real-time sound and vision

Previously, Voice Mode responses lagged, with latencies of 2.8 seconds for GPT-3.5 and 5.4 seconds for GPT-4. GPT-4o is a significant improvement. During its unveiling, OpenAI demonstrated its advanced capability to understand and express a range of emotional tones in a storytelling session.

Translation and enhanced memory

GPT-4o's ability to translate languages in real-time fosters seamless cross-cultural communication. Demonstrated during a bilingual exchange between English and Italian at the launch, this feature illustrates the model's potential as a sophisticated tool for international business.

Additionally, its memory feature helps maintain context and conversation coherence, acting like a knowledgeable personal assistant who remembers your preferences and history.

Recognition and conversational depth

The model's new capability to analyse visual expressions and adjust responses based on detected emotions introduces a deeper interaction layer, making exchanges more personalised and emotionally attuned.

Reasoning abilities. GPT-4o set a new benchmark with 88.7% on the zero-shot COT MMLU, assessing general knowledge. It scored 87.2% in the standard five-shot no-CoT MMLU tests, demonstrating robust processing capabilities.

Hear for yourself

Live language translation (link).
Realtime conversational speech (link).
Lullabies and whispers (link).
Sarcasm (link).
Singing (link)

Safety and ethics

OpenAI continues to collaborate with diverse sectors, including government, media, and civil societies, to ensure the responsible deployment of its technologies. However, further details on how the model addresses privacy and security in facial recognition and audio generation are still to be answered.

Recommended by LinkedIn

10 Best AI Bypassers to Bypass AI Detection With Ease…

Parul Gautam 6 months ago

How To Humanize AI Text: 10 Proven AI Humanization…

Shushant Lakhyani 6 months ago

Understanding LLM Fine-Tuning

Sanjay Kumar MBA,MS,PhD 1 year ago

Desktop service and future enhancements

The launch also featured a new desktop service and further developments in voice assistant capabilities. Despite these advancements, the model has shown some things still need improving, such as misidentifying a smiling man as a wooden surface, underlining the need for ongoing improvements.

Potential industry impact and partnerships

Speculations about a potential partnership with Apple, which could integrate ChatGPT features into the forthcoming iOS 18, are making the rounds.

If true, this move could significantly boost Apple's AI offerings and influence the next iPhone buying cycle. The partnership could position OpenAI as a leader in generative AI, potentially reshaping competitive dynamics with giants like Google's Gemini.

Which model is right for you?

Choosing between ChatGPT models reminds me of navigating the complexities of the UK's off-peak railway ticket system: it is far from straightforward.

However, for those requiring real-time engagement and cost-effective solutions, GPT-4o appears to stand out.

With its superior rate limits, real-time audio and vision capabilities, and advanced memory features, it seems best suited for marketers looking for dynamic interactions and global reach.

Next up...

... Gemini counterattacks...

What you need to know in a nutshell

Free. Access is being rolled out to ChatGPT users (for free users as well, with usage limits), and the API is available to developers.

Text and image input are rolling out now in API and ChatGPT, with voice and video in the coming weeks.

Native audio understanding: You can chat with the model. Conversation latency has decreased tenfold compared to earlier voice modes.

It sings. The model can sing. Honestly!

Real-time video understanding.

New MacOS App.

It is twice as fast and cheap as the GPT-4 turbo.

New multilingual tokenizer: Some languages now require 4.4x fewer tokens. For instance, the model is 3.5 times cheaper for Russian language.

Conversational mode will be available to Plus subscribers in the coming weeks.

Advanced audio and video capabilities are now available to limited user groups.

TipTop Games

4mo

writing

To view or add a comment, sign in

GPT-4o: a step-up?

Jonathan Gabay

Senior Lecturer (Fellow) @ CIM | AI Marketing Expert | Author | Journalist

Multimodal capabilities

Higher rate limits. Lowers costs

Real-time sound and vision

Translation and enhanced memory

Recognition and conversational depth

Hear for yourself

Safety and ethics

Recommended by LinkedIn

Desktop service and future enhancements

Potential industry impact and partnerships

Which model is right for you?

What you need to know in a nutshell

More articles by Jonathan Gabay

Insights from the community

Others also viewed

5 Best AI Writing Tools to Use in 2024

GPT-4 Takes on its Predecessor: A Comprehensive Comparison of ChatGPT 3.5 and 4

ChatGPT & Co: The Future of AI for Presentations

How to Integrate ChatGPT API with Laravel Projects

OpenAI's GPT-4o Unveiled: Here's What You Need to Know

Paid Version: ChatGPT Professional is on its Way!

Llama 3.1 vs. GPT-4o: A Detailed Analysis

OpenAI Sora: The future of AI video

Introduction, Features, and Applications of Chatgpt4

Key Metrics when Implementing RAG-based Generative-AI Chatbots

Explore topics

Multimodal capabilities

Higher rate limits. Lowers costs

Real-time sound and vision

Translation and enhanced memory

Recognition and conversational depth

Hear for yourself

Safety and ethics

Recommended by LinkedIn

Desktop service and future enhancements

Potential industry impact and partnerships

Which model is right for you?

What you need to know in a nutshell

More articles by Jonathan Gabay

2025: The Final Days of Google Search?

2025: Your Life Stages Decoded

Artificial Personas – Real Challenges

The Lightkeeper’s Flame A Chanukah Tale

The AI That Stole Christmas and Chanukah

Price Gouging in the Age of AI

Carbonated by Code: How AI Flattened Coca-Cola’s Christmas Cheer

When the Soufflé Collapses: The MasterChef of Misogyny in Hell’s Kitchen of Indifference

The Assisted Dying Debate: Compassion, Care, and the Right to Choose

From Frenzy to Strategy: Turning Black Friday into a Win-Win

Insights from the community

Others also viewed

5 Best AI Writing Tools to Use in 2024

GPT-4 Takes on its Predecessor: A Comprehensive Comparison of ChatGPT 3.5 and 4

ChatGPT & Co: The Future of AI for Presentations

How to Integrate ChatGPT API with Laravel Projects

OpenAI's GPT-4o Unveiled: Here's What You Need to Know

Paid Version: ChatGPT Professional is on its Way!

Llama 3.1 vs. GPT-4o: A Detailed Analysis

OpenAI Sora: The future of AI video

Introduction, Features, and Applications of Chatgpt4

Key Metrics when Implementing RAG-based Generative-AI Chatbots

Explore topics