GPT-4o: a step-up?
Are the best things in life free?

GPT-4o: a step-up?

OpenAI recently launched GPT-4o, where "o" stands for "omni." In the coming weeks, the model will gradually become accessible to all ChatGPT users, including those who prefer not to subscribe.

Despite AI proving immensely useful, sales for such technologies have yet to reach the anticipated heights. Other models, like Microsoft Copilot, are encountering mixed user responses.

Multimodal capabilities

Sam Altman, CEO of OpenAI, describes the new model as "natively multimodal." It can create or interpret content across voice, text, or images. Its ability to engage through various mediums significantly reduces time and costs, particularly for applications requiring instant interaction, like customer support bots, virtual assistants, and real-time language translation services.

Higher rate limits.  Lowers costs

Performance-wise, GPT-4o is more affordable, costing half as much as GPT-4-turbo and twice as fast. Early access to the newest features and better rate limits are perks for paying users, enabling them to rapidly process vast amounts of data. It can respond to audio inputs in just 232 milliseconds, approximating human response times in conversations.

(ChatBot Arena)

Real-time sound and vision

Previously, Voice Mode responses lagged, with latencies of 2.8 seconds for GPT-3.5 and 5.4 seconds for GPT-4. GPT-4o is a significant improvement.  During its unveiling, OpenAI demonstrated its advanced capability to understand and express a range of emotional tones in a storytelling session.


Visual Understanding:

Translation and enhanced memory

GPT-4o's ability to translate languages in real-time fosters seamless cross-cultural communication. Demonstrated during a bilingual exchange between English and Italian at the launch, this feature illustrates the model's potential as a sophisticated tool for international business.

 Additionally, its memory feature helps maintain context and conversation coherence, acting like a knowledgeable personal assistant who remembers your preferences and history.

Recognition and conversational depth

The model's new capability to analyse visual expressions and adjust responses based on detected emotions introduces a deeper interaction layer, making exchanges more personalised and emotionally attuned.

Reasoning abilities. GPT-4o set a new benchmark with 88.7% on the zero-shot COT MMLU, assessing general knowledge. It scored 87.2% in the standard five-shot no-CoT MMLU tests, demonstrating robust processing capabilities.

Hear for yourself

  • Live language translation (link).
  • Realtime conversational speech (link).
  • Lullabies and whispers (link).
  • Sarcasm (link).
  • Singing (link)

Safety and ethics

OpenAI continues to collaborate with diverse sectors, including government, media, and civil societies, to ensure the responsible deployment of its technologies. However, further details on how the model addresses privacy and security in facial recognition and audio generation are still to be answered.

Desktop service and future enhancements

The launch also featured a new desktop service and further developments in voice assistant capabilities. Despite these advancements, the model has shown some things still need improving, such as misidentifying a smiling man as a wooden surface, underlining the need for ongoing improvements.

Potential industry impact and partnerships

Speculations about a potential partnership with Apple, which could integrate ChatGPT features into the forthcoming iOS 18, are making the rounds.

 If true, this move could significantly boost Apple's AI offerings and influence the next iPhone buying cycle. The partnership could position OpenAI as a leader in generative AI, potentially reshaping competitive dynamics with giants like Google's Gemini.

Which model is right for you?

Choosing between ChatGPT models reminds me of navigating the complexities of the UK's off-peak railway ticket system: it is far from straightforward.

However, for those requiring real-time engagement and cost-effective solutions, GPT-4o appears to stand out.

With its superior rate limits, real-time audio and vision capabilities, and advanced memory features, it seems best suited for marketers looking for dynamic interactions and global reach.

Next up...

... Gemini counterattacks...


What you need to know in a nutshell 

Free. Access is being rolled out to ChatGPT users (for free users as well, with usage limits), and the API is available to developers.

Text and image input are rolling out now in API and ChatGPT, with voice and video in the coming weeks.

Native audio understanding: You can chat with the model. Conversation latency has decreased tenfold compared to earlier voice modes.

It sings.  The model can sing. Honestly!

Real-time video understanding.

New MacOS App.

It is twice as fast and cheap as the GPT-4 turbo.

New multilingual tokenizer: Some languages now require 4.4x fewer tokens. For instance, the model is 3.5 times cheaper for Russian language.

Conversational mode will be available to Plus subscribers in the coming weeks.

Advanced audio and video capabilities are now available to limited user groups.

To view or add a comment, sign in

More articles by Jonathan Gabay

Insights from the community

Others also viewed

Explore topics