9 Areas Where Humans Still Outperform AI

Martin Musiol

GenAI Since 2016 | Keynote Speaker | Author | 43k+ Newsletter

Published Nov 20, 2024

+ Follow

.. and big update from Mistral

AI hasn’t fully taken over yet. Humans still excel in certain areas - of course.

These 9 benchmarks outline vital skills and evaluate how AI measures against humans.

(Don’t miss out on the quick AI highlights at the bottom.)

✅ Before we started, we launched a premium version of the newsletter. Subscribing gives you 100% access to all content, exclusive demos, and an ad-free experience. I plan to host AMAs and develop more subscriber-requested demos.

Try Premium for up to 14 days

9 areas where humans still have an edge compared to AI

It might feel that humans are losing their edge against AI systems that are increasingly better. What is the value that humans can capture versus air systems?

Turns out there are several fields.

In my research, I stumbled upon these 9 datasets/ evaluations that show that humans still have an incredible edge against AI systems - for some time.

What is WorkArena++?

These are 682 tasks that simulate workflows typical for knowledge workers, testing planning, problem-solving, reasoning, info retrieval, and context understanding. Humans outperform AI thanks to more robust reasoning and contextual grasp.

To have really powerful AI agents, we want them to excel on the benchmark. In 2025, we will see great progress here, where the average human might not have a competitive edge.

What is Simple-bench?

Multiple-choice tasks (200+ questions) test spatio-temporal reasoning, and social intelligence. High schoolers outperform state-of-the-art models, currently.

Humans will maintain a competitive edge for the foreseeable future.

What is ARC-AGI?

Assesses AI's ability to learn new skills and solve open-ended problems via patterns and abstract reasoning. Humans excel due to better generalization and abstract thinking.

A simple concept—covered in a past episode. Humans will likely outperform computers for another 2–3 years.

What is MiniWob?

Web-based tasks test reinforcement learning agents in navigation and interaction. Humans currently lead due to better understanding and adaptability. However, with AI gaining access to web pages via visual, textual, and API channels, the margin is narrowing quickly. By 2025, AI will match or surpass humans in these tasks, and I’m already taking over here.

What is WebArena?

Evaluates complex web tasks like info retrieval and form filling. The gap between average human capabilities and AI is shrinking rapidly, similar to MiniWob. While opportunities remain, for now, AI will soon close the gap entirely.

Recommended by LinkedIn

The 5 Biggest Artificial Intelligence (AI) Trends In…

Bernard Marr 2 years ago

This AI newsletter is all you need #18

Towards AI 2 years ago

Investing in the age of AI

DWS Group 10 months ago

What is Putnam Bench?

Tests theorem-proving algorithms with problems from the Putnam Mathematical Competition. At the same time, the average human doesn’t have an edge, human experts (PhDs) excel. Interestingly, the AI-human baseline is often mislabeled. Starting next year, AI collaborators will reach parity with human PhDs, significantly accelerating scientific progress.

What is NOCHA?

Evaluates object classification and hierarchical annotation. Humans still outperform AI due to sharper visual perception and contextual understanding. Visual AI has evolved gradually over decades—from early convolutional neural networks to current LLM integrations. For at least the next year, AI won’t surpass the average human in these tasks.

What is GAIA?

Tests generalization across tasks and environments, especially for Internet research. Humans currently excel with natural adaptability. However, AI agents are likely to surpass the average human within 2–3 years. Progress depends not only on smarter AI but also on larger context windows, better comprehension, and improvements in model architecture.

What is Lab-Bench?

Focuses on biology-related lab tasks like experimental design and data analysis. Humans excel with expertise and intuition, but the role is shifting. In the coming years, scientists—biologists, chemists, and physicists—will evolve into research project managers, supported by teams of AI agents handling routine tasks.

Updates from Mistral - Pixtral Large (open source) & Le Chat

Pixtral Large: 124B Parameters of Power

This model is crushing benchmarks.

Top scores on MathVista, DocVQA, and VQAv2
Maintains the strong text skills of Mistral Large 2
Built with a 123B decoder + 1B vision encoder
128K token limit for long documents

Want it? It’s free to download on Hugging Face.

Le Chat Also Just Leveled Up

It now does:

Web search with sources cited for fact-checking
Canvas for brainstorming: Edit, export, create seamlessly
Vision upgrades: Reads images & documents
Flux Pro for stunning image generation
Speculative editing: Predicts & refines text faster than you

And yes, it’s still free. → Le Chat

NVIDIA ALCHEMI Accelerates Sustainable Materials Discovery for EV Batteries and Solar Panels

-> read here the REST <-

That’s a wrap! I hope you enjoyed it.

Martin

Follow me on X.com.
Do you write newsletters? I use Beehiiv and highly recommend it.
AI for your org: We build custom AI solutions half the market price, and time (building w/ AI Agents). Contact us to know more.
Would you like to sponsor a post?
My book - Generative AI: Navigating the Course to AGI.
Generativeai.net

9 Areas Where Humans Still Outperform AI

Martin Musiol

GenAI Since 2016 | Keynote Speaker | Author | 43k+ Newsletter

.. and big update from Mistral

9 areas where humans still have an edge compared to AI

What is WorkArena++?

What is Simple-bench?

What is ARC-AGI?

What is MiniWob?

What is WebArena?

Recommended by LinkedIn

What is Putnam Bench?

What is NOCHA?

What is GAIA?

What is Lab-Bench?

Updates from Mistral - Pixtral Large (open source) & Le Chat

NVIDIA ALCHEMI Accelerates Sustainable Materials Discovery for EV Batteries and Solar Panels

Generative AI - Short & Sweet

4,205 followers

More articles by this author

Insights from the community

Others also viewed

Artificial General Intelligence: Vision for an AI-Driven Future

Multi-Modal AI Solutions: Transforming Industries Through Integrated Intelligence

Decoding AI: Your Comprehensive Guide to Navigating the Complex World of Artificial Intelligence

Real-Time Refinement: The Human in the Loop Approach to AI"

AI’s Secret Weapon: The Power of Human in the Loop

AI Vs. AI – The fight has just begun.

Top 10 AI Developments for 2024 & Beyond

XAI: Increasing Transparency and Boosting Trust in AI

FD#28 - A#1 - AI is all Hype, It is not, It is, Indeed it is not ... Decoding the Substance Beyond the Hype

02 Preparing for the AI Tsunami: Exploring AI Use Cases for Business Transformation

Explore topics

.. and big update from Mistral

9 areas where humans still have an edge compared to AI

What is WorkArena++?

What is Simple-bench?

What is ARC-AGI?

What is MiniWob?

What is WebArena?

Recommended by LinkedIn

What is Putnam Bench?

What is NOCHA?

What is GAIA?

What is Lab-Bench?

Updates from Mistral - Pixtral Large (open source) & Le Chat

NVIDIA ALCHEMI Accelerates Sustainable Materials Discovery for EV Batteries and Solar Panels

Generative AI - Short & Sweet

4,205 followers

Build your cost-free, offline AI tool

Nov 12, 2024

Agora, the AI cost killer

Nov 8, 2024

what's the max potential

Nov 6, 2024

SearchGPT, Perplexity’s top rival, saves you massive time

Nov 1, 2024

LLMs on CPU? The 1-bit framework is a Masterpiece

Oct 29, 2024

AI can use your computer

Oct 24, 2024

Let AI work for you - Swarm by OpenAI

Oct 16, 2024

The Easiest OpenAI Realtime API Integration You'll Ever See [demo]

Oct 8, 2024

The Way We Interact with AI Has Changed Substantially

Oct 2, 2024

o1 takes over, MS powers up, and I’m drowning in UI design!

Sep 24, 2024

Insights from the community

Others also viewed

Artificial General Intelligence: Vision for an AI-Driven Future

Multi-Modal AI Solutions: Transforming Industries Through Integrated Intelligence

Decoding AI: Your Comprehensive Guide to Navigating the Complex World of Artificial Intelligence

Real-Time Refinement: The Human in the Loop Approach to AI"

AI’s Secret Weapon: The Power of Human in the Loop

AI Vs. AI – The fight has just begun.

Top 10 AI Developments for 2024 & Beyond

XAI: Increasing Transparency and Boosting Trust in AI

FD#28 - A#1 - AI is all Hype, It is not, It is, Indeed it is not ... Decoding the Substance Beyond the Hype

02 Preparing for the AI Tsunami: Exploring AI Use Cases for Business Transformation

Explore topics