Where LLMs Fail the Most, and How to Fix it

Vincent Granville

AI/LLM Disruptive Leader | Co-Founder

Published Dec 13, 2024

Here I illustrate my two most recent interactions with AI-powered GPT. It was an awful failure, a lot worse than before GenAI. Indeed, I had to revert back to old Google search to get help. This is typical of what hundreds of millions of users now experience every day.

First example.

I get payments from Stripe. I asked how I can pay someone, as opposed to getting paid, as I had a contact asking me to pay him with Stripe. After 30 mins of prompts to AI support, I got nowhere. In the end I decided to pay my contact using a different platform. I could not figure out how to get a meaningful answer: see featured image.

Second example.

A VC guy I started to interact with sent me a few messages, but I never received any of them. I tried to contact my email provider, but was faced with a GenAI bot to answer the following precise question: his email address is xyz, mine is abc, his messages do not even show up in my spam box, and I did not block their domain name; how to fix this? After receiving irrelevant answers, I ask point blank: can I chat with a real human? Again, irrelevant answers, no matter how I phrase my question. In the end I told my contact to send messages to an alternate email address.

It's not like old-fashioned chat or email support from human is gone. In the end, after spending a lot of time, I was able to locate it, issue tickets and get a resolution. Too late in both cases. And if I had asked Google how to contact support or submit a ticket on Stripe or Titan, it would have saved me a lot of time. So that information must be somewhere on their respective websites. But the underlying LLMs are unaware or unable to retrieve it.

Recommended by LinkedIn

The AI Corner

Niural 1 year ago

Netcompany Snippets #1

Netcompany 1 year ago

MSBuild - Microsoft has been BUILDING (and a bunch…

Cory Warfield 9 months ago

Here are my thoughts:

Some parts of the corpus are not fed to the underlying LLMs. There is not enough user testing. LLMs not properly configured.
Maybe companies cut on human support to pay for the AI replacement, making it difficult to access human support, on purpose.
User prompts are not analyzed. If millions of users ask the same questions and are angry about the responses they get, it should be easy to notice and address it, for instance augmenting the corpus with appropriate material. Automatically.
Offer two levels of AI service: for advanced users (I don't consider myself an advanced user, but let's pretend I am), and then for people who need very basic help. Advanced users are the most faithful, you don't want to lose them.

I am not criticizing standard LLMs here (I use them a lot with success) but the way it is integrated in enterprise applications. There is somewhere a bottleneck, and it lies in how they get integrated. True, there could be some real LLM flaws too, but I suspect it is not the main issue. Or maybe these companies design their own LLMs, either of poor quality or poorly brought to production.

Anyway, this is the kind of problems we want to solve with LLM 2.0, our home-made AI technology for enterprises. Not only higher accuracy at much lower cost but also making sure it gets properly integrated. Eliminating the scenario where (this is an analogy) an LLM vendor talks in Portuguese to businesspeople - the client - who talk Mandarin.

See where LLM 2.0 stands now, here.

GenAI and Machine Learning

210,513 followers

+ Subscribe

ALBERT KWAME YANKEY

ESL/EFL/ESP TEACHER/CORPORATE TRAINER/SPEAKER/HEALTHCARE/AWS CLOUD/IoT/NATURAL LANGUAGE PROCESSING/LEAN SIX SIGMA/SCRUM/DEVOPS/MACHINE LEARNING/DEEP LEARNING/DATA SCIENCE ENGINEER//SOFTWARE DEV/ENGINEER/TECHNICAL WRITER

2mo

Great insight. Thanks a lot for sharing.

Libor Ballaty

2mo

This doesn't surprise me at all. I have run into this all the time and it appears that there can't be an LLM behind many of these - because the questions I'm often asking absolutely can't be the first time they're asked - and as such someone has answered before yet nothing is getting into their system to answer the same question from other customers. Lots wrong there .... lots of help needed and lots of opportunities -

1 Reaction

Prabhakar Kamble

2mo

Since it learns from public code repositories, there is a risk of introducing biased or insecure code patterns, potentially leading to vulnerabilities or legal concerns regarding copyright infringement. So still human intelligence is irreplaceable ;).

Thore-Bjørn Haugen

Founder of Codepage ‎| "Knowledge isn't free. You have to pay attention." (Richard Feynman)

2mo

I believe hybrid support models are indeed essential. However, a seamless escalation process could be enhanced if LLMs provide structured summaries when handing off to human agents. This would not only save time but also improve user satisfaction. Another key area is leveraging multi-modality. LLMs could dynamically analyze user input and pull in relevant support documents, creating a richer experience. On the fine-tuning side, while feedback loops are critical, enterprises could explore federated learning or on-device fine-tuning to continuously adapt models in a privacy-preserving and scalable way. This ensures domain-specific improvements without compromising user trust. Additionally, automated analysis of escalation patterns could provide deep insights into recurring failure points, enabling businesses to address gaps proactively and systematically. Lastly, building trust in AI-powered systems requires transparency. Clearly indicating when users are interacting with an AI versus a human can reduce frustration and increase confidence in the system. I'm curious to see how your LLM 2.0 approach integrates these elements to push these boundaries. Looking forward to following the journey!

3 Reactions

Stefana Janicijevic, PhD

Mathematician, Full Stack Data Scientist

2mo

you are probably right with user testing. however, I think apple prompts could be more used than othes. I had success with apple user call center , got all neccessary info. but I believe that other companies restricted qa testing.

1 Reaction

See more comments

To view or add a comment, sign in

Where LLMs Fail the Most, and How to Fix it

Vincent Granville

AI/LLM Disruptive Leader | Co-Founder

Recommended by LinkedIn

GenAI and Machine Learning

210,513 followers

More articles by Vincent Granville

Insights from the community

Others also viewed

Tech Giants Race: SpaceX Shoots for the Stars, Amazon's Grocery, and Google's Gemini Stuns GPT-4

DigitalStormWeekly#6

AI In The Public Sector: Potential and Pitfalls

🎶 "Smooth Operator: The AI Playlist" 🎶

🤖 Google's Gemini 2.0 kicks off the "agentic era"

The latest on how AI is changing the business landscape

New Gemini AI updates dominates Google I/O

AI Warzone: The Ruthless Game Theory Driving Google, Meta, Microsoft, and Amazon

AI Prompts as PRDs : Why Prompts Will Become Important IP Assets

Embracing RAG in Financial Services: A New Frontier in AI Technologies

Explore topics

Recommended by LinkedIn

GenAI and Machine Learning

210,513 followers

More articles by Vincent Granville

LLM Challenge with Petabytes of Data to Prove Famous Number Theory Conjecture

Invitation to Attend the Top AI Conference of the Year: NVIDIA GTC 2025

Spectacular Connection Between LLMs, Quantum Systems, and Number Theory

How to Improve RAG / LLM Accuracy & Resilience with Change Data Capture

Using AI to Solve the Deepest Math Conjecture

10 Great AI, LLM & GenAI Courses and Certifications to Boost your Career

Piercing the Deepest Mathematical Mystery

9 Tips to Design Hallucination-Free RAG/LLM Systems

LLM 2.0, RAG & Non-Standard Gen AI on GitHub

NVIDIA GenAI & LLM Courses and Certifications, from Beginner to Advanced

Insights from the community

Others also viewed

Tech Giants Race: SpaceX Shoots for the Stars, Amazon's Grocery, and Google's Gemini Stuns GPT-4

DigitalStormWeekly#6

AI In The Public Sector: Potential and Pitfalls

🎶 "Smooth Operator: The AI Playlist" 🎶

🤖 Google's Gemini 2.0 kicks off the "agentic era"

The latest on how AI is changing the business landscape

New Gemini AI updates dominates Google I/O

AI Warzone: The Ruthless Game Theory Driving Google, Meta, Microsoft, and Amazon

AI Prompts as PRDs : Why Prompts Will Become Important IP Assets

Embracing RAG in Financial Services: A New Frontier in AI Technologies

Explore topics