Where LLMs Fail the Most, and How to Fix it
My failed conversation with an enterprise GPT support bot

Where LLMs Fail the Most, and How to Fix it

Here I illustrate my two most recent interactions with AI-powered GPT. It was an awful failure, a lot worse than before GenAI. Indeed, I had to revert back to old Google search to get help. This is typical of what hundreds of millions of users now experience every day.

First example.

I get payments from Stripe. I asked how I can pay someone, as opposed to getting paid, as I had a contact asking me to pay him with Stripe. After 30 mins of prompts to AI support, I got nowhere. In the end I decided to pay my contact using a different platform. I could not figure out how to get a meaningful answer: see featured image.

Second example.

A VC guy I started to interact with sent me a few messages, but I never received any of them. I tried to contact my email provider, but was faced with a GenAI bot to answer the following precise question: his email address is xyz, mine is abc, his messages do not even show up in my spam box, and I did not block their domain name; how to fix this? After receiving irrelevant answers, I ask point blank: can I chat with a real human? Again, irrelevant answers, no matter how I phrase my question. In the end I told my contact to send messages to an alternate email address.

It's not like old-fashioned chat or email support from human is gone. In the end, after spending a lot of time, I was able to locate it, issue tickets and get a resolution. Too late in both cases. And if I had asked Google how to contact support or submit a ticket on Stripe or Titan, it would have saved me a lot of time. So that information must be somewhere on their respective websites. But the underlying LLMs are unaware or unable to retrieve it.

Here are my thoughts:

  • Some parts of the corpus are not fed to the underlying LLMs. There is not enough user testing. LLMs not properly configured.
  • Maybe companies cut on human support to pay for the AI replacement, making it difficult to access human support, on purpose.
  • User prompts are not analyzed. If millions of users ask the same questions and are angry about the responses they get, it should be easy to notice and address it, for instance augmenting the corpus with appropriate material. Automatically.
  • Offer two levels of AI service: for advanced users (I don't consider myself an advanced user, but let's pretend I am), and then for people who need very basic help. Advanced users are the most faithful, you don't want to lose them.

I am not criticizing standard LLMs here (I use them a lot with success) but the way it is integrated in enterprise applications. There is somewhere a bottleneck, and it lies in how they get integrated. True, there could be some real LLM flaws too, but I suspect it is not the main issue. Or maybe these companies design their own LLMs, either of poor quality or poorly brought to production.

Anyway, this is the kind of problems we want to solve with LLM 2.0, our home-made AI technology for enterprises. Not only higher accuracy at much lower cost but also making sure it gets properly integrated. Eliminating the scenario where (this is an analogy) an LLM vendor talks in Portuguese to businesspeople - the client - who talk Mandarin.

See where LLM 2.0 stands now, here.

ALBERT KWAME YANKEY

ESL/EFL/ESP TEACHER/CORPORATE TRAINER/SPEAKER/HEALTHCARE/AWS CLOUD/IoT/NATURAL LANGUAGE PROCESSING/LEAN SIX SIGMA/SCRUM/DEVOPS/MACHINE LEARNING/DEEP LEARNING/DATA SCIENCE ENGINEER//SOFTWARE DEV/ENGINEER/TECHNICAL WRITER

2mo

Great insight. Thanks a lot for sharing.

Like
Reply
Libor Ballaty

Leader| General Manager | Director | Startups | Growth | Innovation | People | Process | Delivery | Customer Success | Professional Services | Partner Enablement | Vendor Mmgmt Technology | Conservation | Sustainability

2mo

This doesn't surprise me at all. I have run into this all the time and it appears that there can't be an LLM behind many of these - because the questions I'm often asking absolutely can't be the first time they're asked - and as such someone has answered before yet nothing is getting into their system to answer the same question from other customers. Lots wrong there .... lots of help needed and lots of opportunities -

Prabhakar Kamble

"In God we trust, all others must bring data"-W.Edwards Deming | Practicing Leadership | Lead Auditor | FMEA expert | ISO | IATF | VDA | AS | ISO TS 22163

2mo

Since it learns from public code repositories, there is a risk of introducing biased or insecure code patterns, potentially leading to vulnerabilities or legal concerns regarding copyright infringement. So still human intelligence is irreplaceable ;).

Like
Reply
Thore-Bjørn Haugen

Founder of Codepage ‎| "Knowledge isn't free. You have to pay attention." (Richard Feynman)

2mo

I believe hybrid support models are indeed essential. However, a seamless escalation process could be enhanced if LLMs provide structured summaries when handing off to human agents. This would not only save time but also improve user satisfaction. Another key area is leveraging multi-modality. LLMs could dynamically analyze user input and pull in relevant support documents, creating a richer experience. On the fine-tuning side, while feedback loops are critical, enterprises could explore federated learning or on-device fine-tuning to continuously adapt models in a privacy-preserving and scalable way. This ensures domain-specific improvements without compromising user trust. Additionally, automated analysis of escalation patterns could provide deep insights into recurring failure points, enabling businesses to address gaps proactively and systematically. Lastly, building trust in AI-powered systems requires transparency. Clearly indicating when users are interacting with an AI versus a human can reduce frustration and increase confidence in the system. I'm curious to see how your LLM 2.0 approach integrates these elements to push these boundaries. Looking forward to following the journey!

Stefana Janicijevic, PhD

Mathematician, Full Stack Data Scientist

2mo

you are probably right with user testing. however, I think apple prompts could be more used than othes. I had success with apple user call center , got all neccessary info. but I believe that other companies restricted qa testing.

To view or add a comment, sign in

More articles by Vincent Granville

Insights from the community

Others also viewed

Explore topics