Where LLMs Fail the Most, and How to Fix it
Here I illustrate my two most recent interactions with AI-powered GPT. It was an awful failure, a lot worse than before GenAI. Indeed, I had to revert back to old Google search to get help. This is typical of what hundreds of millions of users now experience every day.
First example.
I get payments from Stripe. I asked how I can pay someone, as opposed to getting paid, as I had a contact asking me to pay him with Stripe. After 30 mins of prompts to AI support, I got nowhere. In the end I decided to pay my contact using a different platform. I could not figure out how to get a meaningful answer: see featured image.
Second example.
A VC guy I started to interact with sent me a few messages, but I never received any of them. I tried to contact my email provider, but was faced with a GenAI bot to answer the following precise question: his email address is xyz, mine is abc, his messages do not even show up in my spam box, and I did not block their domain name; how to fix this? After receiving irrelevant answers, I ask point blank: can I chat with a real human? Again, irrelevant answers, no matter how I phrase my question. In the end I told my contact to send messages to an alternate email address.
It's not like old-fashioned chat or email support from human is gone. In the end, after spending a lot of time, I was able to locate it, issue tickets and get a resolution. Too late in both cases. And if I had asked Google how to contact support or submit a ticket on Stripe or Titan, it would have saved me a lot of time. So that information must be somewhere on their respective websites. But the underlying LLMs are unaware or unable to retrieve it.
Recommended by LinkedIn
Here are my thoughts:
I am not criticizing standard LLMs here (I use them a lot with success) but the way it is integrated in enterprise applications. There is somewhere a bottleneck, and it lies in how they get integrated. True, there could be some real LLM flaws too, but I suspect it is not the main issue. Or maybe these companies design their own LLMs, either of poor quality or poorly brought to production.
Anyway, this is the kind of problems we want to solve with LLM 2.0, our home-made AI technology for enterprises. Not only higher accuracy at much lower cost but also making sure it gets properly integrated. Eliminating the scenario where (this is an analogy) an LLM vendor talks in Portuguese to businesspeople - the client - who talk Mandarin.
See where LLM 2.0 stands now, here.
ESL/EFL/ESP TEACHER/CORPORATE TRAINER/SPEAKER/HEALTHCARE/AWS CLOUD/IoT/NATURAL LANGUAGE PROCESSING/LEAN SIX SIGMA/SCRUM/DEVOPS/MACHINE LEARNING/DEEP LEARNING/DATA SCIENCE ENGINEER//SOFTWARE DEV/ENGINEER/TECHNICAL WRITER
2moGreat insight. Thanks a lot for sharing.
Leader| General Manager | Director | Startups | Growth | Innovation | People | Process | Delivery | Customer Success | Professional Services | Partner Enablement | Vendor Mmgmt Technology | Conservation | Sustainability
2moThis doesn't surprise me at all. I have run into this all the time and it appears that there can't be an LLM behind many of these - because the questions I'm often asking absolutely can't be the first time they're asked - and as such someone has answered before yet nothing is getting into their system to answer the same question from other customers. Lots wrong there .... lots of help needed and lots of opportunities -
"In God we trust, all others must bring data"-W.Edwards Deming | Practicing Leadership | Lead Auditor | FMEA expert | ISO | IATF | VDA | AS | ISO TS 22163
2moSince it learns from public code repositories, there is a risk of introducing biased or insecure code patterns, potentially leading to vulnerabilities or legal concerns regarding copyright infringement. So still human intelligence is irreplaceable ;).
Founder of Codepage | "Knowledge isn't free. You have to pay attention." (Richard Feynman)
2moI believe hybrid support models are indeed essential. However, a seamless escalation process could be enhanced if LLMs provide structured summaries when handing off to human agents. This would not only save time but also improve user satisfaction. Another key area is leveraging multi-modality. LLMs could dynamically analyze user input and pull in relevant support documents, creating a richer experience. On the fine-tuning side, while feedback loops are critical, enterprises could explore federated learning or on-device fine-tuning to continuously adapt models in a privacy-preserving and scalable way. This ensures domain-specific improvements without compromising user trust. Additionally, automated analysis of escalation patterns could provide deep insights into recurring failure points, enabling businesses to address gaps proactively and systematically. Lastly, building trust in AI-powered systems requires transparency. Clearly indicating when users are interacting with an AI versus a human can reduce frustration and increase confidence in the system. I'm curious to see how your LLM 2.0 approach integrates these elements to push these boundaries. Looking forward to following the journey!
Mathematician, Full Stack Data Scientist
2moyou are probably right with user testing. however, I think apple prompts could be more used than othes. I had success with apple user call center , got all neccessary info. but I believe that other companies restricted qa testing.