The full stop problem: RAG’s biggest limitation

The full stop problem: RAG’s biggest limitation

RAG (Retrieval Augmented Generation) systems are quickly becoming the go-to solution for those looking to leverage AI in business. But RAG systems have some inherent limitations that negatively impact user experience and prevent scale. One limitation, perhaps the biggest, is what I call ‘the full stop problem’.

In this article, I’ll explain what that is, and give you some ideas for how to incorporate RAG into a more holistic solution that provides a great customer experience and enables you to scale without limitations.

What are RAG systems good for?

RAG systems are good for information searching. That’s the thing that makes RAG a key revelation for chatbots. Before, you’d have to build an NLU model trained on all of the different ways users could ask all of their various questions. Now, semantic search based on vectorised data cuts out the need for these NLU models and does a much better job at matching a user input with relevant content.

Those deploying conversational interfaces for the first time, though, tend to think RAG is all they need. For them, there are unknown unknowns related to RAG’s limitations. Here is where the problem begins.


Bill O’Neill and I preparing to bring the future.

Agentic AI is creating quite the buzz. But what is it? How does it stand apart from general generative AI? Why might it be a game-changer for your business? And, most importantly, how do you design, build, and scale AI agents?

In this 45-minute webinar, you’ll learn how to:

  • Plan and scope an Agentic AI project
  • Design and build an on-brand AI agent
  • Test and refine performance with real-time LLM observability
  • Overcome the most common challenges related to reliability, security, and privacy

On December 18th, I’ll be joined by Quiq Co-founder and SVP of Product & Engineering Bill O’Neill as we uncover the invaluable lessons he’s learned from deploying dozens of AI agents over the past 18 months. This is your chance to cut through the noise and gain actionable insights to harness the true potential of Agentic AI. Don’t miss it!

Register now


Why RAG can’t solve all user needs

See, many organisations that are deploying RAG chatbots believe that these things will answer all of their user’s questions. The problem is that users don’t just have questions. They don’t always want to know something. Sometimes, they want to do something.

It’s this doing of something that is the area of AI agents and automation. The reality is that many businesses for decades have been automating the doing of something’s with chatbot platforms based on deterministic business logic. Apparently, according to Google, these deterministic automations now qualify as ‘AI agents’ too.

Whatever your technology and whether it’s rule-based or agentic, the fact remains that finding answers to questions isn’t the only need your users have, therefore purely RAG-based systems have a grey area they can’t deal with.

The grey area between RAG and AI agents/process automation

So, if RAG systems know things and AI agents do things, how can the two work together? The reality is that if you’re deploying a RAG chatbot today, you’re not going to have any agents or automations that are doing things on day 1. Many businesses don’t even realise they need them.

Even if you do realise you need them, and you’ve built them, how can you understand where the RAG should end and the agent or automation should take over? This grey area between the two is where the full stop problem lies.

The full stop problem

The full stop problem is the tendency of RAG systems to end every response with a full stop (or a ‘period’, if you’re on the other side of the pond). And they end with a full stop, no matter what; whether the user is asking a question or completing a task.

A digression on turn-taking

In conversation design, there’s the concept of turn-taking. Each party in the conversation takes turns throughout to achieve an end result. The key to turn-taking is inviting the next turn. RAG systems almost always fail at turn-taking.

This is because they don’t know the difference between the type of query, the context and subtext of the query or what stage of the journey or lifecycle the customer is in. Instead, all user inputs have to be responded to based on the content it has access to. For RAG, everything is a question that needs an answer.

Inability to recognise query type

I’ll give you an example. Here’s a little test. Let’s imagine I have a telco RAG chatbot that has the following exchange with a user:

User: “I’m moving house and need to update my address”

Chatbot: “That’s not a problem! To update your address, simply log into your account and head to My Details. You can update your address there.”

Do you see any issue with that exchange?

The user is telling the chatbot that they want to do something, but the chatbot is acting like they want to know something. It’s treating the query like a question. A question with an answer that ends in a full stop.

But this is what RAG systems do. They’re search solutions that search content based on the user input. So when all you have is a hammer, everything looks like a nail.

Solving the query type problem

When a user wants to do something, this need for action should impact turn-taking and turn full stops into invitations. For example, a better response to the above exchange would be:

User: “I’m moving house and need to update my address”

Chatbot: “That’s not a problem! I can do that for you. You just need to be logged in first. Let’s start with that, shall we?”

This seems fairly straightforward, but this isn’t just about turn-taking. This isn’t about making the chatbot have a better response. It’s about understanding the difference between knowledge and processes and having capabilities to deliver both.

You have to be able to understand what’s an action and what’s a question. You also then need to understand the best way you can enable users to take the action once it’s understood. Is it something the chatbot can or should do? Is it something that needs to be done on a webpage, in the app, logged in or over the phone? For the above example user query to result in that response, you need to be able to deliver against that use case, and that requires more than RAG.

Architecturally, what needs to happen here, is that there needs to be a separation between search and action. Typically, this will include having a classification model sitting before the RAG solution that determines whether the query is a question or an action. If it’s an action, route to the appropriate automation or solution, for anything else, search.

Here’s an example of what I mean.

How an AI chatbot should differentiate between a query and an action.

Even if you don’t have the capability to fulfil the action right away, you should at least understand the difference so that you can point your user in the right direction.

So this is the first symptom of the full stop problem, the second is the inability to recognise subtext.

Inability to recognise subtext

Let’s have another example to illustrate the point:

User: “How do I upgrade my phone contract?”

Chatbot: “To upgrade your contract, it’s simple. All you need to do is pay off your remaining contract, select a new pricing plan based on the data and minutes you need and then take out another contract.”

Did you get what’s wrong here?

You might think that this is the exact same issue as above, but it isn’t. There’s a difference. This user query was definitely a question, and the user is expecting an answer. But their query has subtext.

The user wasn’t explicitly asking to carry out the action of upgrading their contract, but that’s the underlying need they have. That’s the subtext. That’s what’s next. They wouldn't be asking the question if they weren’t at least interested in upgrading.

For a human, it’s blindingly obvious that this conversation is a contract upgrade conversation. An action. For a RAG system, it’s just another search, summary and full stop.

Solving the subtext problem

To solve this problem, you need to fully understand your world and how each piece of content in your RAG solution relates to which product, service or journey you offer.

You need to have a customer intent taxonomy that lists all of the needs your customers have. Each of these intents needs to have a preferred channel of resolution documented and all associated content needs to be tagged to reflect this.

For example, in the above telco example, maybe you don’t have an AI agent that can handle upgrades. Maybe that’s best done by speaking with a live human agent. Here, your intent taxonomy will have ‘Live Chat’ as your preferred channel. This means that anytime a conversation happens in the chatbot about upgrades, the chatbot is going to suggest that the user should speak to someone and offer to connect them.

To get to this, you need to understand that the query about contract upgrades is related to the action of upgrading contracts.

Tagging content to reflect related actions

This is a challenge and I haven’t seen this done successfully in production yet. However, some ideas on how you might experiment with this include tagging content with the customer intent so that, in your RAG system, if content related to a specific intent is served, you can suggest the preferred next step to the user, based on your intent taxonomy.

Again, to do this, you’re going to need more than RAG. In this example, the whole agent escalation capability (or whatever your preferred channel of resolution is), and the handling of the dialogue surrounding that intervention, will need to be managed in some kind of state machine.

The need to manage state

A state machine is something that manages what’s happening in the conversation and all context related to what has happened previously and so on. Think of it as short term memory for chatbots.

For example, if you’ve asked the user whether they want to speak with someone, and they say ‘no’, you don’t want to ask them again after their next query. That would get very annoying very quickly. Instead, your state machine will tag the conversation and log that live chat has been offered, and will prevent offering it again for X number of turns.

In summary, for RAG to be useful, you have to be able to recognise when it’s not required, and you have to be able to move the conversation forwards towards outcomes for users. To do that, your content needs to be organised in a way that reflects the needs of your users, and tagged with a relationships to a product, service or journey and reflective of your intent taxonomy. It also need to be running underneath a state machine of some kind so that you can manage the context of the conversation, which will enable you to actually turn a question into an action.

How content should maintain relationships with actions in AI chatbots

Lack of journey awareness

The final symptom of the full stop problem is a lack of awareness about the customer journey and lifecycle. This means that, often, conversations wind up going nowhere and opportunities are missed. To illustrate this, you need to understand the nature of user-led vs agent-led conversations.

User led vs AI led conversations

RAG systems, for example, are typically user-led. That is to say that the user is in control of the conversation and is initiating and guiding the exchange. For example:

User: “Can I add another person to my insurance policy?”

AI: “Sure, you can add another person. All you need to do is amend your policy in the Settings section in your account.”

User: “How long will it take to add them on?”

AI: “It’s usually pretty quick. No longer than a few minutes.”

You can see by this exchange that the conversation is being pushed forwards by the user. They’re in control. The RAG responses don’t invite the next turn. They end in full stops.

Whereas, most rule based automations are system-led. That is to say that the system is guiding the conversation and prompting the user for the next step. For example:

AI: “To change your address, I’ll need to take some information from you. What address are you moving from?”

User: “It’s 23 Winston Walk”

AI: “What’s the post code?”

User: “LS17 4TK”

AI: “And where are you moving to?”

And so on.

Exploration vs action

The difference between these two types of conversations is that one is exploratory and another is action-based.

Exploratory conversations are typically information-based and take the form of user-initiated questions. These explorations are typically user-led because they’re guided by the user’s curiosity. This is where RAG is useful.

Action-based conversations are task and results-oriented and take the form of system-initiated prompts. Action-based conversations are system-led and are guided by a process.

Let’s say you’re a utilities company and you’re receiving enquiries about what heat pumps are and how they work; that’s exploratory. Booking an appointment to have one fitted is an action. Yet both types of conversations are part of the same customer journey. They might happen over multiple conversations, or they might happen in the same conversation. The customer initially wants to learn as they’re in the awareness and consideration phase of the lifecycle. But that same customer wants to act when they’re in the purchasing phase.

The limitations of RAG in journey and lifecycle awareness

The key stumbling block of RAG systems today is that they fail to realise when a user-led conversation ought to turn into an agent-led conversation. When exploration should turn to action. This is because they lack awareness about the customer journey as it relates to your products and services, and they lack awareness about the customer lifecycle.

Again, you might think that this is the same as not understanding subtext, but it isn’t. Subtext is predicated on the user knowing what they want, but not directly asking for it. In this instance, the user isn’t aware of what they want or need. They’re exploring a topic and learning. It’s up to the system to figure out what they need and to suggest it to them.

Initially, users will be asking top of the funnel type questions, such as ‘what are heat pumps’ and ‘how do they work’. But eventually, they’ll be asking bottom of the funnel questions like ‘what sized heat pump will I need to generate X amount of heat in a house of Y size’.

Still at this point, the user doesn’t know the specific service they need from you, but a human would know when to interject and say ‘would you like to book an appointment and get some advice?

A human would know when to switch from user-led to system-led. A human can investigate user needs and recommend the thing that users should do next. RAG systems can’t.

If you can take control of the conversation, it means that you understand the user need sufficiently to match it with an appropriate next step. Your goal is then to have the user take that next step.

RAG systems don’t do next steps. They don’t do investigation. They don’t uncover needs. They don’t help users understand what they need. They don’t have goals. They simply search and summarise.

Solving the lack of journey awareness problem

Solving for this, quite frankly, is a challenge that I’m not convinced many (if any) businesses have even thought of. I certainly haven’t seen it.

It requires total synergy between user needs and your customer intent taxonomy, the customer lifecycle, the specifics of each customer journey, the content that supports each of these, the products and services you sell and the capabilities available on each of your customer touch points. much more than search and much more than a state machine.

The best idea I have for this right now is some kind of knowledge graph or ontology that maps the relationships between each of those things and uses a ‘master’ agent of some kind to traverse the space and serve the relevant content or services, as and when required, based on the user’s intent and context, and the agent’s understanding of timing.

Visualising the need for relationships between content, processes, customer journey and lifecycle stage.

In summary

In summary, while RAG systems have revolutionised conversational AI with their ability to retrieve and summarise information, they face critical limitations that impact their ability to provide holistic customer experiences and enable true scalability. These limitations are encapsulated in the “full stop problem,” which manifests in three main ways: inability to distinguish between questions and actions, failure to recognise subtext, and lack of awareness of customer journeys.

There are a few companies out there that I imagine have been thinking about this, so I’m curious to see how the likes of Rasa, Parlant, Quiq, Voiceflow, Lang Graph or others are solving this problem. Feel free to reach out or comment with any thoughts you might have.


About Kane Simms

Kane Simms is the front door to the world of AI-powered customer experience, helping business leaders and teams understand why AI technologies are revolutionising the way businesses operate.

He's a Harvard Business Review-published thought-leader, a LinkedIn 'Top Voice' for both Artificial Intelligence and Customer Experience, who helps executives formulate the future of customer experience ad business automation strategies.

His consultancy, VUX World, helps businesses formulate business improvement strategies, through designing, building and implementing revolutionary products and services built on AI technologies.

  1. Subscribe to VUX WORLD newsletter
  2. Listen to the VUX World podcast on Apple, Spotify or wherever you get your podcasts
  3. Take our free conversational AI maturity assessment


Mike Myer

CEO & Founder at Quiq

3w

Though provoking stuff Kane Simms! The "Full Stop Problem" is unquestionably a problem with "simple RAG". However, I wouldn't throw the RAG pattern completely under the bus. The RAG concept works with a better implementation. As you've called out, before blindly querying a knowledgebase on every turn, the current state and intent needs to be determined to inform the subsequent RAG process. In some cases the source for the retrieval might need to be changed (e.g. account lookup instead of knowledge lookup) and other cases, RAG might be bypassed completely (e.g. more information is needed or an action is to be performed). For me RAG means using the LLM for understanding and response generation, but the middle answering part is not using an LLM's training data. Maybe this more capable RAG pattern is "Agentic RAG"?? (as if we need another agentic usage atm!)

Mark Jones

Innovation Lead at EBM

3w

Interesting insights, Kane Simms! At ebm, we’ve been enhancing our chatbots by combining an LLM-based classifier with fallback RAG responses. This approach allows us to deliver a balanced mix of leveraging conversational context, steered dialogues (ideal for deterministic responses, like your contract example), automation, and, when suitable, RAG-driven informational replies. I completely agree that relying solely on RAG creates a bot that’s purely informational. What we’re seeing is that blending task fulfilment flows with RAG offers a compelling solution that resonates strongly with both our customers and prospects.

Favour Ned

Business Development Executive | Growth Advisor | Expanding Horizons

3w

The Full Stop Problem raises some intriguing challenges in AI interaction! 🤔 How do you see advancements in natural language processing addressing these limitations? On a different note, I’d be happy to connect and exchange insights!

Like
Reply
Jim Martin

Founder & Managing Director at Align Digital

3w

Excellent article!! 👌

Like
Reply

To view or add a comment, sign in

More articles by Kane Simms

Insights from the community

Others also viewed

Explore topics