The Limitations of Large Language Models for Complex Reasoning

The Limitations of Large Language Models for Complex Reasoning

This is the first piece in a series examining the most effective uses of LLMs and why a more holistic approach to AI that combines formal, efficient reasoning systems with language systems like LLMs is needed to help us make better decisions.

AI’s tipping point has arrived. After decades of research, testing, and innovation, the use of artificial intelligence has spilled over from siloed advancements to mainstream adoption. It seems that everyone is tinkering with AI in some form these days, from transforming business processes to creating fun social media posts. But are we using it in the right way and for the right purposes?

From industry leaders to AI-curious individuals, many hold up Large Language Models (LLMs) as the answer to solving complex problems and making better decisions. While they are powerful tools, it's important to understand their limitations and use them effectively. Think of LLMs as skilled writers and storytellers, not as reasoning experts. They are amazing at understanding and generating human language, but they don't necessarily understand the logic behind it. While LLMs have shattered the language barrier, enabling us to interact with machines in unprecedented ways, we must not become overly reliant on them given their limitations. 

By using neuro-symbolic AI that combines LLMs with formal reasoning engines designed for reliable and precise reasoning, we can transparently deduce logical consequences from a set of rules, and deliver AI that is transparent, accountable, and capable of accurately solving our most challenging problems.

The differences between natural and formal language

At its core, LLMs employ natural language rather than formal language. When dissecting the hierarchy of processing reasoning, natural language takes a secondary role. The primary element of reasoning is formal mathematical logic or formal language — the ability to rigorously and explicitly apply clear, reliable rules of logical inference to draw conclusions and compute answers.

Humans created formal languages and formal inference systems like logic, mathematics, and computer programming languages because Natural Language (NL) is ambiguous, imprecise, and opaque. Natural Language is insufficient for reliably performing precise computation and reasoning.

The difference between a formal language and a natural language is that a sequence of statements in a formal language has a precise meaning and consistent answer, while natural language has no clear, set meaning. Formal reasoning requires transparent rules of inference for reliable conclusions at each step toward a final answer. Put plainly, it's akin to the straightforward agreement that 2+2 equals 4. There is no ambiguity with that final answer of 4.

Natural language expressions, absent a formal system, produce meanings that are subjective and lack a solid foundation. We have all experienced conversations where the people involved don’t have a shared precise understanding for the same words and phrases. And because they each assume their own interpretation, the conversation ensues and before you know it, it’s going around in circles and descends into a confusing mess of ambiguity and misunderstanding. Worse yet, we can walk away not even realizing we completely misunderstood each other. Often the fundamental disconnect doesn’t get realized until the stakes are high enough and the ideas need to be formally implemented and used.

The stakes might be low for a misunderstanding among friends, but when the stakes are high the lack of precise meaning can have disastrous consequences.

For complex reasoning problems where you cannot afford to be wrong, natural language is not the right medium. Without any underlying formalism, natural language’s ambiguity and subjectivity are great for casually navigating around into another human's brain, but not the best for ensuring shared meaning and precise, reliable outcomes. It's why we invented formal languages and reasoning systems, and those inventions enabled science, mathematics, and the technology revolution. 

Reliable decision-making processes, like any formal reasoning, require precise semantics. These entail an explicit procedure for unambiguously determining the correct and singular interpretation of any set of expressions. Formal reasoning also requires transparent rules of inference for reliably drawing and interrogating conclusions at each step toward a final answer.

Unraveling the finite nature of LLMs

LLMs are often characterized as stochastic parrots. This is not to say aspects of human language intelligence also fit the same characterization; it's more of a matter of what this type of processing is good for and when to trust it. LLMs are very effective for improving search and generating summaries or derivative content. They can also creatively generate poems, emails, or a work of art by decomposing and reconstituting statistical variations from human-authored examples. That can be inspiring, creative, and fun — kind of like using the computer to help solve complex jumbles. 

It is also fair to expect an LLM to mimic the logical patterns present in linguistic training data. But this does not mean they reliably reason independently of how the language happens to occur. For LLMs, statistical patterns in language are indeed primary, but formal reasoning is a deeply understood complex logical process that is not precisely revealed in our common usage of natural language.

Let’s remember that LLMs are prediction engines, using their training data to select the most likely next word or phrase based on how words occur in the given text. They live in the world of natural language – ambiguities, generalizations, and other nuances of human language will always be part of their operational system. Given the power, reliability, and transparency of formal reasoning systems and their implementations, is there any advantage to training an LLM to mimic the well-defined behavior of these systems using probabilistic methods over word distributions? I think not.

I like to joke that this is analogous to brushing your teeth through your ears. It's inefficient; it might get you there in some circumstances, but can never be as good as the more direct route and may undermine your objective along the way. 

Of course, no finite data set can guarantee 100% reliability. There may always be some input that it fails on, and it would be very hard to know when or why. It would be nearly impossible to diagnose the failure since the decision procedure is not transparent. Therefore, using LLMs to produce reliable computation puts us in a strange place where we might rationally say, “Well, my prompts are probably fine, but this time it just didn’t do what I intended.” In programming science, this is like saying, “Not my fault, my program is correct, the answer is just wrong this time because the compiler randomly misinterpreted my code — sometimes it just doesn’t understand what I was trying to do.” 

If we continue to over-rely on LLMs, the future of life will be in the hands of unpredictable and inaccurate machines in crucial sectors like healthcare, banking, and transportation. In those sectors, we need more reliable AI that goes beyond the use of natural language. We must acknowledge and combine different forms of reasoning to help us make better, more personal, more transparent, and more caring decisions.

AI beyond mimicry

LLMs alone are unreliable in solving complex problems when you can’t afford to be wrong, but they can be a very important part of the process of creating reliable answers. LLMs are a brilliant and incredibly powerful tool for translating natural language into formal languages. This superpower makes them the perfect tool to bridge the gap between human intuition and the formal reasoning engine at the heart of reliable AI. We need more holistic AI that is transparent, accountable, and accurate.

In the next pieces in the series, I will dive deeper into the intricacies of language, examine the overuse of LLMs, and explore how we leverage LLMs alongside formal reasoning engines to develop reliable AI.

Dinakar R.

CloudIDSS for Value Based Transformations

10mo

/LLMs are not reasoning experts, and don't necessarily understand the logic behind it, which is similar to expecting linguistic experts with tons of science & math manuals to solve problems using logic, math and domain expertise. /It is a major challenge to deliver at the intersection of truth, transparency, subjectivity for businesses and humanity Key focus areas: (Click to view)

  • No alternative text description for this image

That is a fantastic article. When they choose the term “natural language”: it feels like “organically driven” vs logic driven.

Like
Reply
Robert Grossman

Professor at the University of Chicago and Entrepreneur

11mo

Great article! Thanks for reminding us of the importance of thinking about the differences between natural languages (the domain of LLM) and formal languages (the domain of formal systems that reason and perform other tasks). 

Praveen Paritosh

Strategic Advisor @ MLCommons | Benchmarking and Data @ Google AI | Knowledge Graph @ Google | Integrating Knowledge into Large Language Models

11mo

Great article, looking forward to the rest. Here's something overlapping we wrote, also related to Tom Dietterich's proposal. https://system2.ai/p/system-2-is-what-we-need

Like
Reply
Robert Schrag

Full-stack probabilistic/logical knowledge representation and reasoning research scientist

11mo

Any thoughts about how best to exploit the logical capabilities LLMs do exhibit? I've been consistently impressed as I've poked around ChatGPT 3.5. Consider the following (e.g.). ---- Me: Assuming "vertebrates are mortal" and "foobazzes are mortal", does it follow that "foobazzes are vertebrates"? ChatGPT: No, it does not necessarily follow that "foobazzes are vertebrates" based on the statements "vertebrates are mortal" and "foobazzes are mortal." The mortality of vertebrates and foobazzes is a common characteristic, but it does not provide information about the specific classification or taxonomy of foobazzes. The mortality statement only indicates that both vertebrates and foobazzes share the trait of being mortal (subject to death). It doesn't imply a direct relationship in terms of classification or biological characteristics such as having a vertebrate structure. In logical terms, the given statements do not establish a direct relationship between foobazzes and vertebrates based on mortality alone. The conclusion that "foobazzes are vertebrates" would be an unwarranted assumption without additional information about the characteristics of foobazzes.

Like
Reply

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics