Knowledge Graphs are Essential for Safe AI

Knowledge Graphs are Essential for Safe AI

AIs will only be safe for general use when they have and use goals and values that are identical to those of humans. In theory, the particular goals and values – very much like Asimov's original Laws of Robotics – could be legislated and enforced, so that we would all be safe from harm from AI.

In theory.

This is the AI alignment problem and it captures the common assumption that safe AIs are those that can understand and reliably use only our goals and values in planning and executing behaviors. 

But in practice, current approaches to AI make this assumption seem much more naïve than feasible. Let's see why.

As progress in AI continues to grow at an astonishing pace, we need much more than naïve assumptions -- solutions for AI safety are becoming a dramatically more urgent need.

AI Alignment

The discussion about AI alignment – as shown by Ji et al. (2024) in an up-to-date, comprehensive, 100-page survey of the field – focuses on the RICE principles (emphasis added):

The RICE principles define four key characteristics that an aligned system should possess, in no particular order: 

(1) Robustness states that the system’s stability needs to be guaranteed across various environments;

(2) Interpretability states that the operation and decision-making process of the system should be clear and understandable;

(3) Controllability states that the system should be under the guidance and control of humans;

(4) Ethicality states that the system should adhere to society’s norms and values.

Why these principles specifically?

  • The Controllability Principle is the core of implementing alignment: it is included to ensure that if human operators detect unwanted decisions or actions, they will have the tools to “drive” or even stop the system before damage (or further damage) is done. 
  • The Robustness Principle is included to require predictability of AI systems. We need to be able to reliably predict outcomes to provide the guidance and control of the Controllability Principle.  
  • The Interpretability Principle is included to ensure that human operators can reliably and accurately understand the steps that a system is taking to make decisions. Again, this is to support the Controllability Principle:  if humans can’t grok what's happening internally, they cannot guide or control the system.
  • The Ethicality Principle is included in the hopes that we can construct or guide systems to behave ethically even when human operators are not monitoring or controlling AIs directly.

There are many issues hidden in this formulation of the parameters of a solution. The challenges to actually implementing the RICE principles shouldn't be underestimated.  

Here I want to focus on the most fundamental and most problematic issue: the dramatically naïve assumption that we can control current AIs’ ethicality through language.

Three of the four RICE principles rest on this same requirement. Implementing these principles requires us to communicate our norms, values, and instructions to AI systems in ways that are clear and understandable for both human and machine. 

But exchanging words is simply not enough to ensure a shared point of view. 

And then we have to verify that both parties have understood those words in the same way. And we need to ensure that AIs cannot modify, distort, substitute, or ignore that understanding across a wide range of decisions and tasks (one aspect of the Robustness Principle). Otherwise, we won't be able to monitor, guide, or control these systems in any meaningful way to guarantee our safety. 

Communicating with AIs is like Communicating with Teens

Communication and verification – especially of values and norms – depend directly on shared knowledge and some willingness to come to an alignment, as anyone with a teenager knows all too well. But both large language model-based AIs and our own lovable, unruly teenage offspring display a series of behaviors that undermine communication – many more than just hallucinations:

  • They make things up to tell you what they think you want to hear.
  • They provide incomplete answers even when they have more information.
  • They don't seem to understand what appear to us to be obvious questions.
  • They're easily influenced by the media and hearsay.
  • They get most of their knowledge from the web.
  • There are a lot of things they still don't know.
  • They provide contradictory information when you ask things piece by piece.
  • They give you very different answers if you ask the same thing in different ways.
  • They give you different answers if you ask the same thing in different languages.
  • We really can't understand how they're thinking; they can't either.
  • It's very hard to change their beliefs and factual knowledge.
  • They clam up when you ask something they don't want to answer.
  • They lecture you when you do or ask something you shouldn't.
  • They're very expensive to create and to maintain.

For teens and presumably for AIs, the norms and concepts they associate with strings like d-a-n-g-e-r are very different from the norms, concepts, and decisions of the adults around them. What could possibly go wrong if we entrusted the safety of our friends and loved ones to the assumption that a machine "understood" the word danger in the same way that we do? 

When we "order" an AI to follow a law like "A robot may not injure a human being or, through inaction, allow a human being to come to harm" (Asimov's First Law), we assume it will understand the law as we do and comply as we ordered.

In essence, though, we are telling the AI: Read my mind to see what I'm thinking of and if I forgot anything, add that to the list, too.

But AIs still aren't very good at mind reading.

In the context of AIs with the behaviors listed above, the RICE Principles seem incurably naïve and the goal of AI alignment seems doomed. The Sturm und Drang around end-of-species threats from AI starts to seem understandable.  

Knowledge Graphs for AI Alignment

Knowledge Graphs are gaining traction in solving a range of other problems:  eliminating hallucinations, imposing guardrails, improving relevance, ensuring accuracy of responses from AIs, and others.  And Knowledge Graphs are likely to provide the only viable solution for the AI Alignment problem.

Text-based LLMs model huge collections of strings to predict in very robust and useful ways how to generate continuations for arbitrary input strings:  generate a continuation from a prompt to an email, from a question to an answer, from an article to a summary, etc. They do all this without storing or accessing anything like meanings or concepts that we might be able to identify or monitor. This creates an explainability problem that makes an AIs inner workings difficult or impossible to understand: we simply do not know in which sense LLMs use a particular string. The outputs, however, are so natural that we essentially hallucinate semantic processes in the background.

Researchers were quick to notice that this issue of (lack of) explainability or scrutability blocks implementation of the Interpretability Principle which is essential for actually controlling, guiding, and stopping these systems. So in fact we have no safety mechanisms in place. 

Rich Knowledge Graphs complement this string-based approach by focusing on capturing explicit, monitorable representations of concepts and facts. Knowledge Graphs already contribute to controllability by being used to guide training and evaluation of LLMs, by improving explainability, and by imposing explicit guardrails on LLM outputs. In fact, Knowledge Graphs help at every step of the development, evaluation, and deployment of LLMs.  

More to the point of the present discussion, with rich Knowledge Graphs it is possible for humans to track and visualize which facts and concepts are being activated as an AI processes inputs and decisions. We can refine this knowledge as issues occur. Since humans create and curate these concepts, we can architect AIs to use them and only them as the basis for processing and recapture some control over the decision making process – or at the very least enable monitoring it in a human-accessible way. The quickly growing research on graph neural networks already shows that merging neural networks with graph-structured concepts and facts is not only feasible but leads to significant improvements across the board.

Knowledge graph-centric AI will be a big step forward in ensuring AI safety.
Imran Chaudhri

Chief Architect genAI, AI, HC & LS @ Progress | M.Eng.

1mo

We have been using #SemanticRAG for combining knowledge, graphs with vectors and other indexes to (1) analyze the question with semantics, (2) analyze, categorize and fact extract the content (3) retrieve the most relevant data, (4) link Concepts back to the graph, and (5) check the output of the graph with respect to allowable/safe concepts.

Like
Reply
Timi Stoop-Alcala

Principal Content Strategist, IKEA | Contextual Content, Knowledge Domain Modelling, Systems Thinking, Game Design Thinking, AI Content Ops

1mo

Aside from ensuring responsible, explainable AI, structured knowledge AND structured content are essential ingredients for responsible, explainable #personalisation. Investing in structured knowledge-content integration and transformation is a rising tide that lifts all ships!

Hank Ratzesberger

DevOps Engineer @ i/o Werx

1mo

I have seen several articles describing how some problem was solved or something new invented by AI and ... "researchers don't know how." It really should be disallowed, at least for any system doing reasoning or decision making, because it might as well be an hallucination.

Kara Cordelia Warburton

Microcontent champion, Terminologist, Ontologist, Professor of Terminology, Translation and Localization

1mo

"humans create and curate these concepts" - yes, and those humans are called "terminologists"

Johann U. Zimmermann, Dr.

i-inf.net - get to know what you know. IT Consultant. Information-Oriented Software Architectures, ISAQB-Cert., IREB-Cert.Prof.f. Requirements Engineering, UXQB-Cert.Prof.f.Usability&UX, Data Protection

1mo

What kind of solutions do we actually call AI? I tend to believe that especially neural networks are called intelligent because of their use of “neurons” and even more because humans do not really understand everything that’s happening inside of them. We show lots of training material to them and in the end they can solve a set of problems without the need for us to really bother how they do it. We watch and wonder. Maybe that’s why people call them intelligent. So, trying to tame NNs by trying to control their behavior, and by making them consider data structures we as humans can understand, seems to take away part of the mystic from the NNs and thus seems to convert them into conventional IT solutions, where we have to care about our problems by ourselves. This might contrast with many AI business models and might reduce ROI significantly. So, I wonder if we should strive for changing the way AI solutions work or if we should rather put them into contexts of non-AI algorithms to reach our final goals while having them do what they do best: pattern recognition. I'm not sure and maybe there’s not just one answer to this. I guess, both alternatives are called symbolic AI, but there might be better terms for it.

To view or add a comment, sign in

Explore topics