Featured Article

The Great Pretender

AI doesn’t know the answer, and it hasn’t learned how to care

Comment

Hands holding a mask of anonymity. Polygonal design of interconnected elements.
Image Credits: llya Lukichev / Getty / Getty Images

There is a good reason not to trust what today’s AI constructs tell you, and it has nothing to do with the fundamental nature of intelligence or humanity, with Wittgensteinian concepts of language representation, or even disinfo in the dataset. All that matters is that these systems do not distinguish between something that is correct and something that looks correct. Once you understand that the AI considers these things more or less interchangeable, everything makes a lot more sense.

Now, I don’t mean to short circuit any of the fascinating and wide-ranging discussions about this happening continually across every form of media and conversation. We have everyone from philosophers and linguists to engineers and hackers to bartenders and firefighters questioning and debating what “intelligence” and “language” truly are, and whether something like ChatGPT possesses them.

This is amazing! And I’ve learned a lot already as some of the smartest people in this space enjoy their moment in the sun, while from the mouths of comparative babes come fresh new perspectives.

But at the same time, it’s a lot to sort through over a beer or coffee when someone asks “what about all this GPT stuff, kind of scary how smart AI is getting, right?” Where do you start — with Aristotle, the mechanical Turk, the perceptron or “Attention is all you need”?

During one of these chats I hit on a simple approach that I’ve found helps people get why these systems can be both really cool and also totally untrustable, while subtracting not at all from their usefulness in some domains and the amazing conversations being had around them. I thought I’d share it in case you find the perspective useful when talking about this with other curious, skeptical people who nevertheless don’t want to hear about vectors or matrices.

There are only three things to understand, which lead to a natural conclusion:

  1. These models are created by having them observe the relationships between words and sentences and so on in an enormous dataset of text, then build their own internal statistical map of how all these millions and millions of words and concepts are associated and correlated. No one has said, this is a noun, this is a verb, this is a recipe, this is a rhetorical device; but these are things that show up naturally in patterns of usage.
  2. These models are not specifically taught how to answer questions, in contrast to the familiar software companies like Google and Apple have been calling AI for the last decade. Those are basically Mad Libs with the blanks leading to APIs: Every question is either accounted for or produces a generic response. With large language models the question is just a series of words like any other.
  3. These models have a fundamental expressive quality of “confidence” in their responses. In a simple example of a cat recognition AI, it would go from 0, meaning completely sure that’s not a cat, to 100, meaning absolutely sure that’s a cat. You can tell it to say “yes, it’s a cat” if it’s at a confidence of 85, or 90, whatever produces your preferred response metric.

So given what we know about how the model works, here’s the crucial question: What is it confident about? It doesn’t know what a cat or a question is, only statistical relationships found between data nodes in a training set. A minor tweak would have the cat detector equally confident the picture showed a cow, or the sky, or a still life painting. The model can’t be confident in its own “knowledge” because it has no way of actually evaluating the content of the data it has been trained on.

The AI is expressing how sure it is that its answer appears correct to the user.

This is true of the cat detector, and it is true of GPT-4 — the difference is a matter of the length and complexity of the output. The AI cannot distinguish between a right and wrong answer — it only can make a prediction of how likely a series of words is to be accepted as correct. That is why it must be considered the world’s most comprehensively informed bullshitter rather than an authority on any subject. It doesn’t even know it’s bullshitting you — it has been trained to produce a response that statistically resembles a correct answer, and it will say anything to improve that resemblance.

The AI doesn’t know the answer to any question, because it doesn’t understand the question. It doesn’t know what questions are. It doesn’t “know” anything! The answer follows the question because, extrapolating from its statistical analysis, that series of words is the most likely to follow the previous series of words. Whether those words refer to real places, people, locations, etc. is not material — only that they are like real ones.

It’s the same reason AI can produce a Monet-like painting that isn’t a Monet — all that matters is it has all the characteristics that cause people to identify a piece of artwork as his. Today’s AI approximates factual responses the way it would approximate “Water Lilies.”

Now, I hasten to add that this isn’t an original or groundbreaking concept — it’s basically another way to explain the stochastic parrot, or the undersea octopus. Those problems were identified very early by very smart people and represent a great reason to read commentary on tech matters widely.

Ethicists fire back at ‘AI Pause’ letter they say ‘ignores the actual harms’

But in the context of today’s chatbot systems, I’ve just found that people intuitively get this approach: The models don’t understand facts or concepts, but relationships between words, and its responses are an “artist’s impression” of an answer. Their goal, when you get down to it, is to fill in the blank convincingly, not correctly. This is the reason why its responses fundamentally cannot be trusted.

Of course sometimes, even a lot of the time, its answer is correct! And that isn’t an accident: For many questions, the answer that looks the most correct is the correct answer. That is what makes these models so powerful — and dangerous. There is so, so much you can extract from a systematic study of millions of words and documents. And unlike recreating “Water Lilies” exactly, there’s a flexibility to language that lets an approximation of a factual response also be factual — but also make a totally or partially invented response appear equally or more so. The only thing the AI cares about is that the answer scans right.

This leaves the door open to discussions around whether this is truly knowledge, what if anything the models “understand,” if they have achieved some form of intelligence, what intelligence even is and so on. Bring on the Wittgenstein!

Furthermore, it also leaves open the possibility of using these tools in situations where truth isn’t really a concern. If you want to generate five variants of an opening paragraph to get around writer’s block, an AI might be indispensable. If you want to make up a story about two endangered animals, or write a sonnet about Pokémon, go for it. As long as it is not crucial that the response reflects reality, a large language model is a willing and able partner — and not coincidentally, that’s where people seem to be having the most fun with it.

Where and when AI gets it wrong is very, very difficult to predict because the models are too large and opaque. Imagine a card catalog the size of a continent, organized and updated over a period of a hundred years by robots, from first principles that they came up with on the fly. You think you can just walk in and understand the system? It gives a right answer to a difficult question and a wrong answer to an easy one. Why? Right now that is one question that neither AI nor its creators can answer.

This may well change in the future, perhaps even the near future. Everything is moving so quickly and unpredictably that nothing is certain. But for the present this is a useful mental model to keep in mind: The AI wants you to believe it and will say anything to improve its chances.

More TechCrunch

Two separate studies investigated how well Google’s Gemini models and others make sense out of an enormous amount of data.

Gemini’s data-analyzing abilities aren’t as good as Google claims

Featured Article

The biggest data breaches in 2024: 1B stolen records and rising

Some of the largest, most damaging breaches of 2024 already account for over a billion stolen records.

10 hours ago
The biggest data breaches in 2024: 1B stolen records and rising

Welcome back to TechCrunch’s Week in Review — TechCrunch’s newsletter recapping the week’s biggest news. Want it in your inbox every Saturday? Sign up here. This week, Apple finally added…

Apple finally supports RCS in iOS 18 update

Featured Article

SAP, and Oracle, and IBM, oh my! ‘Cloud and AI’ drive legacy software firms to record valuations

There’s something of a trend around legacy software firms and their soaring valuations: Companies founded in dinosaur times are on a tear, evidenced this week with SAP‘s shares topping $200 for the first time. Founded in 1972, SAP’s valuation currently sits at an all-time high of $234 billion. The Germany-based…

12 hours ago
SAP, and Oracle, and IBM, oh my! ‘Cloud and AI’ drive legacy software firms to record valuations

Sarah Bitamazire is the chief policy officer at the boutique advisory firm Lumiera.

Women in AI: Sarah Bitamazire helps companies implement responsible AI

Crypto platforms will need to report transactions to the Internal Revenue Service, starting in 2026. However, decentralized platforms that don’t hold assets themselves will be exempt. Those are the main…

IRS finalizes new regulations for crypto tax reporting

As part of a legal settlement, the Detroit Police Department has agreed to new guardrails limiting how it can use facial recognition technology. These new policies prohibit the police from…

Detroit Police Department agrees to new rules around facial recognition tech

Plaid’s expansion into being a multi-product company has led to real traction beyond traditional fintech customers.

Plaid, once aimed at mostly fintechs, is growing its enterprise business and now has over 1,000 customers signed on

He says that the problem is that generative AI is not human or even human-like, and it’s flawed to try and assign human capabilities to it.

MIT robotics pioneer Rodney Brooks thinks people are vastly overestimating generative AI

Matrix is rebranding its India and China affiliates, becoming the latest venture firm to distance its international franchises. The U.S.-headquartered venture capital firm will retain its name, while Matrix Partners…

Matrix rebrands India, China units over ‘organizational independence’

Adept, a startup developing AI-powered “agents” to complete various software-based tasks, has agreed to license its tech to Amazon and the startup’s co-founders and portions of its team have joined…

Amazon hires founders away from AI startup Adept

There are plenty of resources to learn English, but not so many for near-native speakers who still want to improve their fluency. That description applies to Stan Beliaev and Yurii…

YC alum Fluently’s AI-powered English coach attracts $2M seed round

NASA and Boeing officials pushed back against recent reporting that the two astronauts brought to the ISS on Starliner are stranded on board. The companies said in a press conference…

NASA and Boeing deny Starliner crew is ‘stranded’: “We’re not in any rush to come home”

As the country reels from a presidential debate that left no one looking good, the Supreme Court has swooped in with what could be one of the most consequential decisions…

Forget the debate, the Supreme Court just declared open season on regulators

As Google described during the I/O session, the new on-device surface would organize what’s most relevant to users, inviting them to jump back into their apps.

Android’s upcoming ‘Collections’ feature will drive users back to their apps

Many VC firms are struggling to attract new capital from their own backers amid a tepid IPO environment. But established, brand-name firms are still able to raise large funds. On…

Kleiner Perkins announces $2 billion in fresh capital, showing that established firms can still raise large sums

Welcome to Startups Weekly — Haje‘s weekly recap of everything you can’t miss from the world of startups. Sign up here to get it in your inbox every Friday. Editor’s…

DEI? More like ‘common decency’ — and Silicon Valley is saying ‘no thanks’

The company “identified a security incident that involved bad actors targeting a limited number of HubSpot customers and attempting to gain unauthorized access to their accounts” on June 22.

HubSpot says it’s investigating customer account hacks

VW Group’s struggling software arm Cariad has hired at least 23 of the startup’s top employees over the past several months.

Volkswagen’s Silicon Valley software hub is already stacked with Rivian talent

Featured Article

All VCs say they are founder friendly; Detroit’s Ludlow Ventures takes that to another level

VCs Jonathon Triest and Brett deMarrais see their ability to read people and create longstanding relationships with founders as the primary reason their Detroit-based venture firm, Ludlow Ventures, is celebrating its 15th year in business. It sounds silly, attributing their longevity to what’s sometimes called “Midwestern nice.” But is it…

2 days ago
All VCs say they are founder friendly; Detroit’s Ludlow Ventures takes that to another level

President Joe Biden’s administration is doubling down on its interest in the creator economy. In August, the White House will host the first-ever White House Creator Economy Conference, which will…

The White House will host a conference for social media creators

In an industry where creators are often tossed aside like yesterday’s lootboxes, MegaMod swoops in with a heroic promise to put them front and center.

Pitch Deck Teardown: MegaMod’s $1.9M seed deck

Google’s trying to make waves with Gemini, its flagship suite of generative AI models, apps and services. So what’s Google Gemini, exactly? How can you use it? And how does…

Google Gemini: Everything you need to know about the new generative AI platform

There were definite differences between how the two platforms managed last night, with some saying X felt more alive, and others asserting that Threads proved that X is no longer…

Who won the presidential debate: X or Threads?

Ultra-low-cost e-commerce giants Shein and Temu have only recently been confirmed as subject to centralized enforcement of the strictest layer of the European Union’s digital services regulation, the Digital Services…

Following raft of consumer complaints, Shein and Temu face early EU scrutiny of DSA compliance

Artyc has raised $14 million to date and has a product on the market, Medstow Micro, that helps ship temperature-sensitive specimens.

Cold shipping might be the next industry that batteries disrupt

Get ready to unlock the secrets of successful fundraising in the upcoming year at Disrupt 2024. Our featured session, “How to Raise in 2025 if You’ve Taken a Flat, Down,…

Elevate your 2025 fundraising strategy at Disrupt 2024

The remote access giant linked the cyberattack to government-backed hackers working for Russian intelligence, known as APT29.

Remote access giant TeamViewer says Russian spies hacked its corporate network

We’ve poked through the many product announcements made by the biggest tech companies and product trade shows of the year, so far, and compiled them into this list.

Here are the hottest product announcements from Apple, Google, Microsoft and others so far in 2024

As a foreigner, navigating health insurance systems can often be difficult. German startup Feather thinks it has a solution and raised €6 million to help some of the 40-plus million…

Feather raises €6M to go Pan-European with its insurance platform for expats
  翻译: