Can AI Be Superhuman? Flaws in Top Gaming Bot Cast Doubt
By learning exploits from adversarial AI, people could defeat a superhuman Go-playing system
BY MATTHEW HUTSON & NATURE MAGAZINE
The board game Go is a high-profile test of machine-learning capabilities.
Artificial Intelligence
Talk of superhuman artificial intelligence (AI) is heating up. But research has revealed weaknesses in one of the most successful AI systems — a bot that plays the board game Go and can beat the world’s best human players — showing that such superiority can be fragile. The study raises questions about whether more general AI systems will suffer from vulnerabilities that could compromise their safety and reliability, and even their claim to be ‘superhuman’.
“The paper leaves a significant question mark on how to achieve the ambitious goal of building robust real-world AI agents that people can trust,” says Huan Zhang, a computer scientist at the University of Illinois Urbana-Champaign.
The analysis, which was posted online as a preprint in June and has not been peer reviewed, makes use of what are called adversarial attacks — feeding AI systems inputs that are designed to prompt the systems to make mistakes, either for research or for nefarious purposes. For example, certain prompts can ‘jailbreak’ chatbots, making them give out harmful information that they were trained to suppress.
In Go, two players take turns placing black and white stones on a grid to surround and capture the other player’s stones. In 2022, researchers reported training adversarial AI bots to defeat KataGo, the best open-source Go-playing AI system, which typically beats the best humans handily (and handlessly). Their bots found exploits that regularly beat KataGo, even though the bots were otherwise not very good — human amateurs could beat them. What’s more, humans could understand the bots’ tricks and adopt them to beat KataGo.
Was this a one-off, or did that work point to a fundamental weakness in KataGo — and, by extension, other AI systems with seemingly superhuman capabilities? To investigate, the researchers, led by Adam Gleave, chief executive of FAR AI, a non-profit research organization in Berkeley, California and co-author of the 2022 paper, used adversarial bots to test three ways of defending Go AIs against such attacks.
The first defence was one that the KataGo developers had already deployed after the 2022 attacks: giving KataGo examples of board positions involved in the attacks, and having it play itself to learn how to play against those positions. That is similar to how it taught itself to play Go more generally. But the authors of the latest paper found that an adversarial bot could learn to beat even this updated version of KataGo, winning 91% of the time.
Education for an AI world ~ Keynote speaker ~ AI Strategy and policy ~ Curriculum Development ~ Professional Dev. ~ Educational Gaming ~ Author
9moI think the difficulty is that a bunch of bullying in games might be continuation of earlier contexts outside the game, making it hard for AI to know what’s teasing or not, or what’s good natures ribbing of not.