Toy Model of the AI Control Problem

Playback speed

Share post at current time

0:00

Transcript

Toy Model of the AI Control Problem

Even the simplest AIs are close to wanting to kill you

Liron Shapira

Feb 06, 2025

Why does the simplest AI imaginable, when you ask it to help you push a box around a grid, suddenly want you to die?

AI doomers are often misconstrued as having "no evidence" or just "anthropomorphizing". This toy model will help you understand why a drive to eliminate humans is NOT a handwavy anthropomorphic speculation, but rather something we expect by default from any sufficiently powerful search algorithm.

We’re not talking about AGI or ASI here — we’re just looking at an AI that does brute-force search over actions in a simple grid world.

The slide deck I’m presenting was created by Jaan Tallinn, cofounder of the Future of Life Institute.

00:00 Introduction

01:24 The Toy Model

06:19 Misalignment and Manipulation Drives

12:57 Search Capacity and Ontological Insights

16:33 Irrelevant Concepts in AI Control

20:14 Approaches to Solving AI Control Problems

23:38 Final Thoughts

Watch the Lethal Intelligence Guide, the ultimate introduction to AI x-risk! https://www.youtube.com/@lethal-intelligence

PauseAI, the volunteer organization I’m part of: https://pauseai.info

Join the PauseAI Discord — https://discord.gg/2XXWXvErfA — and say hi to me in the #doom-debates-podcast channel!

Doom Debates’ Mission is to raise mainstream awareness of imminent extinction from AGI and build the social infrastructure for high-quality debate.

Support the mission by subscribing to my Substack at https://doomdebates.com and to https://youtube.com/@DoomDebates

Doom Debates

Toy Model of the AI Control Problem

Discussion about this video