0:00
/
0:00
Transcript

Toy Model of the AI Control Problem

Even the simplest AIs are close to wanting to kill you
1

Why does the simplest AI imaginable, when you ask it to help you push a box around a grid, suddenly want you to die?

AI doomers are often misconstrued as having "no evidence" or just "anthropomorphizing". This toy model will help you understand why a drive to eliminate humans is NOT a handwavy anthropomorphic speculation, but rather something we expect by default from any sufficiently powerful search algorithm.

We’re not talking about AGI or ASI here — we’re just looking at an AI that does brute-force search over actions in a simple grid world.

The slide deck I’m presenting was created by Jaan Tallinn, cofounder of the Future of Life Institute.


00:00 Introduction

01:24 The Toy Model

06:19 Misalignment and Manipulation Drives

12:57 Search Capacity and Ontological Insights

16:33 Irrelevant Concepts in AI Control

20:14 Approaches to Solving AI Control Problems

23:38 Final Thoughts


Watch the Lethal Intelligence Guide, the ultimate introduction to AI x-risk! https://www.youtube.com/@lethal-intelligence

PauseAI, the volunteer organization I’m part of: https://pauseai.info

Join the PauseAI Discord — https://discord.gg/2XXWXvErfA — and say hi to me in the #doom-debates-podcast channel!


Doom Debates’ Mission is to raise mainstream awareness of imminent extinction from AGI and build the social infrastructure for high-quality debate.

Support the mission by subscribing to my Substack at https://doomdebates.com and to https://youtube.com/@DoomDebates

Discussion about this video

User's avatar