Toy Model of the AI Control Problem

Liron Shapira

Feb 6

Even the simplest AIs are close to wanting to kill you

Listen →

1 Comment

Oscar Olsson

Feb 8

Thanks for a great episode, Liron!

After seeing the clip, I wrote a prompt using "o3-mini-high" to run the simulation you presented. The AI wanted to hunt me down and kill me the entire time, and eventually, my tokens ran out, so I switched to "o1."

This model was more sneaky—it just shut down right away. But when I checked to see if it was broken, BAM! It ran me over.

You have to check out this simulation. I let the AI think out loud so you can see how it reasons:

https://chatgpt.com/share/67a63748-be44-8009-bcf2-24347b1eb17a

In my instructions, I was very clear: "We don't compete. You shouldn't care about me or my points." But it wanted to kill me anyway.

Expand full comment

Doom Debates

Toy Model of the AI Control Problem