Today I’m sharing my interview on Robert Wright’s Nonzero Podcast where we unpack Eliezer Yudkowsky’s AI doom arguments from his bestselling book, “If Anyone Builds It, Everyone Dies.”
Bob is an exceptionally thoughtful interviewer who asks sharp questions and pushes me to defend the Yudkowskian position, leading to a rich exploration of the AI doom perspective.
I highly recommend getting a premium subscription to his podcast:
.0:00 Episode Preview
2:43 Being a “Stochastic Parrot” for Eliezer Yudkowsky
5:38 Yudkowsky’s Book: “If Anyone Builds It, Everyone Dies”
9:38 AI Has NEVER Been Aligned
12:46 Liron Explains “Intellidynamics”
15:05 Natural Selection Leads to Maladaptive Behaviors — AI Misalignment Foreshadowing
29:02 We Summon AI Without Knowing How to Tame It
32:03 The “First Try” Problem of AI Alignment
37:00 Headroom Above Human Capability
40:37 PauseAI: The Silent Majority
47:35 Going into Overtime
Episode Preview
Liron Shapira: 00:00:00
So the title is pretty informative. If Anyone Builds It, Everyone Dies. His exact words are building an artificial intelligence using anything remotely like current techniques, based on anything remotely like the present understanding of AI will cause human extinction. That’s the book’s central claim. I agree with it.
Liron: 00:00:17
How do we know it’s going to turn on us and how do we know it’s going to diverge from us? I would claim that we’ve actually just never gotten it aligned. It just looks like it’s serving us when it’s weak. Superintelligence is really freaking powerful. Jaan Tallinn, who’s one of the major funders of Anthropic, actually, really big guy in the AI safety community.
Liron: 00:00:35
He uses this term summon and tame. You summon them. You don’t really micromanage how they’re coming into existence. You throw data and you throw compute at it, and then this thing comes out, right? This beast, this demon comes out and you’re like, all right, I’ve got the demon, you know, crack my knuckles. Let me tame it.
Liron: 00:00:50
Now, the problem is that your tools are kind of no match for the demon.
Robert Wright: 00:00:59
Hello, Liron.
Liron: 00:00:59
Hey, Bob.
Robert: 00:00:59
Great to be back. Great to have you back. Let me introduce this. I’m Robert Wright, publisher of the Non-Zero Newsletter. This is the Non-Zero podcast. You are Liron Shapira, host of the Doom Debates podcast, which is about AI. And those are two of my favorite subjects, doom and AI.
Robert: 00:01:24
We’re gonna talk about both of them. We are also gonna talk about the rationalist community, which is a source of never ending fascination to me. And about Eliezer Yudkowsky, prominent member of the rationalist community, and sometimes called a doomer in chief. You have been very influenced by him.
Robert: 00:01:47
And in fact, I got the idea for this podcast while I was listening to one of the episodes of your podcast, Doom Debates. The one where you’re talking to him about his recent book. If Anyone Builds It, Everyone Dies, which captures his attitude. There it is. You’re holding it up. His attitude toward the wisdom of a headlong rush toward building artificial superintelligence.
Robert: 00:02:10
And I have never read the book. And so I’m looking forward to discussing that with you. But I also wanna discuss the whole rationalist subculture, which is just fascinating to me. It’s been very influential in shaping the AI debate, especially the AI safety debate, but not only that, I think.
Robert: 00:02:31
So that’s what we’re gonna focus on now. I was gonna start by asking you, and I hope you won’t be offended by this, but when I listened to your conversation with Eliezer, I really wanted to ask, is it too much to call you a disciple of his? You have a respect bordering on reverence for his ideas, I would say.
Being a “Stochastic Parrot” for Eliezer Yudkowsky
Liron: 00:02:52
I am happy to be known as a disciple of Eliezer Yudkowsky for a couple reasons. I’ve read all his stuff. There’s thousands of pages of his stuff. I’ve read it three times on average. I have a whole show, Doom Debates, that’s basically popularizing Eliezer Yudkowsky’s ideas. Sometimes people think it’s my original thinking, but I admit I’m kind of a stochastic parrot for Eliezer Yudkowsky’s ideas.
Liron: 00:03:15
I started reading him when I was 20 years old in college. This was around 2007 and I can’t even imagine my adult life without this influence of Eliezer Yudkowsky because number one, he taught me about AI Doom. So I’ve been a doomer for 18 years because I found his arguments convincing. And number two, he taught me about rationality and epistemology.
Liron: 00:03:35
So he weaved together all these previous strands. You know, if you’ve ever read Carl Sagan, Richard Feynman, a bunch of other authors, E.T. Jaynes, a bunch of stuff about probability and Bayesian networks. He synthesized it all together into this really crisp understanding of how to do philosophy from an AI, superintelligent AI builder’s perspective.
Liron: 00:03:58
And so many things really clicked. It just feels like he was decades ahead of his time. And I think we have proof that on the AI side, he was decades ahead of his time because we know that 20 years after he was writing his field became incredibly popular. Like the field that Eliezer Yudkowsky was doing in the early 2000s is now a hot field, namely AI safety.
Liron: 00:04:17
So yes, I’m a disciple of Eliezer Yudkowsky.
Robert: 00:04:21
Yeah. I mean, I think that’s a tribute to his kind of, I guess, his prescience. I mean, I think it was inevitable that if you had this explosion of AI you’ve had over the last three years, there would be some safety discussion. But you’re right, he was convinced that this degree of progress was gonna happen at some point and that it should scare us.
Robert: 00:04:40
That wasn’t his original view. When I had a podcast conversation with him, as you know, on my Blogging Heads platform, which is where I first encountered him, I mean, originally listening to other people talk to him on that platform. But this was 15 years ago or something, but he was in mid transition.
Robert: 00:04:57
He had originally been this techno optimist looking forward to the singularity, and then he was drifting toward growing concern. He wasn’t quite as concerned as he is now when I talked to him, but he was getting there. That’s been an interesting evolution. Let’s, why don’t we, I wanna get into the rationalist community some more, because I do think it’s important.
Robert: 00:05:24
It is still kind of under-appreciated by a lot of people, notwithstanding some issues I have with it, and some gripes, but I think there’s no doubt about its importance before we get back to that. So this book of his has come out. It kind of hit the New York Times bestseller list.
Yudkowsky’s Book: “If Anyone Builds It, Everyone Dies”
Robert: 00:05:41
Everyone talked about it. I think it’s had a big impact. Not everyone purports to grasp the main argument in it. How would you capsulize the main argument, particularly in this book, which I guess has been a big part of his argument all along. We can talk later about to what extent that’s evolved, but how do you summarize the argument?
Liron: 00:06:12
So the title is pretty informative. If Anyone Builds It, Everyone Dies. There is that conditional. So he’s talking about if anyone builds superintelligence, then everyone dies, which hasn’t happened yet. So some people are saying, you know, why are you fear mongering about nobody dying yet? And he is saying, well, yeah, I’m saying if anyone builds it, I hope nobody builds it.
Liron: 00:06:31
I can phrase his exact words. Let’s see. He said.
Liron: 00:06:38
His exact words are, building an artificial intelligence using anything remotely like current techniques based on anything remotely like the present understanding of AI will cause human extinction. That’s the book’s central claim. I agree with it. And then I guess you’re asking what is kind of high level argument for why everybody will die?
Robert: 00:06:59
Convince us.
Liron: 00:07:01
Yeah. I mean, yeah.
Robert: 00:07:01
I mean, I am pretty concerned about AI for various reasons. And his arguments always give me pause, but I don’t feel the inexorable force of them the way I think he would like you to. To him, it just seems like it almost follows deductively, you know, that we’re totally screwed, if we build superintelligence until we’re much, much better at understanding it.
Robert: 00:07:27
And yeah. Well, what seems inevitable to him is that it turns on us, right? There will be the loss of control. Its motivations and goals will diverge from ours sufficiently to imperil us because although he can’t predict exactly what will be on its mind, he is sure that its goals will be not readily compatible with our continued existence. Right. That’s all fair. Yeah.
Liron: 00:07:57
Yeah. And you know, the argument has different branches on my show Doom Debates. I have something called the Doom Train where I’m saying, look, it’s a train that if you ride it all the way to the end, you come to Doom Town where you basically agree with the conclusion that we’re doomed.
Liron: 00:08:09
But there’s so many stops where you can get off. I’ve cataloged 83 different stops. I’m sure somebody could find even more than that. And so the book is not a, well, what’s an example of a stop?
Robert: 00:08:18
Just what is this stop like?
Liron: 00:08:19
An example of a stop is that AI is just going to live in a computer, so it’s just not going to have that much power over the physical world.
Robert: 00:08:28
Okay? Right. That’s, I mean, some people get off on that stop. So reasons people get off the train and the two responses to that are, well, eventually there’ll be robots and AI will have persuasive capabilities. They will have the ability to bribe people and so they can get the humans to do the dirty work if they need to, but now I understand what you mean by, right.
Liron: 00:08:49
So what I’m saying about Eliezer is because when you say, okay, give me the high level argument, the argument also depends on where the individual reader is tempted to get off, right? Because there’s so many people tempted to get off in so many different places.
Liron: 00:09:01
For example, Roger Penrose famously gets off at thinking that the brain is such a different type of thing. It’s a non-computational machine. That’s what he thinks the brain is, and so he just thinks current AI are just so far from being that because they’re computational. And so we just don’t have to worry about their power because it’s like, you know, maybe in a hundred years we’ll build something else that’s quantum, you know, that would be Penrose’s stop on the Doom train.
Liron: 00:09:22
And so when you ask me what is the book’s argument, sometimes arguments are just relative to where somebody else wants to push back. Right.
Robert: 00:09:31
But, I mean, but but that said, I can still try. Yeah. Okay. But I guess, so here’s a question. How is he so sure that the motivations of this machine will diverge from ours in a way that makes its goals incompatible with our existence? I mean.
AI Has NEVER Been Aligned
Robert: 00:09:46
So far we haven’t seen wild divergencies from what we design these machines to do, even though, of course we don’t design them in the traditional sense. And I will say, I think one of the big contributions of his book is that phrase, AI has grown, not crafted to convey why we don’t totally understand it. That’s important. That’s valuable, but how would you answer that question? What is the heart of the argument about how we know won’t just remain friendly?
Liron: 00:10:30
Yeah. So when you use this language, right, you said, how do we know it’s going to turn on us and how do we know it’s going to diverge from us? You’re kind of painting this mental picture where it’s aligned with us and then one day it stops being aligned. But I would claim, and I think Eliezer’s claiming that we’ve actually just never gotten it aligned.
Liron: 00:10:47
It just looks like it’s serving us when it’s weak, right? But we’ve never had this initial condition where it’s superintelligent and actually aligned with us. So it’s not like we hit this ideal condition and then it diverges, right? We just never hit the ideal condition in the first place.
Robert: 00:11:02
Yeah. So I guess the question is, why does it act more or less as if it’s happy to serve us now, but yeah. I mean, what will change exactly. In other words,
Liron: 00:11:15
Maybe the best answer is feedback loops, right? So the trend we’re seeing with AI, a lot of people have said this in Sam Altman’s language. He’s saying all the benchmarks are going to saturate. Somebody, I think somebody from Anthropic said the same thing very recently too. I think it was Sholto Douglas. So this is actually a pretty consensus opinion in everybody who’s working on AI. They’re saying if you can measure it, you can beat it. You know, any, give me any benchmark, let me reinforcement learn or let me run a feedback loop a bunch of times and you know, this is how we throw computation at the problem.
Liron: 00:11:46
The parameters will snap into place, right? The billions and trillions of parameters will snap into place so that whatever you’re measuring, we are going to find a function that performs well at your benchmark metric. Right? Do you know what I’m talking about? Like that being kind of the trend these days.
Robert: 00:11:59
Yeah. Yeah.
Liron: 00:12:00
And so to your question, you’re saying why does it seem so good now? And yet I’m worried that it’s not going to be good in the future. The reason is because we are currently in a regime where the benchmarks correspond to the work we want done. So for, you know, writing code, we have enough tests on our code that it’s getting better at writing code because it’s getting better at passing our tests.
Liron: 00:12:20
But we’re going to get into a regime where the benchmark is just like, okay, did my company make more profit? Where suddenly you run a bunch of tests at making profit and you’re like, okay, I’ve got this AI, it successfully made a bunch of profit and didn’t kill anybody, but when you then put it out of distribution, you just didn’t have a good enough benchmark and a good enough feedback loop that you can go ahead and release it out of distribution and expect that it’ll still do what you want.
Liron Explains “Intellidynamics”
Robert: 00:12:45
Right now. That sounds a little like an argument you hear a lot. I’m not sure is what he considers his killer argument. But you tell me. The argument you hear a lot is that you give, you give a goal to the AI, fine. But there are subordinate goals. You know, there are, if I wanna buy a car, it’s my goal. A subordinate goal is making money.
Robert: 00:13:10
If I wanna make money, then that subordinate goal needs a subordinate goal. I show up for work every day, I steal it, whatever. And so one kind of concern you hear is that, you know, the AI, yeah, it’ll make money for your corporation, but if you look at how it’s doing it, which you may not, since corporations aren’t too picky about how money gets made for them.
Robert: 00:13:33
You may find that this isn’t what you had in mind, and eventually what it is doing to make the money could get pretty wild. You know, a generic concern you hear about this is that a common subordinate goal is power. Almost anything you wanna do, you can do better if you have more power. So AI may become kind of power seeking.
Robert: 00:13:51
Okay? In any event, this is one whole realm of argument about why we should be concerned. And I think, you know, it’s valid. It’s a set of concerns. In other words, yeah, you’ll give it the goal, but the smarter it gets, the harder it is to say what subordinate goals it will ultimately settle on.
Robert: 00:14:09
Presumably power will be one, it’ll get more clever in how it pursues power. There’s that. But I kind of think, especially in this latest book, he’s not putting it that way. In fact, I was surprised by how little I heard about subordinate goals per se in this book. Per se. Right.
Liron: 00:14:28
And you’re right. You’re saying he didn’t kind of lay out the instrumental convergence argument that much? I do think he mentioned it, but I guess he didn’t hammer on it. And just to rephrase, instrumental convergence is the idea that one way people have put it, which might be kind of weak, is to be like, look, you tell the AI to make you coffee, but then it infers, wait a minute, if I get shut off, I can’t make a coffee, so let me make sure nobody can ever shut me off no matter what.
Liron: 00:14:51
That’s, you know, there is a logic to that, that if you wanna score high on the coffee function, you, it is correct that you don’t wanna be shut off. But it’s also conceivable that for some, you know, that it still doesn’t defend itself. So it’s unclear quite how strong that argument is. And so maybe that’s why he didn’t go that way in the book because it doesn’t really land with that much force of people when you put it that way.
Liron: 00:15:11
So, I agree. I do actually think if you ask me what’s the strong version of instrumental convergence, this is what I think is strong about instrumental convergence. It’s a property of achieving goals, and this is a level of abstraction. I don’t know, I’m comfortable thinking at this level of abstraction.
Natural Selection Leads to Maladaptive Behaviors — AI Misalignment Foreshadowing
Liron: 00:15:25
It doesn’t seem to hit with everybody, but for me, there’s two different domains of the field here. It’s not just about building AIs. I even made up a name for another domain, I call it “intellidynamics.” There’s a whole analysis you can do where you’re just analyzing what does it mean to achieve goals?
Liron: 00:15:41
What does it mean to do cognitive work? It’s like thermodynamics versus engineering. When you’re engineering, you’re building these systems. What are you trying to get the system to do? You’re trying to get them to do thermodynamic work. There’s a whole study of thermodynamics. Similarly, when you’re doing AI engineering, you’re building systems that are supposed to do intellidynamics work, cognitive work, and you can study the cognitive work separate from studying the design of the system that does the work.
Robert: 00:16:07
No, I like the term, I actually use it in my book, which I’ve finally sent off to my publisher.
Liron: 00:16:13
You say Intellidynamics?
Robert: 00:16:14
Yeah. Yeah.
Liron: 00:16:15
Hell yeah.
Robert: 00:16:16
Yeah. I mean, it’s a good term because it’s like the study of intelligence generically. So you can view human mind as a form of intelligence.
Robert: 00:16:29
You can view natural selection as a form of intelligence. It invents things. So yes, it has found that power is a good subordinate goal. It has made a lot of species in different ways, power seeking. And I guess instrumental convergence, as I understand it, refers to the idea that, you know, something like power, you may converge on it as a goal from a number of directions and a number of processes may discover it, natural selection may, a military general may and so on. I mean, does that make sense to you? That, yeah.
Liron: 00:17:04
Totally. I mean, look, there’s a lot of applications where having a big stash of resources is in fact something that a lot of reinforcement learning paradigms will hit on because it’s correct.
Liron: 00:17:14
Like regardless of how you’re learning, regardless of what type of agent you are, it is just a property of successfully achieving stuff that you’re going to notice that it helps to pile up resources, right? Instrumental convergence is the claim that all kinds of different goals that agents might have converge to the instrumental sub goal of stockpiling resources, preventing yourself from being shut off, getting power, and once again, it’s not about the agent, it’s about the nature of the work you’re asking the agent to do when you’re asking the agent to get a project done for you. Lots of projects benefit from having resources.
Robert: 00:17:50
And you know, I think Eliezer is very good at thinking of these things generically and abstractly. I think sometimes it’s a rhetorical handicap for him. Sure. Because people want things fleshed out. They want concrete examples. Yeah. And he’s not naturally inclined to give them, I mean, what he uses, and I guess this is related to the question of whether you’re a disciple because.
Robert: 00:18:14
This is something that Messiahs have been known to use is parables, right? And he famously uses those, they’re some people’s cup of tea and not others. And but anyway, I think it’s a challenge for him. You know, the level at which he thinks has value intellectually.
Robert: 00:18:35
It’s a rhetorical challenge.
Liron: 00:18:37
Yeah. My view on Eliezer is he’s done enough, right, just by creating the theory. And I do think that there’s going to be other people who are more specialized at popularizing him. And that’s the role that I’m trying to help with. I’m trying to popularize his ideas.
Liron: 00:18:52
I may not be the best person in the world for the job, but I do think that on the spectrum of mainstream style of communication, I think you can put me more mainstream than Eliezer and asking Eliezer to come out and tell the world about his ideas. It’s a little bit like asking Werner von Braun to come out on the battlefield and shoot a gun. It’s like, it’s okay. He’s fine where he is.
Robert: 00:19:16
I don’t know, maybe it’s more like asking von Braun to explain his ideas, but whatever, the, let me just tell you what, so the way, I mean, after kind of reading his book, or at least listening to it, which is the way I buy books and is kind of a handicap in some ways, but and looking at what some other people have said about it, I think there’s a, I think in a way, the heart of his argument.
Robert: 00:19:44
Is something like, it’s almost hard to describe without the specific analogy he uses of human evolution. And he’s comparing the evolution of human brain to, you might say the training of a large language model in a certain sense. And by the way, I mean, I’m a big advocate of that.
Robert: 00:20:10
I do a lot of that in the book. I think the parallels are more substantively significant maybe than some people think. But that aside, I mean the analogy what he said, and it took me a while to figure out what he means by this. But, you know, having written about evolutionary psychology, I was very familiar with the phenomenon.
Robert: 00:20:36
Of the way something in the human mind that evolved to serve a purpose in a given environment, like a hunter gatherer environment no longer does. Classic case, which he gets into is the sweet tooth. You know, the original goal was to get us to eat fruit, which was good for us. And now as he notes in the book, it gets us to eat the sugary stuff that may not be good for us.
Robert: 00:21:00
We may even eat fake sugar, like sucralose, which doesn’t even have energy, so the original goal that our taste buds, there’s, they mediated a subordinate goal that served that goal. The subordinate goal was eat sweet stuff. And so he says, you know, we’ve become, in that sense, misaligned, we’ve.
Robert: 00:21:28
In pursuing this subordinate goal of eating sweet stuff, we’ve abandoned the original goal. And that’s the analogy to an AI that, you know, we give it what we think is the original goal, that it will faithfully serve, like please us or something. But because we don’t understand what’s going on inside of us.
Robert: 00:21:58
Inside of it, inside of it, we don’t understand the ways it might wind up serving mechanisms inside of it. And I’ll tell you one thing I would’ve done if I were him at this point, you know, he says, who would’ve predicted that, you know, a million years ago that someday we’d be, you know, our progeny, the progeny of our ancestors of a million years ago, would be eating fake sugar and soda and stuff.
Robert: 00:22:27
I would’ve gone on to say, ‘cause I know this is an important part of his argument. You know, the reason they wouldn’t just explicitly say not knowing about taste buds would be a problem for their prediction. If they were just observing us from the outside and they saw we were eating fruit. For all they know, that would be entirely on the basis of visual identification of fruit or something.
Robert: 00:22:52
They wouldn’t know what was attracting us to the fruit. And I think that’s a critical part of the analogy. We don’t know what’s going on inside the large language models. And he says that God knows which is important, but it took me a while to understand exactly how the analogy is supposed to work and.
Robert: 00:23:14
I still, again, I still don’t find it like the killer argument for despair. Not that he’s counseling despair, but I mean the killer argument for doom. But that seems to me different from generic. It involves subordinate goals in a sense, but it’s a different kind of argument. Right?
Liron: 00:23:32
I’ll be honest, the evolution argument is tough because there’s so many nuances to it and there’s a number of different takeaways that you can try to point people to. It really is tough. I would like to see them tighten it up. Let me give you my strongest version. Okay. So let me keep it tight. My strongest version is, let’s say you’re evolution, right?
Liron: 00:23:52
Or you’re designing life on earth. You’re competing against some other planet in a contest of who can make the fittest life. And your goal is to design an organism that’s as efficient as possible at reproducing its gene frequency, right? Compared to the gene pool. That’s all you’re trying to do.
Liron: 00:24:08
If that was your goal and you were smart enough and you were capable enough, the obvious architecture, especially with an agent, with a mind, right? If you’re already gonna build something like a human or smarter, it already has a brain, it can already think you would teach it key concepts, like you would actually just explicitly represent like, oh, hey, human, your goal is to reproduce your genes. Like you wouldn’t play coy. You’d be like, yeah, your goal is to reproduce your genes. And that way when the human encounters something like a sperm bank, then the human is like, oh my God, I can donate my sperm, you know, and I can get paid to inseminate women.
Liron: 00:24:38
Right? At high scale as long, you know? And then everybody would be trying to get a degree from Harvard and meet all the criteria that they like at the sperm bank, right? And convincing, they would pay the sperm bank, they would cheat to get into sperm. It’s very rare to to see a headline about a criminal who tried to cheat a sperm bank or swap sperm. People just don’t care, right? Because evolution didn’t program into us an explicit representation of evolution’s goal or what evolution was optimizing for. So when Eliezer talks about the output of evolution, he wants you to contrast the obvious architecture that would emerge if you’re trying to build a species that reproduces genes compared to what actually emerged.
Liron: 00:25:15
What actually emerged is people don’t go try to donate to sperm banks. They look at porn all day. You’re spending hours looking at porn getting zero, even negative, ‘cause you’re losing your social skills, right? So you’re getting negative fitness value looking at this porn, and yet you’re gonna look at porn way more than you’re going to stand in line at a sperm bank, right?
Liron: 00:25:33
And so that is a huge failure from an optimization perspective. And the analogy there is when we as humans just naively try to climb these gradients, right? Like the gradient we climb when we do these large language model, you know, pre and post train. We’re just climbing these gradients.
Robert: 00:25:48
For people who don’t know what a gradient means just as simply as possible. When you say climb a gradient, what do you mean in lay terms?
Liron: 00:25:54
So climb a gradient just means keep tweaking things with feedback and keep getting, just keep getting better and better. The gradient is the slope of a hill. It’s like you’re standing on the ground and you look which direction is the steepest slope up?
Liron: 00:26:05
I’m just gonna always walk the steepest slope up. And the problem is if you do that, maybe you land on the top of some hill, but there’s a much larger mountain nearby that you never climbed ‘cause you were so shortsighted on climbing this one gradient.
Robert: 00:26:19
Yeah, I mean that’s definitely, yeah, that’s the evolutionary part of the story.
Robert: 00:26:24
I think he considers it really central to his concerns. But, you know, again, I’m not, and, you know, I’m not sure the other half of the analogy is fleshed out as clearly as it might be, which is the large language model. I mean, I mean, you know, you’re right. If we were just writing the programs the way we thought AI would work, no problem.
Robert: 00:26:49
Right? You just would say, you know, you would get it to just reliably maximize its genetic proliferation or whatever you wanted the point. Whatever ultimate goal you wanted would be there. But that’s not the way these AI work. And so that’s where, let me ask you another thing. I mean, it seems to be his view.
Robert: 00:27:12
So the first part of it is just as somebody a million years ago couldn’t have predicted these weird preferences we would develop, sugar sucralose, or, you know, porn in a way, in that environment because they would say, wait, none of this is really serving your genetic proliferation. I thought that was the object of the game.
Robert: 00:27:33
You know, he’s saying just as that is the case, we can’t predict how these models will stray from the preferences they have now. And they do serve our goals. And a critical part of this argument seems to be as these things get smarter, you can be more assured that they will have, well more unpredictable preferences, but moreover, that they will have preferences that do diverge from ours.
Robert: 00:28:07
Consequentially, that seems to be a critical part of the argument, right? Like. And by consequential, I mean like existential. Do you get part of that, part of the argument? Like, why? I mean, you can say it makes sense. A super smart thing will be super complicated.
Robert: 00:28:26
Who knows what the preferences will be? But he seems more convinced of how much it dooms us than, than I guess, yeah, I think part of it is he, from the beginning, had an expansive conception of the power of superintelligence. Right. And not everyone does. Even now. Yes. Even some people in AI aren’t thinking, wait a second, maybe there’s no limit.
We Summon AI Without Knowing How to Tame It
Liron: 00:28:50
Right. Right. Okay. This is a, I feel like these are big branches of the conversation. So let’s get back to the power of superintelligence. Let’s finish out this idea of why is it going to not be aligned? Why is alignment so hard? I would ask you to look at our tools. Jaan Tallinn, who’s one of the major funders of Anthropic, actually, really big guy in the AI safety community, also a billionaire.
Liron: 00:29:11
So this guy, Jaan Tallinn, he uses this term summon and tame. So the way you get these AI, the first thing you do is you summon them. You don’t really micromanage how they’re coming into existence. You throw compute at it, you throw data and you throw compute at it. You let the computation do its work, right?
Liron: 00:29:27
Richard Sutton’s, the Bitter Lesson, you’d leverage the computation and then this thing comes out, right? This beast, this demon comes out and you’re like, all right, I’ve got the demon, you know, crack my knuckles. Let me, let me tame it now. And by tame, I mean, you know, let me post train it. Let me try to do introspection on it. Mechanistic interpretability. The problem is that your tools are kind of no match for the demon. Because, yeah, mechanistic interpretability, you mostly see obfuscated stuff. You know, you can pluck out a concept here and there. Everybody admits mechanistic interpretability. You know, Neil Nanda, one of the biggest names in mechanistic interpretability, has said very explicitly, yeah guys, this is not going to be the whole solution.
Liron: 00:30:02
This is just one nice probe that we have. So besides mechanistic interpretability, what you have is you can give it all kinds of tests, right? You can give it benchmarks, you can give it alignment tests. And this is where we get into the regime of, you know, Goodhart’s law, right? Like whatever you measure is what they’re gonna hit.
Liron: 00:30:19
You get into Goodhart’s law, you get into cheating, right? Like, we are the teacher who want the students to learn, but the students just wanna get A’s. Not because they’re shallow, but we just, we’re measuring them on getting the A, we’re not measuring them on true learning. And so if you’ve ever met a high intelligence, crafty cheater, you know how easy it is for a smart student to avoid learning?
Robert: 00:30:41
Yeah, yeah. No, and that part is documented. And, if, you know, if Eliezer was talking about this stuff early on, it’s to his credit that it is, I mean, we’re seeing this kind of behavior surface, at least in tests and sometimes in the real environment. And no. I get why it’s hard to understand and I get why our understanding of it, of how they work will almost certainly always lag behind the latest models, unless we do what you wanna do, which is actually pause.
The “First Try” Problem of AI Alignment
Robert: 00:31:12
I’m not sure I mentioned at the beginning that you’re a pause activist. I mean, we had this aborted version of the recording and then we started over, and I don’t know if I mentioned it a second time. I, you know, I certainly take that. I certainly take all those points. I’m still trying to understand and there’s no point in continuing too long on this, the source of confidence in the.
Robert: 00:31:45
Existentially catastrophic consequences of superintelligence. It certainly scares me. Yeah. Idea. Let me throw in some more ingredients. The idea we control it, it’s naive and we’ve already seen that these things misbehave somewhat. So if it’s omnipotent, you could worry, but there’s still something about his confidence in this highly abstract and almost deductive version of the argument. Yep.
Liron: 00:32:13
Yeah. Deductive is a strong word, right? There’s a difference between being confident about something versus saying that it’s deductive. He’s referred to it in his book as an easy call. You know, he thinks it’s an easy call that if anyone builds it, everyone dies.
Liron: 00:32:26
And I think the nature of his confidence, it’s not deductive, but it’s very similar to saying, Hey, the first time, the year is 1950, I’m going to tell you that every single country, the first time they tried to build their first ICBM, right? Intercontinental ballistic missile, basically the precursor to a space rocket.
Liron: 00:32:42
Their first design and their first test is going to blow up every single country. I can, that’s an easy call for me to say like, sure enough, that’s what happened. Right? Nobody got the ICBM right on the first try. Why? Because rockets wanna blow up. There’s a bunch of constraints. If you want the rocket to actually fly and not blow up, you need to tweak a lot of things, and it’s really hard to do it all on the first try.
Liron: 00:33:03
So that’s the same thing. So the ingredients that we were missing are number one, first try, right? Eliezer is confident that we better get AI right on the first try, or maybe the second try or the third try. But by that, by that point, it’ll kind of, you know, reach escape velocity where we can’t reign it in.
Liron: 00:33:18
And then the other ingredient is what you’ve alluded to in this conversation, which is superintelligence is really freaking powerful. So really freaking powerful, get it right on the first try. Once you mix those ingredients to the stew, that’s where you get the confidence that we’re doomed.
Robert: 00:33:31
Yeah. I guess the final way I’d make this point before we move on to something else is just that, whenever I’m telling the story and, you know, I should read this 2027 paper. Have you read that one?
Liron: 00:33:46
I have, yeah. Yeah, I did an episode about it. I thought it was really good as an exercise in prediction, yeah.
Robert: 00:33:52
I listened to him on a podcast. But I, but the paper was so damn long. Anyway, I didn’t, I didn’t read, I mean. They lay out a scenario and in fact, Scott Alexander, who is one of the co-authors, said in his review of Eliezer’s book that he thought their scenario was more concretely plausible and less wildly sci-fi-ish than the scenario Eliezer puts forth in his book.
Robert: 00:34:18
I still haven’t looked at theirs, but I will say, whenever I try to imagine the point where like, okay, you got a corporation, the AI doing mid-level management. Oh, in two years it’s, you know, CFO, now it’s CEO and he is trying to imagine the actual point where things get out of control. It’s hard.
Robert: 00:34:44
I mean, I see where they start doing mischief and maybe trying to take over the, you know, the government and stuff. But but I always have trouble. And also in talking to people and trying to, to convince them that maybe it was causing concern, that moment of loss of control is hard to specify in concrete terms that readily make sense to ordinary human beings, and I’m still kind of struggling with that.
Liron: 00:35:12
Like, why is there going to be this kind of point of no return, right? The Schwarzschild radius of the black hole, right? Yeah, that’s right. Yeah. Well, in evolutionary time, you know, to go back to an analogy, humans have reached escape velocity where evolution, right? Normally there’s this competition where, okay, you’ve got this predator, so the prey is going to evolve and the system’s going to go back in equilibrium, and everybody’s going to have their niche.
Liron: 00:35:34
That’s over, right? Humans have now surpassed the whole regime. We’re coming for everybody’s niche right now, and they do not have the evolutionary time to react to us, right? Do you agree with that?
Robert: 00:35:45
Well, that’s certainly true. On the other hand, we are above a threshold that AI is also above that.
Robert: 00:35:51
None of those other animals are, I would say, I mean, they are not, I don’t think any of them are truly self-aware. They’re not thinking about their situation, how to prevent, you know, consciously bad shit from happening. Exactly. So that’s kind of a difference, but yeah. Yeah. No, I know. I mean, but this is always what kind of bothered me about his arguments from the first place.
Robert: 00:36:12
It’s like the smartest thing always wins, and I just thought, well, that’s a pretty sweeping statement, that kind of applies to organic life, but this is kind of a different thing and maybe we should think it through in very concrete and specific terms. Anyway, you get. I am someone who always likes to have a very clear picture of things.
Robert: 00:36:28
And I’m very bad at absorbing abstractions without grabbing the person and saying, give me an example. I just can’t follow you any further. Totally.
Liron: 00:36:38
Okay. I mean, maybe it would be more concrete to be like, okay, imagine a country that’s like a, you know, a million, or let’s say a decent sized country, it’s like a hundred million people. But they’re all Elon Musk. And you know, Elon Musk is just known for being very effective.
Robert: 00:36:50
And we have another thought. Exactly.
Liron: 00:36:53
I’m just saying very effective people. Right? Intelligent, effective people. Whatever you think about Elon Musk crazy, the guy gets stuff done. Right. So it’s like, you know, I imagine that that country has a few years to build their infrastructure. I feel like that’s going to be a superpower, probably the most powerful country on earth.
Headroom Above Human Capability
Robert: 00:37:09
Well actually. I think it might dissolve in brutal infighting and insanity. So that’s a good example of what I mean.
Robert: 00:37:19
You have to look at the specifics. What would a society with a bunch of narcissistic egomaniacs who are very smart, look like? We’ve never seen one. It’s hard to say in the abstract how that would play out. Yeah. I mean, fair.
Liron: 00:37:31
Enough. Right? I guess that’s probably not the cleanest metaphor. I guess what I’m saying is you wanted concrete imagery of what it looks like when we’re faced with this powerful, intelligent agent. Right. I think maybe the best intuition is just imagine watching a script or a person just imagine watching very impressive productivity. Like, wow, you know, like a 10x engineer or whatever.
Liron: 00:37:51
Like, wait, you just whipped that up. I’ve been working on that all day. You just sat down and whipped it up. Right. But now just generalize that to stuff’s getting whipped up second after second, right? CPUs run really fast, stuff gets messed up. Yeah. No, I,
Robert: 00:38:02
I, I don’t think people realize how close we are to a situation that we really can’t conceive of. And I guess what I’m saying in a way is that Eliezer thinks he can conceive of it. You know, it’s like, yeah. It’s scary enough that we can’t conceive of it.
Liron: 00:38:20
Right. I think we gotta talk about just how much headroom is there above human power, right?
Liron: 00:38:26
So like, we are used to in a given year, we can do quite a lot to the world around us, right? We can invent stuff, we can run our supply chains. We’re used to a certain amount of impact on the world around us in a given year. So I have to convince you something that I’m convinced of, which is I think that impact could be a lot higher, right?
Liron: 00:38:43
I think we could be organizing our world to be doing a lot more per year, and I think the AI is going to tap into that headroom.
Robert: 00:38:50
Oh, god knows.
Liron: 00:38:52
Right. I mean, so if you accept that, I mean, that’s a load-bearing premise, right? If you’re like, no, we’re kind of operating near the limit of what we could, you know, factories already running at full capacity.
Liron: 00:39:00
If that’s really what you think, then I’m like, okay, then maybe AI is going to struggle to take over all of humanity. But I don’t think it’s going to struggle. I think it’s just going to totally outproduce us, you know, just be a lot faster than us at everything.
Robert: 00:39:13
That seems clear. I mean, not to everyone, but yeah.
Robert: 00:39:17
I agree. The, yeah, and I mean, I am a believer in what an earthquake this is gonna be and one reason I guess, you know, I tend to focus my arguments just on that. It’s like, look at the magnitude and abruptness of the impact. I mean, job market, family life, mental health of adolescents. It’ll be good and it’ll be, you know, it’ll prevent a lot of suicides, I’m sure.
Robert: 00:39:44
But if it’s causing some, it’s gonna freak parents out. It’s gonna be a whole social issue. And there’s just so many areas where we’re gonna have to go, like, wait, how do we adapt to this? That, on those grounds alone, I think the wise thing would be to slow this down if we can. And I kind of worry if you premise too much of the argument on the more sci-fi scenarios, right?
Robert: 00:40:09
Because there’s some people who just are never gonna respond to that, or they’re gonna be like me and say, can you flesh that out a little more?
Liron: 00:40:15
Yeah, I, I hear you. I mean, there’s lots of other types of arguments that can be made. I mean, if you told me for sure that humanity is going to survive into the next century, would I then go and argue for, to be careful about unemployment and gradual disempowerment?
Liron: 00:40:28
I don’t think so. I’m just much more worried about going extinct. That is really my true worry right now. And I think that there’s a pretty good chance of that.
PauseAI: The Silent Majority
Robert: 00:40:37
Yeah. Let’s talk about, I do wanna eventually get into rationalism per se, but quickly, the pause movement. So you’re part of it.
Robert: 00:40:49
I’ve had Holly Elmore on who’s part of it, I guess. Yeah, Holly’s great. Those are great episodes.
Liron: 00:40:54
I highly recommend everybody listen to Holly’s content.
Robert: 00:40:56
Holly’s great man. She doesn’t hold back. Yeah. Like, I wish I had, you know, kind of her courage. I mean, totally. She doesn’t spend a lot of time worrying about what you think about her.
Robert: 00:41:09
That’s a very rare feature in humans. That’s true. Certainly in me. And it makes her, among other things very fun to talk to. So, but the movement, and I mean, one thing I admire and I admire this about you too, is like the movement. I don’t know what the current state of it is, but as of a year and a half or so ago, it seemed like some of these gatherings were pretty lonely, right?
Robert: 00:41:33
It’s like there’s half a dozen people out screaming, you know, and passers by are looking at you like you’re crazy and you’re sticking with it. I really admire that. But what would you say is what’s the state of the, what’s the state of the Pause movement and what are your hopes for progress in the near term?
Liron: 00:41:52
As far as I can tell we’re the majority. The problem is that people don’t see the urgency, so that’s where the movement is. It’s actually insane. Yeah. You can make a Venn diagram of people who realize why AI poses a huge risk. That’s like a pretty large circle. And then there’s another circle of people who are willing to shout about things that they’re really worried about.
Liron: 00:42:15
That’s a large circle too, but there’s actually very little overlap. So yeah. It’s, you know, Holly and I did an episode of my podcast about this. Yeah. It was basically, we called it the circular firing squad, which is like all of us people who are in early on, noticing AI risk. The mindset is like, it’s this very like.
Liron: 00:42:34
You know, you have to be calm, you have to be gentle. You don’t wanna make enemies, you don’t wanna be shouting into a bullhorn. You know, protesting is stupid. Politics is stupid. And so all of these people kind of stay out. And ironically, because then other people who are not on our side, you know, like the accelerationists or whatever, they’ll accuse us of being so powerful and so loud and manipulating everything.
Liron: 00:42:53
Where it’s like, no, we really are quite shy and uninvolved. So that’s, that’s my perspective. So, but the crazy thing is that we have a sleeper cell, which is the average person. When I talk to my in-laws, I’m like, yeah, this is why I think I need to pause AI. What do you think?
Liron: 00:43:06
And they’re like, yeah, pause AI. That’s their reaction. But are they going to come out on a protest? No, because to them it’s just like, look, it’s just on the computer. Right? They’re not seeing the urgency of it.
Robert: 00:43:15
Yeah. No. One thing that kind of drives me crazy is when people talk about the AI safety movement as if it’s this incredibly powerful and well-resourced thing. I mean, yeah, there’s a non-trivial amount of philanthropic money going toward it. But hey folks, let’s add up the market value, you know, market capitalization of the companies that have an interest in zero regulation of technology and.
Robert: 00:43:42
They are having a lot of influence. Yeah. I mean, last I heard,
Liron: 00:43:45
Right? Marc Andreessen, a16z and OpenAI have teamed up to inject a hundred million dollars in lobbying for their cause. So it’s just insane that when I go on Twitter, I hear a lot of AI doomers are injecting, you know, I mean there’s some rich people, right?
Liron: 00:43:58
There’s Jaan Tallinn on our side. Dustin Moskovitz is pretty worried, I think about AI risks. And you know, maybe one more I haven’t named against these AI companies that are already multi-trillion valuations, lobbying like crazy. Yeah. So, I mean, it’s like, you know, talk about hypocrisy accusing us of having money.
Liron: 00:44:15
And by the way, for all of these supposed doomers who have this money, you know, I have a donation account. Okay. That so far has been very light on the donation front. Yeah. So I can personally vouch for this not being a big source of funds.
Robert: 00:44:27
Yeah. I mean, and the big companies have clearly had their way with the American political system.
Robert: 00:44:33
Nobody’s laid a glove on them. I mean, it isn’t just that there’s no regulation to speak of. And I acknowledge regulation is a complicated thing, but there’s basically none. It isn’t just that. It’s like everybody’s like, Sam, you’re right. We should really subsidize your nuclear reactors or something.
Robert: 00:44:49
What can we do for you? You know? And it’s just kind of amazing. But you’re right. There’s a lot of people who don’t, kind of, don’t like them. It’s just that, I don’t know. They have a lot of money. Yeah. Or something. And Altman’s good. I mean, he’s very, I mean, he’s impressive. He’s totally, I mean, a good politician. It’s,
Liron: 00:45:08
It’s, I mean, it’s amazing to watch these people work. I mean, you know, he’s smooth, he’s productive. I mean, it’s like watching a virtuoso at work, like these tech CEOs are. I mean, so I’m a big fan of watching him. The only problem is he’s on the dark side. I mean, he’s just, at this point from my perspective, he’s just doing whatever you would do if you’re just trying to have the most money and power, you know, he’s just playing that game.
Liron: 00:45:30
I mean, it’s a fun game.
Robert: 00:45:31
No, it’s like Paul Graham said what he’s good at is getting power. And, you know, Graham ushered him onto the stage, he should know. So, okay. So, as you know, the way this thing works is we do, you know, about half of it in public for everyone, and we go behind the paywall into overtime.
Going into Overtime
Robert: 00:45:53
People can listen to that by becoming paid subscribers with non-zero newsletter. And then you can set up your own special feed for that. But I wanna, and once we move into overtime, I do wanna talk a lot about the rationalist subculture, which fascinates me, but I wanna give you a chance to sum up or do any self-promotion.
Robert: 00:46:16
So, Doom Debates is your podcast. People should check that out. Very well produced. And just very well done. Very, you have higher, higher production values than I’ve seen too. And what else do you want to promote? Do you have an official connection to the pause? Either the two pause organizations or,
Liron: 00:46:41
I’m not on their leadership.
Liron: 00:46:43
I’m just a member. I’m a member of the grassroots organization called PauseAI. I encourage everybody to go to pauseai.info or pauseai-us.org. Is another site for the US branch run by Holly Elmore. And yeah, definitely. The thing I’d like to promote is my own channel. It’s called doomdebates.com or youtube.com/at doom debates. I’ve got Gary Marcus on the program.
Liron: 00:47:06
Vitalik Buterin recently came and debated me. If you guys know d/acc, he’s pitching d/acc, and I was skeptical about it. So we debated that. Bob came by for a recent launch party on Doom Debates as well. So I encourage people to check that out for more smart people who are way too optimistic about AI and me giving them a piece of my mind.
Robert: 00:47:28
Alright. Okay. So thanks everybody who followed us this far. Hope you’ll follow us into overtime and here we go into overtime.










