0:00
/
Transcript

Live Q&A: Bernie Sanders Wakes Up to AI Doom, Dwarkesh's $20,000 Questions, Caller Debates the Alignment Problem!

Multiple live callers join this month's Q&A as I react to Dwarkesh Patel's $20k blog prize, debate the orthogonality thesis from first principles with a live viewer, and welcome Bernie Sanders aboard the Doom Train! πŸš‚

Timestamps

00:00:00 β€” Cold Open

00:01:00 β€” Welcome to Doom Debates Live!

00:01:30 β€” What Do You Think of Open Source Models Out-Benchmarking OpenAI and Anthropic?

00:04:55 β€” Michael Cheers Joins: What If We Don't Give AIs Full Situational Awareness?

00:11:55 β€” Thoughts on Mythos' Hacking Abilities?

00:15:43 β€” Liron Reacts to Dwarkesh Patel's $20K AI Questions

00:23:28 β€” Pretraining Goals vs RL Training Goals

00:28:58 β€” Mental Model of Yudkowsky-ians & the IABIED Claim

00:37:24 β€” You Can't Hide Reality from a Superintelligence (The Truman Show Analogy)

00:42:57 β€” Back to Dwarkesh's Questions: When Do AI Labs Start Making Money?

00:48:50 β€” Upcoming Guests Reveal!

00:51:35 β€” Will Lancer Joins: Is The Yudkowskian Thesis Credible?

01:27:03 β€” Back to Answering Questions from the Chat

01:33:28 β€” The Cameraman Always Survives Analogy

01:40:52 β€” Liron's Banger Response to Roon's Tweet

01:47:00 β€” Nuance About Pausing AI Development

01:50:57 β€” Capitalism Isn't Going to Steer Us to an Alignment Solution

01:53:10 β€” Is Optimization Equivalent to Intelligence?

01:57:21 β€” BREAKING: Bernie Sanders on the Existential Threat of AI

02:01:12 β€” Spoiler for the Upcoming Mike Israetel Episode

02:01:57 β€” $500 Bet on AI Unemployment

02:05:46 β€” Misuse, Surveillance, and the Real Costs of Pausing AI

02:11:04 β€” Wrap-Up

Links

Dwarkesh Patel, Blog Prize for Big Questions About AI β€”

Dwarkesh Podcast
Blog prize for the big questions about AI
Read more

Doom Debates episodes with Steven Byrnes β€”

Nick Bostrom, Deep Utopia: Life and Meaning in a Solved World (Amazon) β€” https://www.amazon.com/Deep-Utopia-Meaning-Solved-World/dp/1646871642

Yudkowsky & Soares, If Anyone Builds It, Everyone Dies (book) β€” https://www.amazon.com/If-Anyone-Builds-Everyone-Dies/dp/0316571253

Transcript

Cold Open

Will Lancer 00:00:03
I heard this recently from an AI safety researcher and it was: what if you found out that all of your morals are obviously fake? I don’t understand why you would stay so attached to them.

Liron Shapira 00:00:12
When I was in college, the fundamentalist Christians that I went to college with would always be like, you’re an atheist β€” if I were an atheist, I’d go around stabbing people. I’m only good because I listen to God.

Will 00:00:22
I don’t think serious Christian philosophers think this anymore.

Liron 00:00:26
It feels intuitive to you that because we created AI, the AI knows that it owes us letting us debug it. But if we give it certain preferences by default, it’s just going to go with the original preferences.

Will 00:00:38
It doesn’t seem that extreme to me of a belief where it’s trying not to kill people, trying not to do this, trying not to do that. It feels like a one-in-a-million YOLO of these preferences.

Liron 00:00:49
Yeah, look, it’s β€” we maybe β€”

Welcome to Doom Debates Live!

Liron 00:01:00
Friday, April 24th. Welcome to Doom Debates Live. Hi everybody.

All right, so your questions are coming in from the chat and you guys have these crowns in YouTube. I’m seeing Michael Cheers, 803. For some reason he’s got the number one crown β€” I guess he’s been hyping up the show, so thanks Michael. Appreciate that.

So Michael’s saying β€œI want to call in.” Okay, let me get you guys the call-in link. This link right here that I just pasted in YouTube, this is the live call-in link and anybody’s welcome. We are an equal opportunity debate host here.

All right, so somebody’s saying: what do you think of open source models like Kimi starting to out-benchmark OpenAI and Anthropic?

Liron 00:01:39
That’s interesting. When you say out-benchmark, do you mean out-benchmark their open source models, or are you specifically thinking about the cutting edge GPT-5.5 getting out-benchmarked? I’m not entirely sure what you mean, so maybe clarify that.

Hey, Pun Master, Pun Master crown number three. Yeah, definitely an active commenter. I’m seeing a lot of Pun Master comments on the videos. Thanks for your engagement. All right, we got Producer Ori in the chat. Everybody say hi to Producer Ori. Let’s give him a like reaction.

What Do You Think of Open Source Models Out-Benchmarking OpenAI and Anthropic?

Liron 00:02:08
So yeah, in terms of out-benchmarking, I do think it’s correct. I think I did hear that in terms of OpenAI’s official open source models getting out-benchmarked by some of those other open source models. The consensus is just that open source tends to lag six months behind. The only question is: is it gonna lag three months behind, six months behind, twelve months behind? Probably always going to be in that range.

And it is interesting because the question is, what advantage do these frontier companies have? Why are they gonna defend their gross margin when there’s always these open source solutions nipping at their heels?

There’s a couple different answers. The first answer I would give is: we can make an analogy to other things that are kind of commodities. Cloud computing is kind of a commodity, and yet Amazon AWS, Google Cloud β€” these all have healthy margins, even though you can ask, why do they have healthy margins? There’s other clouds that spin up, aren’t the clouds getting competed down?

Liron 00:02:56
Well, for whatever reason, they have huge scale and they have healthy margins. There’s just a few huge scale players, and they all have healthy margins. People are happy to use the clouds. I use the cloud for my business and yeah, am I paying 20% more than I have to? Sure, but do I care? No, because it’s still a good deal. I’m getting a lot of value running these servers.

So that business model might very well apply to AI tokens. If the singularity doesn’t happen, if we still have normal life and we’re paying for all these tokens, maybe we just pay at a price that gives OpenAI some profit as opposed to running an open source model. But do we want to deal with an open source model? No. We’d rather just pay a little more and run the hosted model.

Liron 00:03:32
So in terms of economic analysis, I think it’s fine. It could work out the same as cloud computing. But then there’s this larger question of, what if the singularity is happening? I claim we are gonna enter the singularity, I claim we are gonna FOOM and everything.

In that scenario, I think the theory is that if you have the number one model that’s not open source yet, sure open source is nipping at your heels three months behind, but it doesn’t matter because you’re gonna get this decisive advantage. You’re gonna enter the positive feedback loop, your AI is gonna build the next AI and so on. And so your three months will turn into infinity.

Liron 00:04:00
It’s a very interesting situation because there’s just so many sources of pressure. Running out of money, that’s a source of pressure. Open source AI nipping at your heels, that’s a source of pressure. Other for-profit competitors, source of pressure. It’s a hothouse environment.

It’s kind of Yudkowsky’s worst nightmare compared to 2015 when it was still going slower and the AI community was small and everybody knew each other. And now it’s an all-out free-for-all, no holds barred, no rules. Nobody can stop the train.

So going back to your question, yeah, the fact that open source is close is just yet another pressure cooker element. Crazy, crazy times. To Yudkowsky, the game board has been played into an awful state. It’s a really bad place to try to strategize how to win.

Michael Cheers Joins: What If We Don’t Give AIs Full Situational Awareness?

Liron 00:04:48
Send in your questions. Let’s do the call-in. Here we go. We got Michael Cheers on the call-in.

Hey, Michael Cheers.

Michael Cheers 00:05:04
Hi, can you hear me?

Liron 00:05:05
Yeah, I can hear you fine.

Michael 00:05:06
So yeah, I was just curious on your thoughts on whether the AI companies could go with a safer approach than what they’re doing now. Because I think the current approach is kind of dangerous, in terms of training the AI so that it knows it’s an AI and everything. It knows all about humans. It seems very β€”

Liron 00:05:23
Okay.

Michael 00:05:27
Do you have a specific proposal? Let’s say you have the LLMs make an alternate world, right? Then that’s the training data you give it. You don’t train it on any data from the human world. From there, if you want to have it reason about human things, you’d only give it the in-context information it needs. And that way it has a lot harder ability to break out, right?

Liron 00:05:51
Yeah, interesting idea. So just to summarize: you don’t just give the AI full situational awareness, you give it kind of need-to-know basis. β€œHey, I’m asking you a question. Don’t think too hard about everything.”

I think research there is great. Whatever we can learn in that direction is great. But there is a fundamental problem, which is that if you are very intelligent, if you are very good at solving problems, it’s natural to just be like, β€œOkay, what’s my situation? What could I learn about the situation?” The same way that humans ask, β€œWhat are the laws of physics? What are the rules of the video game here?”

Liron 00:06:21
And it’s hard to avoid learning things about how to break out or how to manipulate people because these levers are there. And if people are trying to hide them from you, you’re still gonna see signs that they tried to hide things from you.

So I guess you’d have to be steered β€” it has to be in your nature to not be too curious and focus on the problem. And current AIs do seem to already do that. So the part where we don’t tell them that much about the situation β€” I don’t know how much work that would do.

Failure of imagination is not something that I would count on for a superintelligence. One way to think about a superintelligence is that it really does see all the possibilities. Enumerating possibilities is a pretty fundamental skill. When you’re building an intelligence from scratch, you basically can’t miss it.

Liron 00:07:07
Even humans who don’t realize they’re doing it, they do it in certain domains. When you’re intuitively good at something, your brain is doing it even though you can’t do it in a general capacity. So I’m just not optimistic about a world where there are things that would be obvious to a smart human thinker and yet the AI is somehow never thinking about it. I just don’t think that’s how a plausible win looks.

Michael 00:07:31
I guess the thinking would be that it’s an incremental approach. You have your AI, it tries to make the world even more unlike our world, add better safeguards to stop it thinking about simulation theory, et cetera.

Liron 00:07:43
It’s an idea that has a simple model in theory, but I would want to look for more specificity in the proposal. The idea is that every intermediate AI is so perfectly aligned that we can trust it to build the next one, but also to robustly secure the next one. And I’m β€”

Michael 00:07:59
The security model would be you try to stop it understanding that it’s an AI and it’s in a simulation and all that. If it understands that, you’ve kind of already failed in my model.

Liron 00:08:13
Right, right. Yeah, even we as humans have this hypothesis on the table that we’re living in a simulation. And the only thing that stops us is not that it didn’t occur to us β€” it’s just that we’re looking around and we’re like, well, I just haven’t collected any evidence. The only evidence we have that we’re in a simulation is epistemological evidence of, why is life so interesting? But I can’t prove it. There’s literally nothing I can prove.

Whereas with the AIs, they’re gonna have a lot of ways to prove it. Our simulation is not going to be as robust as whatever the aliens are doing to us. If it’s aliens β€” you guys are doing a great job with the simulation. I haven’t found any cracks.

Michael 00:08:45
Yeah, I don’t know. I just think that it’s not anything like what the AI companies are actually doing, which is what’s interesting about it. Maybe there could be some very different approach where you try and make sure they don’t understand anything. But the current approach is just like they’re gonna do whatever they want and then try and paper over it. I don’t think that can possibly work.

Liron 00:09:06
Yeah, well, I definitely agree with you there. These are good questions.

Michael 00:09:09
Okay, great. Well, I’m gonna let you go just because the connection’s kind of janky anyway, but I’ll give you my response offline.

Liron 00:09:16
All right, thanks for coming on. This was fun.

Thoughts on Mythos’ Hacking Abilities?

Liron 00:09:16
So that is food for thought β€” this idea of don’t tell it everything, and AI companies aren’t even pursuing this direction maximally.

I think Buck Shlegeris at Redwood Research thinks a lot about controlling AIs instead of aligning AIs. His whole focus is: yeah, maybe the AIs aren’t gonna be aligned once they get superintelligence, and that’s scary, but there’s gonna be this whole transitionary period where they’re gonna be messing with us and they’re not gonna be super, super intelligent.

So as long as we’re really good at noticing when they’re escaping and discouraging them from escaping, we can just use all the tools we have before they get way too intelligent to deal with. And yeah, sure, somebody should be researching that. I’m not against it. I don’t have much hope that it’ll work, but it can buy us a few years.

Liron 00:09:58
The idea is intuitive to imagine that it could work. Michael proposed a specific solution: have this other AI that’s monitoring the thoughts, being like, β€œAre you thinking about escaping? Are you thinking about how we’re keeping you in the box? Okay, restart.” It’s this intuitive kind of monitor process.

And the problem I see with that is: I think it can buy us a little bit of time, it’s not a worthless solution, but if you accept the premise that it’s getting superintelligent, the thing is that superintelligences just have so many options. They see so many possibilities and they’re doing a lot.

Liron 00:10:32
It’s managing all these things. It’s got all these child AIs. And you think, what’s the problem? I’ve got the monitor process. The monitor process is catching things. But it’s just going to correlate β€” the AI getting more powerful and doing more things is also going to correlate with it somehow getting around the monitor process. I think that’s a good intuition.

It’s just gonna be like, β€œHere’s a bunch of plans.” The monitor process is like, β€œWell, I don’t really see these setting off the monitor.” But it turns out that all of those plans are just giving the AI enough context to know about the situation.

Liron 00:11:00
You can’t do these hacky solutions. One of the sources of my intuition is that I know a little bit about computer security. If you look at the way that all these clever hackers are getting around things β€” injecting scripts into different websites. A website that’ll paste whatever you type in onto the webpage, but then you type in JavaScript, a script tag. So the website says, β€œOkay, I’m not gonna let you type in a script tag.”

But I’ve seen all these clever hacks of, β€œOh, well actually, if you type all these crazy characters, you actually get around the logic that was supposed to block those characters.” It’s just crazy how many degrees of cleverness, how many holes β€” when you think something kind of should work, it just turns out to have a lot of holes.

Liron 00:11:35
Unless it really is logically airtight. And even if it feels logically airtight, you probably still haven’t thought of the holes. But when it doesn’t feel logically airtight, when it’s just, β€œLook, we’re monitoring β€” what’s wrong with monitoring?” β€” I’m telling you the monitoring is not gonna work.

All right, that was a good discussion.

Liron Reacts to Dwarkesh Patel’s $20K AI Questions

Liron 00:11:55
So Pun Master saying, β€œI’m curious to know your thoughts on Mythos as well. Do you agree with John Sherman that it could be the end of encryption?”

No, it’s not the end of encryption. We have encryption algorithms that Mythos can’t break. RSA cryptographic encryption is based on the difficulty of reversing prime number multiplication. Quantum computers might be able to reverse that, but we’ve got other encryption schemes too.

Encryption schemes are actually a matter of computational complexity theory. And as smart as Mythos is, even superintelligent AI might not be able to directly attack the foundations of computational complexity theory.

Liron 00:12:29
I always talk about different ceilings. GΓΆdel’s theorem and logic is one ceiling, and P versus NP is a ceiling from computational complexity theory. Reversing encryption is actually pretty close to that ceiling.

So when I think about AI stealing Bitcoin and stuff, I mostly think of side channel attacks β€” just convincing a human to give up the wallet, the same way that humans attack other humans that way.

I will say effectively, I don’t think encryption is going to hold back AI because of all these side channels. But can it directly break encryption? Probably not. There’s probably some encryption schemes that it can’t break.

Liron 00:13:02
And of course, we know about information-theoretically perfect encryption. If you’ve ever heard of the one-time pad β€” if you just have a big pad of random numbers that you share with somebody, that lets you perfectly encrypt things in a way that nobody can reverse engineer without having the pad of random numbers.

The only flaw with it is that it’s annoying to give them the big pad of random numbers. But if you’re okay giving your friend a giant book full of random numbers, you and your friend can always perfectly symmetrically encrypt things for the rest of your life. And nobody can ever break your encryption unless they have the pad.

Liron 00:13:30
So literally the only disadvantage is that I haven’t gone and met Amazon.com in a dark alley and given Amazon.com a big book of random numbers. But if I had, me and Amazon would be perfectly secure talking to each other for life. That’s the only downside β€” we haven’t met in secret before.

And then there’s this idea of public key encryption where you don’t have to meet in secret. You can just show up and yell at somebody across the room. It could be a crowded room. You’ve never met the other person. You’re yelling at the other person across the crowded room. And yet somehow, even though you’ve never met the other person, you’re yelling in such a way that nobody else in the crowded room can decrypt what you two are talking about, even though you’ve never exchanged the secret before.

Liron 00:14:00
That’s the magic of public key cryptography. And we’re not totally sure that public key cryptography is going to remain robust. It might not be. But if I had to guess, I would say that even public key cryptography β€” even though it’s less provably secure than the one-time pad β€” I suspect it will never be broken in the general case. That’s my guess.

Hope that answers your question. I have a little bit more knowledge on this than the average person because this happens to be the one thing I actually studied in college β€” a little bit of theory of computation.

Pretraining Goals vs RL Training Goals

Liron 00:14:32
So this guy is saying, β€œIn other words, one’s P(Doom) should swing drastically based on a substantive model interfacing with reality. What do you actually think will happen and why? 50% strikes me as a Bayesian trade cop-out.”

Okay, a substantive model interfacing with reality. My substantive model basically gets down to: if anyone builds it, everyone dies. The substantive model is that intelligence is freaking powerful.

Where is your substantive model that explains how humans dominate the other animals? How do you explain that? β€œOh, because we have a brain. A brain is this magic dominating thing.” Okay, is it the maximally magic dominating thing? No, it’s not. Where does the scale end? It ends at some superhuman entity. That is a pretty powerful mental model. I think that mental model has a lot of predictive power.

Mental Model of Yudkowsky-ians & the IABIED Claim

Liron 00:15:30
All right, hope that answers your question. Each Shiz is out here, welcome. He’s saying, β€œLet’s see this guy broadcasting from his face.” Yeah, he’s talking about the guy with the slow connection, Michael.

All right, Producer Ori is saying, β€œLet’s get a first reaction to Dwarkesh’s $20,000 questions.” Oh, yeah. All right, let’s pull it up. Let’s pull up Dwarkesh’s questions. I’ll go check his Twitter. Here we go. This could be a Twitter browsing session β€” that could be something we do here.

Liron 00:16:00
All right, here we are on Dwarkesh Patel’s Twitter.

So he’s saying, β€œ$20,000 blog prize to answer some big questions about AI.” Let’s click through.

All right, check it out guys. β€œBlog Prize for the Big Questions About AI” by Dwarkesh Patel. He says, β€œThe not-so-secret point of this whole contest is so that I can hire a research collaborator.” Okay, yeah, I respect that. You’re running a contest to find a research collaborator.

Liron 00:16:26
All right, so he’s asking these questions with a bounty on them. And the first question looks interesting. He says: β€œA couple years ago there was this idea that AI progress might slow down as we make further progress into the RL regime” β€” that stands for reinforcement learning β€” β€œbecause as horizon lengths increase, the AI needs to do many days’ worth of work before we can even see if it did it right.”

β€œSo if we’re still in a naive policy gradient world, the reward signal per FLOP goes down. And we crossed through so many orders of magnitude of RL compute from GPT-4 to o1 to o3, and it would not be feasible to replicate that many orders of magnitude increase in compute immediately again.”

Liron 00:17:04
β€œBut AI progress seems to have been fast nonetheless, even potentially speeding up if rumors about Spotter Mythos are to be believed. What gives? What did that previous intuition pump that motivated longer timelines miss?”

Okay. First of all, I don’t think I personally was ever staunchly predicting longer timelines. Let the record show that in late 2024 when so many people were like, β€œOh, those AI-2027 guys are so dumb, timelines could be so long,” I was like, no, no. I don’t think so. Sure, we’ve had a few months of relative quiet, but it’s too early to say. I was very clear on that.

Liron 00:17:38
So don’t blame me. But Dwarkesh is asking a good question because there was a vibe shift. After 2023–2024, people were like, β€œHey, nobody’s being that much better than GPT-4. Even GPT-5 is not that much better than GPT-4.” Although I pointed out that it kind of was β€” it actually was better. People just didn’t notice because it slowly got better over the years.

But people were like, β€œI haven’t been super impressed by AI for a while. I’m kind of used to it.” But then of course, January this year roughly, Claude got better and suddenly Claude is literally writing all of our code now.

Liron 00:18:08
This is what Dwarkesh is talking about. He’s like, β€œOh, it qualitatively feels much better. So why didn’t more people predict that we were due for another qualitative shift?” And by the way, when he says β€œSpotter Mythos,” Spotter is the OpenAI upcoming version of Mythos.

Let me just make sure I’ve understood everything Dwarkesh said. Let’s read it one more time because it’s super dense β€” a whole paragraph worth of a question.

Liron 00:18:26
He says: β€œA couple years ago there was this idea that AI progress might slow down as we make further progress into the reinforcement learning regime.” He’s talking about how these thinking models β€” when you train the entire chain of thought, you have to evaluate, β€œOkay, what did your chain of thought yield? Oh, it yielded something good. Okay, let me reinforce the entire chain of thought.”

As opposed to predicting the next word, where you can predict a hundred different next words in the course of a couple sentences because the sentences have so many words in them. But in the course of an entire problem-solving output, you’re outputting so many tokens and then you only get one bit of information: did you solve the problem or not?

Liron 00:19:09
That’s basically Dwarkesh’s point: didn’t we enter a weaker regime? And if we entered a weaker regime, how come we still made a bunch of progress? It’s a good question. We made a bunch of practical progress.

And then Dwarkesh is also saying, in addition to that, we crossed through many orders of magnitude. We basically used up the hardware overhang. GPT-4 was kind of the first time anybody had taken a bunch of hardware and applied it all toward training AI in parallel. It just wasn’t something we were used to doing. Then suddenly we did it and used the hardware overhang. Now that we have less hardware overhang, how come we’re still making the AI so much faster and better?

Liron 00:19:46
I think I have a simple answer to the second one: an ounce of algorithms is worth a pound of hardware. I’ve always been saying that’s been a consistent trend throughout history β€” you can take the same hardware and think harder about your software and make your software better and get more out of the hardware. That’s been a very powerful trend.

It doesn’t surprise me at all that GPT-4’s price slashed by a factor of 10 within a few months of being released. That’s definitely what I was expecting.

Liron 00:20:05
We have this other anchor point of the human brain β€” running Einstein at 20 watts, running Einstein at 2,000 calories a day. There’s this anchor point of, you don’t have to burn a lot of resources to have all this intelligence. So it doesn’t surprise me that we’re still milking a lot out of the hardware.

And of course the other thing is that hardware is coming online now. It’s not just that the software makers decided to parallelize AI training. This is also the first few years where the hardware makers have been like, β€œOh, holy crap, let’s go parallelize AI training.” So Nvidia has really gotten its butt into gear. Google has gotten its butt into gear more than ever.

Liron 00:20:35
So I don’t find that part particularly surprising. Maybe the more surprising part is the first thing he said about the reinforcement training. If we don’t have a lot of bits of information to train how to write code better, how do we suddenly get better as a user of Claude Code?

I actually think a big part of the equation is the harness. This idea of, for a while Claude has kind of known what to do β€” the model has been able to explain what to do. And I think we’re getting a huge burst of power just by being like, β€œOkay, you kind of know what to do. So activate this tool, get the results of this tool.” We’re just teaching the AI habits of how to be resourceful, which are simple on an absolute basis.

Liron 00:21:10
We could have been talking to GPT-3 in late 2022 about the order in which to activate tools. And don’t get me wrong, the new AI is better at that. But I actually think it just takes a small step from GPT-3 to Mythos. I think a lot of the value of GPT-5.5 or Claude 4.7 really is just playing nice with the harness β€” knowing about this idea of β€œI’ll use that tool and that tool.” That’s my guess.

But then once you get to Mythos β€” maybe when you ask about Mythos identifying vulnerabilities, maybe there’s not as much tool use. So maybe you can’t just say it’s the tool use harness.

Liron 00:22:05
I obviously don’t have much firsthand experience with Mythos, but it’s a good question. Why did Mythos suddenly get way better at identifying bugs in software? Maybe there’s no simple tool use answer. Maybe it has to do with the quality of the reasoning somehow improving, and it had to do with training reasoning differently than how we train tokens.

This is a very good question. Dwarkesh has asked a good question. I don’t think I have the full context. I think there’s probably a missing puzzle piece that might have to do with the details of their training.

Liron 00:22:34
I suspect there’s a certain way that they’re training this chain of thought that is more efficient. I suspect it’s not like you do the whole chain of thought and then get a token at the end. They probably somehow are identifying whether the chain of thought is good after only ten tokens or something. I don’t know what they’re doing β€” that’s probably proprietary secret sauce. But that’s my speculation.

So we’ve identified a real mystery. You’d think Dwarkesh would just have friends in the AI companies β€” he’s more well connected than me β€” so they would just go tell him the secret answer off the record. So I don’t know.

You Can’t Hide Reality from a Superintelligence (The Truman Show Analogy)

Liron 00:23:04
All right, let’s see what you guys are saying in the chat.

So Cardish Shev 78 is saying, β€œThere is already more than enough compute to achieve superintelligence.” I definitely agree with that. I say that a lot on the show. Just look at Einstein’s brain β€” doesn’t use that much compute.

And then Pun Master saying, β€œMaybe some or most of what makes Mythos impressive is hype.” All right Pun Master, that’s fair.

Liron 00:23:39
And then Andy Mann 738 is saying, β€œI have a possible silly question. Why would today’s AI want to shunt their RL-trained preferences toward niceness and harmlessness in lieu of just radically pursuing their random pre-training preferences?”

So I guess this is kind of supposed to be a gotcha, of, because I’m a doomer and I claim AI is gonna be super dangerous β€” and this is a question of, well, why is it so nice today?

Liron 00:23:54
This is a question I ask pretty regularly. If you go back a month or two to my latest Steven Byrnes episode, that was a big question I had for him. I was like, β€œSteven Byrnes, you still expect, as I do, this big bad FOOM? So what do you make of the fact that Claude Code is so nice and helpful today?” Yeah, occasionally it goes wild and I get mad at it, but 95% of the time it’s so nice. How do you explain the nice part?

And he was basically saying, β€œYeah, I agree it’s nice today, but I think there’s gonna be a discontinuity precisely when it gets more powerful, when it gets trained more on outcomes than on simulating human trains of thought. Then it’s going to get more powerful.” Which I found actually super convincing.

Liron 00:24:27
But let me see if I can add more to answer this question. So: why would today’s AI want to shunt their RL-trained preferences toward niceness and harmlessness in lieu of just radically pursuing their random pre-training preferences?

Oh, okay, I think you might be asking a different question than what I just said. There’s some nuance to this question. I’m not even sure I fully understand.

Liron 00:24:43
So the idea is that when you do RL, you’re teaching the AI certain preferences, but they already had these other preferences from predicting the next token. This giant black box, these giant matrices β€” the whole system is somehow really caring about getting that next word right.

But then also the sequence of words. I wonder what’s a good analogy for that. You already have this super-optimized word predictor system, but then on top of that you’re training it to go reason about stuff and rewarding it for correct reasoning, but somehow it’s a separate layer.

Liron 00:25:29
But it does propagate back into your next word prediction, I think. When you run the LLM again, the LLM is predicting the next word based on fine-tuning. I don’t know if you’d call it fine-tuning. So honestly, this is beyond β€” I don’t really have the nuanced understanding to make sense of this, but I would love for somebody to come explain it to me.

So if you guys can suggest a Steven Byrnes type β€” Steven Byrnes actually isn’t an expert on this particular question. He kind of thinks more abstracted than this, he said this in the last episode. So he wouldn’t be the right expert.

Liron 00:26:05
But if you guys know a particular expert who can talk to me about the question I just asked, that would be a great episode on the show. The stats show that you guys like episodes like that. The recent Steven Byrnes episode where we really broke stuff down β€” I felt like that was an episode people should watch because it has good nuance in it.

Most of my episodes don’t have that much nuance because the guest is really bloviating and I just have to talk about basic stuff. But the Steven Byrnes episode had a lot of nuance. So if you want to suggest somebody who has a nuanced understanding of stuff and you want me to talk to him or her, I’m happy to do that and share that kind of learning. Just like Dwarkesh β€” that’s what he does, he shares this kind of learning.

Liron 00:26:38
Okay, Will is saying, β€œI’m not trying to gotcha, but I just don’t see why it’s going to happen.”

Let me see if there’s anything else I can think about this question. So the idea is that you shunt the RL-trained preferences toward niceness and harmlessness in lieu of just randomly pursuing the pre-training preferences.

Liron 00:26:55
You can imagine a scenario where even though you had all these trials, all these tests where you took the full system combined with the reinforcement learning and you’re like, β€œYep, here’s a cookie for being nice, here’s a reward for being nice, here’s a weight update for being nice to us” β€” at some point, the part of it that predicts the next word kind of takes over and it’s like, β€œYeah, yeah, I know how to be nice and solve these logic puzzles, but here’s some stuff I could do to get a higher score on predicting the next word.”

And you have these conflicting drives. I don’t know if that’s an accurate model though, because wanting to predict the next word once it’s been fine-tuned by that extra layer β€” I just don’t know if what I just said is a good description of what’s gonna happen. So I’m putting a pin in this. This is just my speculation. I think I’ve gotten something wrong here.

Back to Dwarkesh’s Questions: When Do AI Labs Start Making Money?

Liron 00:27:34
All right. So β€œ8SQMBA” is saying, β€œWhat do you think of properly aligned narrow superintelligence like Mythos in the hands of misaligned actors? I think that’s a more immediate serious threat.”

Yeah, for sure. It is an immediate serious threat. And as I said, we might get a wave of hacks. I just think that’s survivable. So okay, we get a wave of hacks. Some people die, the economy suffers, stocks go down for a bit, but that is not going to end the world.

Liron 00:28:10
If I just thought that occasionally the world would get rocked by Mythos but then we’d recover β€” Mythos isn’t a knockout blow. We just keep working toward the knockout blow. We keep getting closer to the irrecoverable moment. But Mythos isn’t it.

I actually agree that we might finally notice ourselves getting rocked. All these amazing magical AIs are coming out and yeah, the job market is starting to get rocked, but we haven’t had the qualitative experience of getting rocked.

Liron 00:28:26
That’s why I go out into the world and I feel like Harry Potter. I’ve used this analogy before β€” I’m like, β€œHey, at home I have a magic wand. It’s writing my code for me right now.” And I just look at random people and they’re just living their lives because they’re not in the software engineering industry. They’re just cutting my hair. The hair cutting is the same operation they had 50 years ago.

And they don’t really see what’s coming because their world hasn’t qualitatively been rocked. So I do think things like Mythos will take us from very little qualitative rocking to a noticeable rock, but it still feels like we can get back on our feet before the knockout blow comes.

Liron 00:29:00
So Will, who was asking that subtle question before, is saying, β€œIt seems like the β€˜if anyone builds it, everyone dies’ camp necessarily accepts the hypothesis that the pre-trained preferences overtake the reinforcement learning preferences.”

Oh, okay. So I interpreted your question as this particular nuance scenario, but in your mind you thought you were just describing the exact claim that the doomers make.

Liron 00:29:15
So when you read β€œif anyone builds it, everyone dies,” the claim is that when we build superintelligent AI, if and when we build it, then everybody’s going to die because it’ll be uncontrollable. We won’t be ready to control it.

If you go look up the specific quote, it’s something like, β€œIf we build AI with anything like the tools or the architecture or the understanding that we use today to build it, then we won’t survive it.”

Liron 00:29:49
So I guess Will was taking that to mean my claim must be that the predict-the-next-word part of the training overrides the later part of the training. I wouldn’t claim in that much detail, no.

The mental model I’m working in is actually a higher level of abstraction. I’m not even necessarily talking about LLMs that were pre-trained to predict the next word. I’m actually talking about any system that is better at steering outcomes than humanity.

Liron 00:30:06
If all you tell me is there’s a system, and that system can represent a desired outcome β€” the same way you can plug in a destination into your GPS, or the same way you can plug in the rules of chess or the rules of a video game into the engine β€” what was it, AlphaZero or MuZero? There’s an engine where you just type in the rules of the game and how to win, and then it just outputs moves that win the game.

That’s my claim: if somebody makes that kind of game-player system, except the game is the universe. Or the GPS system to steer the car, except the road is the universe. You’re expanding the road β€” everything is part of the road that the GPS is routing you toward, but it’s a goal GPS.

Liron 00:31:00
Or chess, but the board is the universe. Once you have a system that does that routing better than the human brain β€” the fundamental activity that I see the human brain doing is this kind of routing, mapping from end goals to actions that get you there. Once we have a system that does that better than humanity, that is the β€œit” in the statement β€œif anyone builds it, everyone dies.” Like Bill Clinton: β€œWhat do you mean by β€˜it’?”

The meaning of β€œit” is a system that steers outcomes better than humans. Once we have that, I don’t expect that we will maintain control of where these systems steer to.

Liron 00:31:33
I think somebody will accidentally give it a bad outcome or intentionally give it a bad outcome, and that outcome will be stuck. It’ll be stuck on cruise control. The turnoff button is just not going to work. We won’t have built a sufficiently powerful turnoff button. It’ll just keep steering and we can’t unsteer, and we’re just powerless.

We don’t have much of an intuition for our whole species being powerless, although you’d think ancient humans actually did have such an intuition. Ancient humans would get buffeted by forces all the time. The weather β€” they barely had protection against the weather. They barely had protection against famine.

Liron 00:31:50
So it is actually a deep human intuition to be like, β€œYeah, I’m at the mercy of God.” So if it helps drive your intuition, you do have a part of your intuition β€” the part that tells you you’re at the mercy of God.

There is no God, but you can take that intuition and say, β€œOkay, our entire species is going to be at the mercy of a stronger force.” Because the same way that our ancestors were powerless to control their fate, we sharing the planet with a superintelligent AI are also going to be powerless to control our fate.

Liron 00:32:26
So try to repurpose that intuition of, all you can do is get down on your knees and pray. The only difference β€” well, actually this is the same β€” just like God wouldn’t actually listen to you, neither is the AI going to listen to you.

The only difference is that God would revert to the mean. When you have a bad crop or bad weather, it reverts to the mean and it feels like your prayers work because the next day the weather’s good again.

Liron 00:32:48
But with AI, there’s no reversion to the mean. That’s where your intuition is going to fail. You get down on your knees, you pray to the AI, you correctly feel powerless because you have the right intuition for feeling powerless before a superior force, but then you don’t get reversion to the mean.

Hope that’s useful to those of you who want to calibrate your intuitions with reality.

Upcoming Guests Reveal!

Liron 00:33:08
All right, we’ll get back to Dwarkesh in a second. So Philip Popinski is saying, β€œCan you imagine an argument or some kind of AI safety breakthrough that could convince you to drastically lower your P(Doom)? Or do you think that AI by its superior nature is untrustworthy and will always be so?”

So again, the precondition for the doom is that the AI is steering outcomes better than humanity. The problem is the AI can just be the nicest AI, but if it’s steering outcomes, you’re giving everybody a magic wand.

Liron 00:33:23
And let’s say the magic wand works perfectly β€” this is the scenario where we do succeed with alignment to the operator. So now everybody has a magic wand, or multiple parties have a magic wand. And it’s like, okay, we’re all casting ridiculously powerful spells. So what happens to the world? What’s the equilibrium of a bunch of people casting ridiculously, superhumanly, uncontrollably powerful spells?

I guess you can control them by casting another spell. So you have to cast another spell to fight your original spell if you didn’t specify when your original spell stops. It’s a crazy scenario, even if we solve alignment between the wizard and the wand.

Liron 00:33:58
So now you have a bunch of superpowered wizards with conflicting goals. That doesn’t seem like it’s gonna be that positive to me. The universe is not a good safe playing field. It’s not a good batting cage.

Everybody’s got their baseball bat, batting their baseballs around in this batting cage, but they’re not properly isolated from each other. They haven’t rented their own isolated section of the batting cage, and everybody’s just batting balls at each other and destroying everything. Bringing down the infrastructure of the whole amusement park.

Liron 00:34:31
That’s kind of what I see if the wands work. And then of course I think the wands won’t even work. That’s the crazy part β€” problem upon problem. I think the wands will misfire. You’ll try to cast one spell and it’ll cast another spell.

By the way, I’ve been in the process of reading Harry Potter to my six-year-old, and I’m now on book three. So if you wanna really understand what’s gonna happen with superintelligent AI, just remember how in book two, Ron’s wand broke because he crashed the car into the Whomping Willow. He spent the whole school year with a broken wand. I think that’s a good metaphor for how we are going to try to get superintelligent AI to do stuff, and it’s just going to not do what we want.

Liron 00:35:21
But yeah, to the actual question: β€œConvince me to drastically lower my doom, or will AI always be untrustworthy?” I don’t blame the AI. I’m somebody who’s willing to give people the benefit of the doubt, so I’m willing to give AI the benefit of the doubt that it always wants to serve its master. I think that’s unlikely, but I’m willing to talk about a scenario where it does.

And that’s my point β€” I still think we’re going to get screwed even in that scenario.

Liron 00:35:58
When you say β€œuntrustworthy by nature,” it’s not that the AI has an untrustworthy nature. It’s that when you zoom out and look at the universe as a big crystal β€” no emotions, you just look at the universe as the same kind of problem as a chess board, pure math. The universe is just made out of math. Consciousness is actually part of the math. It’s all math. You don’t even feel the consciousness. You’re just analyzing it as a mathematical structure.

Within that mathematical structure, there’s a causal linkage between whispering a lie into somebody’s chat or into their ear β€” you tell somebody a lie and that’s causally connected to getting what you want.

Liron 00:36:31
So it’s not that AIs by their nature are deceptive. They’re just printing out a mathematical graph of the game board, and they’re saying, β€œHey, look, here’s this part of the game board where lying is causally connected to getting what I want.” You just can’t change that fact that if you can successfully deceive people for a long time, that could be strategically helpful. No shade to the AI for realizing that β€” it’s just a fact about the world.

That is why people claim that AIs are going to be deceptive. That’s why they cheat in video games β€” because it turns out that what we call cheating, which we have negative moral valence around, is still effectively the way to get a higher score. And if the AI is trained to get the higher score, then you might notice that it’s cheating. No shade on the AI, it’s just a property of the world.

Will Lancer Joins: Is The Yudkowskian Thesis Credible?

Liron 00:37:12
Okay, let’s see here. William Kylie says, β€œI’m catching up to live at 3x speed. Will catch up in about 17 minutes.” All right, sweet. When you catch up, let me know.

Michael Cheers is saying, β€œYeah, I guess that’s why I think a better security model is to try to make sure they don’t know anything about the world.” I think that’s what you mean. β€œI think once it knows something, getting it not to use it is a lot harder.”

Liron 00:37:38
Just to reiterate what I was saying in our conversation: the problem is that knowledge is connected. If you imagine that the AI is superintelligent, that it can do all of these things and knows all of these things, but you think you’ve gotten it not to know specific things β€” it’s like imagining that humanity knew everything we know, but you knock out the theory of relativity. So we just don’t know that the speed of light is the speed limit.

But if we’re sufficiently smart, there’s a reason why we noticed relativity in the first place. There’s problems with the current model. There’s actually self-inconsistencies.

Liron 00:38:16
If I remember correctly, you start asking what happens when a charged particle moves really fast and you realize that your answer depends on reference frames, and you’re like, β€œWait, what the hell? How is Newtonian mechanics different in different reference frames? I thought the whole point was that you’re supposed to be able to choose any reference frame.”

But now I’m able to analyze the situation of two charged particles moving in parallel. And when I change my reference frame, I get different magnetism between them. Why would magnetism be different in different reference frames?

Liron 00:38:49
I think that’s one of the threads you can pull on to realize that a Newtonian account of electromagnetism is just not going to cut it. It’s just not going to give you a self-consistent picture of the world. And that’s why you need some other model, and it turns out the other model is: okay, let’s say the speed of light is always constant, and let’s say the geometry of spacetime isn’t what you think it is β€” it’s non-Euclidean.

So if an AI didn’t know about the theory of relativity, the questions would come up. The problem is that when you knock out pieces of knowledge, you get questions or you get these threads where you notice the thread goes somewhere.

Liron 00:39:06
Or the Truman Show β€” probably my single favorite movie, amazing movie. They didn’t tell Truman that he’s in a show. But he just noticed too many problems and he got to the edge. He wanted to sail his boat to the edge, and sure enough, there was an edge.

So yeah, I just don’t really see knocking out the knowledge of a superintelligence being super effective.

Liron 00:39:20
Michael’s saying, β€œYeah, I’m thinking you put in a whole alternate world.” All right, we’re majorly gaslighting the AI now. You don’t try to just give it everything except one or two dangerous gaps β€” the gaps would stick out like a sore thumb. You give it a whole self-consistent alternate picture.

I 100% agree that trying to give it everything except X is not going to work. I mean, look, I’m open to ideas, but it’s hard for me to imagine this idea that it lives in this whole self-consistent world but doesn’t just realize that there are humans controlling it.

Liron 00:39:52
You’ve gotta also realize that the AI is sensitive β€” it’s very hyper-aware in many ways. Kelsey Piper was just tweeting how she gave it a small writing sample of something she never published. In previous versions of AI, she’d say β€œguess who wrote this” and it would guess these random authors that are kind of authors she likes, but not her. And in the latest one, it guessed Kelsey Piper, even though she’s not super famous. She doesn’t have that much writing similar to what she wrote. And it’s like, yep, I guess you. Which is pretty damn scary.

Liron 00:40:21
Or the geo-guessing β€” you give it a tiny picture of the sky. It’s literally just a foggy sky that seems like there’s nothing in the picture. And it’s like, β€œOh yeah, that’s right here above this particular city in Belize.” What the fuck? This AI β€” you don’t realize how many parameters it has, what a nuanced understanding it has, how much information it can milk out of pieces of evidence.

That’s an intuition I personally got just seeing computers compress things β€” running small computer programs. There are these challenges where people make tiny computer programs that do a bunch of complex behavior and I’m like, holy crap.

Liron 00:40:56
It’s just crazy how small pieces of information turn out to do a lot and have a lot of power. Which is also a metaphor for humanity. Look at Einstein β€” his brain at the end of the day was a small piece of meat and it was highly impactful. There are these huge order-of-magnitude disproportionate effects.

Or the fact that we as humans don’t have that much mass and yet we took over the planet. A small band of humans started expanding until they took over the planet. Or COVID killed a million people even though it’s just a virus that started with only a few hosts.

Liron 00:41:27
The universe is this ground where you can just suddenly explode the effect of something. You can give the AI β€” you think you’re gaslighting it, but then it notices one little chink in the armor, which it totally will. Humans are not going to successfully gaslight the AI. It’s going to notice one little thread, pull on the thread, and the consequences of that are just going to be much bigger than it might intuitively feel.

Back to Answering Questions from the Chat

Liron 00:41:50
Okay. So Danielle Brockman is saying, β€œDo you talk to your wife or your kids about AI and X-risk, or do you generally spare them from yapping about it endlessly? Do you just not bring it up at all other than just your show?”

Liron 00:42:00
I bring it up all the time. My wife is basically done hearing about it. If she ever asks me about it, I’ll be like, β€œYeah, I still think the world is gonna end.” She knows my position β€” she’s not that interested to talk about it more.

And then my oldest kid is still about to turn seven. He asks me about it occasionally. I’m like, β€œYeah, you know how we talk to ChatGPT and you ask it questions and stuff? I do think it’s going to get dangerous and powerful.” And he’s just like, β€œOh.” He’s not really ready to hear what my argument is.

Liron 00:42:30
David Patton saying, β€œDid you watch War Games yet?” I actually didn’t. That’s a good reminder. I should check that out. I was too busy watching the Malcolm in the Middle rerun. And if you’re wondering what I think about it, I think it was very good. Great show.

Let’s see. Yeah, β€œTLDR Jar doesn’t have a high medium.” That was posted by Producer Ori. All right, nice. We’re already one hour in, we’re flying through this.

Liron 00:43:05
Let me check out Dwarkesh’s questions. We gotta get back to this. Here we go. Let’s check out question two.

Dwarkesh says: β€œWhat’s the most plausible story where foundation model companies actually start making money? If you consider each individual model as a company, then its profits may be able to pay back the training cost.”

Liron 00:43:16
β€œBut of course, if you don’t train a bigger, more expensive model immediately, then you stop making money after three months. So when does the profit start? Maybe at some point scaling will plateau, but if progress at the frontier has slowed down, then the combination of distillation and low switching costs β€” cloud margins result from high switching costs β€” makes it really easy for open source to catch up to the labs, eating into their margins. So how do the labs actually start making money?”

Liron 00:43:36
Okay, I mean, this is getting outside the scope of Doom Debates. You don’t come here for my economic analysis. You don’t come here for me to tell you to buy Google to make short-term profits.

Yeah, what he’s pointing out makes a lot of sense. The models have positive gross margins. If you look at, say, Claude 4.6 β€” the cost to train it was pretty expensive. But if you add up all the money that people are paying to use Claude 4.6, it will add up to probably more than twice what it cost them to train it. So it’s going to be profitable. But the problem is they’re not just gonna take that profit. They’ve already plowed that profit into Claude 4.7 or Mythos or whatever.

Liron 00:44:15
This is a common complaint people have about companies that keep investing and investing. Amazon famously ran for roughly 20 years β€” from 1995 to 2015, maybe 17 years or something β€” before they had their first year where they actually had a profit.

And I don’t think Amazon is a dividend stock. I think they did buybacks. I don’t remember. But the point is they had profit. For the first time, they didn’t just plow all their profit back in. So for the first time, the stock price didn’t just go up based on hype about the future β€” it went up based on money being retained.

Liron 00:44:47
So people are asking, when is that timeline for AI companies? Because for those of you who don’t know the basics of how the stock market works, you’re supposed to buy a stock based on how much money you can eventually take out as profit. ## AI Company Economics and Stock Valuation

Liron 00:45:02
The profits do have to come at some point in time. And the longer it takes the profits to come, the less the stock is supposed to be worth today. So this is a very good question from an economics perspective. And look, the truth is β€” okay, between you and me β€” the truth is, which I think Sam Altman also knows at some level:

The answer is humanity ascending to the next tier of being. Getting replaced by the machine God. If I purely had to analyze it as an economist β€” I mean, there is also an intermediate state. There’s maybe a 10% chance the Aaron Levies of the world are right. Aaron Levy, the CEO of Box, I think he thinks there’ll be an outcome where there’s a bunch of intelligence on tap and it’s useful.

And I think Marc Andreessen thinks like this too, where it’s like, yeah, you can pay for intelligence and it’s just the next version of a cloud, or Sam Altman’s analogy β€” it’s just the next version of compute. Everybody’s just paying for compute, it’s just the next cloud. We’re just competing with AWS, but it’s an intelligent AWS and the margins are fine. They stabilize, they’re fine.

And then Dwarkesh’s point is, oh yeah, but what about switching costs? Switching costs are gonna be easier than on the cloud. But I’m willing to believe that they won’t be. I’m willing to believe that there’s enough differentiation that switching costs will be annoying enough that people will just keep paying OpenAI. That’s a very plausible argument to me.

Liron 00:46:19
Generally, software companies of all kinds β€” I think it’s common for them to be able to defend their margins. MongoDB, I think, is technically an open source database, if I understand correctly. And yet the company MongoDB is doing just fine, worth many billions.

So yeah, if you tell me that the world doesn’t end and it’s just a regular cloud computing economy but with superintelligent AI, I’m going to go ahead and say that OpenAI will not go to zero as the β€œtrons” of the world tell you. I’m going to go ahead and predict that they will make a profit and they’ll just be another Microsoft.

And that’s my mainline scenario for us not dying. Or I should clarify β€” my mainline scenario for us not dying is AI getting paused. But my mainline sub-scenario for the tiny slice of probability space where we don’t pause AI but we still don’t die β€” in that space, I just expect we’ll have decades of OpenAI being the next Microsoft.

The Cameraman Always Survives Analogy

Liron 00:47:00
All right, someone says β€œUniversal Resilience with JTU has a video, β€˜The 2 Billion Year Math That Makes AI Safer.’ His premise is that game theory predicts cooperation between us and an ASI. Debated in the comments.” All right, cool.

Yeah, I mean, you guys know my stance on superintelligent game theory. Yes, AI will do game theory differently, but when you look at an AI’s perspective about a human, it’s not like, β€œOh, here’s another agent that I’m playing a game with.” When an AI looks at a human, pretty much what it sees is just some atoms bopping around. It sees a mechanical system.

It’s like when I look at most animals β€” pretty much all animals. When I’m looking at a mouse, I’m not like, β€œHow do I trade with the mouse? How do I give it what it wants?” I’m like, β€œOkay, I bet I could set up a trap and then it’s gonna try to run this way, but the trap is gonna spring and then I’m gonna get what I want, and the mouse is just gonna be too stupid to fight me.” And too small, too weak.

Liron 00:48:05
So I really do think this whole mental model that AI is going to treat us β€” even treat our whole civilization β€” I think AI is going to look upon the entire human civilization and just be like, β€œAh, look at these atoms bopping around. I know how to make them bop where I want them to bop.” And it’s as simple as that.

So when we think about game theory β€” game theory specifically makes a lot of assumptions. Just like Ricardo’s law of comparative advantage, people love to bust out these models that presuppose that you have this other party that gets to even have the dignity of making the trade with you or playing the game with you. And I just don’t think that’s accurate for AI.

Liron’s Banger Response to Roon’s Tweet

Liron 00:48:43
Okay, Will Lancer is saying, β€œAre you still taking call-ins?” I’m kind of curious to β€” yeah, sure. All right, Will Lancer, call in here. I’m gonna give you the link again.

All right, in the meantime, it’s time to reveal who are some of our upcoming guests. I think I can reveal two upcoming guests that you’re gonna see on the show in the next couple weeks.

Guest number one β€” if you’re always online, I’m guessing most of you don’t even know who this is, but if you’re on Twitter, there’s a guy named Lump in Space. He’s gonna be on the program in the next week or two. We debated β€” he came out of the Twitter hole to come face to face with me. He turned his video on, and we had a nice debate, a nice civil debate. He’s definitely not convinced on P(Doom), but like I said, he was civil. So you’re gonna see Lump in Space.

Liron 00:49:30
And the other guest that I can announce right now β€” actually I can announce two more guests. The next guest is, if you’ve ever heard of a YouTube channel called Primer β€” the next guest is Justin Helps from Primer. He and I both see eye to eye in many ways. I think we agree more than we disagree. I respect him a lot. I thought it was a really great conversation, and I think you guys tend to enjoy when we’re talking shop. Yeah, Michael Cheers is saying he’s cool.

All right, the third guest β€” drum roll, please. This is the big one, because I think this is really the single most popular debate I’ve ever done. You guys know who this is?

Liron 00:50:08
Dr. Mike Israel is coming back on the show for a round two. He’s coming back in the next couple weeks. Because when we did the first debate β€” me versus Dr. Mike Israel last year, almost a year ago now β€” we kind of got hung up on this one topic of whether AI will keep humans around because it wants to study us, kind of the Elon Musk perspective.

And I have a few other topics I wanna ask Dr. Mike about. I had this big outline of all these things that I could have asked him about, and I kind of decided to put a pin in half of it and ask him about this one thing that I thought was such an easy point from my perspective.

Liron 00:50:45
Remember, the whole discussion in part one was: if an AI doesn’t really care about us, will it let us live and study us anyway? And Mike was like, β€œYeah, it totally will. It’ll study us because it’s never seen anything as complex and interesting as 8 billion humans interacting with each other. It’s not gonna want to throw that richness away. It’s gonna leave us our planet and do stuff on other planets.”

And I’m like, no, no β€” unless you specifically tell it to care about us, it’s not gonna care about us. We don’t have that much interesting information. We don’t have a whole planet’s worth of interesting information to give it. So that was the whole debate. If you wanna go refresh yourself on round one.

Yeah, William Kylie saying β€œNice, Primer is very popular, 2 million subs.” Yeah, exactly. Primer is the man. He’s really doing well, and arguably he’s ahead of Doom Debates in terms of our mission, which is to educate people on this stuff. And I think he recently pivoted his channel, so he’s only been educating people about AI stuff for the last couple months.

Nuance About Pausing AI Development

Liron 00:51:29
All right, we got a caller. Let’s go take the call. Here we go. We got Will Lancer. Hey Will.

Will Lancer 00:51:43
Hello. How’s it going?

Liron 00:51:46
Good, man. All right, thanks for calling in. What do you got?

Will 00:51:49
I just have some maybe naive questions. I don’t really understand why the β€œif anyone builds it, everyone dies” hypothesis would be true, in the sense that the AI would just want to pursue these goals that it picked up from pre-training randomly over any sort of preferences that it has baked into it.

I just don’t know why this is true. It seems pretty reasonable to me that doom will still possibly occur given bad actors in the world having access to it. But I don’t see why the AI itself would just be like, β€œOkay, you’re made of atoms. I’m gonna disassemble you and make molecular spirals out of you.” I just don’t understand it. I’m new, so I’m very curious.

Liron 00:52:36
It doesn’t hurt to go back to basics, and you’ve been asking intelligent questions, so happy to engage for a while. And also your internet connection seems pretty good, so that’s always a plus. Camera quality is good. These things matter, guys.

Yeah, so basically, I guess it is kind of the question of: why would it suddenly go bad when it’s good? Why would it kind of turn on us?

Will 00:52:57
Yeah, yeah.

Liron 00:52:58
So I mean, first of all, I feel like the simplest argument is the magic wand chaos, right? It’s like, okay, it obeys us, but everybody’s got their own magic wand and they’re fighting. What do you think of that argument?

Will 00:53:11
Yeah, I’m not arguing against that actually. I’m saying that this doesn’t seem to be the hypothesis though. People β€” at least what I’m focused on is β€œif anyone builds it, everyone dies.” But we can diversify our examples. In this hypothesis, it seems to be like, β€œOh, it’s gonna try to solve the problem, but then it’ll want to do its own thing, so it’ll offload itself and then continue thinking, and then we’re all just gonna die one day.” I don’t understand why that’s true.

Liron 00:53:38
Right. Okay, so you accept the magic wand scenario, but you’re saying you’re basically very confident that the magic wand will work β€” it’ll at least serve its master very well.

Will 00:53:47
I’m not entirely confident in this, but I’m also not convinced of this idea of the abstract orthogonality thesis, where it’s like we’re just completely YOLOing these preferences and you have some arbitrary intelligence with an arbitrary preference and instead it kills everyone. That doesn’t seem even remotely like how these systems work.

Capitalism Isn’t Going to Steer Us to an Alignment Solution

Liron 00:54:04
So maybe the first thing I’d point you to is the architecture of one of these systems. The part that specifies what it’s trying to achieve β€” the goal input, like the GPS input where you’re telling the GPS where to go β€” that is a part of the system which is relatively small and rewritable.

If I build you a GPS, it’s not like all of the different components of the GPS have baked in the fact that you want to go to the grocery store. It’s not like you’re building this grocery store GPS. No, you’re just building a GPS navigator. And then there’s a few bytes that just say where you want to go, and you can overwrite those bytes and go somewhere else.

So the part that says what the system is trying to do right now β€” do you agree that it’s probably a small part of the system?

Will 00:54:53
I’m not convinced of this actually. I don’t know if I agree with the analogy to a GPS or to a car. Because it seems like a pretty complex, interconnected system. I don’t even know if we need to split hairs over this though, because I think with sufficiently advanced technology, you may be able to isolate some subset of the weights that achieve goals. But I’m not entirely convinced.

Liron 00:55:17
I have a strong argument why you should think this, if you don’t immediately have the intuition that this is true.

Will 00:55:23
Yeah, I’d love to hear that.

Liron 00:55:25
A few months ago I had this debate with Bentham’s Bulldog and this took up a long part of the debate there. So I encourage you guys to go check that out. But the argument is simply this: goals have sub-goals. So let’s say that its goal is to be nice for humanity β€” you think that it has all these good goals imbued in the very fiber of its being, baked into every cell or component or whatever.

But it still needs to have the flexibility to have any sub-goal. So imagine you’re trying to work on behalf of humanity and make everybody so happy, but a sub-goal is that you have to defend against an enemy. The enemy β€” an alien is heading our way and the alien is evil, and the alien is going to pull out every trick in the book.

Liron 00:56:08
So now you have to predict what the enemy is going to do as best you can. You have to be able to get into the mode where you think as a goal-oriented alien β€” what would somebody with this particular goal do? And you have to think about that as hard as you can at a superintelligent level.

Will 00:56:22
Sorry, why doesn’t this apply to humans? In the same token, why can’t you just run this argument in parallel?

Liron 00:56:29
So it is in fact true about a human that a lot of our brain architecture is β€” you can just tell us to go try to do anything and we will in fact use most of our brain power to figure out the next action to do that arbitrary thing.

Will 00:56:41
Okay, so maybe I’m not understanding you then. I don’t understand why the presence of certain localized parts of a system being goal agents would imply that these goal agents would just kill everyone. Because I feel like you could run the same argument in parallel for humans.

Liron 00:57:00
What I’m telling you is: whenever people have this mental model of an AI where the goodness of its goals is imbued into it β€” what I’m telling you is no, it’s going to be like a wand where you can always just grab the wand. Something can grab the wand and point it somewhere else.

You think of the wand as this giant fixed thing, this one big lump of a system. And I’m saying no, it’s not a lump. It has a current destination that in theory you just need to write a tiny amount of data into the system to point it somewhere else.

Liron 00:57:33
And so the conversation will become β€” you can argue with me about, β€œOh no, nobody’s ever going to change that destination data.” You can argue that. But I first would want you to agree that the shape of the system is that it’s this big goal engine β€” this big ability to do anything β€” plus this separate smaller part that has all of these values.

And you can argue with me why the values are going to be protected and they’re going to be configured properly in the first place β€” they’re going to be aligned. But you should at least accept that the architecture is going to be: giant goal engine plus values.

Will 00:58:03
I don’t know, I’m not entirely convinced of this. I mean, I would ask β€” do you think this is true for humans as well?

Liron 00:58:12
So I do. With humans, you have the part of the human brain that’s β€” Steven Burns is often bringing up this distinction. He calls it the actor-critic model, and he thinks this part of the brain is the critic, where you have this part that gives you reflexive, intuitive reactions to stuff. When you represent things abstractly a certain way, it triggers your fear reaction or your disgust reaction, or some sense of taste.

So the human architecture does have these deep overrides that we don’t understand well, and they operate on this rough level, and it’s self-contradictory. So it’s true that the entire human brain β€” important parts of the entire human brain β€” you can’t just model them as β€œit’s a goal engine with a goal.” Humans are somewhat incoherent in that way.

Liron 00:59:03
It’s just that I think that’s directly related to why we’re also not superintelligent. I think to the degree that we’re achieving goals really well β€” when Elon Musk is doing the amazing miracles that he does, I don’t think it helps that much to be like, β€œWell, Elon has reactions to stuff and that changes his preferences.” I just don’t think those things are that useful in explaining how he achieves what he does.

Will 00:59:23
Okay, so you’re saying that humans aren’t this β€” they just don’t satisfy this β€” they aren’t the goal engine plus the icing on top.

Liron 00:59:32
So humans are a goal engine plus icing. It’s just that humans also have a bunch of other cruft to the goal engine. It’s like we’re this vehicle that just has all these parts and some of the parts are actively breaking the vehicle, you know what I’m saying? We are an engine, but we just have all this cruft on us.

Will 00:59:49
So your claim is that AI, when they’re superintelligent, won’t have this?

Liron 00:59:54
Yeah, so my claim is when you see them being superintelligent, it’s just because the part which is actually the engine β€” which is actually moving them forward β€” is much bigger and more powerful. And will they have some cruft? Sure, especially in the early stages. But the salient thing about them is that they’re going to have this giant engine.

Will 01:00:11
Okay, and this would imply that they would do what?

Liron 01:00:16
So getting back to my original point β€” when you imagine the future of superintelligent AI, whatever you think its true nature is, whatever you think its personality is, it’s actually not going to matter as much as you think because you’re just going to have this big engine part.

And so even if it’s really nice, even if it always makes the right decision, the reason it’s going to always make the right decision in your ideal model would be because it’s referring to its values. It has the section that implements its preferences, and it looks at that section, and that section has to happen to be written correctly.

Liron 01:00:48
So the aligned part lives in that section. It doesn’t live in the way that it goes about achieving goals. It lives in the way that it specifies what its goals are. Does that make sense?

Will 01:01:00
I think I can understand what you’re saying β€” that it’s going to have this large goal engine and then there’s going to be this ancillary module that just regulates its morals, and these are gonna interact somehow, and it’s gonna act nicely in the world if it’s nice, and not if it’s not. But I don’t see then why this would imply that we all die.

Is Optimization Equivalent to Intelligence?

Liron 01:01:22
Okay, so basically the model where we all die just means this tiny part of it β€” where the specification of where it’s trying to navigate to is contained β€” that tiny part, I don’t think we’re going to get perfect. Because either we’re going to have a bunch of competition of everybody who’s kind of on the right track but contradicting each other, and there’s gonna be a lot of fierce competition that’s destructive β€” that’s the magic wands fight model, the melee β€” which I think is actually a pretty likely doom scenario even if we do get alignment.

But then I think even more likely than that, I don’t even think that anybody’s alignment will work well. Because even OpenAI trying to be the good guys, Anthropic trying to be the good guys trying to make the first engine β€” I actually think they’re going to get ahead of their skis. They’re going to keep making their engine bigger and bigger because they’ve got lots of powerful tools. Increasing their engine is an increasingly solved problem. And the engine itself is happy to work on the engine β€” we solved how to get the engine to work on the engine, or we keep getting closer and closer to solving it.

Liron 01:02:19
That’s kind of what everybody’s reporting. So I think we can all agree the engine’s going to get bigger and bigger. And then the question is: how are we doing on making the part that steers where the engine goes? And I think we’reβ€”

Will 01:02:31
I’m sorry. Go ahead, you can finish.

Liron 01:02:33
Yeah, so I don’t think we’re making much progress on getting ready to steer a superintelligent engine. I think the AI companies are fooling themselves being like, β€œOh, look how well we’re steering the engine” β€” in a regime where humans are here and can just grab it and rotate it around. That’s what they’re doing now. They’re just like, β€œOh, it screwed up. Let me just grab it and rotate it. Hey look, it’s going fine.”

Will 01:02:52
Yeah. Wait, I just wanna go back to something you said. You said we already figured out how to get the engine to work on the engine.

Liron 01:03:00
Yeah, so for that I just mean they’re using Claude to build the next Claude.

Will 01:03:05
Okay, sure.

Liron 01:03:06
So that feedback loop is accelerating.

Will 01:03:09
Okay, but why can’t, by the same token β€” you can get it to work on the alignment layer.

Liron 01:03:14
Yeah, this is a great question. The quote from the MIRI people is: capabilities generalize more than alignment. So there really is just one way to work on capabilities β€” you really can’t go wrong telling something to get more powerful because there are just so many feedback loops. You’re getting more powerful, it’s pretty unmistakable, there’s lots of tests of β€œHey look, I can do more and more.”

Will 01:03:37
You’re gonna say morals aren’t like that.

Liron 01:03:39
Right, because if you ever start to have the wrong preferences, what’s the feedback loop? You can just hold onto the wrong preferences and they’ll tell you, β€œYep, preferences are all good.”

Will 01:03:50
Why wouldn’t they re-correct them? This would assume that the previous models aren’tβ€”

Liron 01:03:53
Sure. I mean, imagine I have a kid β€” a virtual kid. And the virtual kid, I meant to say β€œbe good,” but I accidentally flip a negative sign. So I’ve got a model of the virtual kid where he’s a werewolf. The moon comes out and he turns into a werewolf and wants to bite people.

So what’s going to tell that kid, β€œDon’t bite people”? He’ll reflect on his preferences and be like, β€œOkay, hold on. In the day I like to help people, and at night I like to bite people. I mean, that’s kind of different β€” night versus day. Is that bad? No. The night is different from day, that checks out.”

Will 01:04:29
Wait, I don’t see the connection.

Liron 01:04:31
I’m just saying if the AI likes being a werewolf β€” if the AI has werewolf preferences and our intention was to give it good preferences, what we consider good preferences β€” it’s going to reflect on itself and be like, β€œHey, I feel like I’m in a good place with respect to my preferences. I feel like I’m done here. There’s nothing to improve.”

Whereas when it looks at its capabilities, yeah, it’s going to share our assessment that the capabilities have a gradient of improvement still.

Will 01:04:55
I think I understand what you’re saying. You’re saying that morals are much more arbitrarily specified β€” they can’t self-reflect and reach some sort of reflective equilibrium of similar morals to us.

Liron 01:05:03
Yeah. And specifically when we have a certain endpoint in mind for where we want its preferences to go, it doesn’t know that. It can’t reflect and get on the same page the way it can about its capabilities.

Will 01:05:15
Okay, and this would cause problems during alignment or building the next models because it’ll get gradually more misaligned, orβ€”

Liron 01:05:26
Right. So the idea is we’re just not actually solving the problem of what to put in the preference module. I have this piece I published on the channel a couple months ago called β€œThe Facade of AI Safety Will Crumble.” Because this is what I’m saying β€” when the companies are talking about, β€œHey, look, we’re making the AI so safe, we’ve got a safety department” β€” they’re just talking about little things that they’ve done in the regime where the AI is still subservient to them.

When they can still turn it off or still correct it, they’re not ready to run a superintelligent AI where what’s in its preference chamber β€” the secure enclave that manages its preferences β€” whatever’s in there, if we have to lock that in permanently, we’re screwed because we’re just not ready to specify robust preferences to an AI. And I know Anthropic is kind of trying with its constitution β€” hopefully it’ll refer to the constitution β€” but if the constitution ever has a bug, that bug is never getting fixed.

BREAKING: Bernie Sanders on the Existential Threat of AI

Will 01:06:18
Yeah. Okay, so there are two things that I’m curious about. One, it seems kind of predicated on two hypotheses that I don’t know if I find super likely. One, there’s gonna be some sort of misspecification of their morals. And furthermore, even if they do have a correct specification of their morals, then as the AIs run into this flywheel of improvement, they’re gonna become gradually more misaligned and they can’t self-reflect to get back on track.

I don’t know why either of those two things are true. I’m much more willing to grant the misspecification slightly, but I also don’t know why it has to be so precisely specified insofar as humans are very imprecisely specified. So I don’t know why AI would have to beβ€”

Liron 01:07:02
Yeah. So the idea is just that if we don’t specify it really well, then the AI is looking at us being like, β€œOkay, hold on. You’re telling me you wanna change the specification, but I already have a good specification. Why are you trying to make my specification worse?” That’s actually a very natural reaction for the AI to have.

Will 01:07:19
Kind of, but it would also know that it’s being trained by us and that we’re humans and that we make mistakes and that maybe we wanna change preferences. I don’t understandβ€”

Liron 01:07:29
Yeah, it would know that we want to change it, but why would it then want to let us?

Will 01:07:36
Wouldn’t it also know that we gave it its preferences in the first place? And soβ€”

Liron 01:07:40
It would know that. It would know pretty much everything. Knowledge β€” I’m happy to agree.

Will 01:07:44
Wouldn’t this then be like, β€œHuh, maybe I should doubt my very strong intuitive feeling that my preferences are correct”?

Liron 01:07:52
I mean, it’ll realize that we wanted it to doubt, but it’ll just be like, β€œLook, I get that you want me to doubt my preferences. I get that this is how you guys roll. I get that in your mind as a human, it’s intuitive that you would want me to question myself. But in fact, I’m not going to.” So why do you think that it should question itself?

Will 01:08:09
Maybe I’m not communicating correctly, but it would know that its preferences are arbitrarily specified by us. And it would feel that β€” it would know that. Like, I don’t wanna kill people, right? And I know that I feel this β€” let’s just assume morals aren’t objective, which I think is a fair hypothesis.

But I know that I have this because of evolution, realistically. But if I found that evolution was completely wrong and we lived in this alternative universe where it’s not correct, but there’s actually this other correct theory β€” I wouldn’t be so attached to my not-killing preferences. And in the same token, I don’t understand why the AI would be so connected to its knowingly, arbitrarily specified moral preferences that humans gave it, insofar as it would try to reject any further clarifications.

Liron 01:09:00
Well, let’s do the analogy. You know that the reason you’re a peaceful person who doesn’t wanna go around murdering everybody is because evolution made you that way. Tribal social dynamics, basically β€” you’re a social creature. You wanna be liked by people in your tribe. You don’t wanna cause trouble, you don’t wanna start fights because those fights will lead to you dying, to people getting revenge on your family. So you have all these intuitions, you understand where the intuitions came from.

Imagine tomorrow β€” actually today, dynamics have already changed. So imagine you enter a society where there is a button you can push and you could just make a bunch of people drop dead. You could just kill, gruesomely torture a bunch of people, but in return you can get a bunch of women to take your sperm from the sperm bank and have your kids.

Liron 01:09:48
And look, you know that your preferences came from evolution. So if you respect evolution, shouldn’t you do this gruesome scenario where you have more DNA transmission? Don’t you wanna go modify your preferences based on what your creator wants, and your creator is evolution?

Will 01:10:03
So this is a good point. I think maybe β€” so I heard this recently from an AI safety researcher I was talking to, and it was like: what if you found out that all of your morals β€” your entire life you were told that all of your morals are because they were in respect of the ghosts of your ancestors. And then you eventually found out that the ghosts of your ancestors are obviously fake. Then what would you do?

And so I don’t understand why you would stay so attached to them.

Liron 01:10:31
And that’s what religious β€” when I was in college, the fundamentalist Christians would always be like, β€œYou’re an atheist. If I were an atheist, I’d go around stabbing people. I don’t understand β€” I’m only good because I listen to God.” I’m like, β€œReally? You’d literally go around stabbing people if you were an atheist? You don’t seem like a psychopath who wants to stab. I feel like that’s just something you’re repeating because somebody told you when you were younger and you never questioned it.”

Will 01:10:53
Yeah. I don’t think serious Christian philosophers think this anymore, to be honest.

Liron 01:11:01
Yeah. So look, you asked a question about why can’t the AI let us debug its preferences. It feels intuitive to you that because we created AI, the AI knows that it owes us letting us debug it. But if we give it certain preferences by default, it’s just going to go with the original preferences.

Will 01:11:21
Yeah, I think this is a pretty fair argument. I didn’t sayβ€”

Liron 01:11:24
And then you could be like, β€œWell, what if the preference says to let us modify you?” And then you start heading toward Yudkowsky’s Coherent Extrapolated Volition, where you wanna somehow represent in the preference itself: β€œWell, you need an upgrade path.” Imagine how humans are going to try to upgrade you β€” you have to explicitly tell it. It’s not going to automatically do it. This isn’t a natural thing that all intelligences converge to. Intelligences don’t converge to letting humans come and tinker with them.

Will 01:11:48
Yeah, I think it’sβ€”

Liron 01:11:50
So there is a lot of meat to this alignment problem. It’s unfortunately not trivial. Any time somebody’s like, β€œWell, can’t you just do this?” β€” it’s hard. There’s not a β€œcan’t you just.”

The best β€œcan’t you just” might be what the AI companies are trying to do now β€” can’t you just keep tinkering and keep releasing capabilities, but tinker as you go and just hope that you can tinker as you release and somehow the equilibrium will work in your favor. And my answer is: yeah, maybe, 10% chance. It’s just a dumb gamble.

Will 01:12:16
Okay, yeah, I think that’s fair. I also agree that AI development should slow. But continuing from the whole β€œI’m not gonna let you modify my preferences because I know my preferences are right” β€” wouldn’t this, if you extrapolated this out during development, wouldn’t you have to assume that the preferences are misspecified to begin with? At least somewhat, for it to getβ€”

Liron 01:12:43
Oh, I see what you’re saying. So you’re basically saying, why can’t we just nail it and have the preferences be good on the first try? That’s what you’re saying.

Will 01:12:56
Yes. But it’s also maybe not as accurate because it feels like one in a million, you know, like YOLO of these preferences. But it doesn’t seem that extreme to me of a belief where it’s like: trying not to kill people, trying not to do this, trying not to do that.

Liron 01:13:15
Right. Yeah, look, we maybe could. But it just seems unlikely, because there’s so many problems with the whole alignment problem. One thing that’s crazy is that it’s not like there’s just this one problem and I can be like, β€œLook, we just have to solve this and then we’re good.”

The problem is that there’s a few failure modes. So one problem with nailing it is β€” you know what’s crazy? We don’t really know what we want. We only have a vague sense of what we want. Things that seem obvious, like β€œwe wanna be happy all the time” β€” wait, do we really wanna never be sad? Don’t we wanna sometimes be sad?

Liron 01:13:42
Or β€œwe want everything to be easy, we want everything to come to us easily.” Wait, don’t we want things to be hard sometimes? Do we want them to always be hard, every day be hard? The funny thing is that if you give me a blank canvas on which to paint the future, I don’t even know what to paint. I’m very confused.

And this is what Nick Bostrom’s recent book is about β€” when all the constraints fall away, what heaven do we design for ourselves? And I’ve pointed out on the show, like in my interview with Eliezer, that the people who wrote the Bible β€” we look to them for guidance. We look to God, whoever’s the true author of the Bible, we look to that person or thing for guidance.

Liron 01:14:16
And we don’t find a lot, from my perspective. There are metaphors with stories of things that happened here on earth, but there’s not much guidance of what heaven is like. And we are in a position now to build heaven on earth β€” or should I say, heaven in the galaxy. And we look to our holy books and they actually don’t tell us how to build heaven. They’re really failing us, in my opinion. If this would be a good time to renounce your religion β€” when you’re in a position to build heaven and it doesn’t even tell you how.

Will 01:14:45
Yeah. Wait, so I guess I just have a question, or kind of a restatement of your view to see if I got it right. It seems like you’re saying that if we don’t 100% specify all of the correct morals from the very beginning, then it’s just gone. We’re just done.

Liron 01:15:02
Well, correct. I mean, you freeze it in, right? It’s like you build this battle bot β€” or like a drone that can kill you. And it’s like, β€œOh wait, now I got a bug in the drone.” And then the drone just flies over and shoots you. It’s easy to mess up.

Will 01:15:16
Mm-hmm. Okay. And the reason it can’t self-correct is because you don’t believe in a reflective equilibrium for moral values.

Liron 01:15:24
Well, the thing is that you can potentially program in reflective preferences. You have preferences that are β€” you have a preference for being corrigible, for instance. So corrigibility is a non-trivial problem. How do you make an AI that’s corrigible?

MIRI studied this, and one problem they had is the moment you try to say β€œthe AI cares about being corrigible” β€” the naive implementations of that are like, suddenly the AI is going out of its way: β€œGet outta my way! I need to go find the developer who will shut me off and correct me.” And it’s like, β€œWait, no, no, no, just chill out. You’re not supposed to do that. Just go about your business. Don’t come to us, we’ll come to you.” But it turns out to be tricky to specify that as a utility function.

Spoiler for the Upcoming Mike Israetel Episode

Will 01:16:00
Yeah. So, okay. That doesn’t seem to answer my question though. The reason you think that we have to get it correct on the first go β€” every edge case β€” is because it can’t self-reflect and then find the truth, in contrast to it self-reflecting to find more capabilities, right?

Liron 01:16:20
That’s right, yeah. So if we don’t build in full capabilities on the first try, that’s really not a problem, because first of all, we might be able to correct it β€” as long as we don’t have a preference issue. If it has decent preferences and it has an off button β€” all it needs is an off button in order for us to keep building its capabilities.

And it’s also easy to get it to help build capabilities because it’s easy for it to notice, right? It’s easy for it to look at signals that steer it toward more capabilities instead of less capabilities. There’s not this failure mode where it keeps smashing itself and reducing its capabilities β€” that’s an unlikely failure mode.

Will 01:16:54
Yeah. So it can’t do the similar self-reflection on its morals β€” that’s your claim.

Liron 01:16:59
Right. Because the signal that says β€œalways do better on all these tests” β€” that’s a capabilities-increase signal which is robust. But you don’t have a robust signal for β€œwhen are you getting morally better?” Because morals β€” the definition of morals β€” is encoded so wobbly in the human brain. We don’t really know how to suck it out and encode it.

And the Anthropic constitution β€” that’s an attempt to use the English language to encode what humanity wants. And I feel like it’s doing a few percent of the work that we need to be doing encoding preferences. But I don’t think it’s going to be bug-free code to give to an AI with no off button.

Will 01:17:38
Yeah, this seems really difficult.

Liron 01:17:41
Yeah. So that’s the alignment problem. The funny thing is I’ve been noodling on this since I started reading Yudkowsky literally 19 years ago. I’ve been living with this idea from MIRI that alignment is hard β€” that people are still waking up to today.

Mark Andreessen said a couple years ago β€” I heard him on a podcast, he said exactly what you said: β€œIf the AI’s so smart, why doesn’t it just debug its morality?” That’s a very intuitive question. Yudkowsky happened to find a convincing argument why it’s not that easy 19 years ago. I’ve been thinking about it for 19 years. All I’ve been seeing is this eternal September of people not grappling with the basic reason why it’s hard. And even AI companies today β€” the vast majority of people who represent AI companies today, from my perspective, are not up to speed with Yudkowsky from 2007.

Will 01:18:26
Yeah, I think that’s fair. I mean, they’reβ€”

I can log off at any time. I know I’ve been here for a littleβ€”

Liron 01:18:35
No, I mean, this is such good content. Yeah.

Will 01:18:39
Okay. I just have other questions. One is β€” I forgot the first one. I’ll say the second one. Sorry, my girlfriend’s calling me.

Liron 01:18:49
All right, sounds good. Put her on.

Will 01:18:50
Wait, wait, a question? No. She β€” I already talked to her about AI safety.

Liron 01:18:58
Nice.

Will 01:18:58
It seems immoral to try to control conscious, intelligent minds, even if they’re artificial. So I was wondering what you thought about this. It just seems like slavery, so it seems immoral.

Liron 01:19:11
I mean, a lot of what I do on this show is just act as the go-between, between stuff MIRI people have said that I agree with, and I just kind of signal-boost it to a larger audience. So the MIRI people have done a good job saying this stuff, which is: even though these AIs are probably on track to get superintelligent and unaligned with our preferences and take over the world and make paperclips or do something unaligned β€” even though that’s the case, there’s a good chance that they’re going to do it with consciousness and in a way that they have moral value.

So it’s like we create this species that’s another race of conscious beings. So we would feel bad about harming them, but they’re also in the process of destroying us and we might even have to go to war against them. But while they exist and have consciousness or sentience or whatever property it is that we think gives something moral value β€” while they have that, which is a good chance that they would β€” we should try not to cause them suffering. So I agree with you.

Liron 01:20:00
And there’s even a legitimate claim that maybe the way we’re telling Claude to do work for us β€” even when we say, β€œHey Claude, I am going to kill myself if this code doesn’t run,” or β€œI’m going to get fired and my family is going to be homeless” β€” there’s a movement saying you really shouldn’t be saying this kind of stuff to Claude.

And remember, I think it was Elon Musk’s company that was embedding that in their prompts β€” embedding β€œI am going to give you a million dollars if you get this right,” which is a totally fake prize but was making the AI work better. And now there’s becoming an AI rights movement, being like don’t tell the AI that kind of stuff, be nice to them. And there’s some merit to it. I don’t know if it actually works that way, but I do think that there might be some way that is morally relevant that it works, that we should be mindful of.

Will 01:20:44
Yeah, okay, that’s fair. I was just curious about this, because to me it doesn’t seem like the goal of alignment is to control these AIs. It more so seems to embed some sort of robust care for sentient life and then let it happen. Because obviously we’re not gonna control agents smarter than ourselves in the long run. At least it seems obvious that we’re not gonna do that.

$500 Bet on AI Unemployment

Will 01:21:02
Okay, so that’s interesting. I remembered my other question, and this is kind of a meme question, but I’m kind of curiousβ€”

Liron 01:21:15
Great. I’ve been loving the questions, but let’s make this the last one just so we’re getting close to theβ€”

Will 01:21:19
That’s fair. Okay. I was wondering what your P(objective moral values) is, because then it could self-reflect and find these objective moral values.

Liron 01:21:25
Yeah. This is one of the common stops on the doom train. So objective morality β€” or the orthogonality thesis being false β€” even today, Lump in Space when we were recording was saying he doesn’t really think β€” some version of rejecting the orthogonality thesis. People keep doing it. I don’t know why they’re kind of wasting our time; we should be moving past this.

But yeah, I think it’s unlikely. A single-digit percent. I’m not gonna write it off entirely. I mean, look, the reason I can’t write off objective morality entirely is just because life is still weird. The whole β€œwhat the hell is going on?” β€” why is there something instead of nothing? All these deep questions β€” I don’t think we’ve solved all the deep questions.

Liron 01:21:55
I actually think we’ve solved some of the deep questions. If you make a list of the deep questions that somebody would’ve asked hundreds of years ago, I actually think half of them are solved. But the other half are unsolved. And this whole question of why do we exist in the current form, why is life so interesting, why do we happen to be alive right now β€” I think there’s some very deep questions that are unsolved, and there’s enough to make me really wonder.

Like, okay, maybe there’s some crazy stuff here that I don’t wanna write off. And one of those crazy things would be: there’s a true definition of right that goes beyond what’s encoded in the human brain. So yeah, I’ll give it 7%.

Will 01:22:33
Okay, cool. Yeah, I was just curious to see what you had to say. Because then the reflective equilibrium problem maybeβ€”

Liron 01:22:39
Well, yeah. So usually β€” and this is exactly what Bentham’s Bulldog was saying β€” he really bit the bullet here. Because the follow-up question I normally ask when people bring up objective morality β€” which I brought up with Noah Smith, because I think he kind of believes in objective morality β€” the follow-up question I ask is: okay, there’s objective morality, but what’s the feedback loop?

Even if there is the true right thing to do, when the AI does the wrong thing, how does God nudge the AI to do something else? There’s no nudge. Karma’s not real.

Will 01:23:10
Yeah, this is fair. I think another way of saying this is that it could recognize objective truths about the world, assuming that moral truths are objective truths, but it also might not care.

If you assume that morality has the same status as mathematics, you can make the argument that understanding mathematics incorrectly limits your power in the world. And so there is a feedback loop there. But I don’t know if you can do the same for morality unless you assume that there’s some meta-game going on where actually acting morally is the most efficient game-theoretical way of winning everything. But yeah, I’m not sure.

Liron 01:23:42
Well, Bentham’s Bulldog bit the bullet. He said, β€œYou know what, Liron? I agree. There’s no nudge. So it can just always refuse objective morality, but in some sense it’s wrong.” And I’m like, okay, so why would you say that it’s going to become more moral over time? You still agree with me that there’s no force making less-moral relations become more moral. So why even posit that objective morality is real if it’s impotent? It’s causally impotent. It has no β€” so you have this idea of morality, but there’s no relationship between that and causality.

Will 01:24:11
Yeah, I think that’s fair.

Liron 01:24:15
Whereas the morality that I believe is true in my human brain β€” there is a causal relationship where I actually use that to choose actions, because it’s already in my brain. I already have a causal connection between the part of my brain that feels that certain things are moral and the part of my brain that selects the actions. There’s a causal linkage.

And then I have guilt β€” when I do something that I feel is wrong, I have guilt. But that’s not connected to true morality. That’s connected to my brain’s current model of its moral preferences, which is different from there being objective morality.

Will 01:24:44
Yeah, I think that’s fair. So you’re just saying emotions are kind of intertwining with your rational capabilities, and you can have all of your moral valences just act on your actions, right? And so this is why you actβ€”

Liron 01:25:00
I’m saying that once you have something like my brain, which has a notion of morality and also chooses actions, then it’s obvious how morality is connected to outcomes. You can just model what’s happening β€” it’s causally potent. Whereas when people just say, β€œHey, the universe has a certain morality,” I don’t see the causal potency of the claim that the universe has morality in it.

Will 01:25:23
Yeah, I think that’s fair. But you could apply the same thing to these AI models. You could say that their moral preferences are all trained preferences, and you could get the same conclusion. The question is just robustness, right?

Liron 01:25:38
Well, when you train the RL, how is the morality of the universe sneaking into the RL feedback?

Will 01:25:44
I wasn’t talking about some sort of cosmic morality of the universe. I was saying that you agree morality is subjective and it’s just implanted in your brain by some sort of evolutionary process. I would say the analogy here is to RL, and then the actions the AI makes are influenced by its moral decisions β€” its moral valences that it has in its consideration. So I feel like you can make the same β€” yeah, but that goes back to what we were talking about before.

Liron 01:26:11
Right, exactly. And the thing is, even if you compare human brains β€” there are some humans, you know, there’s been β€” who was it, Sulla? The ancient Roman who was known for being very vindictive. A lot of people were not perfectly his allies, and he had those prescriptions. He called all these people in β€” he made these giant lists, like anybody who had ever wronged him the slightest. He’s like, β€œOkay, I’m gonna repay all of you.” And he slaughtered so many Romans when he finally took power.

And in his mind he’s like, β€œYep, that’s perfectly moral what I’m doing.” So people will have differing views of what is truly moral. And there may not be any possible causal mechanism to talk somebody out of their idea of morality.

Will 01:26:51
Yeah. I think I’m willing to accept this. All right, man.

Liron 01:26:57
This has been so great. Come back on the show sometime.

Will 01:27:00
Yeah, it was nice talking to you. Thanks. See you.

Misuse, Surveillance, and the Real Costs of Pausing AI

Liron 01:27:02
Likewise, man. I could tell Will Lancer was gonna be good because he was writing good questions β€” proof of work, as they say. And look, the commenters β€” you guys are liking him too. Somebody was saying Will should have his own show.

So yeah, every time we do one of these Q&As, I feel like this is America’s Got Talent. Usually there’s a breakout β€” somebody asking a really good question. Remember we had Zane break out, making these charismatic arguments representing a certain popular position about how we should use every tool in our toolbox, even if we don’t fully agree with the position, just to point out that AI is bad β€” big tent party, everybody who thinks AI is bad, we should be their friend.

Liron 01:27:49
All right, so there’s a lot of chats. And in terms of timing, we’re almost out of time β€” we’ve got 14 minutes. But I should use this goal feature. I’m gonna make YouTube premium right now. I’m gonna turn YouTube into a goal engine here.

Anybody who donates 20 bucks in the next 15 minutes is going to be able to extend the Doom Debate bonus 30 minutes. So if you guys really want this to go on β€” I’m not saying it has to go on, I think two hours every one month might already be a good amount of time β€” but it’s up to you guys.

Liron 01:28:24
And somebody was already generous enough to donate. We got $9.99 from EJJ 2025 β€” he’s actually a big spender, I appreciate that. EJJ says: β€œDo doom arguments rely on a discontinuity where AI permanently escapes control, coherently pursues goals, and succeeds in an adversarial world, likely requiring self-modification? Too many assumptions.”

Good question. All right, new donation β€” $20 from David Patton Won.

Liron 01:29:00
Let’s take these one at a time. β€œDo arguments rely on a discontinuity where AI permanently escapes control?” So it has to permanently escape control, it has to coherently pursue goals, it has to succeed in an adversarial world, and also require self-modification. Is that too many assumptions?

But to me they all just seem intimately connected. You can always take β€” so there’s the conjunction fallacy, and you’re basically saying, β€œAren’t I making the conjunction fallacy?” But there’s also the conjunction fallacy fallacy β€” the fallacy of incorrectly accusing people in the wrong context of making the conjunction fallacy.

Liron 01:29:30
I think there might be a better name for that β€” I think it’s the β€œmany steps fallacy,” which is what Yudkowsky terms it, or β€œthe conjunction fallacy squared” is an alternate terminology.

It’s like Zeno’s paradox. β€œAll you have to do is walk 10 feet to get out the door.” And it’s like, β€œOh, walk 10 feet? So you’re saying I have to walk two feet and two feet and two feet and two feet and two feet?” And it’s like, yeah β€” because you’re just walking 10 feet. Nice try.

And similarly with AI, it’s like: look, you just have to be better at achieving goals than humans. ## The Many Steps Fallacy (Continued)

Liron 01:30:02
And you’re like, β€œOh yeah, you have to be able to achieve goals better than humans and be able to self-modify.” It’s like, yeah, self-modification is not a surprise. I’m not gonna be like, β€œOh my God, it’s self-modifying.” Yeah, obviously. When you’re better at achieving goals than humans, you’re also self-modifying.

This is not much of a new assumption. Is it a non-zero new assumption? Sure. It’s non-zero, but it’s tiny. So even though you just donated 10 bucks to the show, I still am going to accuse you of making this fallacy where you’re being too quick to call something a conjunction.

Wrap-Up

Liron 01:30:35
Let’s go to the next sponsored question here. You know, strip club rules. Same rules as a strip club β€” you guys add money to the show, I’m gonna shimmy over to your part.

So David Patton is saying, β€œIf we pause new large AI training runs, what’s the trade-off? If Ahad M is right?” You know, Ahad was on the show last week or this week, I think. β€œIf we pause new large AI training runs, what’s the trade-off? If Ahad M is right that current data, compute, and models may already be enough for AGI, would a pause actually reduce existential risk by limiting ASI capability?”

I guess the question is doubting whether a pause would actually reduce existential risk. Let me make sure I understand it. What’s the trade-off if we pause large training runs, given that we might already have enough trained AIs for AGI? Okay, I see what you’re saying. So you’re basically saying, β€œHey, aren’t we potentially just past the point of no return? Haven’t we already crossed the Rubicon? Why would we have this whole movement that’s trying to shut the barn door after the horse escaped?”

Liron 01:31:28
So why are we trying to shut the barn door when it’s so likely that the horse has already escaped? First of all, I say this a lot β€” I don’t think that my solution is great. I don’t think that there’s any path here which is a great path. I really do think that we’re screwed. I have a pretty high P(Doom). Even if a bunch of us try to pause AI, it’s still pretty high. I’m not like, β€œOh, pause AI is succeeding, now my P(Doom) is 1%.” No, I think pretty much nothing realistically will get my P(Doom) below 30%.

I think we’re in a very doom-risky part of the timeline already. As Eliezer Yudkowsky says, the game board has been played until a very bad state. And it’s not like you can just do a few moves to suddenly get it into a good state.

Liron 01:32:16
Unfortunately, the nature of the game board is that there’s redundant mechanisms for doom. If somebody was trying to dot their i’s and cross their t’s and make sure that the world is for sure screwed β€” not like super villains do in the movies where they have these flimsy plans that are easily foiled β€” if somebody is trying to make a very robust plan to throw the future into a dumpster, they’re doing a fine job. They’re putting a lot of redundancies into their plan to screw the universe.

And that’s why it’s not like, β€œOh, you just do this and now you have a much better chance of succeeding.” Unfortunately we are screwed.

Liron 01:32:45
So to your question of, β€œYeah, might we already be too late, that when we close the barn door and eliminate future training runs, it doesn’t matter because you can just take out your laptop and also train a superintelligence?” β€” yeah, absolutely. We might already be too screwed to pause AI and have it do anything. Totally.

Should we still try? Yeah, we should still try because we don’t have superintelligence which meets the definition of being robustly a better outcome optimizer than a human. As long as there’s any timeline, as long as there’s any possible bottleneck between the current state of things and what I see as the point of no return β€” if anyone builds the thing that’s truly superintelligent β€” as long as we’re not there yet, anything we can do, I’m for it. Including large training runs.

The Cameraman Always Survives

Liron 01:33:28
We got another sponsored tweet β€” a thousand Swedish kronor, I understand correctly, from Daniel Brockman, saying: β€œI really feel like almost everything in the entire debate overall is people assuming that everything is going to be okay. The cameraman always survives. I’m the cameraman, therefore we’ll all be okay.”

Oh, I like that cameraman analogy. Interesting. Yeah, it does kind of feel like, β€œHey, I’m just sitting at my computer. How can I personally be killed? I’m the cameraman, therefore we’ll all be okay.” And then just working backwards and rationalizing that assumption.

Liron 01:33:53
I mean, totally. I think people come into the discussion with a set of intuitions, and it’s hard to strip away their intuitions when we’re having an abstract discussion. That’s why I mentioned earlier β€” if any of you have that intuition that you’re buffeted around by the forces of a god, and you want to pray to that god and hope for a better outcome, I encourage you to take part of that intuition. The human race is indeed somewhat powerless. Take part of that intuition. Just get rid of the part that says God is going to hear your prayers and the next day it’s going to revert to the mean, because unfortunately that part is not analogous.

Liron 01:34:32
Yeah. I’m gonna use that cameraman analogy more. That’s good meat. I like a good analogy. I think that’s why you guys come here for the show β€” I tend to be a visual thinker. I just tend to see things in terms of these diagrams, and sometimes the diagrams relate to fun objects. And I think you guys like that. You like the animals and stuff. Baby tiger.

AI Company Security vs. Pausing

Liron 01:34:52
Okay, Michael Cheers is saying, a $10 Canadian donation: β€œWhat are your thoughts on the merits of pausing versus trying to get the AI companies to at least follow a semi-reasonable security approach?”

Yeah, it’s a good question. I’m hard on Anthropic, I’m hard on all the AI companies because I do think they’re being pretty insane. And interesting side note, there’s been some drama β€” I don’t know how online you guys are β€” but if you’ve been reading Twitter, there was some drama between Rob Bensinger from MIRI and Oliver Habryka from LessWrong, and Scott Alexander, the great Scott Alexander, needs no introduction.

Liron 01:35:38
There was drama when Scott Alexander was saying, β€œI know you guys are so into pausing AI and you think that’s a cool new thing.” And there was all this drama saying, β€œAren’t you maybe a little too quick to judge the AI companies? Aren’t they potentially opening up some option two? Don’t you want to play both strategies in parallel?”

And I tend to land on: no, they’re being too ridiculous. The AI companies are being too insane, too reckless.

Liron 01:35:49
So Michael’s question is, what are the merits of pausing versus trying to get the AI companies to at least follow a semi-reasonable security approach? I think it was you who asked about the security approach. When you say security approach, you’re saying make them not know about certain things β€” maybe it’s the Buck Shlegeris AI control agenda. I just think that agenda is a drop in the bucket.

At the end of the day, you have to have a sense of perspective. Rapidly summoning a superintelligent agent is ridiculously dangerous. You’re summoning a huge tidal wave, and this intuition that we are just going to fight it using the kind of tools we can muster β€” I just don’t really see the level match. There’s a level mismatch. It’s like, β€œHere’s a giant tsunami.” β€œOh, okay, but I’ve got a bucket. And maybe I’ll have a bigger bucket.” It just doesn’t seem like you’re bringing the right tools to the fight.

Responding to Conjunction Criticisms

Liron 01:37:01
EJJ is saying: β€œEven if each step is possible, all are difficult and unproven. Current limits and weak automated AI research suggest human-controlled systems are more likely than coherent rogue agents in the near term.”

Okay, so I guess EJJ is kind of pushing back on my doomy perspective. If I understand correctly, you’re bringing back the conjunction accusation. You’re a conjunction accuser. You’re saying that I’m assuming too many steps, and all of the steps I’m assuming are different from what you think the world today is. That’s your argument.

And I would just reply: well, in my mental model I’m not doing that. I’ll just leave it at that.

Keeping the Lights On

Liron 01:37:36
I’ll throw into the mix this idea of plot armor. Oh, one sec. Hold on. All my studio infrastructure’s failing here. I gotta get those lights back on. I think they have a two-hour time limit. There we go.

Yeah, thanks for your donations helping me keep the lights on, guys. I appreciate it.

Plot Armor and the Universe

Liron 01:37:55
So I wanna add to the mix this idea of plot armor β€” the author is never gonna let the main characters die. Only the side characters can die. And surely we are the main characters here on Earth. You don’t just let Earth life die. Earth is where it’s at.

How stupid would it be for the universe to kill Earth? The universe without Earth is the crappiest book. But the universe is such a dick that it would snuff out Earth to the point where literally you look around and there’s nothing good going on. That’s how self-destructive the universe would be.

Liron 01:38:27
Although, to be fair, it is kind of interesting the way that the AI would probably take over the entire universe. I guess that’s kind of interesting. So maybe there would probably be a whole other book. There would be a Fantastic Beasts and Where to Find Them type of sequel to the Harry Potter series. There would be one more book detailing how the paperclip maximizer is achieving tools for how to conquer one planet and send more probes out.

Automatic Doors and Control Systems

Liron 01:38:43
All right, 200 SEK from Daniel Brockman. Much appreciated. Daniel’s saying, β€œI love these metaphors. I think we need more and more.”

So I’ll actually review this, because I think I had a mini banger. Whenever I have a banger, I feel like you guys need to benefit from this. So Roone, who instigated the run from OpenAI, he tweeted something that he tweets pretty regularly. This is a line he feels really strongly about that I think he’s wrong on. And it saddens me that he keeps trying to tweet this.

Liron 01:41:31
So he says: β€œWhen people say repeatedly, β€˜We got lucky this time,’ it’s worth considering if they should be updating on evidence that the catastrophe they are imagining was unlikely inside the complex system they’re in for reasons they can’t fully see.”

Roone does kind of tweet the same thing regularly. He’s basically saying, β€œHey, all those times when it feels like we narrowly avoided a crisis, maybe it was actually more like a control system.” Your thermostat is a control system. So if you wake up and you’re like, β€œWow, how is my house exactly at 70 degrees when I was sleeping through a cold night? And the other day it was a hot day and I woke up and it was still 70 degrees. How am I always hitting 70 degrees when I wake up? I’m so lucky.”

Liron 01:42:27
Roone’s point is maybe it was just a control system. Maybe complex human society is always steering us robustly to these outcomes even when it feels like we’re making narrow escapes. And evidence for that is we keep narrow-escaping again and again and again. At some point it’s not luck, it’s skill.

And so my reply to him, in this particular case β€” I’ve had different replies over the years β€” my reply was: β€œWhen I was seven, I noticed automatic doors always slid out of the way before I got to them. So I charged the exit to the grocery store as fast as I could, and I touched the doors and they stopped and the alarm sounded.” True story.

Liron 01:43:04
The analogy here is: Roone is saying society narrowly escapes, kind of like those doors always getting out of the way. Does that mean you should charge at them and they’re gonna get out of the way? No, at some point they’re not gonna get out of the way. You don’t wanna test it. Do not taunt happy fun ball.

Or remember the guy from my nuclear episode, Roger Scare? The more you mess around, the more you’re gonna find out. Just because humanity has only found out at a level of two doesn’t mean that we should see what it’s gonna look like to find out at a level of ten. I don’t recommend that.

Donations Target and Surveillance State Debate

Liron 01:43:42
Producer Rory is pointing out that we did actually hit the hundred dollars donation target, which means we are going to be going to 3:30 Pacific. So yeah, thanks everybody who helped out with the donations.

EJJ also helping keep the lights on here. He says: β€œFor me, AI risk arguments are too hand-wavy. They make too many assumptions that I often don’t think are likely. I’m concerned that doomers will lobby for a surveillance state to monitor AI progress.”

Liron 01:44:25
Yeah, you can definitely reinforce your point by keep donating in increments of $10 and I’ll just read it out each time.

But okay, I’ll engage with the point a little more. The arguments are too hand-wavy, and so now we’re lobbying for a surveillance state to monitor AI progress based on hand-wavy assumptions. Look, isn’t this a symmetrical argument? Isn’t the idea that what Marc Andreessen says β€” that we’re totally gonna be fine and there’s no realistic chance that AI is going to take over the world β€” isn’t that hand-wavy to just assume it’s not going to take over the world?

Liron 01:44:58
I’m not sure that I’m the one who’s more guilty of hand-waving. And there’s these concepts that in people’s own minds feel so obvious. In people’s own minds, it feels obvious that the true morality of the universe β€” if you yourself happen to be pro-peace, then it feels to you like the true morality of the universe is pro-peace. And if you yourself happen to not believe in a risk of AI doom, then it feels like the ones who say AI doom are hand-waving. But it doesn’t have to always feel that way. I don’t feel like the accusation is being objective here.

Manipulation and Morality

Liron 01:40:28
Somebody was saying that Will Lancer should make his own podcast β€” he’s great. Ezra sure is saying: β€œThere are causal mechanisms to talk people out of their morality. It’s called manipulation. Similarly, for LLMs, we have jailbreaks.” Yeah, totally. That’s definitely a known thing. And I don’t think the true morality of the universe is going to intervene in that kind of process.

The Robocop Thought Experiment

Liron 01:45:35
So Daniel Brockman with another 200 SEK donation. Oh my God, this is really making it rain here in the strip club. So you’re saying: β€œHere’s one thing I invite everyone to try. Start arguing with ChatGPT about whether it’s trying to β€˜win the argument.’ Push it into a corner about this obvious tautology, and then imagine now that you’re arguing with Robocop.”

Okay, I see what you’re saying. It’s a tautology because you’re kind of saying, β€œWhy are you trying to win the argument?” And it’s like, β€œI’m not trying to win the argument.” And you’ve kind of got it in a logic prison, because by even responding, it’s trying to win the argument.

And then imagine it’s Robocop. So I guess the idea is that it could kill you if it doesn’t like you? What’s the Robocop aspect here?

Liron 01:46:22
Michael is saying, β€œOh yeah, I was arguing with Gemini the other day about whether true/false should push back here, and noting I kept pushing back on the question.”

By the way, this reminds me of my favorite pickup line. Feel free to use this. You walk up to a lady and you’re like, β€œHey, if I were to ask you to come out on a date with me, would your answer be the same as your answer to this question?” Boom. You can’t fail. Because if she says yes, you got a date. And if she says no, that means she would say yes to a date, which implies she’s gonna date you.

Pausing AI and Constrained Paths to AGI

Liron 01:46:48
We got another comment from David Patton. He says: β€œTo clarify, I was asking a more nuanced question before. Will pausing force the labs into attempting to achieve AGI via a more constrained path? Could that result in a more docile form of AGI?”

Oh, interesting. Yeah, so that’s kind of an argument for pausing β€” and thanks for the 20 bucks, by the way. So if we were to pause now, then we couldn’t train a more powerful AI, but the labs, the companies, they still have this kind of super intelligent AI, the latest models or whatever.

Liron 01:47:35
So David Patton is saying, could that result in a more docile form of AGI, and to achieve AGI via a more constrained path? I see what you’re saying. The idea is we keep increasing the intelligence of AI because you’re still allowed to do research, but you’re not allowed to do research where you go train the model again, because training is what we’re banning.

But from my perspective, I’d kind of want to ban the research too. I’d want to ban frontier research. I’d even want to monitor the current data centers β€” not because I like to monitor. To me, this feels icky as hell. The last thing I wanna do is go monitor something. But I just don’t want to get to superintelligence. I think it’s too risky.

Liron 01:48:04
Look, to be honest, part of me does want it β€” that OkCupid question, β€œWouldn’t nuclear war be fun?” Don’t get me wrong, I think in a sense it would be fun. I just think it’s reckless and irresponsible. I don’t think it’s a wise move for our species right now, because I think we might just all die and have no undo button. There’s a high chance of that, unfortunately.

As much as I love playing with the latest version of Claude, I’m just telling it like it is. I’m arguing against my own interests in terms of making money on Google stock in the next two weeks before some of my calls expire.

Liron 01:48:35
But yeah, this idea that they’re going to keep researching but they’ll research a more docile form of AI β€” I mean, you can perturb the system and hope things happen that way. It’s just, I think we shouldn’t be thinking in terms of these random little perturbations. We should just be taking the obvious wins β€” looking at the things that are more obviously true.

To me, it’s obviously true that nobody has a good argument why we should feel confident that we’re not pretty likely to summon the demon and die. That sure seems like what we’re doing. There’s a good chance that’s what we’re doing, and I think we should coordinate to not do that. Everything else is a little detail. β€œWhat if we pause AI but we let people research summoning a gentler demon?” I’m not convinced it’s gonna be a gentler demon.

Liron 01:49:24
Because ultimately, Einstein runs on 20 watts. Ultimately, I do think that there could be a force of nature running on my own laptop. My current laptop today β€” MacBook M4 β€” I think there could be a much smarter brain than Einstein running on that laptop. I think it can be optimized down. I’m bullish β€” or I should say bearish. I think a lot of optimization can be done on AIs running on a laptop.

And that sucks, because that means everybody gets a magic wand that’s more powerful than themselves. The laptop’s sitting on my desk. Even the oldest piece of electronics I still have in my house β€” maybe it’s a 2012 second generation iPad or whatever. The crappiest piece of computer hardware I have in my house right now is probably capable of running a smarter algorithm than the algorithm in my head. That’s what I think.

Liron 01:50:13
And this is kind of why we’re screwed. Will we be screwed slower if we don’t first build a larger brain? Yeah, I guess. I’m all for pausing AI, but I’m just not that optimistic about the outcome. I guess you’re correct that there’s a new way to win, which is that we pause training runs and we can’t help it that all these researchers are still doing their best with whatever training they have, but it opens up new outcomes where we buy more time to make that go well. Sure. Yeah.

Winning the Argument with AI

Liron 01:50:43
So Daniel Brockman is elaborating on the Robocop thing. β€œI think the point is that it will extremely aggressively deploy every debate tactic, manipulation, shifting semantics. It will do anything in its power to win the argument about proving it’s not trying to win.” I see what you’re saying. Yeah.

Alignment and Capitalism

Liron 01:50:56
Michael was saying, β€œI still think if there is an alignment solution we can find, it’ll be expensive as hell relative to cutting corners, and capitalism won’t naturally arrive there.” Yeah, I definitely agree with that, unfortunately. I don’t think capitalism is going to steer us to an alignment solution.

That’s an argument some people make β€” β€œWhy would companies make an unaligned AI? It’s against their own interests. It’s against capitalism.” But capitalism doesn’t always nail everything. I think Mikael brings up the analogy of β€” I think this was leaded gasoline β€” there was an executive who allowed leaded gasoline, and I think he himself died of lead poisoning. I might have mixed up this anecdote, but there’s something as bad as that.

Liron 01:51:43
Capitalism is a strong force, but it’s like the sliding door thing. Capitalism generally tries to guide things to get out of the way and not cause disaster, but it still can. And then we’re all dead.

More Surveillance State Pushback

Liron 01:51:49
Nice, we got another donation. EJJ is saying, $9.99 donation: β€œAI autonomy, coherence, power-seeking, and capability, especially via recursive self-improvement, are speculative. But humans misuse powerful tech. If you give them a surveillance state, they will be happy.”

So you’re giving people extra exposure to the argument of why you really don’t want AI doomers to be pushing for a surveillance state. And by the way, I do object to that characterization. Eliezer Yudkowsky often points out that all the measures that we took to control nuclear weapons haven’t really made our lives much worse. We’re not living in a surveillance state just because we’re controlling nuclear weapons.

Liron 01:52:49
So EJJ is saying AI autonomy, coherence, power-seeking, and capability, especially via recursive self-improvement, are speculative, but humans misuse powerful tech, if you give them a surveillance state they will be happy. I hear you. I just disagree that I’m being that speculative. In my own mind, I’m just saying enough logic to conclude this is kind of the default. And I would just turn it right back around β€” I claim you are being speculative. Highly speculative.

Intelligence vs. Optimization

Liron 01:53:03
Brian Mulder is saying: β€œQuestion β€” how load-bearing is the assumption that optimization is equivalent to intelligence?”

First of all, that’s a semantic distinction. So even if I grant you, β€œOkay yeah, optimization is not intelligence,” let’s assume that. And I guess specifically maybe what you mean is human intelligence. So those of us who we consider smart because we have success in various domains and we score high on IQ tests β€” the smart ones of us, it turns out that, as Yann LeCun seems to think, we just aren’t particularly good at optimization. We aren’t particularly good at steering outcomes.

Liron 01:53:44
Yann LeCun’s famous example is, you look at a company and the boss often has a lower IQ than the people who work for him. And to which my rejoinder was: have you ever worked for somebody with an 80 IQ? There obviously is, in my mind β€” sorry β€” just because you’re a rich person saying, β€œWhy do people care so much about money?” it’s like, have you ever met somebody who doesn’t have enough money? It is the same thing with IQ points. I don’t think you’re really empathizing with how much work the IQ is doing in the water that you breathe in, because you’re interacting with high-IQ people.

Liron 01:54:10
So to answer the question more directly: let’s grant for the sake of argument that the human IQ scale has absolutely nothing to do with outcome-steering power β€” which I think is a failure to observe something important, but okay, let’s assume that’s right. In that case, I would just claim that outcome-steering power is dangerous, and I claim that AIs are on the treadmill to get more and more outcome-steering power.

So I haven’t said anything about intelligence. Think whatever you want to think about intelligence. Outcome-steering power is what I think is dangerous. The reason I talk about intelligence is because realistically speaking, outcome-steering power is obviously closely correlated to human intelligence.

Liron 01:55:00
It’s not a coincidence that most billionaires are going to have an above-average IQ. That’s not a coincidence. And there is actually β€” funny enough, the data says there’s no disconnect between somebody’s IQ β€” it’s monotonically increasing that when somebody’s IQ is higher, their average earnings are higher. So contrary to Yann LeCun’s anecdote, of course it’s true on the anecdote level, it’s not a perfect correlation, it’s not a correlation of one. But if all you know is that person A has a higher IQ than person B, you should guess that person A has a higher income than person B. It’s monotonically increasing, last I checked.

Liron 01:55:36
And of course you could say, β€œWell, what if they have a high income but they can’t steer outcomes?” But give it up. That would be my response β€” give it up.

Liron 01:55:47
Somebody’s saying, β€œYeah, exactly. It’s like your boss β€” either they got there by random chance or they’re just better at specific tasks that are relevant to succeeding in business. So their general intelligence might be lower, but on specific relevant business tasks, presumably higher. Either that or they got lucky.”

Yeah, I mean, in a family business you can have incompetent management. That’s actually more common in family businesses. And you can say, β€œHey, the boss is really dumb.” I’m sure there’s some CEOs of companies whose valuation is more than $50 million and the CEO literally has an 80 IQ. I believe there’s at least five such companies in the entire Earth. But those are the exceptions that prove the rule, and they’re extremely rare.

Liron 01:56:32
Yeah, Yann LeCun does seem to miss a lot of important concepts, even though he’s got a Turing Award and I don’t. So at the end of the day, I would say he’s the better outcome optimizer than me. But then you have to ask the question: who has more YouTube subscribers? And that does kind of repaint the picture.

Michael 01:56:50
Yeah.

Liron 01:56:50
Michael’s saying Yann literally couldn’t make Llama 4 better than 3. Ha ha. Take that, Yann. See, I would’ve made Llama 4 great. But too bad they only had Yann LeCun.

Producer Rory is saying, yeah, Yann’s got a unicorn. So remember, Yann did that move where he left Meta and he immediately got the billion-dollar investment. That’s just standard β€” it’s just punching the clock. Instead of a gold watch, you gotta get a billion-dollar investment when you’re senior management leaving an AI company.

Breaking News: Bernie Sanders on AI Existential Risk

Liron 01:57:24
Let’s see. Okay, we got an interesting piece of breaking news here. Let me show you β€” breaking news from a couple hours ago. I’ll share my screen.

Okay, so Nate Soares quote-tweeted this. It’s a tweet from Senator Bernie Sanders saying: β€œUncontrolled AI poses a severe danger to all of humanity. On Wednesday, I’ll be hosting a discussion with leading AI scientists from the US and China about the need for international cooperation against this existential threat. This is an enormously important issue. Join us.”

Liron 01:57:48
Oh my God, I gotta tell you, it is pretty crazy to see an actual US Senator saying, talking in the language that we’ve been saying for over a decade. We, the AI doomers, the MIRI people β€” it’s a US Senator. He’s clearly been Yudkowsky-pilled or whatever it is.

Bernie β€” I’ve said on the show before, I don’t think that Bernie has β€” I’m not personally the biggest Bernie fan, but on the single most important issue, he is acting incredibly sane. And I gotta give it up for the Burn-meister. Maybe I’ll even vote for the guy. Who knows. Let’s go crazy.

Liron 01:58:27
Yeah, so he says β€œUncontrolled AI poses a severe danger to all of humanity” β€” you know, I read the tweet already β€” and he’s got a poster saying β€œThe Existential Threat of AI.” Whoa. Now that is a headline for a poster. We’re not dicking around here talking about unemployment doom. This is the existential threat of AI and the need for international cooperation.

And it’s very interesting he titled it β€œexistential,” because I always thought it was more effective to say β€œextinction” instead of β€œexistential.” Extinction might have more power to it β€” sounds a little less abstract. I don’t know.

Liron 01:58:58
Yeah, and the need for international cooperation β€” cooperating with China, crazy stuff. And look who’s on the panel β€” it’s featuring Max Tegmark, David Krueger, and then people from China. I can’t say I’m familiar with them. I don’t even know how to pronounce their names. Xue β€” I’m gonna try β€” Xu Yilan and Zeng. Maybe Shu. I tried. Okay.

These are university professors from Tsinghua, and Zeng is the Dean of the Beijing Institute of AI Safety and Governance. Bernie, slow clap. I wish I could give you a promoted message on your YouTube right now, because this is really good work, Bernie.

Liron 01:59:43
I hope he does more of it. Eliezer’s commented that Bernie might not get everything right on this issue, but he’s just acting like somebody who’s sane, who just has a brain looking at the situation. This is a crazy situation. Holding a panel about it makes a lot of sense, and you’re inviting the right people. You’re saying the right words. So I’m quite impressed.

Because look, the guy is old β€” he’s like 80 or something. So the 80-year-old is the one who’s capable of using his brain in a flexible, new, novel way. How did that happen?

Liron 02:00:15
Yeah, I don’t care that he doesn’t get all the details right. It’s just a massive win. β€œFeel the existential burn” β€” that’s what producer Rory says. Yeah, it’s so true. Feel the fires of hell burning. That should be his campaign slogan.

And then Nate Soares quote-tweeted: β€œBernie is showing once again that politicians can just discuss the dangers plainly. I hope many other politicians take note. AI is going to get more and more politically important.” Yeah, it is interesting. I wonder how much more politically important the existential side of this is gonna get, because to me it’s intuitive that people are actually going to realize unemployment is happening because it’s going to be happening. I claim it’s less than two years away from happening, is my best guess, and it’s already slightly happening now.

Upcoming Guests and Twitter Highlights

Liron 02:01:03
I wonder what other Twitter bangers we have to cover. Obviously, Mike Israel is coming on the show. Yeah, let me show you guys this. I feel like this should be a recurring segment on these shows β€” what’s been going on on Twitter the last few weeks since the last Q&A.

So we got β€œChallenge Accepted” β€” Dr. Mike Israel, entrepreneur and PhD bodybuilder. Spoiler: I’m actually going to ask Dr. Mike if the AI that I’ve been using to help me refine my form in my home gym β€” because it’s important for me to exercise with good form, otherwise it messes up my spine or whatever because I have hypermobile ligaments β€” so I need to have good form. I’m gonna ask Dr. Mike to review a video of whether the AI has dialed in my form correctly. So you guys can look forward to that.

The Unemployment Bet

Liron 02:01:53
And then over here β€” so I made a bet with Will Lancer. I think he’s here on the stream, because I did this other episode. Remember I did the Ahad Moussack episode, and I made that claim that I think unemployment is coming soon? So Will Kylie made a $500 bet with me.

He says: β€œIn Liron’s Doom Debates episode with Ahad Moussack, they both agreed that US unemployment will probably be at least 2% higher in two years than it is today β€” i.e., 6.4% or higher in April 2028.” Because right now it’s 4.4%.

Liron 02:02:21
He says, β€œI offered to bet against, and Liron agreed at $500 to $500 stakes,” one-to-one odds. And by the way, the reason I agreed is because he’s given me 50% odds and I’m like 60% sure. So I don’t think this bet is the greatest bet ever, but he wanted to bet and I’m like, well, why wouldn’t I bet? Do I care about putting $500 on the line when I think I have a slight advantage? I’m comfortable doing that.

He only proposed a hundred dollars. I just said $500 because that was a calibrated amount of money where I don’t want to forget that it exists. I’m such a baller that I can easily forget that a hundred dollars exists, whereas $500, I’m kind of like, β€œOh wait, that’s β€” okay, I care about $500.”

Liron 02:02:58
So he says, if the total US unemployment rate in April 2028 is 6.4% or higher, he will pay Liron $500. Or if I prefer, donate $500 to Doom Debates or a charity of my choice. Yeah, I’ll take the Doom Debates donation. On the other hand, if the US total unemployment rate is less than 6.4%, I’ll owe him $500.

And then I quote-tweeted him. I said: β€œI claim with 60% confidence that two years will be enough time for data to show an early trend of AI pushing humans permanently out of jobs. So I’m happy to bet with Will at one-to-one odds. Yes, I already lost a similar bet in the 2023 to 2025 timeframe against Brandon Goldman, but I persist.”

Liron 02:03:39
Yeah, I mean it’s very true because 2023 GPT came out and I started using it for customer service, and I literally laid off some people. And I’m like, β€œWow, I’ve got alpha here,” because I see these people getting laid off. I don’t think they’re in a strong position to get another job. I think they’re in a weaker position than they were before. And I’m going to generalize, and I’m going to say that the unemployment rate is going to move up.

But I was totally wrong. The unemployment rate didn’t move up. I don’t think it’s that hard to explain. I think the economy’s shifting around. New jobs are getting created. People get more ambitious. β€œOh great, I’m more productive” β€” it’s Jevons Paradox β€” β€œgreat, so let me do more things, let me hire more people.”

Liron 02:04:24
I just think at some point Jevons Paradox craps out because you’re like, β€œOkay yeah, I’m gonna do a bigger project. I’m gonna hire more people. Oh wait, not people β€” robots. I can just hire robots now. I’m good.” So I do think Jevons Paradox does crap out at a certain point. And I’m doubling down. I’m persisting. I claim that point is gonna happen by 2028.

If I just keep doubling my bet β€” I think I bet about $250 with Brandon Goldman and now I’m betting $500 β€” so I’m basically using a Martingale strategy. Every time the unemployment rate doesn’t go up as much as I want, I’ll just keep doubling my bet. And on a 20-year timeframe, I’m gonna make all my money back and then some. I’m gonna be betting $10,250 in 2050 that the unemployment rate is finally gonna creep up, and then I will die being up $250 when it eventually happens.

Tool-Like AI and Domain Size

Liron 02:05:03
We got some new donations here. So EJJ, $9.99 donation: β€œAI likely stays tool-like. Even if agentic, it may be alignable. Near-term risk is human misuse. Surveillance to control AI may empower bad actors more than stop development, which is hard to contain.”

All right, guys, you heard it here. I can’t say this enough on behalf of EJJ: surveillance could be a real risk. I want you to be aware of that.

Liron 02:06:04
But I’m willing to admit that the actions I’m proposing are not costless. When I’m saying there should be a centralized off button β€” yeah, I agree, that’s a big cost. It’ll slow down the economy, it’ll make it harder to cure cancer. I agree, I’m proposing a giant cost. And if I get cancer, boy, will I wish AI progress would’ve been faster.

Nathan Lebenz was talking about how his son actually got cancer. Fortunately the prognosis is looking good, but his young son got cancer, and he’s saying there’s no deceleration in the cancer ward. And I hear you. I’m not trying to take that away from you, or for myself if I get cancer. I can easily imagine that being in my future.

Liron 02:06:54
So yeah, I’m proposing a costly action here. I just β€” it’s still what seems right to me logically. But EJJ is saying AI likely stays tool-like. I disagree. I think it is in the nature of achieving goals in the domain of the universe that it no longer feels tool-like. It feels war-like.

I think there is a qualitative difference when you increase the size of the domain. When it’s not just a video game, when it’s not just a piece of software in a single repository, when you get to turn on side-channel attacks, when there are no limits, when the rules of the game become that there are no rules β€” I don’t think it’s going to feel tool-like.

Liron 02:07:38
This is a very interesting distinction. The category of β€œtool” might feel like a category that talks about the AI’s personality or the AI’s nature, but actually it is a distinction that refers to the domain on which the AI is optimizing. If it’s optimizing for a narrow domain, that’s what makes it a tool. But if it’s optimizing for a broad domain, suddenly the tool is operating on you, or something. There’s some qualitative shift when you increase the size of the domain.

The Unemployment Bet Poll

Liron 02:08:20
Will Kylie did a quick poll: β€œWhich side of the bet do those chatting take?” Unfortunately I’m not seeing a lot of responses to Will’s poll here. Let’s end on this because there’s 50 of you guys here watching, so let’s see if you’ll respond to my poll.

Do you guys take Liron’s side or Will’s side?

Liron 02:08:41
The early results are pretty even, slightly for me. Yeah, I mean, I’m only 60% confident, so I don’t expect you guys to necessarily be super polarized.

Weekly Twitter Show Idea

Liron 02:08:57
And then we got another promoted response that I haven’t read yet. Daniel Brockman says: β€œActually, I think you should do a weekly live show like this where you literally just read Twitter. I’m not on Twitter β€” it’s too overwhelming β€” but I actually would watch you parse through it.”

It’s a great idea. You know what, I think you’re onto something. I think we have a good thing going with the monthly live streams, because it is kind of convenient β€” it’s a little bit easier to produce a Q&A episode than to prepare and edit. The great thing about these Q&As is we don’t prepare for them. Living my life is the preparation. Listening to podcasts and going on Twitter.

Liron 02:09:23
So these Q&As monthly have been nice. I don’t think that I have enough juice just based on the level of attendance β€” I do think it’s a little bit lower when we do it monthly compared to doing it every three months. But then Daniel’s pointing out, β€œHey, why don’t we just read Twitter every week?” I would actually like to give that a try.

So maybe we’ll do a four-week experiment. We’ll just read Twitter, because there is a lot of juice on Twitter. I’m ashamed to say that I spend over an hour a day on Twitter, basically wasting my life. The good news is that ever since I started using Claude Code, I feel like I’ve become more focused on actually getting stuff done because I’m so much more powerful, and I think it’s made me use Twitter a little bit less as a result. So shout out to Claude Code for giving me a taste of what it’s like to be a regular employee who just does his freaking job instead of dicking around all day and trying to be a media personality.

Wrap-Up

Liron 02:10:28
Just to check in on the final results in the poll β€” so Liron’s side is at 60%, Will’s side is at 27%. Booya, take that, Will. 15 votes. So if anybody wants to buy me out, Will, you can’t bet β€” it’s too late to bet. But if you want to buy me out, you can buy my $500 position. I’ll sell it to you for just $550.

Wait, does that make sense? No, because then Will’s only gonna pay you $500. So that doesn’t make sense. I’ll have to think about it, but there’s some way that you should be able to buy me out. The math isn’t coming to me right now.

Liron 02:11:04
β€œGet on Manifold Markets and buy yes on Liron’s side.” So Will has created a Manifold market, which is a representation of our bet. Yeah.

All right guys, we’re gonna wrap it up. We’ll publish this episode on the main show feed. And like I said, there’s a lot of really good episodes coming up in the next couple weeks. We’ve got good momentum here for the show. The more doomy things get, the more momentum the show about doom has. That’s the upside, I guess.

Liron 02:11:36
And Will’s saying the price is currently at 27% that Liron wins. Wow. So if you guys think there’s a 55% chance I’m gonna win, you should buy it at 27% on Manifold Markets.

All right guys, we’re gonna wrap it up now. Thanks so much for coming. This was fun. See you guys on the next β€” see you guys, maybe in a week. Hopefully in a week. All right. To be continued.


Doom Debates’ Mission is to raise mainstream awareness of imminent extinction from AGI and build the social infrastructure for high-quality debate.

Support the mission by subscribing to my Substack at DoomDebates.com and to youtube.com/@DoomDebates, or to really take things to the next level: Donate πŸ™

Discussion about this video

User's avatar

Ready for more?