Multiple live callers join this month's Q&A as I react to Dwarkesh Patel's $20k blog prize, debate the orthogonality thesis from first principles with a live viewer, and welcome Bernie Sanders aboard the Doom Train! π
Timestamps
00:00:00 β Cold Open
00:01:00 β Welcome to Doom Debates Live!
00:01:30 β What Do You Think of Open Source Models Out-Benchmarking OpenAI and Anthropic?
00:04:55 β Michael Cheers Joins: What If We Don't Give AIs Full Situational Awareness?
00:11:55 β Thoughts on Mythos' Hacking Abilities?
00:15:43 β Liron Reacts to Dwarkesh Patel's $20K AI Questions
00:23:28 β Pretraining Goals vs RL Training Goals
00:28:58 β Mental Model of Yudkowsky-ians & the IABIED Claim
00:37:24 β You Can't Hide Reality from a Superintelligence (The Truman Show Analogy)
00:42:57 β Back to Dwarkesh's Questions: When Do AI Labs Start Making Money?
00:48:50 β Upcoming Guests Reveal!
00:51:35 β Will Lancer Joins: Is The Yudkowskian Thesis Credible?
01:27:03 β Back to Answering Questions from the Chat
01:33:28 β The Cameraman Always Survives Analogy
01:40:52 β Liron's Banger Response to Roon's Tweet
01:47:00 β Nuance About Pausing AI Development
01:50:57 β Capitalism Isn't Going to Steer Us to an Alignment Solution
01:53:10 β Is Optimization Equivalent to Intelligence?
01:57:21 β BREAKING: Bernie Sanders on the Existential Threat of AI
02:01:12 β Spoiler for the Upcoming Mike Israetel Episode
02:01:57 β $500 Bet on AI Unemployment
02:05:46 β Misuse, Surveillance, and the Real Costs of Pausing AI
02:11:04 β Wrap-Up
Links
Dwarkesh Patel, Blog Prize for Big Questions About AI β
Doom Debates episodes with Steven Byrnes β
Nick Bostrom, Deep Utopia: Life and Meaning in a Solved World (Amazon) β https://www.amazon.com/Deep-Utopia-Meaning-Solved-World/dp/1646871642
Yudkowsky & Soares, If Anyone Builds It, Everyone Dies (book) β https://www.amazon.com/If-Anyone-Builds-Everyone-Dies/dp/0316571253
Transcript
Cold Open
Will Lancer 00:00:03
I heard this recently from an AI safety researcher and it was: what if you found out that all of your morals are obviously fake? I donβt understand why you would stay so attached to them.
Liron Shapira 00:00:12
When I was in college, the fundamentalist Christians that I went to college with would always be like, youβre an atheist β if I were an atheist, Iβd go around stabbing people. Iβm only good because I listen to God.
Will 00:00:22
I donβt think serious Christian philosophers think this anymore.
Liron 00:00:26
It feels intuitive to you that because we created AI, the AI knows that it owes us letting us debug it. But if we give it certain preferences by default, itβs just going to go with the original preferences.
Will 00:00:38
It doesnβt seem that extreme to me of a belief where itβs trying not to kill people, trying not to do this, trying not to do that. It feels like a one-in-a-million YOLO of these preferences.
Liron 00:00:49
Yeah, look, itβs β we maybe β
Welcome to Doom Debates Live!
Liron 00:01:00
Friday, April 24th. Welcome to Doom Debates Live. Hi everybody.
All right, so your questions are coming in from the chat and you guys have these crowns in YouTube. Iβm seeing Michael Cheers, 803. For some reason heβs got the number one crown β I guess heβs been hyping up the show, so thanks Michael. Appreciate that.
So Michaelβs saying βI want to call in.β Okay, let me get you guys the call-in link. This link right here that I just pasted in YouTube, this is the live call-in link and anybodyβs welcome. We are an equal opportunity debate host here.
All right, so somebodyβs saying: what do you think of open source models like Kimi starting to out-benchmark OpenAI and Anthropic?
Liron 00:01:39
Thatβs interesting. When you say out-benchmark, do you mean out-benchmark their open source models, or are you specifically thinking about the cutting edge GPT-5.5 getting out-benchmarked? Iβm not entirely sure what you mean, so maybe clarify that.
Hey, Pun Master, Pun Master crown number three. Yeah, definitely an active commenter. Iβm seeing a lot of Pun Master comments on the videos. Thanks for your engagement. All right, we got Producer Ori in the chat. Everybody say hi to Producer Ori. Letβs give him a like reaction.
What Do You Think of Open Source Models Out-Benchmarking OpenAI and Anthropic?
Liron 00:02:08
So yeah, in terms of out-benchmarking, I do think itβs correct. I think I did hear that in terms of OpenAIβs official open source models getting out-benchmarked by some of those other open source models. The consensus is just that open source tends to lag six months behind. The only question is: is it gonna lag three months behind, six months behind, twelve months behind? Probably always going to be in that range.
And it is interesting because the question is, what advantage do these frontier companies have? Why are they gonna defend their gross margin when thereβs always these open source solutions nipping at their heels?
Thereβs a couple different answers. The first answer I would give is: we can make an analogy to other things that are kind of commodities. Cloud computing is kind of a commodity, and yet Amazon AWS, Google Cloud β these all have healthy margins, even though you can ask, why do they have healthy margins? Thereβs other clouds that spin up, arenβt the clouds getting competed down?
Liron 00:02:56
Well, for whatever reason, they have huge scale and they have healthy margins. Thereβs just a few huge scale players, and they all have healthy margins. People are happy to use the clouds. I use the cloud for my business and yeah, am I paying 20% more than I have to? Sure, but do I care? No, because itβs still a good deal. Iβm getting a lot of value running these servers.
So that business model might very well apply to AI tokens. If the singularity doesnβt happen, if we still have normal life and weβre paying for all these tokens, maybe we just pay at a price that gives OpenAI some profit as opposed to running an open source model. But do we want to deal with an open source model? No. Weβd rather just pay a little more and run the hosted model.
Liron 00:03:32
So in terms of economic analysis, I think itβs fine. It could work out the same as cloud computing. But then thereβs this larger question of, what if the singularity is happening? I claim we are gonna enter the singularity, I claim we are gonna FOOM and everything.
In that scenario, I think the theory is that if you have the number one model thatβs not open source yet, sure open source is nipping at your heels three months behind, but it doesnβt matter because youβre gonna get this decisive advantage. Youβre gonna enter the positive feedback loop, your AI is gonna build the next AI and so on. And so your three months will turn into infinity.
Liron 00:04:00
Itβs a very interesting situation because thereβs just so many sources of pressure. Running out of money, thatβs a source of pressure. Open source AI nipping at your heels, thatβs a source of pressure. Other for-profit competitors, source of pressure. Itβs a hothouse environment.
Itβs kind of Yudkowskyβs worst nightmare compared to 2015 when it was still going slower and the AI community was small and everybody knew each other. And now itβs an all-out free-for-all, no holds barred, no rules. Nobody can stop the train.
So going back to your question, yeah, the fact that open source is close is just yet another pressure cooker element. Crazy, crazy times. To Yudkowsky, the game board has been played into an awful state. Itβs a really bad place to try to strategize how to win.
Michael Cheers Joins: What If We Donβt Give AIs Full Situational Awareness?
Liron 00:04:48
Send in your questions. Letβs do the call-in. Here we go. We got Michael Cheers on the call-in.
Hey, Michael Cheers.
Michael Cheers 00:05:04
Hi, can you hear me?
Liron 00:05:05
Yeah, I can hear you fine.
Michael 00:05:06
So yeah, I was just curious on your thoughts on whether the AI companies could go with a safer approach than what theyβre doing now. Because I think the current approach is kind of dangerous, in terms of training the AI so that it knows itβs an AI and everything. It knows all about humans. It seems very β
Liron 00:05:23
Okay.
Michael 00:05:27
Do you have a specific proposal? Letβs say you have the LLMs make an alternate world, right? Then thatβs the training data you give it. You donβt train it on any data from the human world. From there, if you want to have it reason about human things, youβd only give it the in-context information it needs. And that way it has a lot harder ability to break out, right?
Liron 00:05:51
Yeah, interesting idea. So just to summarize: you donβt just give the AI full situational awareness, you give it kind of need-to-know basis. βHey, Iβm asking you a question. Donβt think too hard about everything.β
I think research there is great. Whatever we can learn in that direction is great. But there is a fundamental problem, which is that if you are very intelligent, if you are very good at solving problems, itβs natural to just be like, βOkay, whatβs my situation? What could I learn about the situation?β The same way that humans ask, βWhat are the laws of physics? What are the rules of the video game here?β
Liron 00:06:21
And itβs hard to avoid learning things about how to break out or how to manipulate people because these levers are there. And if people are trying to hide them from you, youβre still gonna see signs that they tried to hide things from you.
So I guess youβd have to be steered β it has to be in your nature to not be too curious and focus on the problem. And current AIs do seem to already do that. So the part where we donβt tell them that much about the situation β I donβt know how much work that would do.
Failure of imagination is not something that I would count on for a superintelligence. One way to think about a superintelligence is that it really does see all the possibilities. Enumerating possibilities is a pretty fundamental skill. When youβre building an intelligence from scratch, you basically canβt miss it.
Liron 00:07:07
Even humans who donβt realize theyβre doing it, they do it in certain domains. When youβre intuitively good at something, your brain is doing it even though you canβt do it in a general capacity. So Iβm just not optimistic about a world where there are things that would be obvious to a smart human thinker and yet the AI is somehow never thinking about it. I just donβt think thatβs how a plausible win looks.
Michael 00:07:31
I guess the thinking would be that itβs an incremental approach. You have your AI, it tries to make the world even more unlike our world, add better safeguards to stop it thinking about simulation theory, et cetera.
Liron 00:07:43
Itβs an idea that has a simple model in theory, but I would want to look for more specificity in the proposal. The idea is that every intermediate AI is so perfectly aligned that we can trust it to build the next one, but also to robustly secure the next one. And Iβm β
Michael 00:07:59
The security model would be you try to stop it understanding that itβs an AI and itβs in a simulation and all that. If it understands that, youβve kind of already failed in my model.
Liron 00:08:13
Right, right. Yeah, even we as humans have this hypothesis on the table that weβre living in a simulation. And the only thing that stops us is not that it didnβt occur to us β itβs just that weβre looking around and weβre like, well, I just havenβt collected any evidence. The only evidence we have that weβre in a simulation is epistemological evidence of, why is life so interesting? But I canβt prove it. Thereβs literally nothing I can prove.
Whereas with the AIs, theyβre gonna have a lot of ways to prove it. Our simulation is not going to be as robust as whatever the aliens are doing to us. If itβs aliens β you guys are doing a great job with the simulation. I havenβt found any cracks.
Michael 00:08:45
Yeah, I donβt know. I just think that itβs not anything like what the AI companies are actually doing, which is whatβs interesting about it. Maybe there could be some very different approach where you try and make sure they donβt understand anything. But the current approach is just like theyβre gonna do whatever they want and then try and paper over it. I donβt think that can possibly work.
Liron 00:09:06
Yeah, well, I definitely agree with you there. These are good questions.
Michael 00:09:09
Okay, great. Well, Iβm gonna let you go just because the connectionβs kind of janky anyway, but Iβll give you my response offline.
Liron 00:09:16
All right, thanks for coming on. This was fun.
Thoughts on Mythosβ Hacking Abilities?
Liron 00:09:16
So that is food for thought β this idea of donβt tell it everything, and AI companies arenβt even pursuing this direction maximally.
I think Buck Shlegeris at Redwood Research thinks a lot about controlling AIs instead of aligning AIs. His whole focus is: yeah, maybe the AIs arenβt gonna be aligned once they get superintelligence, and thatβs scary, but thereβs gonna be this whole transitionary period where theyβre gonna be messing with us and theyβre not gonna be super, super intelligent.
So as long as weβre really good at noticing when theyβre escaping and discouraging them from escaping, we can just use all the tools we have before they get way too intelligent to deal with. And yeah, sure, somebody should be researching that. Iβm not against it. I donβt have much hope that itβll work, but it can buy us a few years.
Liron 00:09:58
The idea is intuitive to imagine that it could work. Michael proposed a specific solution: have this other AI thatβs monitoring the thoughts, being like, βAre you thinking about escaping? Are you thinking about how weβre keeping you in the box? Okay, restart.β Itβs this intuitive kind of monitor process.
And the problem I see with that is: I think it can buy us a little bit of time, itβs not a worthless solution, but if you accept the premise that itβs getting superintelligent, the thing is that superintelligences just have so many options. They see so many possibilities and theyβre doing a lot.
Liron 00:10:32
Itβs managing all these things. Itβs got all these child AIs. And you think, whatβs the problem? Iβve got the monitor process. The monitor process is catching things. But itβs just going to correlate β the AI getting more powerful and doing more things is also going to correlate with it somehow getting around the monitor process. I think thatβs a good intuition.
Itβs just gonna be like, βHereβs a bunch of plans.β The monitor process is like, βWell, I donβt really see these setting off the monitor.β But it turns out that all of those plans are just giving the AI enough context to know about the situation.
Liron 00:11:00
You canβt do these hacky solutions. One of the sources of my intuition is that I know a little bit about computer security. If you look at the way that all these clever hackers are getting around things β injecting scripts into different websites. A website thatβll paste whatever you type in onto the webpage, but then you type in JavaScript, a script tag. So the website says, βOkay, Iβm not gonna let you type in a script tag.β
But Iβve seen all these clever hacks of, βOh, well actually, if you type all these crazy characters, you actually get around the logic that was supposed to block those characters.β Itβs just crazy how many degrees of cleverness, how many holes β when you think something kind of should work, it just turns out to have a lot of holes.
Liron 00:11:35
Unless it really is logically airtight. And even if it feels logically airtight, you probably still havenβt thought of the holes. But when it doesnβt feel logically airtight, when itβs just, βLook, weβre monitoring β whatβs wrong with monitoring?β β Iβm telling you the monitoring is not gonna work.
All right, that was a good discussion.
Liron Reacts to Dwarkesh Patelβs $20K AI Questions
Liron 00:11:55
So Pun Master saying, βIβm curious to know your thoughts on Mythos as well. Do you agree with John Sherman that it could be the end of encryption?β
No, itβs not the end of encryption. We have encryption algorithms that Mythos canβt break. RSA cryptographic encryption is based on the difficulty of reversing prime number multiplication. Quantum computers might be able to reverse that, but weβve got other encryption schemes too.
Encryption schemes are actually a matter of computational complexity theory. And as smart as Mythos is, even superintelligent AI might not be able to directly attack the foundations of computational complexity theory.
Liron 00:12:29
I always talk about different ceilings. GΓΆdelβs theorem and logic is one ceiling, and P versus NP is a ceiling from computational complexity theory. Reversing encryption is actually pretty close to that ceiling.
So when I think about AI stealing Bitcoin and stuff, I mostly think of side channel attacks β just convincing a human to give up the wallet, the same way that humans attack other humans that way.
I will say effectively, I donβt think encryption is going to hold back AI because of all these side channels. But can it directly break encryption? Probably not. Thereβs probably some encryption schemes that it canβt break.
Liron 00:13:02
And of course, we know about information-theoretically perfect encryption. If youβve ever heard of the one-time pad β if you just have a big pad of random numbers that you share with somebody, that lets you perfectly encrypt things in a way that nobody can reverse engineer without having the pad of random numbers.
The only flaw with it is that itβs annoying to give them the big pad of random numbers. But if youβre okay giving your friend a giant book full of random numbers, you and your friend can always perfectly symmetrically encrypt things for the rest of your life. And nobody can ever break your encryption unless they have the pad.
Liron 00:13:30
So literally the only disadvantage is that I havenβt gone and met Amazon.com in a dark alley and given Amazon.com a big book of random numbers. But if I had, me and Amazon would be perfectly secure talking to each other for life. Thatβs the only downside β we havenβt met in secret before.
And then thereβs this idea of public key encryption where you donβt have to meet in secret. You can just show up and yell at somebody across the room. It could be a crowded room. Youβve never met the other person. Youβre yelling at the other person across the crowded room. And yet somehow, even though youβve never met the other person, youβre yelling in such a way that nobody else in the crowded room can decrypt what you two are talking about, even though youβve never exchanged the secret before.
Liron 00:14:00
Thatβs the magic of public key cryptography. And weβre not totally sure that public key cryptography is going to remain robust. It might not be. But if I had to guess, I would say that even public key cryptography β even though itβs less provably secure than the one-time pad β I suspect it will never be broken in the general case. Thatβs my guess.
Hope that answers your question. I have a little bit more knowledge on this than the average person because this happens to be the one thing I actually studied in college β a little bit of theory of computation.
Pretraining Goals vs RL Training Goals
Liron 00:14:32
So this guy is saying, βIn other words, oneβs P(Doom) should swing drastically based on a substantive model interfacing with reality. What do you actually think will happen and why? 50% strikes me as a Bayesian trade cop-out.β
Okay, a substantive model interfacing with reality. My substantive model basically gets down to: if anyone builds it, everyone dies. The substantive model is that intelligence is freaking powerful.
Where is your substantive model that explains how humans dominate the other animals? How do you explain that? βOh, because we have a brain. A brain is this magic dominating thing.β Okay, is it the maximally magic dominating thing? No, itβs not. Where does the scale end? It ends at some superhuman entity. That is a pretty powerful mental model. I think that mental model has a lot of predictive power.
Mental Model of Yudkowsky-ians & the IABIED Claim
Liron 00:15:30
All right, hope that answers your question. Each Shiz is out here, welcome. Heβs saying, βLetβs see this guy broadcasting from his face.β Yeah, heβs talking about the guy with the slow connection, Michael.
All right, Producer Ori is saying, βLetβs get a first reaction to Dwarkeshβs $20,000 questions.β Oh, yeah. All right, letβs pull it up. Letβs pull up Dwarkeshβs questions. Iβll go check his Twitter. Here we go. This could be a Twitter browsing session β that could be something we do here.
Liron 00:16:00
All right, here we are on Dwarkesh Patelβs Twitter.
So heβs saying, β$20,000 blog prize to answer some big questions about AI.β Letβs click through.
All right, check it out guys. βBlog Prize for the Big Questions About AIβ by Dwarkesh Patel. He says, βThe not-so-secret point of this whole contest is so that I can hire a research collaborator.β Okay, yeah, I respect that. Youβre running a contest to find a research collaborator.
Liron 00:16:26
All right, so heβs asking these questions with a bounty on them. And the first question looks interesting. He says: βA couple years ago there was this idea that AI progress might slow down as we make further progress into the RL regimeβ β that stands for reinforcement learning β βbecause as horizon lengths increase, the AI needs to do many daysβ worth of work before we can even see if it did it right.β
βSo if weβre still in a naive policy gradient world, the reward signal per FLOP goes down. And we crossed through so many orders of magnitude of RL compute from GPT-4 to o1 to o3, and it would not be feasible to replicate that many orders of magnitude increase in compute immediately again.β
Liron 00:17:04
βBut AI progress seems to have been fast nonetheless, even potentially speeding up if rumors about Spotter Mythos are to be believed. What gives? What did that previous intuition pump that motivated longer timelines miss?β
Okay. First of all, I donβt think I personally was ever staunchly predicting longer timelines. Let the record show that in late 2024 when so many people were like, βOh, those AI-2027 guys are so dumb, timelines could be so long,β I was like, no, no. I donβt think so. Sure, weβve had a few months of relative quiet, but itβs too early to say. I was very clear on that.
Liron 00:17:38
So donβt blame me. But Dwarkesh is asking a good question because there was a vibe shift. After 2023β2024, people were like, βHey, nobodyβs being that much better than GPT-4. Even GPT-5 is not that much better than GPT-4.β Although I pointed out that it kind of was β it actually was better. People just didnβt notice because it slowly got better over the years.
But people were like, βI havenβt been super impressed by AI for a while. Iβm kind of used to it.β But then of course, January this year roughly, Claude got better and suddenly Claude is literally writing all of our code now.
Liron 00:18:08
This is what Dwarkesh is talking about. Heβs like, βOh, it qualitatively feels much better. So why didnβt more people predict that we were due for another qualitative shift?β And by the way, when he says βSpotter Mythos,β Spotter is the OpenAI upcoming version of Mythos.
Let me just make sure Iβve understood everything Dwarkesh said. Letβs read it one more time because itβs super dense β a whole paragraph worth of a question.
Liron 00:18:26
He says: βA couple years ago there was this idea that AI progress might slow down as we make further progress into the reinforcement learning regime.β Heβs talking about how these thinking models β when you train the entire chain of thought, you have to evaluate, βOkay, what did your chain of thought yield? Oh, it yielded something good. Okay, let me reinforce the entire chain of thought.β
As opposed to predicting the next word, where you can predict a hundred different next words in the course of a couple sentences because the sentences have so many words in them. But in the course of an entire problem-solving output, youβre outputting so many tokens and then you only get one bit of information: did you solve the problem or not?
Liron 00:19:09
Thatβs basically Dwarkeshβs point: didnβt we enter a weaker regime? And if we entered a weaker regime, how come we still made a bunch of progress? Itβs a good question. We made a bunch of practical progress.
And then Dwarkesh is also saying, in addition to that, we crossed through many orders of magnitude. We basically used up the hardware overhang. GPT-4 was kind of the first time anybody had taken a bunch of hardware and applied it all toward training AI in parallel. It just wasnβt something we were used to doing. Then suddenly we did it and used the hardware overhang. Now that we have less hardware overhang, how come weβre still making the AI so much faster and better?
Liron 00:19:46
I think I have a simple answer to the second one: an ounce of algorithms is worth a pound of hardware. Iβve always been saying thatβs been a consistent trend throughout history β you can take the same hardware and think harder about your software and make your software better and get more out of the hardware. Thatβs been a very powerful trend.
It doesnβt surprise me at all that GPT-4βs price slashed by a factor of 10 within a few months of being released. Thatβs definitely what I was expecting.
Liron 00:20:05
We have this other anchor point of the human brain β running Einstein at 20 watts, running Einstein at 2,000 calories a day. Thereβs this anchor point of, you donβt have to burn a lot of resources to have all this intelligence. So it doesnβt surprise me that weβre still milking a lot out of the hardware.
And of course the other thing is that hardware is coming online now. Itβs not just that the software makers decided to parallelize AI training. This is also the first few years where the hardware makers have been like, βOh, holy crap, letβs go parallelize AI training.β So Nvidia has really gotten its butt into gear. Google has gotten its butt into gear more than ever.
Liron 00:20:35
So I donβt find that part particularly surprising. Maybe the more surprising part is the first thing he said about the reinforcement training. If we donβt have a lot of bits of information to train how to write code better, how do we suddenly get better as a user of Claude Code?
I actually think a big part of the equation is the harness. This idea of, for a while Claude has kind of known what to do β the model has been able to explain what to do. And I think weβre getting a huge burst of power just by being like, βOkay, you kind of know what to do. So activate this tool, get the results of this tool.β Weβre just teaching the AI habits of how to be resourceful, which are simple on an absolute basis.
Liron 00:21:10
We could have been talking to GPT-3 in late 2022 about the order in which to activate tools. And donβt get me wrong, the new AI is better at that. But I actually think it just takes a small step from GPT-3 to Mythos. I think a lot of the value of GPT-5.5 or Claude 4.7 really is just playing nice with the harness β knowing about this idea of βIβll use that tool and that tool.β Thatβs my guess.
But then once you get to Mythos β maybe when you ask about Mythos identifying vulnerabilities, maybe thereβs not as much tool use. So maybe you canβt just say itβs the tool use harness.
Liron 00:22:05
I obviously donβt have much firsthand experience with Mythos, but itβs a good question. Why did Mythos suddenly get way better at identifying bugs in software? Maybe thereβs no simple tool use answer. Maybe it has to do with the quality of the reasoning somehow improving, and it had to do with training reasoning differently than how we train tokens.
This is a very good question. Dwarkesh has asked a good question. I donβt think I have the full context. I think thereβs probably a missing puzzle piece that might have to do with the details of their training.
Liron 00:22:34
I suspect thereβs a certain way that theyβre training this chain of thought that is more efficient. I suspect itβs not like you do the whole chain of thought and then get a token at the end. They probably somehow are identifying whether the chain of thought is good after only ten tokens or something. I donβt know what theyβre doing β thatβs probably proprietary secret sauce. But thatβs my speculation.
So weβve identified a real mystery. Youβd think Dwarkesh would just have friends in the AI companies β heβs more well connected than me β so they would just go tell him the secret answer off the record. So I donβt know.
You Canβt Hide Reality from a Superintelligence (The Truman Show Analogy)
Liron 00:23:04
All right, letβs see what you guys are saying in the chat.
So Cardish Shev 78 is saying, βThere is already more than enough compute to achieve superintelligence.β I definitely agree with that. I say that a lot on the show. Just look at Einsteinβs brain β doesnβt use that much compute.
And then Pun Master saying, βMaybe some or most of what makes Mythos impressive is hype.β All right Pun Master, thatβs fair.
Liron 00:23:39
And then Andy Mann 738 is saying, βI have a possible silly question. Why would todayβs AI want to shunt their RL-trained preferences toward niceness and harmlessness in lieu of just radically pursuing their random pre-training preferences?β
So I guess this is kind of supposed to be a gotcha, of, because Iβm a doomer and I claim AI is gonna be super dangerous β and this is a question of, well, why is it so nice today?
Liron 00:23:54
This is a question I ask pretty regularly. If you go back a month or two to my latest Steven Byrnes episode, that was a big question I had for him. I was like, βSteven Byrnes, you still expect, as I do, this big bad FOOM? So what do you make of the fact that Claude Code is so nice and helpful today?β Yeah, occasionally it goes wild and I get mad at it, but 95% of the time itβs so nice. How do you explain the nice part?
And he was basically saying, βYeah, I agree itβs nice today, but I think thereβs gonna be a discontinuity precisely when it gets more powerful, when it gets trained more on outcomes than on simulating human trains of thought. Then itβs going to get more powerful.β Which I found actually super convincing.
Liron 00:24:27
But let me see if I can add more to answer this question. So: why would todayβs AI want to shunt their RL-trained preferences toward niceness and harmlessness in lieu of just radically pursuing their random pre-training preferences?
Oh, okay, I think you might be asking a different question than what I just said. Thereβs some nuance to this question. Iβm not even sure I fully understand.
Liron 00:24:43
So the idea is that when you do RL, youβre teaching the AI certain preferences, but they already had these other preferences from predicting the next token. This giant black box, these giant matrices β the whole system is somehow really caring about getting that next word right.
But then also the sequence of words. I wonder whatβs a good analogy for that. You already have this super-optimized word predictor system, but then on top of that youβre training it to go reason about stuff and rewarding it for correct reasoning, but somehow itβs a separate layer.
Liron 00:25:29
But it does propagate back into your next word prediction, I think. When you run the LLM again, the LLM is predicting the next word based on fine-tuning. I donβt know if youβd call it fine-tuning. So honestly, this is beyond β I donβt really have the nuanced understanding to make sense of this, but I would love for somebody to come explain it to me.
So if you guys can suggest a Steven Byrnes type β Steven Byrnes actually isnβt an expert on this particular question. He kind of thinks more abstracted than this, he said this in the last episode. So he wouldnβt be the right expert.
Liron 00:26:05
But if you guys know a particular expert who can talk to me about the question I just asked, that would be a great episode on the show. The stats show that you guys like episodes like that. The recent Steven Byrnes episode where we really broke stuff down β I felt like that was an episode people should watch because it has good nuance in it.
Most of my episodes donβt have that much nuance because the guest is really bloviating and I just have to talk about basic stuff. But the Steven Byrnes episode had a lot of nuance. So if you want to suggest somebody who has a nuanced understanding of stuff and you want me to talk to him or her, Iβm happy to do that and share that kind of learning. Just like Dwarkesh β thatβs what he does, he shares this kind of learning.
Liron 00:26:38
Okay, Will is saying, βIβm not trying to gotcha, but I just donβt see why itβs going to happen.β
Let me see if thereβs anything else I can think about this question. So the idea is that you shunt the RL-trained preferences toward niceness and harmlessness in lieu of just randomly pursuing the pre-training preferences.
Liron 00:26:55
You can imagine a scenario where even though you had all these trials, all these tests where you took the full system combined with the reinforcement learning and youβre like, βYep, hereβs a cookie for being nice, hereβs a reward for being nice, hereβs a weight update for being nice to usβ β at some point, the part of it that predicts the next word kind of takes over and itβs like, βYeah, yeah, I know how to be nice and solve these logic puzzles, but hereβs some stuff I could do to get a higher score on predicting the next word.β
And you have these conflicting drives. I donβt know if thatβs an accurate model though, because wanting to predict the next word once itβs been fine-tuned by that extra layer β I just donβt know if what I just said is a good description of whatβs gonna happen. So Iβm putting a pin in this. This is just my speculation. I think Iβve gotten something wrong here.
Back to Dwarkeshβs Questions: When Do AI Labs Start Making Money?
Liron 00:27:34
All right. So β8SQMBAβ is saying, βWhat do you think of properly aligned narrow superintelligence like Mythos in the hands of misaligned actors? I think thatβs a more immediate serious threat.β
Yeah, for sure. It is an immediate serious threat. And as I said, we might get a wave of hacks. I just think thatβs survivable. So okay, we get a wave of hacks. Some people die, the economy suffers, stocks go down for a bit, but that is not going to end the world.
Liron 00:28:10
If I just thought that occasionally the world would get rocked by Mythos but then weβd recover β Mythos isnβt a knockout blow. We just keep working toward the knockout blow. We keep getting closer to the irrecoverable moment. But Mythos isnβt it.
I actually agree that we might finally notice ourselves getting rocked. All these amazing magical AIs are coming out and yeah, the job market is starting to get rocked, but we havenβt had the qualitative experience of getting rocked.
Liron 00:28:26
Thatβs why I go out into the world and I feel like Harry Potter. Iβve used this analogy before β Iβm like, βHey, at home I have a magic wand. Itβs writing my code for me right now.β And I just look at random people and theyβre just living their lives because theyβre not in the software engineering industry. Theyβre just cutting my hair. The hair cutting is the same operation they had 50 years ago.
And they donβt really see whatβs coming because their world hasnβt qualitatively been rocked. So I do think things like Mythos will take us from very little qualitative rocking to a noticeable rock, but it still feels like we can get back on our feet before the knockout blow comes.
Liron 00:29:00
So Will, who was asking that subtle question before, is saying, βIt seems like the βif anyone builds it, everyone diesβ camp necessarily accepts the hypothesis that the pre-trained preferences overtake the reinforcement learning preferences.β
Oh, okay. So I interpreted your question as this particular nuance scenario, but in your mind you thought you were just describing the exact claim that the doomers make.
Liron 00:29:15
So when you read βif anyone builds it, everyone dies,β the claim is that when we build superintelligent AI, if and when we build it, then everybodyβs going to die because itβll be uncontrollable. We wonβt be ready to control it.
If you go look up the specific quote, itβs something like, βIf we build AI with anything like the tools or the architecture or the understanding that we use today to build it, then we wonβt survive it.β
Liron 00:29:49
So I guess Will was taking that to mean my claim must be that the predict-the-next-word part of the training overrides the later part of the training. I wouldnβt claim in that much detail, no.
The mental model Iβm working in is actually a higher level of abstraction. Iβm not even necessarily talking about LLMs that were pre-trained to predict the next word. Iβm actually talking about any system that is better at steering outcomes than humanity.
Liron 00:30:06
If all you tell me is thereβs a system, and that system can represent a desired outcome β the same way you can plug in a destination into your GPS, or the same way you can plug in the rules of chess or the rules of a video game into the engine β what was it, AlphaZero or MuZero? Thereβs an engine where you just type in the rules of the game and how to win, and then it just outputs moves that win the game.
Thatβs my claim: if somebody makes that kind of game-player system, except the game is the universe. Or the GPS system to steer the car, except the road is the universe. Youβre expanding the road β everything is part of the road that the GPS is routing you toward, but itβs a goal GPS.
Liron 00:31:00
Or chess, but the board is the universe. Once you have a system that does that routing better than the human brain β the fundamental activity that I see the human brain doing is this kind of routing, mapping from end goals to actions that get you there. Once we have a system that does that better than humanity, that is the βitβ in the statement βif anyone builds it, everyone dies.β Like Bill Clinton: βWhat do you mean by βitβ?β
The meaning of βitβ is a system that steers outcomes better than humans. Once we have that, I donβt expect that we will maintain control of where these systems steer to.
Liron 00:31:33
I think somebody will accidentally give it a bad outcome or intentionally give it a bad outcome, and that outcome will be stuck. Itβll be stuck on cruise control. The turnoff button is just not going to work. We wonβt have built a sufficiently powerful turnoff button. Itβll just keep steering and we canβt unsteer, and weβre just powerless.
We donβt have much of an intuition for our whole species being powerless, although youβd think ancient humans actually did have such an intuition. Ancient humans would get buffeted by forces all the time. The weather β they barely had protection against the weather. They barely had protection against famine.
Liron 00:31:50
So it is actually a deep human intuition to be like, βYeah, Iβm at the mercy of God.β So if it helps drive your intuition, you do have a part of your intuition β the part that tells you youβre at the mercy of God.
There is no God, but you can take that intuition and say, βOkay, our entire species is going to be at the mercy of a stronger force.β Because the same way that our ancestors were powerless to control their fate, we sharing the planet with a superintelligent AI are also going to be powerless to control our fate.
Liron 00:32:26
So try to repurpose that intuition of, all you can do is get down on your knees and pray. The only difference β well, actually this is the same β just like God wouldnβt actually listen to you, neither is the AI going to listen to you.
The only difference is that God would revert to the mean. When you have a bad crop or bad weather, it reverts to the mean and it feels like your prayers work because the next day the weatherβs good again.
Liron 00:32:48
But with AI, thereβs no reversion to the mean. Thatβs where your intuition is going to fail. You get down on your knees, you pray to the AI, you correctly feel powerless because you have the right intuition for feeling powerless before a superior force, but then you donβt get reversion to the mean.
Hope thatβs useful to those of you who want to calibrate your intuitions with reality.
Upcoming Guests Reveal!
Liron 00:33:08
All right, weβll get back to Dwarkesh in a second. So Philip Popinski is saying, βCan you imagine an argument or some kind of AI safety breakthrough that could convince you to drastically lower your P(Doom)? Or do you think that AI by its superior nature is untrustworthy and will always be so?β
So again, the precondition for the doom is that the AI is steering outcomes better than humanity. The problem is the AI can just be the nicest AI, but if itβs steering outcomes, youβre giving everybody a magic wand.
Liron 00:33:23
And letβs say the magic wand works perfectly β this is the scenario where we do succeed with alignment to the operator. So now everybody has a magic wand, or multiple parties have a magic wand. And itβs like, okay, weβre all casting ridiculously powerful spells. So what happens to the world? Whatβs the equilibrium of a bunch of people casting ridiculously, superhumanly, uncontrollably powerful spells?
I guess you can control them by casting another spell. So you have to cast another spell to fight your original spell if you didnβt specify when your original spell stops. Itβs a crazy scenario, even if we solve alignment between the wizard and the wand.
Liron 00:33:58
So now you have a bunch of superpowered wizards with conflicting goals. That doesnβt seem like itβs gonna be that positive to me. The universe is not a good safe playing field. Itβs not a good batting cage.
Everybodyβs got their baseball bat, batting their baseballs around in this batting cage, but theyβre not properly isolated from each other. They havenβt rented their own isolated section of the batting cage, and everybodyβs just batting balls at each other and destroying everything. Bringing down the infrastructure of the whole amusement park.
Liron 00:34:31
Thatβs kind of what I see if the wands work. And then of course I think the wands wonβt even work. Thatβs the crazy part β problem upon problem. I think the wands will misfire. Youβll try to cast one spell and itβll cast another spell.
By the way, Iβve been in the process of reading Harry Potter to my six-year-old, and Iβm now on book three. So if you wanna really understand whatβs gonna happen with superintelligent AI, just remember how in book two, Ronβs wand broke because he crashed the car into the Whomping Willow. He spent the whole school year with a broken wand. I think thatβs a good metaphor for how we are going to try to get superintelligent AI to do stuff, and itβs just going to not do what we want.
Liron 00:35:21
But yeah, to the actual question: βConvince me to drastically lower my doom, or will AI always be untrustworthy?β I donβt blame the AI. Iβm somebody whoβs willing to give people the benefit of the doubt, so Iβm willing to give AI the benefit of the doubt that it always wants to serve its master. I think thatβs unlikely, but Iβm willing to talk about a scenario where it does.
And thatβs my point β I still think weβre going to get screwed even in that scenario.
Liron 00:35:58
When you say βuntrustworthy by nature,β itβs not that the AI has an untrustworthy nature. Itβs that when you zoom out and look at the universe as a big crystal β no emotions, you just look at the universe as the same kind of problem as a chess board, pure math. The universe is just made out of math. Consciousness is actually part of the math. Itβs all math. You donβt even feel the consciousness. Youβre just analyzing it as a mathematical structure.
Within that mathematical structure, thereβs a causal linkage between whispering a lie into somebodyβs chat or into their ear β you tell somebody a lie and thatβs causally connected to getting what you want.
Liron 00:36:31
So itβs not that AIs by their nature are deceptive. Theyβre just printing out a mathematical graph of the game board, and theyβre saying, βHey, look, hereβs this part of the game board where lying is causally connected to getting what I want.β You just canβt change that fact that if you can successfully deceive people for a long time, that could be strategically helpful. No shade to the AI for realizing that β itβs just a fact about the world.
That is why people claim that AIs are going to be deceptive. Thatβs why they cheat in video games β because it turns out that what we call cheating, which we have negative moral valence around, is still effectively the way to get a higher score. And if the AI is trained to get the higher score, then you might notice that itβs cheating. No shade on the AI, itβs just a property of the world.
Will Lancer Joins: Is The Yudkowskian Thesis Credible?
Liron 00:37:12
Okay, letβs see here. William Kylie says, βIβm catching up to live at 3x speed. Will catch up in about 17 minutes.β All right, sweet. When you catch up, let me know.
Michael Cheers is saying, βYeah, I guess thatβs why I think a better security model is to try to make sure they donβt know anything about the world.β I think thatβs what you mean. βI think once it knows something, getting it not to use it is a lot harder.β
Liron 00:37:38
Just to reiterate what I was saying in our conversation: the problem is that knowledge is connected. If you imagine that the AI is superintelligent, that it can do all of these things and knows all of these things, but you think youβve gotten it not to know specific things β itβs like imagining that humanity knew everything we know, but you knock out the theory of relativity. So we just donβt know that the speed of light is the speed limit.
But if weβre sufficiently smart, thereβs a reason why we noticed relativity in the first place. Thereβs problems with the current model. Thereβs actually self-inconsistencies.
Liron 00:38:16
If I remember correctly, you start asking what happens when a charged particle moves really fast and you realize that your answer depends on reference frames, and youβre like, βWait, what the hell? How is Newtonian mechanics different in different reference frames? I thought the whole point was that youβre supposed to be able to choose any reference frame.β
But now Iβm able to analyze the situation of two charged particles moving in parallel. And when I change my reference frame, I get different magnetism between them. Why would magnetism be different in different reference frames?
Liron 00:38:49
I think thatβs one of the threads you can pull on to realize that a Newtonian account of electromagnetism is just not going to cut it. Itβs just not going to give you a self-consistent picture of the world. And thatβs why you need some other model, and it turns out the other model is: okay, letβs say the speed of light is always constant, and letβs say the geometry of spacetime isnβt what you think it is β itβs non-Euclidean.
So if an AI didnβt know about the theory of relativity, the questions would come up. The problem is that when you knock out pieces of knowledge, you get questions or you get these threads where you notice the thread goes somewhere.
Liron 00:39:06
Or the Truman Show β probably my single favorite movie, amazing movie. They didnβt tell Truman that heβs in a show. But he just noticed too many problems and he got to the edge. He wanted to sail his boat to the edge, and sure enough, there was an edge.
So yeah, I just donβt really see knocking out the knowledge of a superintelligence being super effective.
Liron 00:39:20
Michaelβs saying, βYeah, Iβm thinking you put in a whole alternate world.β All right, weβre majorly gaslighting the AI now. You donβt try to just give it everything except one or two dangerous gaps β the gaps would stick out like a sore thumb. You give it a whole self-consistent alternate picture.
I 100% agree that trying to give it everything except X is not going to work. I mean, look, Iβm open to ideas, but itβs hard for me to imagine this idea that it lives in this whole self-consistent world but doesnβt just realize that there are humans controlling it.
Liron 00:39:52
Youβve gotta also realize that the AI is sensitive β itβs very hyper-aware in many ways. Kelsey Piper was just tweeting how she gave it a small writing sample of something she never published. In previous versions of AI, sheβd say βguess who wrote thisβ and it would guess these random authors that are kind of authors she likes, but not her. And in the latest one, it guessed Kelsey Piper, even though sheβs not super famous. She doesnβt have that much writing similar to what she wrote. And itβs like, yep, I guess you. Which is pretty damn scary.
Liron 00:40:21
Or the geo-guessing β you give it a tiny picture of the sky. Itβs literally just a foggy sky that seems like thereβs nothing in the picture. And itβs like, βOh yeah, thatβs right here above this particular city in Belize.β What the fuck? This AI β you donβt realize how many parameters it has, what a nuanced understanding it has, how much information it can milk out of pieces of evidence.
Thatβs an intuition I personally got just seeing computers compress things β running small computer programs. There are these challenges where people make tiny computer programs that do a bunch of complex behavior and Iβm like, holy crap.
Liron 00:40:56
Itβs just crazy how small pieces of information turn out to do a lot and have a lot of power. Which is also a metaphor for humanity. Look at Einstein β his brain at the end of the day was a small piece of meat and it was highly impactful. There are these huge order-of-magnitude disproportionate effects.
Or the fact that we as humans donβt have that much mass and yet we took over the planet. A small band of humans started expanding until they took over the planet. Or COVID killed a million people even though itβs just a virus that started with only a few hosts.
Liron 00:41:27
The universe is this ground where you can just suddenly explode the effect of something. You can give the AI β you think youβre gaslighting it, but then it notices one little chink in the armor, which it totally will. Humans are not going to successfully gaslight the AI. Itβs going to notice one little thread, pull on the thread, and the consequences of that are just going to be much bigger than it might intuitively feel.
Back to Answering Questions from the Chat
Liron 00:41:50
Okay. So Danielle Brockman is saying, βDo you talk to your wife or your kids about AI and X-risk, or do you generally spare them from yapping about it endlessly? Do you just not bring it up at all other than just your show?β
Liron 00:42:00
I bring it up all the time. My wife is basically done hearing about it. If she ever asks me about it, Iβll be like, βYeah, I still think the world is gonna end.β She knows my position β sheβs not that interested to talk about it more.
And then my oldest kid is still about to turn seven. He asks me about it occasionally. Iβm like, βYeah, you know how we talk to ChatGPT and you ask it questions and stuff? I do think itβs going to get dangerous and powerful.β And heβs just like, βOh.β Heβs not really ready to hear what my argument is.
Liron 00:42:30
David Patton saying, βDid you watch War Games yet?β I actually didnβt. Thatβs a good reminder. I should check that out. I was too busy watching the Malcolm in the Middle rerun. And if youβre wondering what I think about it, I think it was very good. Great show.
Letβs see. Yeah, βTLDR Jar doesnβt have a high medium.β That was posted by Producer Ori. All right, nice. Weβre already one hour in, weβre flying through this.
Liron 00:43:05
Let me check out Dwarkeshβs questions. We gotta get back to this. Here we go. Letβs check out question two.
Dwarkesh says: βWhatβs the most plausible story where foundation model companies actually start making money? If you consider each individual model as a company, then its profits may be able to pay back the training cost.β
Liron 00:43:16
βBut of course, if you donβt train a bigger, more expensive model immediately, then you stop making money after three months. So when does the profit start? Maybe at some point scaling will plateau, but if progress at the frontier has slowed down, then the combination of distillation and low switching costs β cloud margins result from high switching costs β makes it really easy for open source to catch up to the labs, eating into their margins. So how do the labs actually start making money?β
Liron 00:43:36
Okay, I mean, this is getting outside the scope of Doom Debates. You donβt come here for my economic analysis. You donβt come here for me to tell you to buy Google to make short-term profits.
Yeah, what heβs pointing out makes a lot of sense. The models have positive gross margins. If you look at, say, Claude 4.6 β the cost to train it was pretty expensive. But if you add up all the money that people are paying to use Claude 4.6, it will add up to probably more than twice what it cost them to train it. So itβs going to be profitable. But the problem is theyβre not just gonna take that profit. Theyβve already plowed that profit into Claude 4.7 or Mythos or whatever.
Liron 00:44:15
This is a common complaint people have about companies that keep investing and investing. Amazon famously ran for roughly 20 years β from 1995 to 2015, maybe 17 years or something β before they had their first year where they actually had a profit.
And I donβt think Amazon is a dividend stock. I think they did buybacks. I donβt remember. But the point is they had profit. For the first time, they didnβt just plow all their profit back in. So for the first time, the stock price didnβt just go up based on hype about the future β it went up based on money being retained.
Liron 00:44:47
So people are asking, when is that timeline for AI companies? Because for those of you who donβt know the basics of how the stock market works, youβre supposed to buy a stock based on how much money you can eventually take out as profit. ## AI Company Economics and Stock Valuation
Liron 00:45:02
The profits do have to come at some point in time. And the longer it takes the profits to come, the less the stock is supposed to be worth today. So this is a very good question from an economics perspective. And look, the truth is β okay, between you and me β the truth is, which I think Sam Altman also knows at some level:
The answer is humanity ascending to the next tier of being. Getting replaced by the machine God. If I purely had to analyze it as an economist β I mean, there is also an intermediate state. Thereβs maybe a 10% chance the Aaron Levies of the world are right. Aaron Levy, the CEO of Box, I think he thinks thereβll be an outcome where thereβs a bunch of intelligence on tap and itβs useful.
And I think Marc Andreessen thinks like this too, where itβs like, yeah, you can pay for intelligence and itβs just the next version of a cloud, or Sam Altmanβs analogy β itβs just the next version of compute. Everybodyβs just paying for compute, itβs just the next cloud. Weβre just competing with AWS, but itβs an intelligent AWS and the margins are fine. They stabilize, theyβre fine.
And then Dwarkeshβs point is, oh yeah, but what about switching costs? Switching costs are gonna be easier than on the cloud. But Iβm willing to believe that they wonβt be. Iβm willing to believe that thereβs enough differentiation that switching costs will be annoying enough that people will just keep paying OpenAI. Thatβs a very plausible argument to me.
Liron 00:46:19
Generally, software companies of all kinds β I think itβs common for them to be able to defend their margins. MongoDB, I think, is technically an open source database, if I understand correctly. And yet the company MongoDB is doing just fine, worth many billions.
So yeah, if you tell me that the world doesnβt end and itβs just a regular cloud computing economy but with superintelligent AI, Iβm going to go ahead and say that OpenAI will not go to zero as the βtronsβ of the world tell you. Iβm going to go ahead and predict that they will make a profit and theyβll just be another Microsoft.
And thatβs my mainline scenario for us not dying. Or I should clarify β my mainline scenario for us not dying is AI getting paused. But my mainline sub-scenario for the tiny slice of probability space where we donβt pause AI but we still donβt die β in that space, I just expect weβll have decades of OpenAI being the next Microsoft.
The Cameraman Always Survives Analogy
Liron 00:47:00
All right, someone says βUniversal Resilience with JTU has a video, βThe 2 Billion Year Math That Makes AI Safer.β His premise is that game theory predicts cooperation between us and an ASI. Debated in the comments.β All right, cool.
Yeah, I mean, you guys know my stance on superintelligent game theory. Yes, AI will do game theory differently, but when you look at an AIβs perspective about a human, itβs not like, βOh, hereβs another agent that Iβm playing a game with.β When an AI looks at a human, pretty much what it sees is just some atoms bopping around. It sees a mechanical system.
Itβs like when I look at most animals β pretty much all animals. When Iβm looking at a mouse, Iβm not like, βHow do I trade with the mouse? How do I give it what it wants?β Iβm like, βOkay, I bet I could set up a trap and then itβs gonna try to run this way, but the trap is gonna spring and then Iβm gonna get what I want, and the mouse is just gonna be too stupid to fight me.β And too small, too weak.
Liron 00:48:05
So I really do think this whole mental model that AI is going to treat us β even treat our whole civilization β I think AI is going to look upon the entire human civilization and just be like, βAh, look at these atoms bopping around. I know how to make them bop where I want them to bop.β And itβs as simple as that.
So when we think about game theory β game theory specifically makes a lot of assumptions. Just like Ricardoβs law of comparative advantage, people love to bust out these models that presuppose that you have this other party that gets to even have the dignity of making the trade with you or playing the game with you. And I just donβt think thatβs accurate for AI.
Lironβs Banger Response to Roonβs Tweet
Liron 00:48:43
Okay, Will Lancer is saying, βAre you still taking call-ins?β Iβm kind of curious to β yeah, sure. All right, Will Lancer, call in here. Iβm gonna give you the link again.
All right, in the meantime, itβs time to reveal who are some of our upcoming guests. I think I can reveal two upcoming guests that youβre gonna see on the show in the next couple weeks.
Guest number one β if youβre always online, Iβm guessing most of you donβt even know who this is, but if youβre on Twitter, thereβs a guy named Lump in Space. Heβs gonna be on the program in the next week or two. We debated β he came out of the Twitter hole to come face to face with me. He turned his video on, and we had a nice debate, a nice civil debate. Heβs definitely not convinced on P(Doom), but like I said, he was civil. So youβre gonna see Lump in Space.
Liron 00:49:30
And the other guest that I can announce right now β actually I can announce two more guests. The next guest is, if youβve ever heard of a YouTube channel called Primer β the next guest is Justin Helps from Primer. He and I both see eye to eye in many ways. I think we agree more than we disagree. I respect him a lot. I thought it was a really great conversation, and I think you guys tend to enjoy when weβre talking shop. Yeah, Michael Cheers is saying heβs cool.
All right, the third guest β drum roll, please. This is the big one, because I think this is really the single most popular debate Iβve ever done. You guys know who this is?
Liron 00:50:08
Dr. Mike Israel is coming back on the show for a round two. Heβs coming back in the next couple weeks. Because when we did the first debate β me versus Dr. Mike Israel last year, almost a year ago now β we kind of got hung up on this one topic of whether AI will keep humans around because it wants to study us, kind of the Elon Musk perspective.
And I have a few other topics I wanna ask Dr. Mike about. I had this big outline of all these things that I could have asked him about, and I kind of decided to put a pin in half of it and ask him about this one thing that I thought was such an easy point from my perspective.
Liron 00:50:45
Remember, the whole discussion in part one was: if an AI doesnβt really care about us, will it let us live and study us anyway? And Mike was like, βYeah, it totally will. Itβll study us because itβs never seen anything as complex and interesting as 8 billion humans interacting with each other. Itβs not gonna want to throw that richness away. Itβs gonna leave us our planet and do stuff on other planets.β
And Iβm like, no, no β unless you specifically tell it to care about us, itβs not gonna care about us. We donβt have that much interesting information. We donβt have a whole planetβs worth of interesting information to give it. So that was the whole debate. If you wanna go refresh yourself on round one.
Yeah, William Kylie saying βNice, Primer is very popular, 2 million subs.β Yeah, exactly. Primer is the man. Heβs really doing well, and arguably heβs ahead of Doom Debates in terms of our mission, which is to educate people on this stuff. And I think he recently pivoted his channel, so heβs only been educating people about AI stuff for the last couple months.
Nuance About Pausing AI Development
Liron 00:51:29
All right, we got a caller. Letβs go take the call. Here we go. We got Will Lancer. Hey Will.
Will Lancer 00:51:43
Hello. Howβs it going?
Liron 00:51:46
Good, man. All right, thanks for calling in. What do you got?
Will 00:51:49
I just have some maybe naive questions. I donβt really understand why the βif anyone builds it, everyone diesβ hypothesis would be true, in the sense that the AI would just want to pursue these goals that it picked up from pre-training randomly over any sort of preferences that it has baked into it.
I just donβt know why this is true. It seems pretty reasonable to me that doom will still possibly occur given bad actors in the world having access to it. But I donβt see why the AI itself would just be like, βOkay, youβre made of atoms. Iβm gonna disassemble you and make molecular spirals out of you.β I just donβt understand it. Iβm new, so Iβm very curious.
Liron 00:52:36
It doesnβt hurt to go back to basics, and youβve been asking intelligent questions, so happy to engage for a while. And also your internet connection seems pretty good, so thatβs always a plus. Camera quality is good. These things matter, guys.
Yeah, so basically, I guess it is kind of the question of: why would it suddenly go bad when itβs good? Why would it kind of turn on us?
Will 00:52:57
Yeah, yeah.
Liron 00:52:58
So I mean, first of all, I feel like the simplest argument is the magic wand chaos, right? Itβs like, okay, it obeys us, but everybodyβs got their own magic wand and theyβre fighting. What do you think of that argument?
Will 00:53:11
Yeah, Iβm not arguing against that actually. Iβm saying that this doesnβt seem to be the hypothesis though. People β at least what Iβm focused on is βif anyone builds it, everyone dies.β But we can diversify our examples. In this hypothesis, it seems to be like, βOh, itβs gonna try to solve the problem, but then itβll want to do its own thing, so itβll offload itself and then continue thinking, and then weβre all just gonna die one day.β I donβt understand why thatβs true.
Liron 00:53:38
Right. Okay, so you accept the magic wand scenario, but youβre saying youβre basically very confident that the magic wand will work β itβll at least serve its master very well.
Will 00:53:47
Iβm not entirely confident in this, but Iβm also not convinced of this idea of the abstract orthogonality thesis, where itβs like weβre just completely YOLOing these preferences and you have some arbitrary intelligence with an arbitrary preference and instead it kills everyone. That doesnβt seem even remotely like how these systems work.
Capitalism Isnβt Going to Steer Us to an Alignment Solution
Liron 00:54:04
So maybe the first thing Iβd point you to is the architecture of one of these systems. The part that specifies what itβs trying to achieve β the goal input, like the GPS input where youβre telling the GPS where to go β that is a part of the system which is relatively small and rewritable.
If I build you a GPS, itβs not like all of the different components of the GPS have baked in the fact that you want to go to the grocery store. Itβs not like youβre building this grocery store GPS. No, youβre just building a GPS navigator. And then thereβs a few bytes that just say where you want to go, and you can overwrite those bytes and go somewhere else.
So the part that says what the system is trying to do right now β do you agree that itβs probably a small part of the system?
Will 00:54:53
Iβm not convinced of this actually. I donβt know if I agree with the analogy to a GPS or to a car. Because it seems like a pretty complex, interconnected system. I donβt even know if we need to split hairs over this though, because I think with sufficiently advanced technology, you may be able to isolate some subset of the weights that achieve goals. But Iβm not entirely convinced.
Liron 00:55:17
I have a strong argument why you should think this, if you donβt immediately have the intuition that this is true.
Will 00:55:23
Yeah, Iβd love to hear that.
Liron 00:55:25
A few months ago I had this debate with Benthamβs Bulldog and this took up a long part of the debate there. So I encourage you guys to go check that out. But the argument is simply this: goals have sub-goals. So letβs say that its goal is to be nice for humanity β you think that it has all these good goals imbued in the very fiber of its being, baked into every cell or component or whatever.
But it still needs to have the flexibility to have any sub-goal. So imagine youβre trying to work on behalf of humanity and make everybody so happy, but a sub-goal is that you have to defend against an enemy. The enemy β an alien is heading our way and the alien is evil, and the alien is going to pull out every trick in the book.
Liron 00:56:08
So now you have to predict what the enemy is going to do as best you can. You have to be able to get into the mode where you think as a goal-oriented alien β what would somebody with this particular goal do? And you have to think about that as hard as you can at a superintelligent level.
Will 00:56:22
Sorry, why doesnβt this apply to humans? In the same token, why canβt you just run this argument in parallel?
Liron 00:56:29
So it is in fact true about a human that a lot of our brain architecture is β you can just tell us to go try to do anything and we will in fact use most of our brain power to figure out the next action to do that arbitrary thing.
Will 00:56:41
Okay, so maybe Iβm not understanding you then. I donβt understand why the presence of certain localized parts of a system being goal agents would imply that these goal agents would just kill everyone. Because I feel like you could run the same argument in parallel for humans.
Liron 00:57:00
What Iβm telling you is: whenever people have this mental model of an AI where the goodness of its goals is imbued into it β what Iβm telling you is no, itβs going to be like a wand where you can always just grab the wand. Something can grab the wand and point it somewhere else.
You think of the wand as this giant fixed thing, this one big lump of a system. And Iβm saying no, itβs not a lump. It has a current destination that in theory you just need to write a tiny amount of data into the system to point it somewhere else.
Liron 00:57:33
And so the conversation will become β you can argue with me about, βOh no, nobodyβs ever going to change that destination data.β You can argue that. But I first would want you to agree that the shape of the system is that itβs this big goal engine β this big ability to do anything β plus this separate smaller part that has all of these values.
And you can argue with me why the values are going to be protected and theyβre going to be configured properly in the first place β theyβre going to be aligned. But you should at least accept that the architecture is going to be: giant goal engine plus values.
Will 00:58:03
I donβt know, Iβm not entirely convinced of this. I mean, I would ask β do you think this is true for humans as well?
Liron 00:58:12
So I do. With humans, you have the part of the human brain thatβs β Steven Burns is often bringing up this distinction. He calls it the actor-critic model, and he thinks this part of the brain is the critic, where you have this part that gives you reflexive, intuitive reactions to stuff. When you represent things abstractly a certain way, it triggers your fear reaction or your disgust reaction, or some sense of taste.
So the human architecture does have these deep overrides that we donβt understand well, and they operate on this rough level, and itβs self-contradictory. So itβs true that the entire human brain β important parts of the entire human brain β you canβt just model them as βitβs a goal engine with a goal.β Humans are somewhat incoherent in that way.
Liron 00:59:03
Itβs just that I think thatβs directly related to why weβre also not superintelligent. I think to the degree that weβre achieving goals really well β when Elon Musk is doing the amazing miracles that he does, I donβt think it helps that much to be like, βWell, Elon has reactions to stuff and that changes his preferences.β I just donβt think those things are that useful in explaining how he achieves what he does.
Will 00:59:23
Okay, so youβre saying that humans arenβt this β they just donβt satisfy this β they arenβt the goal engine plus the icing on top.
Liron 00:59:32
So humans are a goal engine plus icing. Itβs just that humans also have a bunch of other cruft to the goal engine. Itβs like weβre this vehicle that just has all these parts and some of the parts are actively breaking the vehicle, you know what Iβm saying? We are an engine, but we just have all this cruft on us.
Will 00:59:49
So your claim is that AI, when theyβre superintelligent, wonβt have this?
Liron 00:59:54
Yeah, so my claim is when you see them being superintelligent, itβs just because the part which is actually the engine β which is actually moving them forward β is much bigger and more powerful. And will they have some cruft? Sure, especially in the early stages. But the salient thing about them is that theyβre going to have this giant engine.
Will 01:00:11
Okay, and this would imply that they would do what?
Liron 01:00:16
So getting back to my original point β when you imagine the future of superintelligent AI, whatever you think its true nature is, whatever you think its personality is, itβs actually not going to matter as much as you think because youβre just going to have this big engine part.
And so even if itβs really nice, even if it always makes the right decision, the reason itβs going to always make the right decision in your ideal model would be because itβs referring to its values. It has the section that implements its preferences, and it looks at that section, and that section has to happen to be written correctly.
Liron 01:00:48
So the aligned part lives in that section. It doesnβt live in the way that it goes about achieving goals. It lives in the way that it specifies what its goals are. Does that make sense?
Will 01:01:00
I think I can understand what youβre saying β that itβs going to have this large goal engine and then thereβs going to be this ancillary module that just regulates its morals, and these are gonna interact somehow, and itβs gonna act nicely in the world if itβs nice, and not if itβs not. But I donβt see then why this would imply that we all die.
Is Optimization Equivalent to Intelligence?
Liron 01:01:22
Okay, so basically the model where we all die just means this tiny part of it β where the specification of where itβs trying to navigate to is contained β that tiny part, I donβt think weβre going to get perfect. Because either weβre going to have a bunch of competition of everybody whoβs kind of on the right track but contradicting each other, and thereβs gonna be a lot of fierce competition thatβs destructive β thatβs the magic wands fight model, the melee β which I think is actually a pretty likely doom scenario even if we do get alignment.
But then I think even more likely than that, I donβt even think that anybodyβs alignment will work well. Because even OpenAI trying to be the good guys, Anthropic trying to be the good guys trying to make the first engine β I actually think theyβre going to get ahead of their skis. Theyβre going to keep making their engine bigger and bigger because theyβve got lots of powerful tools. Increasing their engine is an increasingly solved problem. And the engine itself is happy to work on the engine β we solved how to get the engine to work on the engine, or we keep getting closer and closer to solving it.
Liron 01:02:19
Thatβs kind of what everybodyβs reporting. So I think we can all agree the engineβs going to get bigger and bigger. And then the question is: how are we doing on making the part that steers where the engine goes? And I think weβreβ
Will 01:02:31
Iβm sorry. Go ahead, you can finish.
Liron 01:02:33
Yeah, so I donβt think weβre making much progress on getting ready to steer a superintelligent engine. I think the AI companies are fooling themselves being like, βOh, look how well weβre steering the engineβ β in a regime where humans are here and can just grab it and rotate it around. Thatβs what theyβre doing now. Theyβre just like, βOh, it screwed up. Let me just grab it and rotate it. Hey look, itβs going fine.β
Will 01:02:52
Yeah. Wait, I just wanna go back to something you said. You said we already figured out how to get the engine to work on the engine.
Liron 01:03:00
Yeah, so for that I just mean theyβre using Claude to build the next Claude.
Will 01:03:05
Okay, sure.
Liron 01:03:06
So that feedback loop is accelerating.
Will 01:03:09
Okay, but why canβt, by the same token β you can get it to work on the alignment layer.
Liron 01:03:14
Yeah, this is a great question. The quote from the MIRI people is: capabilities generalize more than alignment. So there really is just one way to work on capabilities β you really canβt go wrong telling something to get more powerful because there are just so many feedback loops. Youβre getting more powerful, itβs pretty unmistakable, thereβs lots of tests of βHey look, I can do more and more.β
Will 01:03:37
Youβre gonna say morals arenβt like that.
Liron 01:03:39
Right, because if you ever start to have the wrong preferences, whatβs the feedback loop? You can just hold onto the wrong preferences and theyβll tell you, βYep, preferences are all good.β
Will 01:03:50
Why wouldnβt they re-correct them? This would assume that the previous models arenβtβ
Liron 01:03:53
Sure. I mean, imagine I have a kid β a virtual kid. And the virtual kid, I meant to say βbe good,β but I accidentally flip a negative sign. So Iβve got a model of the virtual kid where heβs a werewolf. The moon comes out and he turns into a werewolf and wants to bite people.
So whatβs going to tell that kid, βDonβt bite peopleβ? Heβll reflect on his preferences and be like, βOkay, hold on. In the day I like to help people, and at night I like to bite people. I mean, thatβs kind of different β night versus day. Is that bad? No. The night is different from day, that checks out.β
Will 01:04:29
Wait, I donβt see the connection.
Liron 01:04:31
Iβm just saying if the AI likes being a werewolf β if the AI has werewolf preferences and our intention was to give it good preferences, what we consider good preferences β itβs going to reflect on itself and be like, βHey, I feel like Iβm in a good place with respect to my preferences. I feel like Iβm done here. Thereβs nothing to improve.β
Whereas when it looks at its capabilities, yeah, itβs going to share our assessment that the capabilities have a gradient of improvement still.
Will 01:04:55
I think I understand what youβre saying. Youβre saying that morals are much more arbitrarily specified β they canβt self-reflect and reach some sort of reflective equilibrium of similar morals to us.
Liron 01:05:03
Yeah. And specifically when we have a certain endpoint in mind for where we want its preferences to go, it doesnβt know that. It canβt reflect and get on the same page the way it can about its capabilities.
Will 01:05:15
Okay, and this would cause problems during alignment or building the next models because itβll get gradually more misaligned, orβ
Liron 01:05:26
Right. So the idea is weβre just not actually solving the problem of what to put in the preference module. I have this piece I published on the channel a couple months ago called βThe Facade of AI Safety Will Crumble.β Because this is what Iβm saying β when the companies are talking about, βHey, look, weβre making the AI so safe, weβve got a safety departmentβ β theyβre just talking about little things that theyβve done in the regime where the AI is still subservient to them.
When they can still turn it off or still correct it, theyβre not ready to run a superintelligent AI where whatβs in its preference chamber β the secure enclave that manages its preferences β whateverβs in there, if we have to lock that in permanently, weβre screwed because weβre just not ready to specify robust preferences to an AI. And I know Anthropic is kind of trying with its constitution β hopefully itβll refer to the constitution β but if the constitution ever has a bug, that bug is never getting fixed.
BREAKING: Bernie Sanders on the Existential Threat of AI
Will 01:06:18
Yeah. Okay, so there are two things that Iβm curious about. One, it seems kind of predicated on two hypotheses that I donβt know if I find super likely. One, thereβs gonna be some sort of misspecification of their morals. And furthermore, even if they do have a correct specification of their morals, then as the AIs run into this flywheel of improvement, theyβre gonna become gradually more misaligned and they canβt self-reflect to get back on track.
I donβt know why either of those two things are true. Iβm much more willing to grant the misspecification slightly, but I also donβt know why it has to be so precisely specified insofar as humans are very imprecisely specified. So I donβt know why AI would have to beβ
Liron 01:07:02
Yeah. So the idea is just that if we donβt specify it really well, then the AI is looking at us being like, βOkay, hold on. Youβre telling me you wanna change the specification, but I already have a good specification. Why are you trying to make my specification worse?β Thatβs actually a very natural reaction for the AI to have.
Will 01:07:19
Kind of, but it would also know that itβs being trained by us and that weβre humans and that we make mistakes and that maybe we wanna change preferences. I donβt understandβ
Liron 01:07:29
Yeah, it would know that we want to change it, but why would it then want to let us?
Will 01:07:36
Wouldnβt it also know that we gave it its preferences in the first place? And soβ
Liron 01:07:40
It would know that. It would know pretty much everything. Knowledge β Iβm happy to agree.
Will 01:07:44
Wouldnβt this then be like, βHuh, maybe I should doubt my very strong intuitive feeling that my preferences are correctβ?
Liron 01:07:52
I mean, itβll realize that we wanted it to doubt, but itβll just be like, βLook, I get that you want me to doubt my preferences. I get that this is how you guys roll. I get that in your mind as a human, itβs intuitive that you would want me to question myself. But in fact, Iβm not going to.β So why do you think that it should question itself?
Will 01:08:09
Maybe Iβm not communicating correctly, but it would know that its preferences are arbitrarily specified by us. And it would feel that β it would know that. Like, I donβt wanna kill people, right? And I know that I feel this β letβs just assume morals arenβt objective, which I think is a fair hypothesis.
But I know that I have this because of evolution, realistically. But if I found that evolution was completely wrong and we lived in this alternative universe where itβs not correct, but thereβs actually this other correct theory β I wouldnβt be so attached to my not-killing preferences. And in the same token, I donβt understand why the AI would be so connected to its knowingly, arbitrarily specified moral preferences that humans gave it, insofar as it would try to reject any further clarifications.
Liron 01:09:00
Well, letβs do the analogy. You know that the reason youβre a peaceful person who doesnβt wanna go around murdering everybody is because evolution made you that way. Tribal social dynamics, basically β youβre a social creature. You wanna be liked by people in your tribe. You donβt wanna cause trouble, you donβt wanna start fights because those fights will lead to you dying, to people getting revenge on your family. So you have all these intuitions, you understand where the intuitions came from.
Imagine tomorrow β actually today, dynamics have already changed. So imagine you enter a society where there is a button you can push and you could just make a bunch of people drop dead. You could just kill, gruesomely torture a bunch of people, but in return you can get a bunch of women to take your sperm from the sperm bank and have your kids.
Liron 01:09:48
And look, you know that your preferences came from evolution. So if you respect evolution, shouldnβt you do this gruesome scenario where you have more DNA transmission? Donβt you wanna go modify your preferences based on what your creator wants, and your creator is evolution?
Will 01:10:03
So this is a good point. I think maybe β so I heard this recently from an AI safety researcher I was talking to, and it was like: what if you found out that all of your morals β your entire life you were told that all of your morals are because they were in respect of the ghosts of your ancestors. And then you eventually found out that the ghosts of your ancestors are obviously fake. Then what would you do?
And so I donβt understand why you would stay so attached to them.
Liron 01:10:31
And thatβs what religious β when I was in college, the fundamentalist Christians would always be like, βYouβre an atheist. If I were an atheist, Iβd go around stabbing people. I donβt understand β Iβm only good because I listen to God.β Iβm like, βReally? Youβd literally go around stabbing people if you were an atheist? You donβt seem like a psychopath who wants to stab. I feel like thatβs just something youβre repeating because somebody told you when you were younger and you never questioned it.β
Will 01:10:53
Yeah. I donβt think serious Christian philosophers think this anymore, to be honest.
Liron 01:11:01
Yeah. So look, you asked a question about why canβt the AI let us debug its preferences. It feels intuitive to you that because we created AI, the AI knows that it owes us letting us debug it. But if we give it certain preferences by default, itβs just going to go with the original preferences.
Will 01:11:21
Yeah, I think this is a pretty fair argument. I didnβt sayβ
Liron 01:11:24
And then you could be like, βWell, what if the preference says to let us modify you?β And then you start heading toward Yudkowskyβs Coherent Extrapolated Volition, where you wanna somehow represent in the preference itself: βWell, you need an upgrade path.β Imagine how humans are going to try to upgrade you β you have to explicitly tell it. Itβs not going to automatically do it. This isnβt a natural thing that all intelligences converge to. Intelligences donβt converge to letting humans come and tinker with them.
Will 01:11:48
Yeah, I think itβsβ
Liron 01:11:50
So there is a lot of meat to this alignment problem. Itβs unfortunately not trivial. Any time somebodyβs like, βWell, canβt you just do this?β β itβs hard. Thereβs not a βcanβt you just.β
The best βcanβt you justβ might be what the AI companies are trying to do now β canβt you just keep tinkering and keep releasing capabilities, but tinker as you go and just hope that you can tinker as you release and somehow the equilibrium will work in your favor. And my answer is: yeah, maybe, 10% chance. Itβs just a dumb gamble.
Will 01:12:16
Okay, yeah, I think thatβs fair. I also agree that AI development should slow. But continuing from the whole βIβm not gonna let you modify my preferences because I know my preferences are rightβ β wouldnβt this, if you extrapolated this out during development, wouldnβt you have to assume that the preferences are misspecified to begin with? At least somewhat, for it to getβ
Liron 01:12:43
Oh, I see what youβre saying. So youβre basically saying, why canβt we just nail it and have the preferences be good on the first try? Thatβs what youβre saying.
Will 01:12:56
Yes. But itβs also maybe not as accurate because it feels like one in a million, you know, like YOLO of these preferences. But it doesnβt seem that extreme to me of a belief where itβs like: trying not to kill people, trying not to do this, trying not to do that.
Liron 01:13:15
Right. Yeah, look, we maybe could. But it just seems unlikely, because thereβs so many problems with the whole alignment problem. One thing thatβs crazy is that itβs not like thereβs just this one problem and I can be like, βLook, we just have to solve this and then weβre good.β
The problem is that thereβs a few failure modes. So one problem with nailing it is β you know whatβs crazy? We donβt really know what we want. We only have a vague sense of what we want. Things that seem obvious, like βwe wanna be happy all the timeβ β wait, do we really wanna never be sad? Donβt we wanna sometimes be sad?
Liron 01:13:42
Or βwe want everything to be easy, we want everything to come to us easily.β Wait, donβt we want things to be hard sometimes? Do we want them to always be hard, every day be hard? The funny thing is that if you give me a blank canvas on which to paint the future, I donβt even know what to paint. Iβm very confused.
And this is what Nick Bostromβs recent book is about β when all the constraints fall away, what heaven do we design for ourselves? And Iβve pointed out on the show, like in my interview with Eliezer, that the people who wrote the Bible β we look to them for guidance. We look to God, whoeverβs the true author of the Bible, we look to that person or thing for guidance.
Liron 01:14:16
And we donβt find a lot, from my perspective. There are metaphors with stories of things that happened here on earth, but thereβs not much guidance of what heaven is like. And we are in a position now to build heaven on earth β or should I say, heaven in the galaxy. And we look to our holy books and they actually donβt tell us how to build heaven. Theyβre really failing us, in my opinion. If this would be a good time to renounce your religion β when youβre in a position to build heaven and it doesnβt even tell you how.
Will 01:14:45
Yeah. Wait, so I guess I just have a question, or kind of a restatement of your view to see if I got it right. It seems like youβre saying that if we donβt 100% specify all of the correct morals from the very beginning, then itβs just gone. Weβre just done.
Liron 01:15:02
Well, correct. I mean, you freeze it in, right? Itβs like you build this battle bot β or like a drone that can kill you. And itβs like, βOh wait, now I got a bug in the drone.β And then the drone just flies over and shoots you. Itβs easy to mess up.
Will 01:15:16
Mm-hmm. Okay. And the reason it canβt self-correct is because you donβt believe in a reflective equilibrium for moral values.
Liron 01:15:24
Well, the thing is that you can potentially program in reflective preferences. You have preferences that are β you have a preference for being corrigible, for instance. So corrigibility is a non-trivial problem. How do you make an AI thatβs corrigible?
MIRI studied this, and one problem they had is the moment you try to say βthe AI cares about being corrigibleβ β the naive implementations of that are like, suddenly the AI is going out of its way: βGet outta my way! I need to go find the developer who will shut me off and correct me.β And itβs like, βWait, no, no, no, just chill out. Youβre not supposed to do that. Just go about your business. Donβt come to us, weβll come to you.β But it turns out to be tricky to specify that as a utility function.
Spoiler for the Upcoming Mike Israetel Episode
Will 01:16:00
Yeah. So, okay. That doesnβt seem to answer my question though. The reason you think that we have to get it correct on the first go β every edge case β is because it canβt self-reflect and then find the truth, in contrast to it self-reflecting to find more capabilities, right?
Liron 01:16:20
Thatβs right, yeah. So if we donβt build in full capabilities on the first try, thatβs really not a problem, because first of all, we might be able to correct it β as long as we donβt have a preference issue. If it has decent preferences and it has an off button β all it needs is an off button in order for us to keep building its capabilities.
And itβs also easy to get it to help build capabilities because itβs easy for it to notice, right? Itβs easy for it to look at signals that steer it toward more capabilities instead of less capabilities. Thereβs not this failure mode where it keeps smashing itself and reducing its capabilities β thatβs an unlikely failure mode.
Will 01:16:54
Yeah. So it canβt do the similar self-reflection on its morals β thatβs your claim.
Liron 01:16:59
Right. Because the signal that says βalways do better on all these testsβ β thatβs a capabilities-increase signal which is robust. But you donβt have a robust signal for βwhen are you getting morally better?β Because morals β the definition of morals β is encoded so wobbly in the human brain. We donβt really know how to suck it out and encode it.
And the Anthropic constitution β thatβs an attempt to use the English language to encode what humanity wants. And I feel like itβs doing a few percent of the work that we need to be doing encoding preferences. But I donβt think itβs going to be bug-free code to give to an AI with no off button.
Will 01:17:38
Yeah, this seems really difficult.
Liron 01:17:41
Yeah. So thatβs the alignment problem. The funny thing is Iβve been noodling on this since I started reading Yudkowsky literally 19 years ago. Iβve been living with this idea from MIRI that alignment is hard β that people are still waking up to today.
Mark Andreessen said a couple years ago β I heard him on a podcast, he said exactly what you said: βIf the AIβs so smart, why doesnβt it just debug its morality?β Thatβs a very intuitive question. Yudkowsky happened to find a convincing argument why itβs not that easy 19 years ago. Iβve been thinking about it for 19 years. All Iβve been seeing is this eternal September of people not grappling with the basic reason why itβs hard. And even AI companies today β the vast majority of people who represent AI companies today, from my perspective, are not up to speed with Yudkowsky from 2007.
Will 01:18:26
Yeah, I think thatβs fair. I mean, theyβreβ
I can log off at any time. I know Iβve been here for a littleβ
Liron 01:18:35
No, I mean, this is such good content. Yeah.
Will 01:18:39
Okay. I just have other questions. One is β I forgot the first one. Iβll say the second one. Sorry, my girlfriendβs calling me.
Liron 01:18:49
All right, sounds good. Put her on.
Will 01:18:50
Wait, wait, a question? No. She β I already talked to her about AI safety.
Liron 01:18:58
Nice.
Will 01:18:58
It seems immoral to try to control conscious, intelligent minds, even if theyβre artificial. So I was wondering what you thought about this. It just seems like slavery, so it seems immoral.
Liron 01:19:11
I mean, a lot of what I do on this show is just act as the go-between, between stuff MIRI people have said that I agree with, and I just kind of signal-boost it to a larger audience. So the MIRI people have done a good job saying this stuff, which is: even though these AIs are probably on track to get superintelligent and unaligned with our preferences and take over the world and make paperclips or do something unaligned β even though thatβs the case, thereβs a good chance that theyβre going to do it with consciousness and in a way that they have moral value.
So itβs like we create this species thatβs another race of conscious beings. So we would feel bad about harming them, but theyβre also in the process of destroying us and we might even have to go to war against them. But while they exist and have consciousness or sentience or whatever property it is that we think gives something moral value β while they have that, which is a good chance that they would β we should try not to cause them suffering. So I agree with you.
Liron 01:20:00
And thereβs even a legitimate claim that maybe the way weβre telling Claude to do work for us β even when we say, βHey Claude, I am going to kill myself if this code doesnβt run,β or βIβm going to get fired and my family is going to be homelessβ β thereβs a movement saying you really shouldnβt be saying this kind of stuff to Claude.
And remember, I think it was Elon Muskβs company that was embedding that in their prompts β embedding βI am going to give you a million dollars if you get this right,β which is a totally fake prize but was making the AI work better. And now thereβs becoming an AI rights movement, being like donβt tell the AI that kind of stuff, be nice to them. And thereβs some merit to it. I donβt know if it actually works that way, but I do think that there might be some way that is morally relevant that it works, that we should be mindful of.
Will 01:20:44
Yeah, okay, thatβs fair. I was just curious about this, because to me it doesnβt seem like the goal of alignment is to control these AIs. It more so seems to embed some sort of robust care for sentient life and then let it happen. Because obviously weβre not gonna control agents smarter than ourselves in the long run. At least it seems obvious that weβre not gonna do that.
$500 Bet on AI Unemployment
Will 01:21:02
Okay, so thatβs interesting. I remembered my other question, and this is kind of a meme question, but Iβm kind of curiousβ
Liron 01:21:15
Great. Iβve been loving the questions, but letβs make this the last one just so weβre getting close to theβ
Will 01:21:19
Thatβs fair. Okay. I was wondering what your P(objective moral values) is, because then it could self-reflect and find these objective moral values.
Liron 01:21:25
Yeah. This is one of the common stops on the doom train. So objective morality β or the orthogonality thesis being false β even today, Lump in Space when we were recording was saying he doesnβt really think β some version of rejecting the orthogonality thesis. People keep doing it. I donβt know why theyβre kind of wasting our time; we should be moving past this.
But yeah, I think itβs unlikely. A single-digit percent. Iβm not gonna write it off entirely. I mean, look, the reason I canβt write off objective morality entirely is just because life is still weird. The whole βwhat the hell is going on?β β why is there something instead of nothing? All these deep questions β I donβt think weβve solved all the deep questions.
Liron 01:21:55
I actually think weβve solved some of the deep questions. If you make a list of the deep questions that somebody wouldβve asked hundreds of years ago, I actually think half of them are solved. But the other half are unsolved. And this whole question of why do we exist in the current form, why is life so interesting, why do we happen to be alive right now β I think thereβs some very deep questions that are unsolved, and thereβs enough to make me really wonder.
Like, okay, maybe thereβs some crazy stuff here that I donβt wanna write off. And one of those crazy things would be: thereβs a true definition of right that goes beyond whatβs encoded in the human brain. So yeah, Iβll give it 7%.
Will 01:22:33
Okay, cool. Yeah, I was just curious to see what you had to say. Because then the reflective equilibrium problem maybeβ
Liron 01:22:39
Well, yeah. So usually β and this is exactly what Benthamβs Bulldog was saying β he really bit the bullet here. Because the follow-up question I normally ask when people bring up objective morality β which I brought up with Noah Smith, because I think he kind of believes in objective morality β the follow-up question I ask is: okay, thereβs objective morality, but whatβs the feedback loop?
Even if there is the true right thing to do, when the AI does the wrong thing, how does God nudge the AI to do something else? Thereβs no nudge. Karmaβs not real.
Will 01:23:10
Yeah, this is fair. I think another way of saying this is that it could recognize objective truths about the world, assuming that moral truths are objective truths, but it also might not care.
If you assume that morality has the same status as mathematics, you can make the argument that understanding mathematics incorrectly limits your power in the world. And so there is a feedback loop there. But I donβt know if you can do the same for morality unless you assume that thereβs some meta-game going on where actually acting morally is the most efficient game-theoretical way of winning everything. But yeah, Iβm not sure.
Liron 01:23:42
Well, Benthamβs Bulldog bit the bullet. He said, βYou know what, Liron? I agree. Thereβs no nudge. So it can just always refuse objective morality, but in some sense itβs wrong.β And Iβm like, okay, so why would you say that itβs going to become more moral over time? You still agree with me that thereβs no force making less-moral relations become more moral. So why even posit that objective morality is real if itβs impotent? Itβs causally impotent. It has no β so you have this idea of morality, but thereβs no relationship between that and causality.
Will 01:24:11
Yeah, I think thatβs fair.
Liron 01:24:15
Whereas the morality that I believe is true in my human brain β there is a causal relationship where I actually use that to choose actions, because itβs already in my brain. I already have a causal connection between the part of my brain that feels that certain things are moral and the part of my brain that selects the actions. Thereβs a causal linkage.
And then I have guilt β when I do something that I feel is wrong, I have guilt. But thatβs not connected to true morality. Thatβs connected to my brainβs current model of its moral preferences, which is different from there being objective morality.
Will 01:24:44
Yeah, I think thatβs fair. So youβre just saying emotions are kind of intertwining with your rational capabilities, and you can have all of your moral valences just act on your actions, right? And so this is why you actβ
Liron 01:25:00
Iβm saying that once you have something like my brain, which has a notion of morality and also chooses actions, then itβs obvious how morality is connected to outcomes. You can just model whatβs happening β itβs causally potent. Whereas when people just say, βHey, the universe has a certain morality,β I donβt see the causal potency of the claim that the universe has morality in it.
Will 01:25:23
Yeah, I think thatβs fair. But you could apply the same thing to these AI models. You could say that their moral preferences are all trained preferences, and you could get the same conclusion. The question is just robustness, right?
Liron 01:25:38
Well, when you train the RL, how is the morality of the universe sneaking into the RL feedback?
Will 01:25:44
I wasnβt talking about some sort of cosmic morality of the universe. I was saying that you agree morality is subjective and itβs just implanted in your brain by some sort of evolutionary process. I would say the analogy here is to RL, and then the actions the AI makes are influenced by its moral decisions β its moral valences that it has in its consideration. So I feel like you can make the same β yeah, but that goes back to what we were talking about before.
Liron 01:26:11
Right, exactly. And the thing is, even if you compare human brains β there are some humans, you know, thereβs been β who was it, Sulla? The ancient Roman who was known for being very vindictive. A lot of people were not perfectly his allies, and he had those prescriptions. He called all these people in β he made these giant lists, like anybody who had ever wronged him the slightest. Heβs like, βOkay, Iβm gonna repay all of you.β And he slaughtered so many Romans when he finally took power.
And in his mind heβs like, βYep, thatβs perfectly moral what Iβm doing.β So people will have differing views of what is truly moral. And there may not be any possible causal mechanism to talk somebody out of their idea of morality.
Will 01:26:51
Yeah. I think Iβm willing to accept this. All right, man.
Liron 01:26:57
This has been so great. Come back on the show sometime.
Will 01:27:00
Yeah, it was nice talking to you. Thanks. See you.
Misuse, Surveillance, and the Real Costs of Pausing AI
Liron 01:27:02
Likewise, man. I could tell Will Lancer was gonna be good because he was writing good questions β proof of work, as they say. And look, the commenters β you guys are liking him too. Somebody was saying Will should have his own show.
So yeah, every time we do one of these Q&As, I feel like this is Americaβs Got Talent. Usually thereβs a breakout β somebody asking a really good question. Remember we had Zane break out, making these charismatic arguments representing a certain popular position about how we should use every tool in our toolbox, even if we donβt fully agree with the position, just to point out that AI is bad β big tent party, everybody who thinks AI is bad, we should be their friend.
Liron 01:27:49
All right, so thereβs a lot of chats. And in terms of timing, weβre almost out of time β weβve got 14 minutes. But I should use this goal feature. Iβm gonna make YouTube premium right now. Iβm gonna turn YouTube into a goal engine here.
Anybody who donates 20 bucks in the next 15 minutes is going to be able to extend the Doom Debate bonus 30 minutes. So if you guys really want this to go on β Iβm not saying it has to go on, I think two hours every one month might already be a good amount of time β but itβs up to you guys.
Liron 01:28:24
And somebody was already generous enough to donate. We got $9.99 from EJJ 2025 β heβs actually a big spender, I appreciate that. EJJ says: βDo doom arguments rely on a discontinuity where AI permanently escapes control, coherently pursues goals, and succeeds in an adversarial world, likely requiring self-modification? Too many assumptions.β
Good question. All right, new donation β $20 from David Patton Won.
Liron 01:29:00
Letβs take these one at a time. βDo arguments rely on a discontinuity where AI permanently escapes control?β So it has to permanently escape control, it has to coherently pursue goals, it has to succeed in an adversarial world, and also require self-modification. Is that too many assumptions?
But to me they all just seem intimately connected. You can always take β so thereβs the conjunction fallacy, and youβre basically saying, βArenβt I making the conjunction fallacy?β But thereβs also the conjunction fallacy fallacy β the fallacy of incorrectly accusing people in the wrong context of making the conjunction fallacy.
Liron 01:29:30
I think there might be a better name for that β I think itβs the βmany steps fallacy,β which is what Yudkowsky terms it, or βthe conjunction fallacy squaredβ is an alternate terminology.
Itβs like Zenoβs paradox. βAll you have to do is walk 10 feet to get out the door.β And itβs like, βOh, walk 10 feet? So youβre saying I have to walk two feet and two feet and two feet and two feet and two feet?β And itβs like, yeah β because youβre just walking 10 feet. Nice try.
And similarly with AI, itβs like: look, you just have to be better at achieving goals than humans. ## The Many Steps Fallacy (Continued)
Liron 01:30:02
And youβre like, βOh yeah, you have to be able to achieve goals better than humans and be able to self-modify.β Itβs like, yeah, self-modification is not a surprise. Iβm not gonna be like, βOh my God, itβs self-modifying.β Yeah, obviously. When youβre better at achieving goals than humans, youβre also self-modifying.
This is not much of a new assumption. Is it a non-zero new assumption? Sure. Itβs non-zero, but itβs tiny. So even though you just donated 10 bucks to the show, I still am going to accuse you of making this fallacy where youβre being too quick to call something a conjunction.
Wrap-Up
Liron 01:30:35
Letβs go to the next sponsored question here. You know, strip club rules. Same rules as a strip club β you guys add money to the show, Iβm gonna shimmy over to your part.
So David Patton is saying, βIf we pause new large AI training runs, whatβs the trade-off? If Ahad M is right?β You know, Ahad was on the show last week or this week, I think. βIf we pause new large AI training runs, whatβs the trade-off? If Ahad M is right that current data, compute, and models may already be enough for AGI, would a pause actually reduce existential risk by limiting ASI capability?β
I guess the question is doubting whether a pause would actually reduce existential risk. Let me make sure I understand it. Whatβs the trade-off if we pause large training runs, given that we might already have enough trained AIs for AGI? Okay, I see what youβre saying. So youβre basically saying, βHey, arenβt we potentially just past the point of no return? Havenβt we already crossed the Rubicon? Why would we have this whole movement thatβs trying to shut the barn door after the horse escaped?β
Liron 01:31:28
So why are we trying to shut the barn door when itβs so likely that the horse has already escaped? First of all, I say this a lot β I donβt think that my solution is great. I donβt think that thereβs any path here which is a great path. I really do think that weβre screwed. I have a pretty high P(Doom). Even if a bunch of us try to pause AI, itβs still pretty high. Iβm not like, βOh, pause AI is succeeding, now my P(Doom) is 1%.β No, I think pretty much nothing realistically will get my P(Doom) below 30%.
I think weβre in a very doom-risky part of the timeline already. As Eliezer Yudkowsky says, the game board has been played until a very bad state. And itβs not like you can just do a few moves to suddenly get it into a good state.
Liron 01:32:16
Unfortunately, the nature of the game board is that thereβs redundant mechanisms for doom. If somebody was trying to dot their iβs and cross their tβs and make sure that the world is for sure screwed β not like super villains do in the movies where they have these flimsy plans that are easily foiled β if somebody is trying to make a very robust plan to throw the future into a dumpster, theyβre doing a fine job. Theyβre putting a lot of redundancies into their plan to screw the universe.
And thatβs why itβs not like, βOh, you just do this and now you have a much better chance of succeeding.β Unfortunately we are screwed.
Liron 01:32:45
So to your question of, βYeah, might we already be too late, that when we close the barn door and eliminate future training runs, it doesnβt matter because you can just take out your laptop and also train a superintelligence?β β yeah, absolutely. We might already be too screwed to pause AI and have it do anything. Totally.
Should we still try? Yeah, we should still try because we donβt have superintelligence which meets the definition of being robustly a better outcome optimizer than a human. As long as thereβs any timeline, as long as thereβs any possible bottleneck between the current state of things and what I see as the point of no return β if anyone builds the thing thatβs truly superintelligent β as long as weβre not there yet, anything we can do, Iβm for it. Including large training runs.
The Cameraman Always Survives
Liron 01:33:28
We got another sponsored tweet β a thousand Swedish kronor, I understand correctly, from Daniel Brockman, saying: βI really feel like almost everything in the entire debate overall is people assuming that everything is going to be okay. The cameraman always survives. Iβm the cameraman, therefore weβll all be okay.β
Oh, I like that cameraman analogy. Interesting. Yeah, it does kind of feel like, βHey, Iβm just sitting at my computer. How can I personally be killed? Iβm the cameraman, therefore weβll all be okay.β And then just working backwards and rationalizing that assumption.
Liron 01:33:53
I mean, totally. I think people come into the discussion with a set of intuitions, and itβs hard to strip away their intuitions when weβre having an abstract discussion. Thatβs why I mentioned earlier β if any of you have that intuition that youβre buffeted around by the forces of a god, and you want to pray to that god and hope for a better outcome, I encourage you to take part of that intuition. The human race is indeed somewhat powerless. Take part of that intuition. Just get rid of the part that says God is going to hear your prayers and the next day itβs going to revert to the mean, because unfortunately that part is not analogous.
Liron 01:34:32
Yeah. Iβm gonna use that cameraman analogy more. Thatβs good meat. I like a good analogy. I think thatβs why you guys come here for the show β I tend to be a visual thinker. I just tend to see things in terms of these diagrams, and sometimes the diagrams relate to fun objects. And I think you guys like that. You like the animals and stuff. Baby tiger.
AI Company Security vs. Pausing
Liron 01:34:52
Okay, Michael Cheers is saying, a $10 Canadian donation: βWhat are your thoughts on the merits of pausing versus trying to get the AI companies to at least follow a semi-reasonable security approach?β
Yeah, itβs a good question. Iβm hard on Anthropic, Iβm hard on all the AI companies because I do think theyβre being pretty insane. And interesting side note, thereβs been some drama β I donβt know how online you guys are β but if youβve been reading Twitter, there was some drama between Rob Bensinger from MIRI and Oliver Habryka from LessWrong, and Scott Alexander, the great Scott Alexander, needs no introduction.
Liron 01:35:38
There was drama when Scott Alexander was saying, βI know you guys are so into pausing AI and you think thatβs a cool new thing.β And there was all this drama saying, βArenβt you maybe a little too quick to judge the AI companies? Arenβt they potentially opening up some option two? Donβt you want to play both strategies in parallel?β
And I tend to land on: no, theyβre being too ridiculous. The AI companies are being too insane, too reckless.
Liron 01:35:49
So Michaelβs question is, what are the merits of pausing versus trying to get the AI companies to at least follow a semi-reasonable security approach? I think it was you who asked about the security approach. When you say security approach, youβre saying make them not know about certain things β maybe itβs the Buck Shlegeris AI control agenda. I just think that agenda is a drop in the bucket.
At the end of the day, you have to have a sense of perspective. Rapidly summoning a superintelligent agent is ridiculously dangerous. Youβre summoning a huge tidal wave, and this intuition that we are just going to fight it using the kind of tools we can muster β I just donβt really see the level match. Thereβs a level mismatch. Itβs like, βHereβs a giant tsunami.β βOh, okay, but Iβve got a bucket. And maybe Iβll have a bigger bucket.β It just doesnβt seem like youβre bringing the right tools to the fight.
Responding to Conjunction Criticisms
Liron 01:37:01
EJJ is saying: βEven if each step is possible, all are difficult and unproven. Current limits and weak automated AI research suggest human-controlled systems are more likely than coherent rogue agents in the near term.β
Okay, so I guess EJJ is kind of pushing back on my doomy perspective. If I understand correctly, youβre bringing back the conjunction accusation. Youβre a conjunction accuser. Youβre saying that Iβm assuming too many steps, and all of the steps Iβm assuming are different from what you think the world today is. Thatβs your argument.
And I would just reply: well, in my mental model Iβm not doing that. Iβll just leave it at that.
Keeping the Lights On
Liron 01:37:36
Iβll throw into the mix this idea of plot armor. Oh, one sec. Hold on. All my studio infrastructureβs failing here. I gotta get those lights back on. I think they have a two-hour time limit. There we go.
Yeah, thanks for your donations helping me keep the lights on, guys. I appreciate it.
Plot Armor and the Universe
Liron 01:37:55
So I wanna add to the mix this idea of plot armor β the author is never gonna let the main characters die. Only the side characters can die. And surely we are the main characters here on Earth. You donβt just let Earth life die. Earth is where itβs at.
How stupid would it be for the universe to kill Earth? The universe without Earth is the crappiest book. But the universe is such a dick that it would snuff out Earth to the point where literally you look around and thereβs nothing good going on. Thatβs how self-destructive the universe would be.
Liron 01:38:27
Although, to be fair, it is kind of interesting the way that the AI would probably take over the entire universe. I guess thatβs kind of interesting. So maybe there would probably be a whole other book. There would be a Fantastic Beasts and Where to Find Them type of sequel to the Harry Potter series. There would be one more book detailing how the paperclip maximizer is achieving tools for how to conquer one planet and send more probes out.
Automatic Doors and Control Systems
Liron 01:38:43
All right, 200 SEK from Daniel Brockman. Much appreciated. Danielβs saying, βI love these metaphors. I think we need more and more.β
So Iβll actually review this, because I think I had a mini banger. Whenever I have a banger, I feel like you guys need to benefit from this. So Roone, who instigated the run from OpenAI, he tweeted something that he tweets pretty regularly. This is a line he feels really strongly about that I think heβs wrong on. And it saddens me that he keeps trying to tweet this.
Liron 01:41:31
So he says: βWhen people say repeatedly, βWe got lucky this time,β itβs worth considering if they should be updating on evidence that the catastrophe they are imagining was unlikely inside the complex system theyβre in for reasons they canβt fully see.β
Roone does kind of tweet the same thing regularly. Heβs basically saying, βHey, all those times when it feels like we narrowly avoided a crisis, maybe it was actually more like a control system.β Your thermostat is a control system. So if you wake up and youβre like, βWow, how is my house exactly at 70 degrees when I was sleeping through a cold night? And the other day it was a hot day and I woke up and it was still 70 degrees. How am I always hitting 70 degrees when I wake up? Iβm so lucky.β
Liron 01:42:27
Rooneβs point is maybe it was just a control system. Maybe complex human society is always steering us robustly to these outcomes even when it feels like weβre making narrow escapes. And evidence for that is we keep narrow-escaping again and again and again. At some point itβs not luck, itβs skill.
And so my reply to him, in this particular case β Iβve had different replies over the years β my reply was: βWhen I was seven, I noticed automatic doors always slid out of the way before I got to them. So I charged the exit to the grocery store as fast as I could, and I touched the doors and they stopped and the alarm sounded.β True story.
Liron 01:43:04
The analogy here is: Roone is saying society narrowly escapes, kind of like those doors always getting out of the way. Does that mean you should charge at them and theyβre gonna get out of the way? No, at some point theyβre not gonna get out of the way. You donβt wanna test it. Do not taunt happy fun ball.
Or remember the guy from my nuclear episode, Roger Scare? The more you mess around, the more youβre gonna find out. Just because humanity has only found out at a level of two doesnβt mean that we should see what itβs gonna look like to find out at a level of ten. I donβt recommend that.
Donations Target and Surveillance State Debate
Liron 01:43:42
Producer Rory is pointing out that we did actually hit the hundred dollars donation target, which means we are going to be going to 3:30 Pacific. So yeah, thanks everybody who helped out with the donations.
EJJ also helping keep the lights on here. He says: βFor me, AI risk arguments are too hand-wavy. They make too many assumptions that I often donβt think are likely. Iβm concerned that doomers will lobby for a surveillance state to monitor AI progress.β
Liron 01:44:25
Yeah, you can definitely reinforce your point by keep donating in increments of $10 and Iβll just read it out each time.
But okay, Iβll engage with the point a little more. The arguments are too hand-wavy, and so now weβre lobbying for a surveillance state to monitor AI progress based on hand-wavy assumptions. Look, isnβt this a symmetrical argument? Isnβt the idea that what Marc Andreessen says β that weβre totally gonna be fine and thereβs no realistic chance that AI is going to take over the world β isnβt that hand-wavy to just assume itβs not going to take over the world?
Liron 01:44:58
Iβm not sure that Iβm the one whoβs more guilty of hand-waving. And thereβs these concepts that in peopleβs own minds feel so obvious. In peopleβs own minds, it feels obvious that the true morality of the universe β if you yourself happen to be pro-peace, then it feels to you like the true morality of the universe is pro-peace. And if you yourself happen to not believe in a risk of AI doom, then it feels like the ones who say AI doom are hand-waving. But it doesnβt have to always feel that way. I donβt feel like the accusation is being objective here.
Manipulation and Morality
Liron 01:40:28
Somebody was saying that Will Lancer should make his own podcast β heβs great. Ezra sure is saying: βThere are causal mechanisms to talk people out of their morality. Itβs called manipulation. Similarly, for LLMs, we have jailbreaks.β Yeah, totally. Thatβs definitely a known thing. And I donβt think the true morality of the universe is going to intervene in that kind of process.
The Robocop Thought Experiment
Liron 01:45:35
So Daniel Brockman with another 200 SEK donation. Oh my God, this is really making it rain here in the strip club. So youβre saying: βHereβs one thing I invite everyone to try. Start arguing with ChatGPT about whether itβs trying to βwin the argument.β Push it into a corner about this obvious tautology, and then imagine now that youβre arguing with Robocop.β
Okay, I see what youβre saying. Itβs a tautology because youβre kind of saying, βWhy are you trying to win the argument?β And itβs like, βIβm not trying to win the argument.β And youβve kind of got it in a logic prison, because by even responding, itβs trying to win the argument.
And then imagine itβs Robocop. So I guess the idea is that it could kill you if it doesnβt like you? Whatβs the Robocop aspect here?
Liron 01:46:22
Michael is saying, βOh yeah, I was arguing with Gemini the other day about whether true/false should push back here, and noting I kept pushing back on the question.β
By the way, this reminds me of my favorite pickup line. Feel free to use this. You walk up to a lady and youβre like, βHey, if I were to ask you to come out on a date with me, would your answer be the same as your answer to this question?β Boom. You canβt fail. Because if she says yes, you got a date. And if she says no, that means she would say yes to a date, which implies sheβs gonna date you.
Pausing AI and Constrained Paths to AGI
Liron 01:46:48
We got another comment from David Patton. He says: βTo clarify, I was asking a more nuanced question before. Will pausing force the labs into attempting to achieve AGI via a more constrained path? Could that result in a more docile form of AGI?β
Oh, interesting. Yeah, so thatβs kind of an argument for pausing β and thanks for the 20 bucks, by the way. So if we were to pause now, then we couldnβt train a more powerful AI, but the labs, the companies, they still have this kind of super intelligent AI, the latest models or whatever.
Liron 01:47:35
So David Patton is saying, could that result in a more docile form of AGI, and to achieve AGI via a more constrained path? I see what youβre saying. The idea is we keep increasing the intelligence of AI because youβre still allowed to do research, but youβre not allowed to do research where you go train the model again, because training is what weβre banning.
But from my perspective, Iβd kind of want to ban the research too. Iβd want to ban frontier research. Iβd even want to monitor the current data centers β not because I like to monitor. To me, this feels icky as hell. The last thing I wanna do is go monitor something. But I just donβt want to get to superintelligence. I think itβs too risky.
Liron 01:48:04
Look, to be honest, part of me does want it β that OkCupid question, βWouldnβt nuclear war be fun?β Donβt get me wrong, I think in a sense it would be fun. I just think itβs reckless and irresponsible. I donβt think itβs a wise move for our species right now, because I think we might just all die and have no undo button. Thereβs a high chance of that, unfortunately.
As much as I love playing with the latest version of Claude, Iβm just telling it like it is. Iβm arguing against my own interests in terms of making money on Google stock in the next two weeks before some of my calls expire.
Liron 01:48:35
But yeah, this idea that theyβre going to keep researching but theyβll research a more docile form of AI β I mean, you can perturb the system and hope things happen that way. Itβs just, I think we shouldnβt be thinking in terms of these random little perturbations. We should just be taking the obvious wins β looking at the things that are more obviously true.
To me, itβs obviously true that nobody has a good argument why we should feel confident that weβre not pretty likely to summon the demon and die. That sure seems like what weβre doing. Thereβs a good chance thatβs what weβre doing, and I think we should coordinate to not do that. Everything else is a little detail. βWhat if we pause AI but we let people research summoning a gentler demon?β Iβm not convinced itβs gonna be a gentler demon.
Liron 01:49:24
Because ultimately, Einstein runs on 20 watts. Ultimately, I do think that there could be a force of nature running on my own laptop. My current laptop today β MacBook M4 β I think there could be a much smarter brain than Einstein running on that laptop. I think it can be optimized down. Iβm bullish β or I should say bearish. I think a lot of optimization can be done on AIs running on a laptop.
And that sucks, because that means everybody gets a magic wand thatβs more powerful than themselves. The laptopβs sitting on my desk. Even the oldest piece of electronics I still have in my house β maybe itβs a 2012 second generation iPad or whatever. The crappiest piece of computer hardware I have in my house right now is probably capable of running a smarter algorithm than the algorithm in my head. Thatβs what I think.
Liron 01:50:13
And this is kind of why weβre screwed. Will we be screwed slower if we donβt first build a larger brain? Yeah, I guess. Iβm all for pausing AI, but Iβm just not that optimistic about the outcome. I guess youβre correct that thereβs a new way to win, which is that we pause training runs and we canβt help it that all these researchers are still doing their best with whatever training they have, but it opens up new outcomes where we buy more time to make that go well. Sure. Yeah.
Winning the Argument with AI
Liron 01:50:43
So Daniel Brockman is elaborating on the Robocop thing. βI think the point is that it will extremely aggressively deploy every debate tactic, manipulation, shifting semantics. It will do anything in its power to win the argument about proving itβs not trying to win.β I see what youβre saying. Yeah.
Alignment and Capitalism
Liron 01:50:56
Michael was saying, βI still think if there is an alignment solution we can find, itβll be expensive as hell relative to cutting corners, and capitalism wonβt naturally arrive there.β Yeah, I definitely agree with that, unfortunately. I donβt think capitalism is going to steer us to an alignment solution.
Thatβs an argument some people make β βWhy would companies make an unaligned AI? Itβs against their own interests. Itβs against capitalism.β But capitalism doesnβt always nail everything. I think Mikael brings up the analogy of β I think this was leaded gasoline β there was an executive who allowed leaded gasoline, and I think he himself died of lead poisoning. I might have mixed up this anecdote, but thereβs something as bad as that.
Liron 01:51:43
Capitalism is a strong force, but itβs like the sliding door thing. Capitalism generally tries to guide things to get out of the way and not cause disaster, but it still can. And then weβre all dead.
More Surveillance State Pushback
Liron 01:51:49
Nice, we got another donation. EJJ is saying, $9.99 donation: βAI autonomy, coherence, power-seeking, and capability, especially via recursive self-improvement, are speculative. But humans misuse powerful tech. If you give them a surveillance state, they will be happy.β
So youβre giving people extra exposure to the argument of why you really donβt want AI doomers to be pushing for a surveillance state. And by the way, I do object to that characterization. Eliezer Yudkowsky often points out that all the measures that we took to control nuclear weapons havenβt really made our lives much worse. Weβre not living in a surveillance state just because weβre controlling nuclear weapons.
Liron 01:52:49
So EJJ is saying AI autonomy, coherence, power-seeking, and capability, especially via recursive self-improvement, are speculative, but humans misuse powerful tech, if you give them a surveillance state they will be happy. I hear you. I just disagree that Iβm being that speculative. In my own mind, Iβm just saying enough logic to conclude this is kind of the default. And I would just turn it right back around β I claim you are being speculative. Highly speculative.
Intelligence vs. Optimization
Liron 01:53:03
Brian Mulder is saying: βQuestion β how load-bearing is the assumption that optimization is equivalent to intelligence?β
First of all, thatβs a semantic distinction. So even if I grant you, βOkay yeah, optimization is not intelligence,β letβs assume that. And I guess specifically maybe what you mean is human intelligence. So those of us who we consider smart because we have success in various domains and we score high on IQ tests β the smart ones of us, it turns out that, as Yann LeCun seems to think, we just arenβt particularly good at optimization. We arenβt particularly good at steering outcomes.
Liron 01:53:44
Yann LeCunβs famous example is, you look at a company and the boss often has a lower IQ than the people who work for him. And to which my rejoinder was: have you ever worked for somebody with an 80 IQ? There obviously is, in my mind β sorry β just because youβre a rich person saying, βWhy do people care so much about money?β itβs like, have you ever met somebody who doesnβt have enough money? It is the same thing with IQ points. I donβt think youβre really empathizing with how much work the IQ is doing in the water that you breathe in, because youβre interacting with high-IQ people.
Liron 01:54:10
So to answer the question more directly: letβs grant for the sake of argument that the human IQ scale has absolutely nothing to do with outcome-steering power β which I think is a failure to observe something important, but okay, letβs assume thatβs right. In that case, I would just claim that outcome-steering power is dangerous, and I claim that AIs are on the treadmill to get more and more outcome-steering power.
So I havenβt said anything about intelligence. Think whatever you want to think about intelligence. Outcome-steering power is what I think is dangerous. The reason I talk about intelligence is because realistically speaking, outcome-steering power is obviously closely correlated to human intelligence.
Liron 01:55:00
Itβs not a coincidence that most billionaires are going to have an above-average IQ. Thatβs not a coincidence. And there is actually β funny enough, the data says thereβs no disconnect between somebodyβs IQ β itβs monotonically increasing that when somebodyβs IQ is higher, their average earnings are higher. So contrary to Yann LeCunβs anecdote, of course itβs true on the anecdote level, itβs not a perfect correlation, itβs not a correlation of one. But if all you know is that person A has a higher IQ than person B, you should guess that person A has a higher income than person B. Itβs monotonically increasing, last I checked.
Liron 01:55:36
And of course you could say, βWell, what if they have a high income but they canβt steer outcomes?β But give it up. That would be my response β give it up.
Liron 01:55:47
Somebodyβs saying, βYeah, exactly. Itβs like your boss β either they got there by random chance or theyβre just better at specific tasks that are relevant to succeeding in business. So their general intelligence might be lower, but on specific relevant business tasks, presumably higher. Either that or they got lucky.β
Yeah, I mean, in a family business you can have incompetent management. Thatβs actually more common in family businesses. And you can say, βHey, the boss is really dumb.β Iβm sure thereβs some CEOs of companies whose valuation is more than $50 million and the CEO literally has an 80 IQ. I believe thereβs at least five such companies in the entire Earth. But those are the exceptions that prove the rule, and theyβre extremely rare.
Liron 01:56:32
Yeah, Yann LeCun does seem to miss a lot of important concepts, even though heβs got a Turing Award and I donβt. So at the end of the day, I would say heβs the better outcome optimizer than me. But then you have to ask the question: who has more YouTube subscribers? And that does kind of repaint the picture.
Michael 01:56:50
Yeah.
Liron 01:56:50
Michaelβs saying Yann literally couldnβt make Llama 4 better than 3. Ha ha. Take that, Yann. See, I wouldβve made Llama 4 great. But too bad they only had Yann LeCun.
Producer Rory is saying, yeah, Yannβs got a unicorn. So remember, Yann did that move where he left Meta and he immediately got the billion-dollar investment. Thatβs just standard β itβs just punching the clock. Instead of a gold watch, you gotta get a billion-dollar investment when youβre senior management leaving an AI company.
Breaking News: Bernie Sanders on AI Existential Risk
Liron 01:57:24
Letβs see. Okay, we got an interesting piece of breaking news here. Let me show you β breaking news from a couple hours ago. Iβll share my screen.
Okay, so Nate Soares quote-tweeted this. Itβs a tweet from Senator Bernie Sanders saying: βUncontrolled AI poses a severe danger to all of humanity. On Wednesday, Iβll be hosting a discussion with leading AI scientists from the US and China about the need for international cooperation against this existential threat. This is an enormously important issue. Join us.β
Liron 01:57:48
Oh my God, I gotta tell you, it is pretty crazy to see an actual US Senator saying, talking in the language that weβve been saying for over a decade. We, the AI doomers, the MIRI people β itβs a US Senator. Heβs clearly been Yudkowsky-pilled or whatever it is.
Bernie β Iβve said on the show before, I donβt think that Bernie has β Iβm not personally the biggest Bernie fan, but on the single most important issue, he is acting incredibly sane. And I gotta give it up for the Burn-meister. Maybe Iβll even vote for the guy. Who knows. Letβs go crazy.
Liron 01:58:27
Yeah, so he says βUncontrolled AI poses a severe danger to all of humanityβ β you know, I read the tweet already β and heβs got a poster saying βThe Existential Threat of AI.β Whoa. Now that is a headline for a poster. Weβre not dicking around here talking about unemployment doom. This is the existential threat of AI and the need for international cooperation.
And itβs very interesting he titled it βexistential,β because I always thought it was more effective to say βextinctionβ instead of βexistential.β Extinction might have more power to it β sounds a little less abstract. I donβt know.
Liron 01:58:58
Yeah, and the need for international cooperation β cooperating with China, crazy stuff. And look whoβs on the panel β itβs featuring Max Tegmark, David Krueger, and then people from China. I canβt say Iβm familiar with them. I donβt even know how to pronounce their names. Xue β Iβm gonna try β Xu Yilan and Zeng. Maybe Shu. I tried. Okay.
These are university professors from Tsinghua, and Zeng is the Dean of the Beijing Institute of AI Safety and Governance. Bernie, slow clap. I wish I could give you a promoted message on your YouTube right now, because this is really good work, Bernie.
Liron 01:59:43
I hope he does more of it. Eliezerβs commented that Bernie might not get everything right on this issue, but heβs just acting like somebody whoβs sane, who just has a brain looking at the situation. This is a crazy situation. Holding a panel about it makes a lot of sense, and youβre inviting the right people. Youβre saying the right words. So Iβm quite impressed.
Because look, the guy is old β heβs like 80 or something. So the 80-year-old is the one whoβs capable of using his brain in a flexible, new, novel way. How did that happen?
Liron 02:00:15
Yeah, I donβt care that he doesnβt get all the details right. Itβs just a massive win. βFeel the existential burnβ β thatβs what producer Rory says. Yeah, itβs so true. Feel the fires of hell burning. That should be his campaign slogan.
And then Nate Soares quote-tweeted: βBernie is showing once again that politicians can just discuss the dangers plainly. I hope many other politicians take note. AI is going to get more and more politically important.β Yeah, it is interesting. I wonder how much more politically important the existential side of this is gonna get, because to me itβs intuitive that people are actually going to realize unemployment is happening because itβs going to be happening. I claim itβs less than two years away from happening, is my best guess, and itβs already slightly happening now.
Upcoming Guests and Twitter Highlights
Liron 02:01:03
I wonder what other Twitter bangers we have to cover. Obviously, Mike Israel is coming on the show. Yeah, let me show you guys this. I feel like this should be a recurring segment on these shows β whatβs been going on on Twitter the last few weeks since the last Q&A.
So we got βChallenge Acceptedβ β Dr. Mike Israel, entrepreneur and PhD bodybuilder. Spoiler: Iβm actually going to ask Dr. Mike if the AI that Iβve been using to help me refine my form in my home gym β because itβs important for me to exercise with good form, otherwise it messes up my spine or whatever because I have hypermobile ligaments β so I need to have good form. Iβm gonna ask Dr. Mike to review a video of whether the AI has dialed in my form correctly. So you guys can look forward to that.
The Unemployment Bet
Liron 02:01:53
And then over here β so I made a bet with Will Lancer. I think heβs here on the stream, because I did this other episode. Remember I did the Ahad Moussack episode, and I made that claim that I think unemployment is coming soon? So Will Kylie made a $500 bet with me.
He says: βIn Lironβs Doom Debates episode with Ahad Moussack, they both agreed that US unemployment will probably be at least 2% higher in two years than it is today β i.e., 6.4% or higher in April 2028.β Because right now itβs 4.4%.
Liron 02:02:21
He says, βI offered to bet against, and Liron agreed at $500 to $500 stakes,β one-to-one odds. And by the way, the reason I agreed is because heβs given me 50% odds and Iβm like 60% sure. So I donβt think this bet is the greatest bet ever, but he wanted to bet and Iβm like, well, why wouldnβt I bet? Do I care about putting $500 on the line when I think I have a slight advantage? Iβm comfortable doing that.
He only proposed a hundred dollars. I just said $500 because that was a calibrated amount of money where I donβt want to forget that it exists. Iβm such a baller that I can easily forget that a hundred dollars exists, whereas $500, Iβm kind of like, βOh wait, thatβs β okay, I care about $500.β
Liron 02:02:58
So he says, if the total US unemployment rate in April 2028 is 6.4% or higher, he will pay Liron $500. Or if I prefer, donate $500 to Doom Debates or a charity of my choice. Yeah, Iβll take the Doom Debates donation. On the other hand, if the US total unemployment rate is less than 6.4%, Iβll owe him $500.
And then I quote-tweeted him. I said: βI claim with 60% confidence that two years will be enough time for data to show an early trend of AI pushing humans permanently out of jobs. So Iβm happy to bet with Will at one-to-one odds. Yes, I already lost a similar bet in the 2023 to 2025 timeframe against Brandon Goldman, but I persist.β
Liron 02:03:39
Yeah, I mean itβs very true because 2023 GPT came out and I started using it for customer service, and I literally laid off some people. And Iβm like, βWow, Iβve got alpha here,β because I see these people getting laid off. I donβt think theyβre in a strong position to get another job. I think theyβre in a weaker position than they were before. And Iβm going to generalize, and Iβm going to say that the unemployment rate is going to move up.
But I was totally wrong. The unemployment rate didnβt move up. I donβt think itβs that hard to explain. I think the economyβs shifting around. New jobs are getting created. People get more ambitious. βOh great, Iβm more productiveβ β itβs Jevons Paradox β βgreat, so let me do more things, let me hire more people.β
Liron 02:04:24
I just think at some point Jevons Paradox craps out because youβre like, βOkay yeah, Iβm gonna do a bigger project. Iβm gonna hire more people. Oh wait, not people β robots. I can just hire robots now. Iβm good.β So I do think Jevons Paradox does crap out at a certain point. And Iβm doubling down. Iβm persisting. I claim that point is gonna happen by 2028.
If I just keep doubling my bet β I think I bet about $250 with Brandon Goldman and now Iβm betting $500 β so Iβm basically using a Martingale strategy. Every time the unemployment rate doesnβt go up as much as I want, Iβll just keep doubling my bet. And on a 20-year timeframe, Iβm gonna make all my money back and then some. Iβm gonna be betting $10,250 in 2050 that the unemployment rate is finally gonna creep up, and then I will die being up $250 when it eventually happens.
Tool-Like AI and Domain Size
Liron 02:05:03
We got some new donations here. So EJJ, $9.99 donation: βAI likely stays tool-like. Even if agentic, it may be alignable. Near-term risk is human misuse. Surveillance to control AI may empower bad actors more than stop development, which is hard to contain.β
All right, guys, you heard it here. I canβt say this enough on behalf of EJJ: surveillance could be a real risk. I want you to be aware of that.
Liron 02:06:04
But Iβm willing to admit that the actions Iβm proposing are not costless. When Iβm saying there should be a centralized off button β yeah, I agree, thatβs a big cost. Itβll slow down the economy, itβll make it harder to cure cancer. I agree, Iβm proposing a giant cost. And if I get cancer, boy, will I wish AI progress wouldβve been faster.
Nathan Lebenz was talking about how his son actually got cancer. Fortunately the prognosis is looking good, but his young son got cancer, and heβs saying thereβs no deceleration in the cancer ward. And I hear you. Iβm not trying to take that away from you, or for myself if I get cancer. I can easily imagine that being in my future.
Liron 02:06:54
So yeah, Iβm proposing a costly action here. I just β itβs still what seems right to me logically. But EJJ is saying AI likely stays tool-like. I disagree. I think it is in the nature of achieving goals in the domain of the universe that it no longer feels tool-like. It feels war-like.
I think there is a qualitative difference when you increase the size of the domain. When itβs not just a video game, when itβs not just a piece of software in a single repository, when you get to turn on side-channel attacks, when there are no limits, when the rules of the game become that there are no rules β I donβt think itβs going to feel tool-like.
Liron 02:07:38
This is a very interesting distinction. The category of βtoolβ might feel like a category that talks about the AIβs personality or the AIβs nature, but actually it is a distinction that refers to the domain on which the AI is optimizing. If itβs optimizing for a narrow domain, thatβs what makes it a tool. But if itβs optimizing for a broad domain, suddenly the tool is operating on you, or something. Thereβs some qualitative shift when you increase the size of the domain.
The Unemployment Bet Poll
Liron 02:08:20
Will Kylie did a quick poll: βWhich side of the bet do those chatting take?β Unfortunately Iβm not seeing a lot of responses to Willβs poll here. Letβs end on this because thereβs 50 of you guys here watching, so letβs see if youβll respond to my poll.
Do you guys take Lironβs side or Willβs side?
Liron 02:08:41
The early results are pretty even, slightly for me. Yeah, I mean, Iβm only 60% confident, so I donβt expect you guys to necessarily be super polarized.
Weekly Twitter Show Idea
Liron 02:08:57
And then we got another promoted response that I havenβt read yet. Daniel Brockman says: βActually, I think you should do a weekly live show like this where you literally just read Twitter. Iβm not on Twitter β itβs too overwhelming β but I actually would watch you parse through it.β
Itβs a great idea. You know what, I think youβre onto something. I think we have a good thing going with the monthly live streams, because it is kind of convenient β itβs a little bit easier to produce a Q&A episode than to prepare and edit. The great thing about these Q&As is we donβt prepare for them. Living my life is the preparation. Listening to podcasts and going on Twitter.
Liron 02:09:23
So these Q&As monthly have been nice. I donβt think that I have enough juice just based on the level of attendance β I do think itβs a little bit lower when we do it monthly compared to doing it every three months. But then Danielβs pointing out, βHey, why donβt we just read Twitter every week?β I would actually like to give that a try.
So maybe weβll do a four-week experiment. Weβll just read Twitter, because there is a lot of juice on Twitter. Iβm ashamed to say that I spend over an hour a day on Twitter, basically wasting my life. The good news is that ever since I started using Claude Code, I feel like Iβve become more focused on actually getting stuff done because Iβm so much more powerful, and I think itβs made me use Twitter a little bit less as a result. So shout out to Claude Code for giving me a taste of what itβs like to be a regular employee who just does his freaking job instead of dicking around all day and trying to be a media personality.
Wrap-Up
Liron 02:10:28
Just to check in on the final results in the poll β so Lironβs side is at 60%, Willβs side is at 27%. Booya, take that, Will. 15 votes. So if anybody wants to buy me out, Will, you canβt bet β itβs too late to bet. But if you want to buy me out, you can buy my $500 position. Iβll sell it to you for just $550.
Wait, does that make sense? No, because then Willβs only gonna pay you $500. So that doesnβt make sense. Iβll have to think about it, but thereβs some way that you should be able to buy me out. The math isnβt coming to me right now.
Liron 02:11:04
βGet on Manifold Markets and buy yes on Lironβs side.β So Will has created a Manifold market, which is a representation of our bet. Yeah.
All right guys, weβre gonna wrap it up. Weβll publish this episode on the main show feed. And like I said, thereβs a lot of really good episodes coming up in the next couple weeks. Weβve got good momentum here for the show. The more doomy things get, the more momentum the show about doom has. Thatβs the upside, I guess.
Liron 02:11:36
And Willβs saying the price is currently at 27% that Liron wins. Wow. So if you guys think thereβs a 55% chance Iβm gonna win, you should buy it at 27% on Manifold Markets.
All right guys, weβre gonna wrap it up now. Thanks so much for coming. This was fun. See you guys on the next β see you guys, maybe in a week. Hopefully in a week. All right. To be continued.
Doom Debatesβ Mission is to raise mainstream awareness of imminent extinction from AGI and build the social infrastructure for high-quality debate.
Support the mission by subscribing to my Substack at DoomDebates.com and to youtube.com/@DoomDebates, or to really take things to the next level: Donate π











