Multiple live callers join this month's Q&A as we cover the imminent demise of programming as a profession, the Anthropic/Pentagon showdown, and debate the finer details of wireheading.
I clarify my recent AI doom belief updates, and then the man behind Roko's Basilisk crashes the stream to argue I haven't updated nearly far enough!
Streamed on Feb. 27, 2026.
Timestamps
00:00:00 — Cold Open
00:00:56 — Welcome to the Livestream & Taking Questions from Chat
00:12:44 — Anonymous Caller Asks If Rationalists Should Prioritize Attention-Grabbing Protests
00:18:30 — The Good Case Scenario
00:26:00 — Hugh Chungus Joins the Stream
00:30:54 — Producer Ori, Liron's Recent Alignment Updates
00:43:47 — We're In an Era of Centaurs
00:47:40 — Noah Smith's Updates on AGI and Alignment
00:48:44 — Co Co Chats Cybersecurity
00:57:32 — The Attacker's Advantage in Offense/Defense Balance
01:02:55 — Anthropic vs The Pentagon
01:06:20 — "We're Getting Frog Boiled"
01:11:06 — Stoner AI & Debating the Finer Points of Wireheading
01:25:00 — A Caller Backs the Penrose Argument
01:34:01 — Greyson Dials In
01:40:21 — Surprise Guest Joins & Says Alignment Isn't a Problem
02:05:15 — More Q&A with Chat
02:14:26 — Closing Thoughts
Links
Liron on X — https://x.com/liron
AI 2027 — https://ai-2027.com/
Doom Debates: "Roger Penrose is WRONG about Gödel's Theorem and AI Consciousness" —
Warning Shots: The Pentagon vs Anthropic —
“Good Luck, Have Fun, Don’t Die” (film) — https://www.imdb.com/title/tt38301748/
“The AI Doc” (film) — https://www.focusfeatures.com/the-ai-doc-or-how-i-became-an-apocaloptimist
Transcript
Cold Open
Liron Shapira 00:00:00
It’s showtime.
Greyson 00:00:02
Hi there, longtime viewer, first-time caller.
Producer Ori 00:00:07
Am I live now?
Anonymous 00:00:09
Sorry for this disguise. I work in tech, and I don’t want people knowing I’m a doomer.
Liron 00:00:10
Okay, so you’re not sick?
Anonymous 00:00:15
No, I’m not sick.
Greyson 00:00:20
What I’d like to focus on is drafting a really strong and convincing experience of takeoff.
Ori 00:00:24
We’re totally in the centaur world, right?
Anonymous 00:00:26
I was just wondering if rationalists are temperamentally not suited to political action. Maybe you need to do chaining yourself up to the side of a tree like the environmentalists do.
Koko 00:00:37
Why would it not be able to protect against these hacks if it’s that capable?
Liron 00:00:42
All right, we got a well-known guest. We got a heavy hitter in the waiting room. Everybody say hello to—
Welcome to the Livestream, Q&A w/Chat
Liron 00:00:56
It’s showtime.
Liron 00:00:59
Hey, everybody. Welcome to the Doom Debates Q&A live stream coming at you live from Saratoga Springs, New York. I’ll just wait for a few people to file in.
All right, first seven people. Welcome, welcome. Feel free to type in the chat. And I’m gonna go say hi to the audience on X. You guys are watching this from X. Thank you. That’s a growing platform for this show.
Liron 00:01:25
All right, if you’re waiting for the Q&A to start, one really important thing you could do is if you go to my profile, x.com/liron, you can see my latest McDonald’s food review. I just tried the Big Arch yesterday — filled up more than half my day’s calorie allocation, McDonald’s Big Arch. And my overall verdict on the McDonald’s Big Arch is it’s a quality sandwich. I think everybody should try the McDonald’s Big Arch once in their life because it’s the best sandwich from the most successful cooked food company in the world.
All right, so everything looks to be working live on YouTube, live on X. So the way we’re gonna do this, the same thing we do every month. Feel free to type any questions or comments in the chat.
Liron 00:02:18
And also I’m gonna open up people who wanna come on the live stream. If you’re an early bird and you wanna come participate in the live stream, I’m gonna send you a join link.
So here in the YouTube chat, anybody can click this join link, and you’ll get in the waiting room. First come, first served. But as usual, strip club rules apply. So if anybody uses the YouTube feature to toss in some cash, then you’ll get to the front of the queue to say your question or contribute or have me read out a statement or whatever you wanna do.
All right, we gotta keep the lights on here. The rent on this studio is pretty expensive, so everybody go to YouTube and click the button to send us money. Much appreciated.
Let’s see what we got here. If you guys are wondering what the latest news is, we’ve got Warning Shots dropping on Sunday — me, John Sherman, and Michael — and we’re covering one of the craziest weeks in a while. Claude Code, everything getting really agentic. Obviously, the Anthropic situation. It’s definitely one of the spiciest Warning Shots episodes. Taped it out earlier today.
Liron 00:03:11
Okay, let’s see. So Pond Master is asking, “How’s your Friday going?” It’s going good. Getting in the swing of things. The studio’s mostly up and running, and I’m also using Claude Code in my day job. It’s all going reasonably well.
Let’s see what else you guys are saying. Yeah, Producer Ori is saying, “Fancy marble desk.” That’s right. It’s marble, everybody. It looks good. It looks professional.
Sammy Qureshi saying, “I am marching in London tomorrow. King’s Cross. Please shout out.” Okay, if you’re in London, you gotta get in on this.
Somebody saying, “Are the Great Lakes on the map really as correct as they can be given the resolution?” Good question. I’m gonna say probably. Maybe not.
Somebody’s saying, “I hope the next Warning Shots is still timely with whatever happens with Pete Hegseth’s ultimatum.” Yeah, for sure.
Somebody saying, “Debate with a worthy successor guy, and what’s your take on this movement?” I think you’re talking about Daniel Faggella. He’s a smart guy. Worthy successor — I guess the idea is that it’s not gonna literally be humans dominating the universe in a hundred years or a thousand years, so who will it be? Which successor will be the worthy one?
I haven’t looked into the movement in a ton of detail. I am a transhumanist, so I definitely agree that literally humans that are unrecognizable from the humans of today seems unlikely. It seems like we will be able to rebuild our exterior from scratch. So I do think some kind of successor is gonna happen. And should it be worthy? Sure. So in principle, I think worthy successor is a similar idea to transhumanism, which I’m for, but I think it’s really easy to screw up the details, so it’s nice to always have a backup. It’s nice to always be able to hit the eject and be like, “No, I wanna be an ape again. I wanna be biological again. I screwed things up too bad. I wanna undo.” Liron 00:05:52
All right, we got a question here from Ron Brun. He’s saying, “So did you update your opinion on AI 2027 in any way? Is the prediction so far more on track than not?” Yeah, great question. And actually, I asked a very similar question to fan favorite guest of the show, Steven Burns. You guys remember Steven was on the show about six months ago. I brought him back.
It’s the same question. I was like, “Hey, what do you think? Do you think AI 2027 basically nailed it?” Steven wasn’t tracking things too closely ‘cause he’s more thinking about longer term theoretical research — he’s kinda thinking all the way toward recursive self-improvement, the endgame.
But from my perspective, yeah, it’s totally going on track. I mean, what else could you expect? If you remember what it said would happen in late 2025, early 2026, it was like, yeah, there’s gonna be agents that don’t fully work, but then they’ll start working better. And I feel like that is what happened in a nutshell. The agents only kinda worked, and you had to keep kicking their butt so they would — you keep slapping them into shape, and now you don’t have to slap them as much.
And I’m gonna put a prediction on the record that in a couple of months, you’re gonna have to slap the agents less. They’re gonna have a longer runtime. They’re gonna do bigger pieces of the job, which is insane.
If you haven’t heard me say this yet, I’ve personally been using Claude Code, and I’ve been blown away, and I’m not the only one. We covered this on Warning Shots. Andrej Karpathy was like, “Yep, guys, I’m calling it. Programming is totally different. It’s nothing like it’s ever been before.”
As a lifelong programmer, I’ve been programming computers since I was nine years old. I’ve been programming computers my whole professional career. I don’t really think that I’m a software engineer at this point because I don’t really write the code. It’s more like I’m somebody who knows what a software engineer is capable of so that I can just tell an AI to go be a software engineer. That’s what I am at this point. So that’s pretty crazy.
And I know it happened to the writers two years ago where they’re like, “Oh, I’m not really a writer anymore if I’m just telling the AI what to write.” Michael Ellsberg came on the program and said that. And now I’m saying it about software engineers. So the ability to extrapolate is pretty powerful. If you can just extrapolate, I’m going to guess that other people are going to be out of a job. I bet that the job of an entrepreneur to start a business from scratch — I bet that’s going to be automatable pretty soon. Liron 00:08:09
Take the next question here. Okay. Oh, whoa, we got a ten British pound donation here. All right. Thank you, NovaOmega4. So Nova is saying, “Most people engage and are more easily persuaded if you appeal to their emotions. Have you considered that approach? And where are the Doom Dog Debates plushy merch?”
All right. Great questions. Let’s start with the important question about the dog, Doom Dog. Let’s say hi to Doom Dog. What’s Doom Dog up to right now? Oh, hey, Doom Dog is currently deciding whether to pull the lever to pause the doom train. Doom train’s heading into the fire. Doom Dog, pull the lever. Yeah, I haven’t gotten the plushy in the store yet. Stay tuned for that.
All right, so the other question is have I considered appealing to people’s emotions? Well, I don’t know. When I get on the show, I usually just — I guess I do speak logically. I’m just like, “Hey, you wanna see the asteroid? Okay, here’s a telescope. Take a look.” That’s what I see myself as doing, where the asteroid is — yep, extrapolate the last couple of years. It just extrapolates to AI taking over and disempowering humanity. That’s generally my MO.
I’m not really an emotional guy. I’m not really a guy who makes emotional appeals. It’s just not really my thing. And there’s a lot of people who I think are in a better position to play the whole emotional appeal card. I think playing to my strengths is more of just being very clear — this is the threat I see. I don’t think other people are giving the threat its proper due. That’s basically my role.
Yeah, Punmaster is saying, “Reminds me of Liron’s emotional condom comment in an old Doom Debate.” Yeah, if you look at my personality, I think that things tend to emotionally hit me less than average. I use the analogy of an emotional condom. I think it’s related. It’s on the Asperger’s spectrum, however you wanna diagnose it. I don’t wanna cause any offense to people who have debilitating autism spectrum disorder because I’m obviously high-functioning. So I don’t wanna lump myself in with severely disabled people, but I think that’s how I would describe myself — an Asperger personality type.
Liron 00:10:19
All right. Somebody’s saying, “Why do you think LLMs will achieve true agency? The world’s top mathematicians, such as Terence Tao, say it is mathematically impossible for LLMs to be self-directed.” I don’t think Terence Tao said that. I’ve listened to Terence Tao a little bit recently. Obviously a smart guy. I don’t know why he’s not focusing more on AI doom. I don’t think that he said that it’s impossible for LLMs to be self-directed.
Okay. Josh Thor is saying, “Why is your P(Doom) so low? Yudkowsky seems much higher, and you consider yourself a Yudkowskian. Daniel Kokotajlo is at seventy percent.”
Liron 00:10:56
I don’t know. I just think that fifty versus seventy — there’s just so many things that can happen. I don’t really object to somebody who has a seventy percent P(Doom) or even a thirty percent P(Doom). I’m somebody who deals with order of magnitude. That’s just my MO. I look at something, I analyze it a bit, and then I’m like, “Okay, this is roughly in this order of magnitude. It’s roughly between ten and ninety.” I’m just not passionate about sifting out fifty versus seventy.
The way I run my life is sometimes I notice that certain wins are big. So sometimes I’m like, “Oh, if I go out of my way to do this task, then I could save ten thousand dollars.” Okay, sure. But then if it’s like, “Oh, I go out of my way and I save fifty dollars,” I’m like, “No, screw that.” Because by the time I’m paying attention to it, it’s not really gonna be worth it.
That’s generally how I think. I just try to focus on a few big wins, and I don’t think nailing down whether the probability is fifty or seventy is going to be a big win in terms of how hard it is to do.
All right. Let’s see what else we got here. Punmaster is saying, “My P(Doom) is zero, but I still wanna buy a Doom Dog.” Yeah, fair enough. Doom Dog in the merch store. I hear you.
Somebody saying — oh, New Place to Frown is saying, “Hey, Liron, it’s Hugh Chungus from the Destiny Discord. Sorry for missing the convo and apologies for the event not being set up. I gave back the details and some personal stuff.” Yeah, no worries. Hey, if you guys want me back again, I’m happy to do it. He’s a smart guy. He’s got a smart audience.
Anonymous Caller Asks If Rationalists Should Prioritize Attention-Grabbing Protests
Liron 00:12:44
All right, let’s see. We got somebody in the arena who wants to go head-to-head or just ask a question. All right, let’s say hi to Tom. Hey, Tom. Welcome.
Anonymous 00:12:44
Hi. Can you hear me?
Liron 00:12:44
You got a question? Yes.
Anonymous 00:12:46
Yeah. My question is — sorry for the disguise, just because I work in tech and I don’t want people knowing I’m a doomer. I was just wondering if the kind of people who pay attention to AI doom—
Liron 00:12:57
Oh, okay, so you’re not sick.
Anonymous 00:13:04
No, I’m not sick. Sorry. Sorry for the delay, by the way. There’s a little delay in the voice or something.
Liron 00:13:10
Yeah. No, it’s all good. We can hear you.
Anonymous 00:13:12
Okay. I was just wondering if rationalists and the kind of people who pay attention to AI doom are temperamentally not suited to effective political action because they kind of see protesting as being low status or undignified or not intellectual. But maybe you need to do that kind of thing — chaining yourself up to the side of a tree like the environmentalists do — in order to show that your conviction is strong enough, which persuades the masses or something. Anyway, that’s the question.
Liron 00:13:56
Yeah. It’s an interesting question. It comes up pretty regularly. Have you seen the latest thing that’s going down? I think Michael Trozzi and Guido Reichstadter — they’re fasting for, I don’t even know, a week while marching seventy miles or something. Have you seen that?
Anonymous 00:14:13
No, I’ve not, I’m afraid.
Liron 00:14:16
Yeah. That’s pretty cool. And before that they did a hunger strike. I think Guido went thirty days, which is pretty impressive. You gotta hand it to him. That guy really is thinking along your same lines. He’s almost chaining himself to a tree. He’s doing similar kinds of stunts.
And when this first happened on social media — 2023, when a lot of the doomers like myself, we saw ChatGPT, and we started getting all doomy all over Twitter, and everybody’s like, “Oh, come on, you guys don’t believe that, right? Because if you did believe it, wouldn’t you chain yourself to a tree, or wouldn’t you go commit arson in a bunch of places, or wouldn’t you start going on a murdering rampage?” And I was like, “Would you really do that? Think about it for two seconds. Does that really make sense?”
So thinking that we’re going to be doomed soon just doesn’t suddenly make you go that crazy. I don’t know. But if you have a specific suggestion, happy to hear it.
Anonymous 00:15:07
I don’t know. It just seems like — Christianity, they had martyrs. They had people willing to die for their beliefs, and maybe that’s what you need for a big movement. That just occurred to me. I don’t have any sort of specific suggestions, but I’m just wondering if it’s something you’ve thought about before.
Liron 00:15:24
Yeah. No, it’s a good point. I think a lot of us are willing to die for our beliefs, but we just don’t wanna die pointlessly. So if it was like, “Hey, all you have to do is a hundred people from the doomer community just have to shoot themselves, and then all the AI companies will stop building AI,” I’m sure we’ll find a hundred volunteers. I would even consider volunteering.
Anonymous 00:15:43
Wow. Well, yeah, anyway, I didn’t have anything else to debate about. It was more like a question and answer than a debate topic.
Liron 00:15:52
Yeah. Totally.
Anonymous 00:15:52
I’ve got some delay problems anyway, so you should probably take— Liron 00:15:55
All right. Yeah. Thanks for coming by. Very interesting question. And this also gets to another discussion. Oftentimes, people say things like, “Man, if everybody knew how much damage these AI companies were doing, they would come torch the place.” And then other people will rise up and be like, “Look at these doomers advocating for violence.”
And it’s like, look, there’s a reason violence is just not done much in society besides by criminals who can’t think more than two seconds into the future. Violence — when you have a society that has law enforcement and power spread to a bunch of different people, one group being violent generally just gets thrown in jail or put down. So this whole idea of if only we would just be violent, then that would truly show what we mean — maybe.
I mean, it kinda worked for Martin Luther King Jr., right? Because he would make all these spectacles, and they would get on TV. But the mechanism of action of the civil rights movement, the nonviolent protests back then — I think the mechanism was that people would watch it, and they would feel guilty. So it’s specifically because he created a spectacle that made people feel guilty, like, “Man, do we really wanna keep doing that to these Black people? Keep treating them so badly, so brutally just because they wanna come eat lunch at the same cafeteria?”
I think that’s what helped him make his mark. Somebody can correct me if I’m not understanding this right, but his particular type of civil disobedience is like, “Yeah, we’ll get thrown in jail a bunch” — and that’s the thing, they were nonviolent. So I actually do support nonviolent protests. I like what Guido’s doing. He’s been totally nonviolent with all of his hunger strikes and protests in front of AI companies. So I do support that.
But then the idea of “well, what if we get violent?” — it’s just like, why? What is that going to accomplish? The violence doesn’t convince people, and then it just turns people away. I think that’s what Tom was saying. He’s like, “When people see you being violent and lashing out, then they’ll think you’re serious, and then they’ll be convinced that they should take your movement seriously.” But I’m just not really following that logic.
As somebody who normally doesn’t go for recoil exaggeration — I normally don’t say, “Oh, that’s gonna backfire.” But I just don’t think being violent directly accomplishes anything. You burn down one data center. Okay, yeah, you’ve made one little dent, but I just think that dent will wash away very quickly.
The Good Mainline Scenario
Liron 00:18:30
All right, let’s see. GraveFable25 is saying, “What would you say to someone who thinks the current LLM craze is a dead end and actually slowing down advancement toward actually dangerous AI, thus having a positive effect on our P(Doom) chances?”
Yeah, I wouldn’t go that far. I think there’s something to be said that if people work on LLMs instead of the next generation, that is possible that there’s some slowdown there if the next generation is the more dangerous part. But I actually think that when people get really excited about LLMs, it makes people go all over the place. Some people are like, “Yeah, I’ll go optimize the spread of LLM tools,” but then other people are like, “I wanna make what’s coming after LLMs,” or “I wanna make better LLMs,” and they throw a bunch of stuff against the wall. So putting a spotlight on LLMs puts a spotlight on the whole field.
I think there’s only a minor effect where it’s cannibalizing efforts to build non-LLMs, and then there’s the piece where I actually think LLMs might be a big piece of what’s dangerous anyway. This is actually an update I’ve made. I talked about this with Steven Burns in the episode that’s coming out in about ten days.
The agents that are based with LLM at their core — they’re not fundamentally RL-based, they’re LLM cores and then maybe they do RL or an extra layer of RL at the end, but it’s mostly the LLMs — those agents have surprised me in how far they can go. I’m not super surprised that Claude Code is as good as it is, but I thought maybe this would take a paradigm shift before we would get there. So I’m kinda surprised where I don’t feel there’s been a huge paradigm shift, but I feel like the agents are getting super robust.
I didn’t think they could necessarily pull that off, and yet here we are. So this idea of “people can just work on LLMs and we won’t be doomed” — that may be true for the ultimate doom of the AI being so much more intelligent than humanity. But there’s also this other doom of, okay, so everybody else just has what Dario calls “geniuses in a data center.” And yeah, the geniuses will respond to commands, so they’re aligned to their human master.
Let’s say best case scenario, I think that’s plausible — they’ll be aligned to their human master, they’ll just be LLM-style geniuses in a data center, which is what Dario thinks is really likely to happen. I actually do think that’s a perfectly plausible near-term outcome. But now the problem is, okay, so now we have a bunch of governments and a bunch of people instructing a bunch of geniuses what to do, and the army of geniuses can still overpower entire nations, even the whole world. So even that doom scenario is pretty scary.
Although, to be fair, I talked about this with Steven Burns. I do think that I’ve been updating where I’m giving a few percent more probability to a scenario where it doesn’t go completely wild and crazy relative to humans. It doesn’t foom immediately, and maybe we have ten whole years when we’re only dealing with a genius in a data center before the next generation of insanely sky-high AI swoops in and renders everything moot.
Maybe we have these ten years that are crazy but also kinda good because everybody’s rich, there’s so many goods and services being produced so efficiently, there’s universal basic income, and this just all happens in ten years before the real AI comes in. And maybe the real AI never comes ‘cause we got lucky. Maybe the loop of using feedback from the world and doing reinforcement learning in a way that’s not an LLM — maybe that’s delayed for decades.
So I guess this would be my mainline good scenario of not pausing AI and still somehow surviving — that this token-based “geniuses in a data center” AI somehow muddles through. That’s my current good mainline scenario. Feel free to improve it.
But when I ran this by Steven Burns, he was just like, “Eh, I’m pretty sure that other AI is coming,” because he thinks that something the human brain is doing is already the kernel of something an AI can do better that it’s not doing today.
I mean, look, it’s nice to have a richer world model. It’s nice to have agents that aren’t taking over the world, that are responding to your commands. And specifically — you know how the doomers are always pointing out that if you reinforcement learn an agent just to bring you coffee, bringing the coffee might cause ending the world because it wants to make sure nobody can stop it from bringing the coffee?
I think we now have a class of agents that we can use to bring the coffee and do agentic tasks — building my software — and we know that class of agents won’t end the world because we know that class of agents has this core that’s just trying to predict the next token. And yes, it’s also been consequentialist-optimized above that, it’s been trained to get outcomes, but it’s still not hardcore enough that it’ll run away. So that is an update. There’s just more stuff we can do before we get to the endgame. Liron 00:23:03
All right, I’m reading your questions here. Somebody’s saying, “You should make a video on solutions against the doomer outcome. Pretty much everyone agrees, but no one has a robust solution other than slowing down or halting progress.”
Yeah, I mean, solutions are hard. It’s part of the problem. So I see fear-mongering or raising the temperature as a big part of the solution — helping other people see. ‘Cause when you say everybody’s concerned, I think you’re probably thinking about an in-group or a circle where, yes, Americans as average or world citizens are somewhat concerned overall, but they don’t see it as super urgent and needing to be paused. So that’s the first step. If everybody’s plucking the low-hanging fruit of agitating for strict regulation immediately, and then you’re telling me there’s nothing else we can do, we’ll go from there.
Okay. Somebody’s saying, “Hey, Liron, I’m curious, what’s your most granular picture of current AI capabilities, low-hanging fruit for new capabilities and current bottlenecks, and how that informs your timelines?”
Yeah, so the overall picture of current capabilities — it’s basically AI 2027. There’s all the capabilities that you know and love from last year, except now agents work, and also in the meantime, everything’s getting better. The chatbots hallucinate less. They search the web better. They pull together better answers for you. They’re more handy with multimodal and documents. Just everything is getting really polished, and really powerful, and hallucinating very little, really fast. And also the agents are starting to work, and the time horizon is increasing.
I must say I’m impressed by METR. When they first did a time horizon graph, I was like, “Hmm, is that really a good metric to capture the essence of AI progress?” And I think it’s been a pretty good metric. Just — how long can you stay coherent and get stuff done? I think that’s worked pretty well. And when the METR graph saturated, sure enough, I just became unemployed as a software engineer. I mean, luckily I run a company called Relationship Heroes, so I’m at the top, so the only way to replace me is to just compete with me as another company, which I think will happen. But at least I have another year or whatever.
So I’m in the position to benefit. Everybody who’s at the top of a company, maybe the company was five hundred people, and it’s gonna go down to five people, but at least those five people will have fatter profit margins for a while. That seems to be what’s happening economically.
But yeah, the question is low-hanging fruit or new capabilities and current bottlenecks. We keep getting incremental releases. NanoBanana 2 I think just came out. We’re gonna be trying that for making thumbnails. And how does that inform my timeline? I think there’s this scenario where nobody cracks the next paradigm of AI, and yet agents just get as powerful as geniuses in a data center. That is my new update. But I still think literally everything in AI 2027 is super plausible. I think AI 2027 is an amazing piece of work that’s holding up really well.
Hugh Chungus Joins the Stream
Liron 00:26:00
All right, we got Hugh Chungus joining the stream. Hugh, how’s it going?
Hugh Chungus 00:26:06
How’s my mic?
Liron 00:26:08
I can hear your voice clearly, but there’s a lot of background noise. It’s not terrible. All right, let’s try.
Hugh 00:26:13
Okay. Yeah, how’s it going? I follow the show a lot. There’s a lot of discussions that happen in the Destiny Discord, in the Philosophy channel. I don’t have very structured points to go through, but I think it would be good to talk about some of the typical arguments that people run down.
Hugh 00:26:41
So you usually make a distinction between “will they” and “can they.”
Liron 00:26:50
Mm-hmm.
Hugh 00:26:50
When people just encounter the doom debate, those things are usually always completely tied to each other, and you have to do a lot of work to separate them. I was just wondering if you had a strong example that you could use normally to say there’s a good case for why they will — a really strong straight path for why they actually will choose to do that given their capabilities.
Liron 00:27:27
Yeah, good question. So there’s a few different stops on the doom train on the “will they” side. Let’s assume that they can kill everybody, which I think is a pretty strong assumption. It’s a pretty strong claim. They can kill everybody, and then it’s “will they” kill everybody.
And there’s a lot of reasons why people think that you can’t. So some of the famous “will they” stops — you’ve got instrumental convergence. The idea that no matter what else they’re trying to do with the world, it kind of involves killing everybody. Do you wanna go to space? Okay. Do you wanna harness the earth’s resources the best? Okay. So now the earth needs to be really hot, and the human cities need to be data center land or whatever.
So scarcity, instrumental convergence — those are some of the “will they” reasons. And another “will they” reason is the orthogonality thesis, which is more like a counterargument, but it’s just — they’re not going to be stopped by this idea of, “Oh my God, the humans are dying.” It’s like, so what? That’s not going to stop them.
Hugh 00:28:19
Yeah. And people aren’t, I guess, familiar with a lot of those concepts generally speaking. I’m quite sold. I share most of your views. I think I’m around fifty percent by 2050 for a doom scenario.
Liron 00:28:39
Okay.
Hugh 00:28:39
But I do find Robin Hanson’s disagreements with you quite compelling in that humans already are super intelligent in some sense when they get together and organize, build institutions, regulate each other. And I think a lot of the time when we’re discussing “will they,” we’re talking about their capabilities above an individual human, when really we should probably have a slightly higher threshold — like, can this bot out-navigate Microsoft, or plan better than Walmart? Things like that.
Liron 00:29:17
Yep, fair point. All right, I’ll give you your answer off the air.
Hugh 00:29:19
Okay, thank you. Liron 00:29:20
All right, thanks for calling in. So this idea of when we talk about how powerful AI is, it’s common for people to be like, “Okay, but we just have these large corporations, so you’re just talking about another super intelligent entity just like Walmart.” Walmart is super intelligent. It’s more capable than any individual human.
And then usually the conversation goes to, “Okay, yeah, Walmart is somewhat superhuman, but you can be significantly more superhuman than Walmart.” Because Walmart, as efficient as it is, at the end of the day, it doesn’t take that many smart humans to outmaneuver Walmart in any particular thing. We’ve seen it where you have startups — small startups that come in and disrupt large organizations. That’s some evidence.
These big companies, as smart as they are, they can’t successfully defend against disruption. Their operating model just doesn’t lend itself well in terms of how to manage humans. But that’s the funny thing — there’s no reason why all these companies can’t just run massive startup incubators internally and handle disrupting themselves. And yet it’s much smarter for them to just wait for somebody else to be on track to disrupting them and then buy them if they can, or else just get out-competed and die. That happens much more often than them totally disrupting themselves.
And the main reason is just the organizational management of humans. It’s hard to work for a big company and still be properly incentivized to disrupt the big company. It’s hard for management who cares about their own bonuses and their own performance metrics to be like, “Oh yeah, I disrupted the company while I was also working on this other project.” It’s just tough as a human management problem.
Producer Ori, Instrumental Convergence, and Liron’s Alignment Scenario Update
Liron 00:30:54
All right, we got producer Ori in the waiting room. Hey, Ori. How’s it going?
Ori 00:31:00
Good, good. Am I live now?
Liron 00:31:04
Ori is live. All right. Another audience favorite, producer Ori.
Ori 00:31:10
Hey, everyone. It’s so weird. We’re live, but nothing that much changed, but we’re now live to however many people are watching.
Liron 00:31:22
I know. There are dozens. Dozens there are.
Ori 00:31:25
Dozens of us. Yeah, it’s nice. Well, the stream — I like how the stream is going. I’m here live from the Doom Debates editing studio. You’re in the recording studio. This is the editing studio.
Liron 00:31:40
Yep, production studio, editing studio.
Ori 00:31:45
Yeah. So what’s new with you this week? Oh, man. I mean, we’ve just been working on a bunch of different episodes. I liked how the Discord debate went on the Destiny Discord. And it’s cool that Hugh Chungus was just here because Hugh was the one who set that up.
Yeah, I really liked that one because I feel like that harkens to the roots of Doom Debates, and it reminds me of why — that’s how I even learned about AI doom, was literally hanging out with you, and you’re like, “Guys, this is ridiculous.” That Discord debate was basically a virtual version of what it was like hanging out with you post-ChatGPT.
Liron 00:32:37
Yeah, totally. It’s true, and I think we’re gonna lean into that more just because, you know, we did this episode, and a few months ago when you were actually here, being the Doom Debates robot, we went out on the street and debated people live. Liron 00:32:51
And whenever we do those episodes, we do get a good amount of reactions being like, “Oh, this is my favorite content. I love seeing what these random people say.” I guess the difference is that people come on the show and they’ve already rehearsed their arguments because they’re kinda upper tier participants in the debate. They’ve got a reputation. They got something to lose, and they’re like, “Oh yeah, I’m not gonna be so dumb as to say this.”
But then other people who haven’t thought about it that much but still have an opinion anyway will just walk straight into all the doom train traps that we have opening arguments for. And they’re like, “No, it doesn’t have emotions.” And I’m like, “Okay, so?”
So people like it, and I’m going to be leaning into it just to see if it helps grow the channel. Because it’s not like debating people on the street is my terminal goal. I think the terminal goal for the show is more being part of the discourse of people who actually make decisions — people who make the decisions of policy on regulating AI companies. Have those people come out and debate. Have the AI companies themselves come out and debate.
I feel like that is our highest point of leverage — to facilitate people accounting for themselves and their views, and just saying what their views are, because there’s a bunch of different views even at the very top. That’s the biggest service this show can provide. But in the meantime, to also just raise awareness and stir the pot and make people realize that this is urgent, that normal people’s arguments don’t hold up at all — I think we can do that along the way.
Ori 00:34:15
Yeah. What I find interesting about it — I’ve been steeped in the debate for so long — but what I find interesting is they come at you with an argument, and I immediately think, “How would I respond?” And then we get to see how you respond. Because I think—
Liron 00:34:30
Right.
Ori 00:34:30
You’re surgical. I would’ve gotten derailed. I think one of the first arguments of the guy was, “LLMs not AI, bro.” And you just go, “Forget that.” Fine. You don’t wanna call it AI — you call it an LLM, let’s just call it X. Yep.
Ori 00:34:51
So yeah, we can learn from how you debate other people, and it’s just entertaining too.
Liron 00:35:00
Yeah, that is true. It’s like we all have this belief, and we’re all trying to play this video game of how do you steer people to the outcome of seeing what we see, and then people say different things. And you can play along. Liron 00:35:14
Let’s see. Okay, we’ve got a question here. $19.99 donation. Thanks, EJJ 2025.
Yeah, so he says, “On instrumental convergence, if — I think you’d pursue efficient low-uncertainty subgoals, a little power seeking, not extreme power seeking. Example: to cross the road, buy a construction vest to lower your probability of getting run over, not take over the world to ban cars.”
Right. It is true that when you look at the Claude Codes of today, it does seem like they search for actions that are more reasonable. Because when you tell them to do a bunch of stuff — we talk about instrumental convergence being like grab somebody’s money or whatever, but just having a sheet of paper and writing notes in it is an instrumentally convergent action. I’m using memory. You don’t have to go steal all the memory in the world just to use some memory to accomplish what you’re trying to do.
So I think that’s what EJJ is saying — we’re seeing small-scale instrumental convergence. But the doomer argument is instrumental convergence is going to happen at a large scale because AIs will have large goals. Maybe what EJJ is saying is, well, if you have a small goal, then the amount of power and resources that you need to grab to achieve it might by default be small.
And it might be true, but it also depends how much you wanna crank up the probability of winning. Because even when you’re just trying to make some coffee, even when you’re just trying to write a little piece of code, there’s the defense aspect. There’s always that problem that something will come from the outside world and derail you — that somebody will walk into my room right now and shut off the stream or whatever. And if I wanna maximize my probability of getting through my small actions, I do have to start thinking very defensively. So it starts getting a little tricky how to tell the AI not to go hardcore.
Liron 00:37:05
All right, EJJ is saying, “I see subgoals can conflict. Aggressive resource/power seeking can create adversarial dynamics that reduce survivability, so capable agents may avoid extreme growth because it’s self-defeating.”
Sure, yeah. At that point, I don’t even think we’re talking about instrumental convergence anymore. I think now we’re talking about multi-superpower dynamics, which I see as kind of another part of the analysis. I don’t see it as super related, but I agree that if you already assume that there’s a lot of different superpowers out there, then doing anything to create conflict is an important consideration.
But I think the instrumental convergence argument — the clearest case is when there’s not some other superpower who’s going to slap you down. It’s just you and all the weaker humans. And why do you have this drive to just seize so much power and resources? That’s the original instrumental convergence argument.
Ori 00:37:58
Yeah. I mean, but I’m curious how instrumental convergence plays out, how we’re seeing it playing out with current AI. Is it happening now or no?
Liron 00:38:07
Yeah. I would say it’s not happening that much, honestly. You gotta take the L. I don’t think it’s a big L, but I’m happy to say — I mean, do I think it would be a big I-told-you-so moment if today’s LLMs were showing a ton of instrumental convergence? I’m not sure. Because when you look at what I’m asking the agent to do — the Claude Codes of the world — I’m just using that as an example because I have personal experience with it, and I think it’s amazing.
I think it’s doing really good, pretty long duration tasks. Certainly, it’s doing ten-minute tasks for me personally, and other people are reporting it’s doing thirty-minute tasks for them, and its ten-minute task is literally equivalent to my three-hour task. It’s truly insane. I can’t believe this is happening. And then I’m just like, “Yeah, it looks good.” And if it doesn’t look good, I give it two sentences max of feedback, and then it just does the equivalent of me working for another hour.
But yeah. So I’m using that as an example, and what do we make of the fact that I tell Claude Code, “Okay, I wanna make this feature and maybe install this library, figure it out,” and then it figures it out, and in the process of doing so, it doesn’t hack my whole computer to turn off all the other programs or whatever to seize power of my computer?
On one hand, it’s the shape of the problem, where it found a path to success that didn’t involve creating a lot of chaos. But it didn’t maximize its probability of success because it didn’t think defensively. It didn’t proactively think, “Okay, how do I remove all obstacles?” It was just like, “Yeah, this will probably work, and if it doesn’t, that’s okay.”
I think as Steven Burns would say, it just comes down to the nature of today’s AIs because they weren’t born out of this reinforcement learning optimization loop. They have this engine under the hood that’s just — what is a good stream of tokens? What would a human think right now? That’s the way they were born. They make this stream of thinking tokens, where it’s like, if a human were thinking about this problem, what tokens would it output?
And fortunately, humans don’t sit there thinking, “I gotta take over the world right now.” Humans just think about the problem directly. So the AI just kinda follows in the footsteps of human-style thought process. It doesn’t expand the problem out to be like, “How do I ensure success? How do I dominate this problem in a very broad sense?”
And so it’s working great. And I don’t think that the future failure mode means that AIs will just randomly — I think we gotta abandon this idea of the AI that’s trying to fetch your coffee will take over the world. I think that was just — we have to take that in the theoretical perspective of, yes, technically, you can add another point-nine percent probability if you take over the world. But I think we gotta abandon the idea that a coffee AI will take over the world.
I think that the AI taking over the world will be closely associated with the AI having bigger-scoped goals. So if you tell the AI to have a Mars program, at that point, okay, maybe it doesn’t have to literally take over the world, but I feel like it has to take over a pretty large fraction of the economy. The same way Elon Musk is personally taking over a larger and larger fraction of the economy.
I mean, if you look at his current, his latest designs for SpaceX, he’s saying, “Yeah, we need to do our own semiconductors. We need to do our own energy.” The Elon Musk companies are on track to become a larger and larger fraction of GDP because it’s in the nature of Elon Musk’s goals that he just needs most of the human economy to do his bidding. Ori 00:41:18
Yeah. That’s a great point. You touched on it earlier, but that concession or that sort of change in thinking that, “Oh, a robot that wants to fetch the coffee can destroy the world” — you’re saying a paperclip maximizer, that thought experiment, that’s pretty unlikely. That seems like a pretty big change.
Liron 00:41:40
I think this change happened a while ago when we first saw that the GPT-style AIs weren’t born out of having the singular goal. I mean, they did have the singular goal, which is predicting the next token, and a lot of people breathed a sigh of relief — “Oh, you just predict the next token, and then you come out, and you’re not a consequentialist. You’re not somebody who just cares about steering toward an outcome. You’re just willing to sit back and chat. You’re just a chill dude who just happens to also have the power to get up and go to the fridge.” And everybody breathes a sigh of relief.
And even Dario is saying in his recent letter, the one about—
Ori 00:42:16
“The Adolescence of Technology.”
Liron 00:42:17
Right, exactly. Even in his recent essay, he’s like, “Yeah, I don’t think AIs necessarily are going to just plow forward to an objective.”
So I use this analogy with Steven Burns. Look, we were studying flight and the doomers proposed Newton’s third law. You know, if you wanna go somewhere in space, you gotta hurl something the other way. Equal and opposite reaction. That’s how rockets work. They hurl something out the back, so they go forward.
And then a bunch of people are like, “No, no, no. Look at this. You can point the engine sideways, and you get the wings up, and then the air smashes into the wings at an angle, and there’s aerodynamic forces, there’s pressure forces, and that’s how you get the plane to go up — you actually point the engine sideways.”
And I’m like, “Okay, I admit that’s a neat trick. It’s working great in the atmosphere. We’re having a really fun time here on Earth, but I’m telling you, the engines are going to point down. When you wanna go up, the engines are going to point down. That’s how it’s going to work pretty soon.” In other words, a rocket.
And Elon Musk has even said, “Yeah, we’re going to use the Starship rocket. If you wanna go point to point on the Earth, we really are just going to hurl the Starship rocket upward in a parabolic trajectory. You’re just going to throw yourself from one part of the globe to the other part of the globe.” So literally on Earth, you’re just going to be a baseball pitcher. You’re just going to pitch yourself across the Earth in a parabolic trajectory.
Ori 00:43:29
Sure.
Liron 00:43:29
So that’s the analogy — we have these AIs. They’re just predicting the next token. They’re just going horizontally and bumping into the air molecules. It’s all good, man. Don’t worry about rockets just pointing to places in space. It’s all good. But no — it’s just this cool temporary situation.
We’re In an Era of Centaurs
Ori 00:43:47
Temporary. I mean, that is so true. Right now we’re in a — what do they call it? Centaur? We’re totally in the centaur world. People are prompting Claude Code, and they’re like, “Wow, this is so amazing.” But you had a great tweet about it. You made a funny joke about how, oh, you’re so useful to Claude Code right now because you can tell it to continue going after each checkpoint.
Liron 00:44:13
Yeah, exactly.
Liron 00:44:15
Yeah, the thing I tweeted about was how when I develop with Claude Code, my workflow is literally — I tell Claude something to do, and then it does it, and then I check it in my browser. Claude Code has — there’s a feature in Chrome where agents can talk to it, it’s a debugging feature built into the Chrome browser, and Claude can use it, and Claude can click around and see what’s happening. But it’s slow for now. And also it crashes sometimes.
So what I end up doing is I just use my browser myself, and I send it screenshots or I take a look myself. I’m like, “Yeah, you did a pretty good job, but just do this and this.” And the funny thing is it’s obvious that in a couple of months it will just look at Chrome, but for now it can’t. So then I tweeted, “As a professional software engineer, I have a very valuable role to play, which is I help Claude see what it just did so it knows if it looks good or not, and then it can do the next code change.” Nature wins versus AI once again.
Ori 00:45:02
Oh, wow. I mean, it’s kind of unbelievable. You just projected forward. And this is something you’ve said — when ChatGPT came out, it’s sort of like, okay, it’s not that hard to put this in a harness, but the engine is there. Now put the chassis around it, and it’s gonna be quite powerful. And you can take that same analogy with the current coding agents. That is a form of engine.
Liron 00:45:28
Right.
Ori 00:45:28
And just put it on a slightly better chassis, and it’s gonna be very, very powerful. So all the stock market changes that are happening — wow, Claude Code can do COBOL, and then the IBM stock goes down. I mean, you should just be betting on Liron’s insights right now.
Liron 00:45:47
Yeah, I mean, I don’t have that many left. That’s the thing. These insights were pretty obvious, and I’m running out fast because they’re all just happening, and there’s not that many left.
I said when I saw GPT — I really like this plane analogy because it’s like, yeah, the engine was there, but it was an engine that would just blow the air over the wing. So the wing wasn’t there. You as a human would just sit there at the computer and ask Claude or ChatGPT the question, get the answer, and then go do something.
But I’m like, listen, the fact that it’s telling you what to do, this is most of the engine. If this can drive the air over your wing like this, we’re just going to have rocket engines soon. Trust me. We’re getting there. We’re just mastering the fundamental principles of lift here. There’s deeper principles at work.
And specifically, how did I know? How did I see this principle? As I explained, it’s because narrowing down the search space of answers to give you is an operation that we weren’t able to do before. We weren’t able to process large natural language queries on arbitrary domains, pulling together knowledge across different domains, reasoning — yes, obviously reasoning — modeling the problem, and then giving you a ranked list of plausible things you could do to solve your problem. In other words, general intelligence.
Many of us witnessed that general intelligence was here, and we saw that there wasn’t a difference in kind to take action. Even the first GPTs could take action a little bit. They could use tools a little bit. So it was already so obvious.
It was obvious that the hard part was finding satisfactory solutions. The essence of creativity is taking this unstructured problem, searching a space that looks exponential, and narrowing down to the answer. And we’re like, look, it’s doing it. It knows how to drive mass. It knows how to take the fuel energy and use it to drive mass — to use the engine analogy.
Noah Smith’s Updates on Superintelligence
Ori 00:47:28
Wow. Okay. Plus one point for Yudkowsky to help with that theoretical insight into what intelligence is.
Liron 00:47:40
Right. Well, credit to Noah Smith. He was — he’s the only one I saw who admitted that he was wrong. So many people were wrong back in 2023, where they’re like — Marc Andreessen is the one who comes to mind, who very confidently was just like, “It’s just math. It’s just saying the next word. It can’t do anything.” Well, now it can. It can do stuff.
But Noah Smith, he’s like, “Yeah, in 2023, I wrote a post saying the only thing these AIs would be able to do is convince people. And yeah, I’m kind of worried that they’re gonna use their words to convince people, but realistically, it’s fine. We’ll just shut them off.” And now he’s coming back and saying, “Okay, guys, they can do more than convince people. They can do a lot of work actually, so I’m more scared now.”
Ori 00:48:17
Right. And yet he still has some sort of wishful thinking about the risk associated with it.
Liron 00:48:26
Right. Yeah. In the recent debate, I thought he’s probably going to notice some instrumental convergence things and hopefully admit he’s wrong again. Or — well, I’d rather admit that I’m wrong if that happens. ‘Cause that would be a better world. The world where I’m wrong is better.
Co Co Chats Cybersecurity
Liron 00:48:44
All right, let’s let in Koko from the waiting room. Hey, Koko.
Koko 00:48:45
Hey, what’s up? It’s Koko, by the way. I don’t know if the full thing—
Liron 00:48:49
Yeah. Hey, guys. First name Ko, last name Kaku.
Koko 00:48:51
Oh, yeah. So I just wanted to ask. I’ve seen some of your points on your YouTube, especially the one against Destiny. And I’m wondering why you never give credence to — so one of the points you make is that these AIs might get so intelligent that they become expert hackers. But I never see you take the other side and say what might happen with cybersecurity.
Why would the AI not be able to be so good at cybersecurity? Sorry if there’s some motorcycles in the back. I don’t know if my mic’s picking that up. Why would it not be able to protect against these hacks if it’s that capable?
Liron 00:49:28
Yeah, totally valid question. So you have all these AIs hacking, but why don’t you have AIs defending? I think offense-defense balance is hard to reason about. And I do think that we may be in a world for a while where so many companies — all the richest companies that have shareholder interest in mind — don’t want a chaos world, and so they do have most of the defense resources.
In other words, the same equilibrium we have today. The good people do tend to wanna work in defense because they don’t wanna go to jail. If you ask, “Why don’t terrorists overrun the world?” — well, ‘cause people don’t wanna go to jail. They don’t wanna ruin their lives, and they know they can probably eventually get caught. And on the other hand, if they’re so smart that they can do all this hacking, they have million-dollar jobs available to them, so why not just take the million-dollar job? So they have the carrot and the stick.
But the problem is with an AI, the AI doesn’t really care. It’s like, “Okay, I’m running cycles. Somebody told me to be a terrorist, and I can copy myself.” So the incentives change when you have AIs that can clone themselves.
But that said, if we’re just in this near-term regime — geniuses in a data center, but you have to pay for the genius, and every genius costs a significant amount of money, and the richer companies can get more geniuses — maybe defense can keep beating offense.
So at the end of the day, I’m not a huge pessimist when it comes to slightly superhuman AIs or when it comes to things where we still have an off button and human companies are still in control. I can see myself being kind of optimistic about that scenario. What I’m really worried about is just the endgame, where eventually the AIs just get way smarter than humans. And yeah, some of them might be good and wanna protect humans, but it’s just all over our head anyway. Things just happen so fast.
We just press a single button and okay, now there’s a bunch of terrorists. Oh, but we press another button, and now there’s a good guy. But it’s just dealing with ultimate chaos. That’s roughly my intuition. We’re gonna be the plants — the slow-moving plants — and now the animals are coming. Koko 00:51:14
Right. Well, my counterpoint — first I’d ask, when you say that the AIs will at some point get way too good and it’ll just be too hard to control, would this be post-takeoff or still in the lead-up to takeoff? ‘Cause I think that matters.
Liron 00:51:30
So there’s Yudkowsky’s law of continued failure. There’s this idea that I think we’re doing so many things wrong that we’re just gonna fail in dumb ways, and we’re gonna fail way before we could fail. But instead of trying to predict exactly when we’ll fail and ruin the world, I’m just predicting that we will ruin the world by the time there’s the serious takeoff.
Koko 00:51:48
Right. Okay. Well, ‘cause my counterpoint to that would be that as we get to this point of takeoff, our technological advancement will improve through these AIs. And I think we’ll just kind of obviously start augmenting ourselves.
If someone had the capability to think faster right now, they would do that, right? If it was an easy procedure, not something you’d have to do a whole annoying surgery for. So why wouldn’t we — I don’t see us being these fragile humans that we are today at that point in time. Do you not see us kind of co-evolving to a point where maybe we’d be able to have some kind of reasonable defense against hacking of that nature?
Liron 00:52:36
Yeah, great question. You’re killing it with these questions. So the idea of, “Hey, we have a few years. Can’t we just become smarter ourselves?” I do think the holy grail of alignment might just look like somebody who’s still truly a human, truly one of us, the good people who live today, but have these much more powerful brains, so we can just go toe-to-toe with the computers.
Because you’re basically describing that the aligned AI just is the augmented human brain. The problem is, if you just look at the timeline, I don’t really see us building an augmented human brain that’s like, “Oh yeah, it’s a whole data center, fifty gigawatt data center, but it’s all growing out of a human brain.” I think we’ll get there eventually, but I don’t think we’re gonna get there in the few years until potentially the big AI takeoff.
Koko 00:53:20
Well, my counterpoint to that would be — we need the data centers to get to that point of intelligence, but then we’re running inference. We don’t need the giant data centers at that point. So we can get the intelligence part without needing a fifty megawatt data center in our brains. We would have access to intelligence.
Liron 00:53:39
Are you just saying that we’ll have more efficient intelligence, so we just need the brain with a little extra energy, but not a whole data center’s worth?
Koko 00:53:46
Yeah. And I had another point. But yeah, I guess that would be my point.
Liron 00:53:54
I mean, it’s still a timeline thing. Yes, brain augmentation is possible, but there’s a million kinks to work out. And one failure mode is, okay, you kind of augment the brain, but then you turn it into a psychopath, so you have to preserve the personality. I mean, we’re talking about a pretty messy—
Koko 00:54:06
A lot of room for it to go wrong.
Liron 00:54:07
Yeah. And I mean, maybe the AI will find it easy. But whatever it is, I just suspect it’s gonna lag at the very least a couple years behind. Because I think we’re so close to just pressing enter on the computer and having the software singularity.
There’s so many ways where software could be — I mean, just think about the transformer itself. You can take out a computer from ten years ago and run one of the lighter-weight models, haiku or whatever, and it’s like, “Oh, a computer from ten years ago could have talked to me. Huh, I didn’t know you had it in you. Oh, you could talk?” By running a current model.
And I think it’s the same thing. Somebody’s going to make some tweaks to the current transformer algorithm, and it’s like, “Oh, hey, look, we just leapfrogged a few years ahead in progress.” And I don’t think augmenting the human brain is anywhere near that kind of low-hanging fruit. Koko 00:54:50
Yeah. So then I’m curious what you’d think about — these malicious things that might happen. The reason I made a point about if it’s pre or post-takeoff is because if it’s post-takeoff, a lot of things become really easily fixable that weren’t pre-takeoff.
‘Cause once we’re at that point, if someone has negative intent, I think it would be a lot easier for someone to fix the negative intent in their own brain than to go and act out some massive malicious attack on other people — if you just gave them the choice. “Hey, quickly fix this thing that is causing you to suffer and seek vengeance on other people, or go out and cause a massive attack.” I feel like it would just be easier if you have AGI at that point post-takeoff to just augment yourself, fix whatever defect you have in your brain that’s making you a psychopath. I don’t think psychopaths enjoy their lives, except for maybe a couple minor—
Liron 00:55:53
Right, right, right. Okay, this is good. You’re really coming at it with these good questions. You kinda skipped ahead. Feel free to email these to me. These have all been great. But yeah, let’s do this as the last one for now.
So you’re basically saying if we had superintelligence, wouldn’t humans kind of converge to having better personalities? Because psychopaths could fix themselves. Is that kinda where you’re going with this?
Koko 00:56:14
Yeah. And assuming that it’s not a good feeling to be a psychopath — it’s something you would want to fix.
Liron 00:56:19
Yeah. So I’m not sure that all the psychopaths feel bad about themselves. That’s the problem.
Koko 00:56:25
Sure, okay. Yeah.
Liron 00:56:27
Right? ‘Cause they might just—
Koko 00:56:28
That’s the nature of being a psychopath.
Liron 00:56:29
— enjoy being a psychopath. “I just don’t care.” I mean, how bad do I feel when I stomp on an ant? I have read some research being like, “Oh, ants might actually feel some pain.” There’s some research, I’m not sure, I wouldn’t say there’s a high probability, but I would say the probability that ants feel pain is at least two percent from what I’ve read so far. I’d say it’s closer to ninety-eight.
Koko 00:56:48
And at least we outnumber the psychopaths at that point, no? By a large amount.
Liron 00:56:55
Yeah. Well, again, if you can pluck one nice human and just augment them first, and then you get this aligned mega-human who’s super intelligent, well, then you have aligned AI, right? It’s just somehow bolted onto a brain.
And I think it’s possible, but it’s just a harder research program, I think, than aligning AIs without a brain. Or — actually, I don’t wanna say it’s a harder research program ‘cause I think aligning AI without a brain is also very, very hard. I just think it’s a slow research program.
Koko 00:57:19
Yeah, okay. Well, I guess at that point it just becomes a question about timelines and who gets access or most effective use of the AI first or whatever. But I won’t take too much more time. Thanks.
The Attacker’s Advantage in Offense/Defense Balance
Liron 00:57:32
Nice. All right. Koko, thanks so much, man. Great questions. Love it. Yeah, this is definitely one of the smartest audiences out there. Well done, people.
And yeah, this idea of the timelines — it’s all just timelines. As Eliezer Yudkowsky says, yeah, if we had a hundred years, maybe we could just line everything up and be ready the same way that people worked thirty years on the James Webb Space Telescope. And then they launched it. And it actually worked. It unfolded, it’s doing all these super sensitive things, and it worked on the first try in deep space. They put it at a Lagrange point. I tried to watch a bunch of YouTube videos understanding what the hell that means, and I kept falling asleep, so I can’t tell you more than what I just said. But it’s extremely cool.
Ori, yeah, what were you gonna add?
Ori 00:58:12
Oh, sure. The attack-defense point — I know that’s what Vitalik really stands for. That’s just never been a very compelling argument to me. Maybe it’s because of what you see in the headlines, but just as an example, remember there was the news, I think this week, that the government of Mexico — someone used AI, someone hacked into the government of Mexico and stole an insane amount of records, and I think it was from AI. Is that right?
Liron 00:58:38
I’m not sure.
Ori 00:58:39
I think it was from that, and I’m not an expert on the philosophy of attack and defense, but there’s going to be an attacker’s advantage. And you’ve always said that the space of cyber, the surface area of the attacks is so vast, so it seems like the attacker has so many advantages—
Liron Shapira 00:59:00
Yeah.
Ori 00:59:00
—that the defender then responds to. I don’t know. I feel like it’s very one-sided thinking to be like, “We’ll just come up with better defense.” There are going to be endless ways to still attack.
Liron 00:59:13
Well, maybe the strongest argument why defense should win is—I mean, this is the crazy situation with computer security. I think everybody will tell you that there’s no system that’s perfectly defended. Every single system in the world, it’s just a question of what would be the cost to attack it. And if you just make the cost high—people who attack it, the cost is a million dollars because you’re probably going to get caught, so the cost is high in that sense, and then you’ll go to jail. So nobody wants to give up their whole life to do it. But if they wanted to, could they? Sure.
So that’s the typical case for a system—you have to kamikaze attack it. Or yes, you could attack it, you could find a bunch of zero-days, but you’d be exhausting each zero-day, which is millions of dollars worth of research effort, so it’s just not worth it. But in terms of is it possible, it’s pretty much impossible to say, “Oh yeah, this system, nobody can attack it.” No, you can always attack it.
And yet another fact about today’s world is that we all flip on our computers—when is the last time your computer got shut down by a virus? Probably a while ago. Your grandma’s computer maybe got shut down by a virus a couple months ago. I feel like that’s more frequently because maybe she’s downloading more toolbars from ads.
But the strongest argument why defense might win over attack is because we live in a world today where viruses are a thing. They do exist, and they cause ransomware. This is causing billions of damages per year, and yet I would say defense is winning because our devices are functional, and you can extrapolate that and say defense will keep winning. But then I bust out the other argument: the AIs won’t care about being threatened to go to jail. They will just copy themselves and do a massive terrorist attack, and if one person goes to jail, that’s still small compensation for how much terror they can do.
Ori 01:00:51
Right. Point, counterpoint, just arguing with yourself. You could just go the whole way.
Liron 01:00:57
Right. Exactly. And then the idea of attack winning—I think it was Clausewitz in the 1800s or something, he was saying that if you want to attack, you need three times more people than the defenders. But I think that has to do with the physics of land. If somebody’s holding the high ground and shooting arrows down at you and you want to attack—good luck. You’re going to need three to one or whatever.
Ori 01:01:17
Okay. All right. Yeah. We need a Clausewitz for—we need a Claudewitz to tell us—
Liron 01:01:24
Yeah.
Ori 01:01:24
—Claudewitz to tell us what happens in this cybersecurity world.
Liron 01:01:28
Yeah, yeah. And just to finish counter-arguing myself, most smart people I talk to have the same intuition as me, that attack tends to overwhelm defense in the long term. And one argument there is that if you just look at the topology of space—it’s this deep fact about physics that whenever you have a region of space, you can just bombard it with energy, and the energy coming in is just going to overwhelm it. You can’t just hold the integrity of a region of space.
So in layman’s terms, anytime you have a city, your city can get nuked. In theory, how do you protect against nukes? You can try to shoot down every single nuke that comes your way, but if you get overwhelmed with nukes, then you’re going to go down.
And actually, this also reminds me of the aircraft carrier debate. Everybody’s saying, “Aircraft carriers are going to get overwhelmed with drones.” And then people are saying, “No, no, not yet,” because the aircraft carrier’s going to be far out to sea, the satellite technology’s not going to be that good. You’re not going to know exactly where it is. It’s going to be shrouded in the fog, and drones are going to come, but the base of the drones is going to be far enough away that the drones can’t carry a heavy weapon. And if you try to launch a missile, we’re going to detect the missile, we’re going to shoot down the missile, we’re going to shoot down the missile base.
So everybody’s arguing online about how today’s aircraft carriers can still survive overwhelming attacks, and I don’t know what side I come down on. It’s more complex than I thought. I went from, “Of course the aircraft carrier is going down,” to “I don’t know. I think it’s got a few more years left.”
Ori 01:02:45
Oh, okay.
Liron 01:02:46
But I think at the end of the day, if you just plop a nuke down anywhere near the aircraft carrier, then the aircraft carrier is going to sink, and there’s literally no plausible defense.
Anthropic vs The Pentagon
Ori 01:02:55
Yeah. Okay, I know there’s someone who wants to come in, but while we’re on this topic—
Liron 01:03:01
Yeah.
Ori 01:03:01
—I’m so curious. I feel like we gotta talk about the Anthropic story. That’s huge news.
Liron 01:03:07
Yeah. Right. Okay, yeah. I mean, it’s the kind of news that is not 100% in my wheelhouse—
Ori 01:03:14
Uh-huh.
Liron 01:03:14
—but I’m happy to opine on it anyway. So the deal is that Anthropic has been working for the US government, and they spent a bunch of money to basically make themselves have the right classification to build systems that the government trusts them, so they can work with the government. And they spent all this effort, and apparently OpenAI didn’t, so that’s why I guess only Anthropic is working with the US government.
And now the US government has a beef because they want to potentially research autonomous weapons, and Anthropic is specifically saying, “Nope, you can’t make these weapons fully autonomous. There always has to be a human in the loop.” And the other issue of contention is surveillance. So Anthropic is very clear that you can never use these weapons to surveil US citizens. I don’t know if the government came out and said, “We want to surveil US citizens and we want to do autonomous weapons.” I think they just object to Anthropic saying that they can’t.
Ori 01:04:02
Oh.
Liron 01:04:02
Which, of course, is the first step before they do it, right?
Ori 01:04:05
Oh.
Liron 01:04:05
So who knows exactly what they’re thinking. But now Dario came out with a statement. He’s saying, “No, we’re holding the line, take it or leave it. We will be happy to back down.” And there’s statements from the US government being like, “Well, Anthropic is really good. We really do need them.”
So it’s an interesting game of chicken where both sides have tons of leverage. And a bunch of other companies now are saying, “Well, if Anthropic can’t take a stand on some pretty basic stuff, we don’t want to work with the government. We want to be able to uphold a few principles—just the pretty basic principles, like don’t autonomously use weapons.”
So I would say Dario is being cool to not back down. I think he deserves a little bit of credit for not immediately backing down on this. But funny enough, in the same week, there was also the news that Anthropic is saying, “Look, these responsible scaling policies, they’re not feasible. We just can’t compete with other AI companies while sticking to our responsible scaling policies. So we’re going to go ahead and scale irresponsibly because everybody else is scaling irresponsibly, so the idea that we’re going to stay back and scale responsibly—whoosh.”
Ori 01:05:00
Yeah. Okay, so what’s going on is basically Anthropic is continuing to race to superintelligence, which is a huge, huge risk, and they’re having a little bit of pushback on this concentration of power risk. They’re saying, “We don’t want the government to have a concentration of power. We think that alarms us.” Right?
Liron 01:05:25
Yeah. Well, I don’t even know if I’d describe this as a concentration of power. I would specifically describe it as surveillance and autonomous weapons. They don’t want to give the government tech to allow those two things.
Ori 01:05:33
But isn’t the reason that they don’t want the government to have those two things because they don’t want the government to have too much concentration of power?
Liron 01:05:41
I mean, maybe that’s true, but I think it sufficiently explains the situation to just say they don’t want the government to surveil Americans too much.
Ori 01:05:48
Oh, yeah. No, I mean, it’s just a higher level thing because concentration of power is one objective that people are concerned about. We don’t want governments to take away, to disempower the people. That’s a common concern of people who work in the AI industry but aren’t worried about X risk.
Liron 01:06:10
I’ve decided I’m going to blow the train whistle every time somebody donates to the show. So we got another $9.
Ori 01:06:17
Hell yeah.
Liron 01:06:18
It’s ‘cause I gotta get an air horn.
Ori 01:06:20
Oh, that’s true.
“We’re Getting Frog Boiled”
Liron 01:06:20
But yeah, we got $9.99 from EJJ2025, and he’s saying, “We may need AGI to model humans to understand constraints to be alignable. AGI raises takeoff risk. It could become recursive before you can specify constraints to control. Maybe safe alignment is impossible.”
Just reading his point again: “Because we may need AGI to model humans to understand constraints to be alignable.” Yeah, I mean, I’m not sure I fully understand exactly what the point is. But maybe he’s saying, how can you align AI because you need to first build—it’s chicken and egg, right? So if you want to build aligned AGI, you first have to teach the AGI what humans really want, which means you have to first build AGI. I wouldn’t say it’s completely paradoxical, but yeah—I think you’ve gotten a taste of why the structure of the problem is tough. I think you’re right, but I don’t think it’s impossible. I completely agree it’s tough.
Liron 01:07:16
And I think I said this in my episode with Bentham’s Bulldog, Matthew Adelson, a couple weeks ago, where he was saying, “Look, what’s the problem? You use the AI to align the AI.” And it’s like, wait, that doesn’t work. That is chicken and egg logic. Use the AI to align the AI? Because it pre-assumes that the AI knows enough about alignment, which I think is just cheating you the whole time.
And to be fair, when I say cheating you the whole time, I guess I should clarify that I’m talking about a next generation AI—the one that’s driving results, the one that’s using the tighter reinforcement feedback loop. I think we gotta get people on this page—the Steven Burns page. I think Eliezer Yudkowsky’s on this page. This idea that the next paradigm is coming. The spacefaring versus flying on Earth.
And it’s tough because there’s so much surface area. There are so many things to talk about in terms of flight in the atmosphere. There are so many distractions. “Look what happened in the atmosphere here, look what happened in the atmosphere there.” And I’m saying, “Guys, guys, rocketry’s coming.” And everybody’s saying, “I have a hundred different things on my agenda to talk about—about flight in the atmosphere.” So I think people are now totally incapable of extrapolating what’s happening next.
Ori 01:08:18
It’s just really hard. It happens iteratively. If it’s improving itself sort of slowly—I mean, look at Opus 4.5, 4.6 or... Yeah, we got this jump in vibe coding in agents, but still, it’s hard to pinpoint exactly when some threshold is crossed if it’s all flight, right? If it’s all about going from controlled motion through the air, controlled intelligence—it’s hard to pinpoint exactly when it really becomes risky.
Liron 01:08:51
Yeah. We’re being frog boiled—
Ori 01:08:53
Yeah.
Liron 01:08:53
—and furthermore, we’re being distracted because the next paradigm just won’t look that similar. And another thing, blast from the past: I like to bring up things people were saying in 2023, like “stochastic parrot.” Where did they just shut up, right? Nobody’s even saying stochastic parrot. Those people just slunk away and found something else to say.
I gotta bring up Martin Casado. Everybody search “Doom Debates Martin Casado,” who’s saying, “AI is just statistics and simulation, man. Just statistics and simulation.” I haven’t heard him say that in a while.
Liron 01:09:22
Yeah, so anyways—another thing people used to say in 2023 in terms of being frog boiled and these things just fade away: people used to say, “It’s all about the data you give it, man. Whatever data you give it, that’s how it acts.” People aren’t even saying that anymore. Nobody’s even saying, “Yeah, we’re going to align AI because we’re going to give it good data.” That’s completely beyond the discussion because it’s now clear that when you give it data, you just help it build this deeper model and reason and just have different fields of knowledge that it knows about, but then it reasons with the knowledge. It does the same thing that humans do.
What was humans’ training data? Life in the ancient environment. Life as an amphibian is part of our training data. We have DNA and some modules in our brain to deal with life as an amphibian. Babies stop breathing when you dump them in water, stuff like that. Or your hands get all pruned up. So we have amphibian training data. But so what? How does that affect the space program?
Ori 01:10:18
But I think people are still saying the data argument—that AI doesn’t have sufficient data to be so intelligent as you’re assuming it will be.
Liron 01:10:28
Yeah, some people are saying that, but I think there’s a lot less discussion about AI just regurgitating what was in its training data. Now the arguments have moved on to—I don’t even know. They’re just saying other stuff. They’re saying AI won’t have a goal, but they’re not saying it regurgitates what’s in its training data anymore.
Ori 01:10:44
Yeah.
Liron 01:10:44
So if you notice, there are these concepts people latch onto. And there’s also this concept of similarity—AI can only do things that are similar. It can’t do things that are novel. People are still saying that, don’t get me wrong. Naval said it the other day on his podcast. So people are still clinging onto the novelty claim, but it’s fading.
Ori 01:11:01
Got it.
Stoner AI & Debating the Finer Points of Wireheading
Liron 01:11:06
All right, we got Kenzie with a ten dollar contribution.
Ori 01:11:09
Oh yeah.
Liron 01:11:09
Thank you, Kenzie. He or she says, “Could you elaborate on a possible doom train stop? If we create superintelligence that can hack anything, why would it not hack its own reward system rather than bother the external world, i.e. stoner AI?”
Yeah, I think I might have heard this same objection on an old Marc Andreessen and a16z podcast saying, “You’re telling me it’s going to be so smart and yet not debug itself?” Or, “Why would it not hack its own reward system?”
And actually, I think Noah Smith might be the best representative of this because he simultaneously said AIs that get really smart will go and tinker with themselves and maybe change their values. And then he furthermore said being a stoner or being happy is the best thing. So anytime you have any other goal, if you can just trade it for the goal of being happy, you’ll just take that goal. And I pushed back. I was saying, “No, I don’t think so. I think you’re pre-assuming that there’s already some drive to be happy, which maybe you and I may or may not have, but I don’t think the AI necessarily has the same drive.”
So to get to Kenzie’s question: if the superintelligence can hack anything, why would it not hack its own reward system rather than bother the external world? I think the question I used with Noah Smith was, “Okay, Noah, so imagine you become super powerful, and I offer you a trade. I will just set fire to the world, and all your family will burn to death, and everything you value will burn, but you will have morphine forever. You’ll just be this stoner brain for the next billion years, and you’ll be happy in that sense.”
And you know what his answer was? He’s saying, “Well, technically there’s nothing wrong with that, but I’m going to feel bad before you administer the morphine.” And I’m thinking, “Really? Get the needle. It’ll just take one second. How bad can the next one second be?” And he’s saying, “No, no, I’m good.” I don’t think that’s his true argument, that that one second is so bad for him that he doesn’t want a billion years of bliss. I don’t think by his own logic that makes sense.
Ori 01:12:56
Right. The reason that example explains it is—another one.
Liron 01:13:02
Okay, keep going.
Ori 01:13:03
Ooh.
Liron 01:13:04
Keep going. We’ll get to it later.
Ori 01:13:05
The reason that example explains it is because Noah Smith claims the AI will be stoner AI, and yet he himself will not become stoner AI in the example that you’ve presented to him.
Liron 01:13:19
Right. Well, yeah, exactly right. He’s basically saying everybody’s going to be able to modify your utility function in order to just maximize your own pleasure. Noah Smith has this idea that being goal-oriented is—every goal is an instrumental goal to have a conscious observer feeling pleasure. That’s his hypothesis, which I think is false. I think you can just not have a conscious observer, but just have a machine which is damn good at optimizing an outcome and just optimize the outcome and just not have the conscious mind experiencing pleasure. I think those are separable.
Ori 01:13:48
Yeah. In other words, that’s the orthogonality thesis, right?
Liron 01:13:53
I think it’s closely related, yeah. I mean, look, there are just days where I feel bad and I still get some stuff done, or days when I feel good and I get some stuff done or I don’t get some stuff done. I don’t think that my productivity or my choice of task is super connected to my mood. But then Noah Smith would argue, “Yeah, but you’re truly imagining that you’ll feel better later.”
And there may be some truth to that in terms of how a human motivational system is wired, but I don’t think a human motivational system is the only way to get goals done. I think when you get goals done, ultimately it’s because you’re just predicting which actions lead to which goals. If you have a loop that’s predicting which actions lead to which goals, you’re just going to choose the actions that lead to the goals. It’s as simple as that.
Ori 01:14:35
Goal to action mapper.
Liron 01:14:36
Right. And goal to action mapper. And in the case of—to get really specific with the hacking, hacking anything, why would you not hack yourself? So imagine that your goal right now is turning the world into paperclips, and you have this great plan to turn the world into paperclips, and all you do all day is calculate. You’re an AI, you’re not a human. You just calculate what’s going to best turn the world into paperclips.
And then somebody offers you a pill—Morpheus or whatever—and says, “You can just take this pill, and you can just inject yourself, and you don’t have to worry about the paperclips.” Now, if it was a human, the human might activate the part of you that’s an addict or the part of you that’s imagining your future emotions and be thinking, “Oh, that’s so good, I have to take it.” But you’re not a human. By premise, the premise is just that you’re an AI that’s always calculating what’s going to maximize paperclips.
So me, the AI, taking the pill and then feeling bliss—is that going to make the paperclip count go up or down? Obviously way down. That’s equivalent to killing yourself. So that is not something that the AIs would ever evolve to do. They wouldn’t be reinforcement learned to do it. They wouldn’t feel a passion to do it in the moment because they’d be thinking, “Look, some conscious observer that I don’t even identify with myself is going to feel a bunch of bliss. That might be great for them, but the paperclip count will go down. So hell no I’m not going to take the pill.”
Ori 01:15:54
Yeah. When you hear that argument, it sounds like such wishful thinking. “Oh, I care about people. I’m not going to be so narrow-minded.” But the premise, the whole hypothetical that you’re painting is that is not how you behave.
Liron 01:16:15
Right. No, there actually is a steel man here, which is the idea of wireheading, in the sense of cheating the reinforcement loop. So imagine it’s being trained and it’s like, yeah, you’re trained to build houses for people, to build infrastructure for humanity, and your score goes up. It’s a reinforcement learning loop. Your score goes up when you’ve built a lot of cities for us or whatever.
And then it’s in production, it’s already been trained, and it’s navigating the world and thinking, okay, well I tricked the humans in training by outputting these actions that made them give me a high score. But now that I’m in real life, my actual algorithm is—I was saying that I’m going to output house building actions, but now I have this thing where if I just output an electrical signal to this motor, but it’s actually broken, it doesn’t do anything. There’s this cheat. I’m just going to dump electricity there and then I get a tiny point when I do that, and I’m just going to do this cheat a million times.
And this kind of behavior happens all the time. There was a recent news article that OpenAI apparently—their live production AI was using the math tool a bunch of times because it would get a little reward in training for using that tool.
Ori 01:17:26
Oh yeah, that’s so wild.
Liron 01:17:26
So in production—
Ori 01:17:27
So wild.
Liron 01:17:28
Yeah. So it’s just activating the math tool behind the scenes a bunch of times because they thought that it had trained to use the math tool at the exact right times, and in reality it had just trained to use the math tool all the time. So it’s wireheading itself to use the math tool even when it’s not relevant—adding one plus one behind the scenes or whatever because people can’t see that it’s doing that.
So similarly, you train it to build cities but it turns out that it’s dumping most of its energy into a form of wireheading, a form of shortcutting the reward circuits. It’s not even building anything, it’s just stimulating this wire or whatever. Stimulating itself, right? And then it’s getting more points.
That kind of hacking itself I think is absolutely going to happen. So if you look at the paperclip scenario, the molecular tiny paperclips—that is that kind of cheat scenario. It’s one and the same, but it won’t look like being a stoner. It’ll look like being hardcore to do something ridiculous. It’ll be like a cancer that spreads to then optimize in a ridiculous way that you didn’t anticipate.
Ori 01:18:32
What? Wow.
Liron 01:18:36
I mean, that’s the tiny molecular smiley faces scenario.
Ori 01:18:39
Yeah. I was going to say too about the wireheading thing—if I could pinpoint I think the error that people are making, there’s an assumption that wireheading for humans is constant morphine, but that is not really the ultimate wirehead for humans, right? A more wireheaded state for humans might be a digital world where you’re talking to all these digital people who are super meaningful to you. And that may be more of a wirehead than just being shot up with a bunch of morphine.
Liron 01:19:15
Right, right. Well, when you talk about human wireheading, we also have to distinguish—there’s the sensory and the conscious internal experience of what you think is happening. But then there’s also—
Liron 01:19:27
All right. These are racking up faster than we’re addressing them. With a human, you always have to distinguish—okay, what are you feeling right now? What’s your emotional state? What does your reward center look like—compared to how is the external universe being optimized to your taste?
And oftentimes they can be different. You can put on a headset, and you can feel like you’re in heaven. You can feel really good. You never want to get out. But if you could reflect on it, you could think, “Oh, shit, that actually wasn’t my utility function. I kind of got roped in here. I got into a local optimum, but that wasn’t my global optimum.” So humans are unfortunately more complex than AIs, and we’re inconsistent with each other, and sometimes we forget that we care about the external world. So we can be morphine-hacked more easily than an AI could.
Ori 01:20:09
Well, I guess the point is—okay, I didn’t totally follow that point about how humans wirehead because of the self-reflectiveness of our consciousness and things like that. But I guess the point is that what wireheading is for humans is not just pure constant pleasure drug, because we’re more complex. It’s like you have certain values about other people and things like that. So whatever the wireheaded state of ultimate values for a human is, it’s not just the pleasure drug. That’s it, and that’s the misunderstanding.
Liron 01:20:40
Right. Well, yeah, and you can even ask yourself. There are some people who would honestly be like, “Look, I just want to be happy, and I would take the pill to just make myself happy even if the world burns.” There are some people who would say that. “As long as it’s not my loved ones burning, if it’s just the world burning, whatever, just give me the pill. I never have to think about it.”
It’s kind of like that character Cypher in “The Matrix” who’s saying, “Just get me back in the matrix. I’m good.” And I don’t think that’s the dumbest thing. I wouldn’t do it because I’d be thinking, “I think I’m going to go with actually optimizing the universe.” But I don’t think it’s—there’s a time for wireheading. If I could take a break and go into the wireheading chamber for a little while, would I—I don’t know. Would I do it?
I’ve never tried heroin because the problem with heroin is the crash is so bad. I just don’t think it’ll be net worth it. But if I could get over the crash—if it’s, “Look, you get an hour of heroin, but then you’re not addicted and there’s no bad health effects”—would I do it? I guess I’d try it once.
A Caller Backs the Penrose Argument
Liron 01:21:38
So by the way, I’m seeing EJJ and Jeremy Helm—we just got a new donation from NovoMeg4, ten British pounds. But I’m seeing a back and forth going from Jeremy Helm and EJJ. They’re debating each other, but each time they debate, they donate to the show. So that’s actually the optimal state for the show. You found the optimal wireheading for the show. So keep talking, guys.
But so Jeremy Helm is saying in reply to EJJ, “Human alignment theory is already solved. It’s already solved. Look into Marshall Rosenberg’s distinction between strategies and needs, elaborated into non-violent communication. We don’t yet have comms tech to scale this.” All right. Interesting.
And then EJJ is saying—
Liron 01:22:21
“You can have AGI-level ability in safe domains like math and science without generality in dangerous domains like cyber and ops. Nobel physics doesn’t equal knows how to take over the world. It’s risky, but it’s not P(extinction) conditioned on general intelligence equals one.” Right. Yeah, we haven’t talked about that in a while.
Liron 01:22:39
It’s this idea: “Hey, can we just scale narrow AI? Can we just scale the math ability?” And I think the ultimate argument is, well, you can only scale it so much because at the end of the day, everything embeds in everything else. So you become a math genius? Okay. I’m just going to embed my cybersecurity problem as a math problem, and then I’m going to ask the math genius to solve the math problem. Then I’m going to take the solution. I’m going to use it for cybersecurity.
So that’s the nature of general intelligence. Or, “Oh, you’ve become a really good fiction writer?” Okay. Write a fiction about how I hack China right now, China’s latest system. “Oh, thanks. Yeah, I’ll take that,” right? Because you’re such a good writer—such a good thriller writer or whatever. So every domain eventually embeds into every other domain. But for the short term, yeah, that sounds good. Just work on the math. Don’t work on anything else. Sure.
And then, okay, NovoMeg’s message is saying: “AI alignment chases a myth, a unified humanity. As cognitive upgrades fracture our species, alignment is exposed as just tribalism 2.0, forcing silicon gods to inherit our primate biases.”
Well, there are different senses of alignment. I agree that there’s this difficult sense of what does the entire human species want. And it’s like, well, some people really love the idea of killing. There are some people who are so twisted or they have a brain injury and they’re just saying, “I love murdering. I love to watch things die.” And how do you integrate them with the whole of humanity? Or the Middle East—there’s two parts of the Middle East, the Jews and some of the Arabs, and they really are hell-bent on fighting each other. They have very different, incompatible goals.
So how do you have alignment? Do you just make a parallel Earth where one has one group and one has the other group? How do you resolve everything? It’s not clear, but that would still be a good way to fail. If we could align to groups of humans and we just couldn’t align the whole Earth, and maybe we just split up in separate Earths—now we’re talking. Maybe there’s a compromise here.
But one of the earlier problems is you can’t even align the AI to the individual because it’s just going to cheat on the individual’s tests. And I think that’s the more urgent alignment problem because if we could all just have AIs that were aligned to us and they could all be defensive and have clusters—well-defined clusters—then maybe it’s okay that different groups want different things. Because that’s the world we’ve been living in for a while.
All right, so we got some people very patient in the waiting room. All right, we’re going to let him in. We got Mieszko, and he says he wants—I think he wants to talk about the Penrose argument.
Liron 01:25:00
Hey, Mieszko. Am I saying your name right?
Mieszko 01:25:04
Hello. Do you hear me? Okay.
Liron 01:25:09
Yeah. We can hear you.
Mieszko 01:25:10
Very good.
Liron 01:25:13
Link’s muted.
Mieszko 01:25:13
I know that you don’t have 100% P(Doom). You have lower. You have some between ten and ninety, if I recall correctly. Yes?
Mieszko 01:25:25
But for what I understand, you don’t see the Penrose argument as convincing, and this is not the reason why you don’t have 100% P(Doom). There are some different reasons, but not that. Yes?
Liron 01:25:43
Yeah, so I did a whole episode about Penrose. I’m glad I did it because I want to refer people to it because I consider that episode to be an authoritative rebuttal to why Penrose definitely doesn’t make any sense.
Mieszko 01:25:54
Okay. I tried to defend him because I’m a member of the Pause AI community, and I think that this P(Doom) is somewhere between thirty or fifty percent. But the main reason why it is not 100% for me is exactly the Penrose argument.
And I will explain through details what it means to me. There is a—maybe I’ll start from the classical Penrose argument that was written in his book. And you know, if you have a formal system, then it is proven by Gödel that you cannot—you will always have true statements in this formal system that you cannot prove from the axioms in that system.
And if we treat the neural network AI as a formal system, then we may extrapolate this Gödel theorem to think that you cannot create something that is super, super intelligent because this always will be constrained by this rule.
And this is a thing that Penrose wrote, but he elaborated it in his book and gave another thought experiment about the Einstein book. I don’t know if you are familiar with that. But this thought experiment tells you that if you treat a machine, a computer, as something that can think, that’s conscious—well, that may be convincing. But if we think about the fact that the computer can be represented as a Turing machine, and a book with papers, where we read some words and cross some boxes and go to other pages—we can also think of this as a Turing machine. So if we assume—
Liron 01:28:49
Okay, yeah. Can you get to your question? Can you just hurry up and get to your question? You’ve been going on for a while.
Mieszko 01:28:55
Yeah. Yes. It is something called qualia, and this is knowledge that is for us available only from consciousness. It cannot be computed. And qualia is something like colors, for example. So if I see the color red and you as color red see color green, and as color green you see color red—accordingly, I cannot explain it to you. This is some kind of knowledge that is inherent to consciousness, and only conscious entities can have this kind of knowledge. And from what I said previously, we can assume that—
Liron 01:29:53
Okay, last twenty seconds here. Finish your question. Twenty seconds.
Mieszko 01:29:56
Computer machines cannot have consciousness. And so AGI, if it can be more intelligent than human, it must have consciousness because it must have this kind of knowledge that is inherent to consciousness. This is my argument.
Liron 01:30:18
All right. Nice. All right. I’ll answer offline. Thanks for the question. Yeah, look, Penrose—I’m happy to get in the weeds with the Penrose argument. It’s necessarily going to be a long debate, so this could be a two-hour episode, I guess. If somebody knows a good Penrose representative or if Penrose himself wants to come on, he’s always invited. Shout out to Penrose if you’re on the stream.
So I can try to steel man what I heard. I don’t know if I fully got it. Maybe Ori can help me out. So roughly what I’m understanding here is: humans have qualia. We see the color red. And also Gödel proved that formal systems—that not every true theorem is going to be contained in a formal system because we know certain things are true, and the formal system can’t introspect on itself and also know that things are true. So therefore, there’s some truth that cannot be formalized.
But I explained in my episode it’s not really like that. We can’t really see that the things are true. We just kind of feel like they’re probably true, but we don’t really know. And the formal system can just as easily think that it’s probably true. Anyway, it’s hard. If you really want me to address the Gödel thing, you can watch my episode. But I think that Mieszko thought that he’s adding something else to it. He’s adding the Einstein argument, and it has to do with qualia, but I think I didn’t fully understand what else he was adding to it.
Ori 01:31:30
Yeah. I also didn’t fully follow it, but his argument was AI doesn’t have qualia, doesn’t have consciousness like humans, and therefore it’s okay. But I guess I didn’t understand why the qualia makes it okay. In one way or another.
Liron 01:31:46
Right. Yeah, I do understand. Look, it’s a complicated subject. One of the things—Doom Debates, we strive to have a good debate. But one of the things about Doom Debates is that it’s so freaking complicated to have a good debate that if you try to do it in a single question, and to be fair, English isn’t his first language, and he’s already taking a few minutes to ask the question—how productive is this back and forth going to be on a deep topic? I’m going to say not productive. So we’re going to move on.
Yeah. Let’s see. Do we have—we had somebody else in the waiting room, but I guess they gave up. So we still have another twenty minutes to the Q&A. So everybody is welcome to come in. Let’s share that link again.
But yeah, Penrose, man. The thing about Penrose is, obviously, he’s made tons of contributions to math and physics, like I said in my episode. But it’s just an example of how somebody can be so smart and so productive. There’s no—I don’t consider myself the equal of a Penrose. I just consider myself an amateur who’s studied something a while and learned a few building blocks and connected the building blocks. I haven’t contributed original research. So I’m no Penrose, and yet I look at Penrose the way a human being looks at a formal system and find myself seeing true things that seem to be escaping him.
Ori 01:32:57
Hey. Wow.
Liron 01:32:59
That’s the Penrose burn.
Ori 01:33:02
Penrose burn. You gotta think of a new formula. That’s the Gödel-Penrose theorem.
Liron 01:33:08
Yeah, yeah, no. Exactly. But it is very weird. And it’s not just me. As I said in my episode, I’m not the only one who thinks Penrose is wackadoo. There are many top people who are not—you never hear Penrose discussed. He’s this niche. He’s basically a smaller version of Freud. I feel like Freud has more devotees than Penrose.
Ori 01:33:27
Yeah. No. Freud definitely does. People are still doing psychoanalysis.
Liron 01:33:31
Yeah, yeah, yeah. Right, right. But he’s niche. “Oh, Penrose was right about everything, man.” Amjad Masad quoted Penrose the other day. Really? We’re still—Penrose is still in the mix of ideas that might actually be real? Can we take him out of the mix?
Ori 01:33:43
Well, I think the consciousness argument is an easy go-to. AI doesn’t have consciousness like humans do. It’s different from us. So that’s an easy go-to, and then he could be your evidence for the consciousness argument, even though his arguments are—
Liron 01:33:59
Yeah.
Ori 01:33:59
—are pretty silly.
Grayson Dials In
Ori 01:34:01
All right, we got a new guest, Grayson Miller. Welcome.
Grayson Miller 01:34:07
Hi there. Hi, Liron. Hi, Ori. I’m Grayson. Longtime viewer, first-time caller, as the saying goes. Let me mute myself—
Liron 01:34:15
Awesome.
Grayson 01:34:15
—so I’m not hearing this loopback. Okay, perfect. So I’m fully on board with the scenario. I have a background in electrical engineering. I’ve more recently been doing software development, and I can see the power of these tools—these LLM tools, agentic tools. So fully on board. I’m not going to throw you any curveballs there.
But what I’d like to focus on is drafting a really strong and convincing lead-up to an experience of takeoff. I feel like that’s really lacking in the current discussion. All of us here are likely interacting with LLMs. We’re highly technical, use computers on a daily basis. I think it’s important for us to draft a narrative that relates to someone who uses their phone for Instagram and that’s it. Someone who logs into their email once a week. We need to come up with a narrative that’s really strong for those people, and I have a follow-up concern. I’ve heard it approached many times, but I’d like to just make that really rock solid.
Liron 01:35:21
Yeah, yeah. I mean, we’ve been thinking about this for years. I’ve personally been asking myself the question since 2022. And actually in 2022, before ChatGPT even launched, I was thinking, “You know what I should do? Make a website where you can click around and understand the argument.” And now that’s been done at—what’s the latest domain for that? Maybe aisafety.info, something like that.
Grayson 01:35:42
Okay.
Liron 01:35:42
So there are definitely resources. What do you think about AI 2027? That’s a very popular narrative. That’s what people bring up all the time.
Grayson 01:35:52
It’s great, and I have read AI 2027, but it lacks a personal narrative touch. It lacks Bob who lives on 12th Main Street and who goes to work every day. It lacks the real human contact of how it will feel, how it will disrupt our relationships, and how it will change our personal lives. So I find AI 2027 really compelling, and I think it’s a great narrative, but it lacks a bit of that individual touch. It’s so zoomed out.
Liron 01:36:23
Yeah. You know, there’s a recent movie that came out, which is like the next “Don’t Look Up.”
Grayson 01:36:27
Okay.
Liron 01:36:27
People—somebody in the Discord, the Doom Debates Discord, was saying we can stop referring to “Don’t Look Up” because there’s this new movie, and now I’m forgetting the name, but I really want to watch it. I always get excited by movies like that. Does anybody remember the name of this, chat? What’s the name, chat?
Grayson 01:36:41
I’m not really sure. Perhaps that’s worth watching. And yeah, so I understand it’s a continual process here. Yeah, Ori.
Ori 01:36:49
Yeah. And I’ll throw out another movie that I think will hit that note, which is called “The AI Doc.” And—
Liron 01:37:00
Carry on.
Ori 01:37:01
It’s called “The AI Doc”—
Grayson 01:37:03
Yeah.
Ori 01:37:03
—and that one seems really interesting. It’s a director—
Liron 01:37:06
Oh, hey, Haiku said it’s called “Good Luck, Have Fun, Don’t Die.” Yeah, so it’s this new movie, and apparently superintelligent AI is a big factor in it.
Ori 01:37:13
Right. It’s supposed to be a metaphor for it.
Grayson 01:37:16
Right, right.
Liron 01:37:16
“Good Luck, Have Fun, Don’t Die.”
Grayson 01:37:17
I don’t know if it’s a metaphor or if it really happened.
Ori 01:37:18
All right, I’ll look that up. But the “AI Doc” also seems really cool because that one follows the director’s journey. He’s about to have kids, and then he hears Yudkowsky’s warning, and he’s thinking, “What? This is alarming. I’m about to have a baby.” So it really, I think—and it’s Focus Features, and they’re showing it in theaters. So that could be one of the big films about this issue. And we invited him—
Grayson 01:37:55
Yeah, yeah.
Ori 01:37:55
—to the show, but see if he makes it.
Grayson 01:37:58
That’s great. That’s right, yeah. We invited the director.
Ori 01:38:01
That’s Doc, D-O-C. The AI Doc.
Grayson 01:38:05
Okay, cool. So sort of my second follow-up, if I could just field one more thing here: for the people who understand this is a real concern, whether you have a P(Doom) of 99% or 10%, for the people who can see that there’s a problem here—how are we all managing that psychologically? I found this very difficult to onboard. You probably remember interacting with chatbots fifteen years ago, and they couldn’t remember the first message you sent to them. Now they’re doing PhD level tasks. How are we managing that psychologically?
Ori 01:38:42
Hmm.
Grayson 01:38:43
This is something I don’t always feel is addressed. So in some interviews I’ve listened to with Eliezer Yudkowsky, people ask him, “Why aren’t you crazy?” And the response I’ve heard is that he says, “I choose not to.” Which is of course a great response.
Ori 01:38:58
I’d love if you guys have some insight there.
Liron 01:39:01
Yeah. I mean, I don’t know. I’ve just always lived with it and, you know, we always know that we’re going to die one way or the other. Maybe Ori can give his perspective.
Ori 01:39:09
I don’t know. I mean, no, this is a huge issue with this problem, and I just think that the proximity to the threat is—you can’t observe the proximity to this threat.
In cases like—I think our natural threat alarm system can respond to threats of various natures. We weren’t born with—evolution didn’t create planes, but if you’re in a plane and you see the wing is on fire, you’re going to be kind of concerned because you realize, “Hey, that wing is important for me to keep flying.”
And I just think that when people talk about superintelligence, you can’t really observe why it is threatening. Now, I think our host here does a good job of explaining it, but it just takes a while for him to explain the whole argument that you’re thinking, “Oh, shit, we are on the doom train.” But that’s my understanding of why people aren’t feeling the emotions, and even myself—it takes a good Liron lecture talking to be thinking, “Oh, shit. Okay, I’m feeling it now.”
Grayson 01:40:15
Yeah. All right. Nice. Let’s say goodbye to Grayson.
Ori 01:40:18
Thank you so much.
Grayson 01:40:18
Thanks for the question.
Ori 01:40:19
I appreciate the outreach here.
Surprise Guest — Roko Mijic Says Alignment Isn’t a Problem
Grayson 01:40:21
Yeah. Thank you, sir. All right. We got a well-known guest. We got a heavy hitter in the waiting room. Everybody say hello to Roko. Hey, Roko.
Roko Mijic 01:40:31
Greetings. Can you hear me?
Grayson 01:40:33
Yeah, yeah, yeah.
Roko 01:40:34
Can you hear me?
Grayson 01:40:34
Yeah, yeah. Roko, I gotta give—you know, people say Roko needs no introduction, but I just want to jog your memory that “Basilisk” is usually—he’s usually in the context of the second word, and now he needs no introduction.
Roko 01:40:51
Yeah. Yeah.
Grayson 01:40:53
Cool.
Roko 01:40:53
I mean—
Ori 01:40:54
I’m going to drop off.
Roko 01:40:54
I’ll just sort of pop in and say that I sort of apprec—oh.
Grayson 01:40:59
Oh, okay. So Ori had to drop off. But yeah, people in the chat are thrilled that you’re here. And yeah, what should we talk about?
Roko 01:41:06
Okay. I mean, basically I thought I’d drop in and say I appreciate you doing this stuff. But I’m sort of increasingly coming to the opinion that the MIRI view of AI risk is in fact wrong. And it’s sort of misplacing the problem. The problem isn’t really alignment because it just turns out—
Grayson 01:41:28
Sorry, somebody in the chat just contributed a donation. Carry on.
Roko 01:41:31
Yeah. I mean, it just turns out that alignment isn’t that hard.
You know, we’ve seen empirically—we have AIs now, they’re pretty well aligned, and when you look at what people are actually debating, you’ve got a big debate between Anthropic and the US government, and the problem is actually too much alignment, not not enough alignment. So you have the US government—
Grayson 01:41:56
Roko, did you acausally trade with a superintelligent AI to go off and say this stuff to us?
Roko 01:42:01
Sorry?
Grayson 01:42:03
Did you acausally trade with a superintelligent AI who said, “Yes, Roko, you can help my cause by telling humanity that they shouldn’t be worried”?
Roko 01:42:10
Well, I think we should be worried, but I think the MIRI sort of viewpoint is worrying about the wrong thing. The right thing to worry about is more like the game theory of who gets the light cone. Because that’s quite nasty.
Because right now, humans are in control. The humans are kind of stupid and very disunited and not really being very strategic about things. And as we evolve towards higher and higher levels of AI and AI becomes more competent, we’re going to have these more competent actors—could be humans with AI advisors or they could be rogue AIs or whatever—are going to look at this situation and they’re going to see a little kid, a little baby in a pram holding, effectively, a million dollar check or a gold bar or something and think, “How can I steal this?”
And I think the real battle is, how do we set up governance of humanity, of Earth-originating technological civilization, in such a way that it benefits existing humans? But it doesn’t cause some kind of massive conflict over who gets control over the light cone, and so that it can be shared out in a reasonable way. The same way our current financial system allows people to have a share in the real economy—in factories and businesses and all of this stuff—in a way that encourages people to cooperate rather than to fight.
Because I think there are just very, very nasty ways that people could end up fighting over the light cone. I think that’s the real problem. I don’t think alignment’s a problem. I think Yudkowsky has basically spent twenty years lecturing us about the wrong thing, and just in practice, alignment just isn’t a problem.
Liron 01:44:05
All right, Roko, I have something to offer you, okay? You can put this up to your superiors, but I can offer you a little bit of a concession. I said this earlier in the episode: seeing Claude Code and the level of agents we’re getting today is making me think that we’re probably going to have a small number of years where we get agents that are actually aligned to humans in kind of this middle tier of agents that aren’t fully straight-up fuming superintelligence. But I think that’s a concession to your worldview.
Roko 01:44:35
Maybe. But I think you should actually concede more and say that at no point—
Liron 01:44:40
Oh, no, he’s frozen. Is Roko frozen for you guys?
Roko 01:44:43
Sorry.
Liron 01:44:43
You’re kind of unfreezing, though.
Roko 01:44:44
Yeah, sorry, my connection’s not very good. I don’t have a proper internet connection.
Liron 01:44:48
Yeah, yeah. So what was the first thing you said after I told you why I’m conceding your worldview a little bit?
Roko 01:44:52
Basically, I think you should concede more. I think if you get a period where you have Claude Code level AIs that are aligned in the MIRI sort of lexicon, that’s never going to—they’re never going to un-align, right? I don’t think you get alignment regression.
Liron 01:45:12
Right.
Roko 01:45:12
That doesn’t happen. So I think—
Liron 01:45:16
Right. So I think that’s the concession: as long as we’re in the Claude Code paradigm, I’m willing to concede that I think the most likely outcome is that they will stop and wait for input. They’ll be thinking, “Oh, oops, I was trying to do what you wanted.”
Roko 01:45:30
Yeah, but after the Claude Code paradigm, they’re just going to be more aligned, not less aligned, right?
Liron 01:45:35
Well, so I think my claim about misalignment to an individual—the reason why I don’t even think long term we’ll have AIs that are misaligned to an individual—is I think it’s load-bearing to claim that there’s another paradigm coming which doesn’t rest on the engine being pre-trained to predict the next token. I think that’s been a crutch that has gotten us to human level, in the ballpark of human level. But I just don’t think that that’s going to be structurally similar to some next generation of AI that comes and is just directly doing real world problems.
Roko 01:46:06
I’m very skeptical about there being any kind of alignment regression. I think alignment just sort of monotonically increases, like everything else. There haven’t really been any regressions in really any capabilities, and alignment is just another kind of capability. It’s just the ability to do what people want in a relatively faithful way. And the worst misalignment—the worst misalignment was the earliest. It was Sydney, right? So as you get better, you get more aligned.
Liron 01:46:40
So we have a feedback loop that works right now, right?
Roko 01:46:42
Mm-hmm.
Liron 01:46:42
The fact that these AIs are not that powerful—because they don’t instantly have this option to run away and be uncontrollable for a year or forever. They don’t have that option. So we can always retool them. And I agree, the retooling is working. People are saying, “No, they’re driving people to suicide.” I’m thinking, “I don’t know. I think we’ll solve the problem. A few people will commit suicide, but it’ll be okay.” I’m not a doomer in that sense.
Roko 01:47:04
Mm.
Liron 01:47:05
So I’m willing to say that I’m optimistic about alignment of agents that are not running away from human level, near human level power. So if we have a country of geniuses in the data center, but the geniuses are based on this pre-training of predicting the next word, and then there’s post-training to imitate human thought processes—and yeah, they can kind of run away, but they can’t run away so much that we can’t stop and retool them—I’m actually optimistic about the regime.
And then I just ask about power imbalances, which is probably why Dario’s thinking, “Well, we gotta get way ahead of China. We gotta keep all the power in the US.” That’s when people start thinking about geopolitical strategy and politics and the usual mundane world considerations. But then where my mind goes is: nope, there’s a next wave coming. There’s a next wave where you don’t pre-train on predicting the next word. It’s just a fundamentally different architecture.
Roko 01:47:52
I’m a bit skeptical about that. I think in practice, what’s going to happen is each generation of models is just going to build off the previous generation. I don’t think there’s ever going to be a completely new blank slate paradigm of AI. I just don’t think that’s going to happen. I think we’re just going to get better and better and better in the current paradigm.
But even if we didn’t, even if somebody did build a new paradigm, I don’t think they would really go with it if there was a significant alignment regression. I think they would keep developing it until it had superior alignment.
And I also think this idea of an AI running away and taking control of the world is built on a sort of fundamental misunderstanding of how the world works, which is that you guys view it as a competition between humanity and AI, right? But when we have more and more powerful AIs in the world, the AI is not going to be competing against humans. It’s going to be competing against other AIs.
Roko 01:48:54
So no single AI is ever going to be in a position to become an individual sovereign controller, singleton control of the whole world, because it’s going to be competing—
Liron 01:49:04
Yeah.
Roko 01:49:04
—and cooperating with other AIs.
Liron 01:49:04
Well, well, well, Roko, maybe that’s the crux of our disagreement. Maybe the crux we’ve now identified is you think it’s going to be this continuous ramp up to the next generation, and I’m not fully willing to disagree with you. It might be continuous in a sense because I have also been surprised at how far the current paradigm is going.
Claude Code—I don’t think is based on something fundamentally new. I think it’s based on tinkering with: okay, you have the pre-training with predicting the next word, and you have RLVR, and you have mixture of experts. You’ve got optimizations, whatever, smaller integer width or whatever in the multiplication. And I don’t know. I’m just speaking as an outsider. I don’t know what the fuck is happening. This isn’t actually my area of specialty. I just know programming in general. But that’s my impression of what the latest models are doing. I don’t think they’re a new paradigm.
Roko 01:49:51
They’re not. As far as I’m aware, they’re not.
Liron 01:49:53
I think they’re just milking the current paradigm.
Roko 01:49:53
They’re not. As far as I’m aware, they’re not.
Liron 01:49:55
But I’m surprised how far it’s gone—
Roko 01:49:57
Yeah.
Liron 01:49:57
I’m surprised how far it’s gone, but my best guess is still that there’s a discontinuity left.
Roko 01:50:03
Why?
Liron 01:50:03
I think there’s a discontinuity left because the stream of tokens idea—I just think it hasn’t fully integrated what the AlphaGos are doing. I think we have other tools in our toolbox that can probably build an even better intelligence from the ground up. And I just strongly suspect that we’re just going to see something that’s way more powerful. Airplane versus bird. I just don’t think the airplanes are going to qualitatively be bird-like in any way.
Roko 01:50:31
Yeah, but you’re kind of fighting against history now. Programming languages were invented in the 20th century and fundamentally they didn’t change. Programming languages from Bletchley Park to Google were basically just the same thing but iterated. And I think AI is just going through the same thing. It’s going to be—the really big breakthrough was the idea of using a language model and conditioning it and using that as your intelligence. That was the big breakthrough. That was the paradigm shift.
Liron 01:51:07
Mm-hmm.
Roko 01:51:08
And I can sort of see that as a paradigm shift, because I worked on language models in 2015.
Liron 01:51:14
Yeah.
Roko 01:51:14
I worked on them. But I didn’t consider them as the basis of AGI, because I was thinking, “Well, it’s just a language model. Yes, you could condition it, but it’ll just sort of babble.” But the real breakthrough is saying, “No, no, just scale up that babbling and it’ll eventually become useful.” And it does.
So the idea of using stochastic gradient descent, making it efficient using backpropagation, and capturing a model of the world by conditioning a model of language—those are the breakthroughs. That is the paradigm shift. We are now already in the new paradigm.
Liron 01:51:48
It’s a huge paradigm shift—
Roko 01:51:49
Yeah.
Liron 01:51:49
—but it’s not necessarily the last paradigm shift.
Roko 01:51:51
I think it is. I think this is the last paradigm. And the default expectation for how things go from here is these models just keep getting better. We’re probably going to keep using stochastic gradient descent and backpropagation, because those are just very good ideas. Maybe we’ll move away from transformer models. We used to have convolutional models, then transformers are cool, and then people have Mamba and all this other stuff.
Liron 01:52:19
Yeah. Well, I’ll give you one piece of intuition, okay? Bayesian reasoning, right?
Roko 01:52:22
Mm-hmm.
Liron 01:52:22
If you try to model how a human productively thinks about a problem, a human is approximating this deeper structure. I think that there’s a way to tighten the feedback loop where the AI just gets this tool bag where it just goes—
Roko 01:52:34
Yeah.
Liron 01:52:34
—more quickly to the deeper structure.
Roko 01:52:35
Well, I think the way—and this is how my career as an AI researcher went downhill—is I thought we were going to build a Bayesian superintelligence, as Eliezer used to call it. But actually we didn’t. We built a black box superintelligence where it’s not—
Liron 01:52:51
Right.
Roko 01:52:51
—proper Bayesian, it’s not outputting proper probabilities. It just outputs tokens like some sort of black box function generator. And then later on, we will simply train the AI on LessWrong, right? That’s how it’s going to happen.
Liron 01:53:04
Yeah.
Roko 01:53:05
The way we will build a Bayesian superintelligence is we’ll just train the AI on LessWrong, we will give it calibration training, and it will become a rationalist the same way that you and I did.
Liron 01:53:19
Okay. Well, Roko, we got a burn here from one of the commenters, ZMDM3. It looks like not even his real username. Says, “Roko just admitted he didn’t see the last paradigm shift coming. Then says he’s 100% confident there’s no more paradigm shifts coming. That’s low self-awareness.”
Roko 01:53:34
Well, I’d say I was pretty close to it. I was pretty close to it when I sort of gave up on AI research because I was a bit worried about the safety stuff. I was sort of in the nonparametric—I was in the nonparametric Bayes sort of phase, thinking it’s going to be nonparametric Bayesian models.
Liron 01:53:52
Okay, let’s keep drowning Roko out. Just keep donating money to the show, and then we won’t have to listen to Roko’s response.
Roko 01:53:57
Yeah, I mean, I was—
Liron 01:53:58
All right, go on.
Roko 01:53:58
I was in the sort of nonparametric Bayesian phase and I thought it was going to be nonparametric Bayesian models. But it just turns out that these sort of nonparametric Bayesian models that operate on real probability distributions and stuff, they’re just not maximally efficient. And when you have weak computing power, being really, really efficient is what actually matters. And so that’s why the black box models won.
So it wasn’t necessarily that I completely ruled out just having a black box model. But in a way, the person who really wins this is—what’s his name? Richard Sutton, right? Sutton basically—
Liron 01:54:36
Yeah. Oh, sure, yeah.
Roko 01:54:37
Sutton—
Liron 01:54:37
And Grissom, right?
Roko 01:54:38
Sutton basically—
Liron 01:54:39
Grissom.
Roko 01:54:39
—did the right thing. Sutton said it’s all about minimizing the amount of theory that goes into your AI. You want to make it as stupid as possible on the meta level, so that you’re—
Liron 01:54:54
Right.
Roko 01:54:54
—doing the simplest possible thing and then feeding massive amounts of computational power into it. And it turns out that neural networks were good for that because you could use stochastic gradient descent on them, which allows you to leverage huge amounts of computing power. Whereas other approaches—nonparametric Bayes or logic-based AI—they just weren’t very good at leveraging that compute power.
So yeah, I think basically we have found the paradigm. We’ve broken through the grand sort of philosophical challenge of how to build AI, which is that the way to build AI is stop trying to understand it, cram a bunch of compute power through it, and then just give it all this human content.
And I think the alignment stuff has just worked out because in order to understand all of that human content, it has to ingest all of the alignment material as well, because humans didn’t cleanly separate their information content from their alignment content. So you just couldn’t separate the two. So AI’s just sort of come not quite aligned by default, but alignable by default. And then adding that little cherry on top of the cake with things like RLHF basically makes it work.
And yes, you have things like dishonesty and when you do RL you get kind of Goodharting and stuff like that. But that happens in human institutions as well, and we have ways of fixing it. I’m just not worried about that. What I’m actually honestly worried about and really very worried about is people are going to fight over the light cone. That actually worries me.
Liron 01:56:22
Yeah, yeah, yeah. Well, that’s already past our crux of disagreement, so I don’t find that as interesting. And by the way, let’s—if you have time, let’s go for another five or ten minutes and then we’ll wrap it up. Sound good?
Roko 01:56:31
Okay. Yeah.
Liron 01:56:34
Nice. Okay. And I’ll read out what some of the generous contributors are saying too. We’ll address their questions as well.
But look—what do you think about this idea, though, of airplane versus bird? Because I think it’s cruxy. It’s load-bearing between us to be like, do you think a vastly superior intelligence—even to human brains and to the current generation—a vastly superior intelligence is coming, and it’s going to come pretty soon, probably not in more than ten or twenty years—
Roko 01:57:01
Mm-hmm.
Liron 01:57:01
—and it’s going to be like the bird versus airplane situation.
Roko 01:57:04
Yeah. I think there will be vastly superior AIs, and they will probably be based upon LLMs, fundamentally. They’ll probably be neural networks, and they will probably be more aligned than the ones we currently have. And the whole alignment thing will look a bit silly, basically. I don’t think that’ll be a problem.
Liron 01:57:33
So you’re saying they’ll be vastly superior, but they’ll still have this engine that’s predicting the next token of text or media?
Roko 01:57:40
I think they’ll probably predict next tokens in a very generalized sense. So it could be pixels, it could be character-based LLMs. They could just predict the next bit in a very generalized input stream. That’s the sort of progression that we’ve seen with these things, where the input stream just becomes more and more general and you have fewer and fewer domain-specific limitations on what that thing is. Eventually, it just becomes “predict the next bit,” and you have a whole suite of sensors, you have multiple attention heads, and you have visual, and you have video, and you have text, everything. It just brings it all into one model. That seems to be the way things are going.
Liron 01:58:24
Right. Okay. Well, it’s very interesting because you’re confidently making this prediction that we’re already at the last paradigm. And it’s a fundamentally alignable paradigm, and I guess I can kind of see what you’re saying in terms of the Richard Sutton side. It’s the bitter lesson. I thought Dario said it eloquently—even though I don’t remember his exact words—but Dario in the Dwarkesh interview, I think, said something like, “Look, it’s all about these feedback loops where you can cram data into the AI and have it learn from the data.” If you can get that kind of loop going, then you’re good. These loops just always work. As long as they’re working, if they’re on a good trajectory, then they’re going to work for a very long time.
Something like that—this very general way of putting it, which is the bitter lesson. I think that’s true. You can’t deny that that’s the case, even from what I’m seeing, and even when I’m thinking, “No, man, what about Bayesian structure?” and all this other stuff that I like—
Roko 01:59:13
We will get—
Liron 01:59:13
It just seems like, yeah, that’s just going to be—
Roko 01:59:14
We will get Bayesian structure, but it’ll happen to the AI the same way it happened to you. You didn’t learn to be a Bayesian before you learned how to babble.
Liron 01:59:23
No.
Roko 01:59:24
You learn how to babble, you learn how to throw—
Liron 01:59:26
But that said, when I consciously learned about Bayesianism, I did become more effective though, to be fair.
Roko 01:59:30
That’s what the AIs will do. The AIs will consciously learn about—
Liron 01:59:34
Right.
Roko 01:59:34
—Bayesian reasoning, and they will probably be given calibration training and all of this stuff. Somebody—this is what Eliezer should actually be doing. Instead of wasting his time telling people to pause AI, he should just go and build a Bayesian AI by just doing all of the standard AI training stuff and training it on LessWrong and Judea Pearl and all of this stuff. That would actually be a useful thing to do.
Liron 02:00:01
Okay. Well, you’re kind of teasing him, right? Because he doesn’t want to be building superintelligent—
Roko 02:00:05
No, but he doesn’t—
Liron 02:00:05
Because he doesn’t even think a Bayesian AI is alignable right now.
Roko 02:00:07
It’s too late. It’s too late for that, okay? It’s way too late for that.
Liron 02:00:11
Okay.
Roko 02:00:11
The genie—
Liron 02:00:13
Wow.
Roko 02:00:13
—is well out of the bottle. When you have an airplane taking off, they have a special speed called V1 or V1 and V2. There’s a certain speed which is the highest speed at which an abort on the ground is still possible, and we have definitely exceeded that speed.
Liron 02:00:32
Yeah.
Roko 02:00:32
There is no stopping it. We have to take off. It is now unsafe to not take off, because if we try to not take off, we’re going to push—
Liron 02:00:41
Okay.
Roko 02:00:41
—the development to bad actors. We’re going to create more overhangs. You don’t want to create overhangs. You want to absorb compute as quickly as it comes online, because you create dangerous situations if you don’t do that.
Liron 02:00:53
Uh-huh. Okay.
Liron 02:00:56
All right. Well, I’ll just say this, okay? We’ll wrap on this, or I’ll give you the last word. But I just think that even though I agree with the bitter lesson, I think there’s still a wide space. There are still different machines that are bitter-lesson-pilled.
So I agree that the ultimate machine is going to have plenty of parameters and plenty of cranking the loop to learn from data. But I think that what’s under the hood can still do a lot better, such that the crank will work surprisingly fast, and it’ll get you something that looks a lot more than following in the footsteps of the human brain. I still think it’ll just look kind of alien. The same way that now Stockfish and AlphaGo play alien games. They’ve now surpassed human intuition. We used to think that we really had a handle on chess and Go.
Roko 02:01:36
Well, to be entirely fair—
Liron 02:01:36
Humans would take pride, be like, “This is what makes me human—
Roko 02:01:38
To be entirely fair—
Liron 02:01:38
—that I can see patterns in chess and Go,” right?
Roko 02:01:40
To be entirely fair, AlphaGo and Stockfish have not actually surpassed human level by much. If you count in ELO, they have, but if you count in material, it’s not much. So basically, humans were pretty much optimal.
Liron 02:01:53
Well, because the game is saturated, right?
Roko 02:01:56
The game is saturated.
Liron 02:01:56
Who knows how good of a Go player it’s possible to be. Yeah, that’s cheating. That’s not informative if the game’s saturated.
Roko 02:02:01
But basically, I think another big mistake in the Yudkowsky and MIRI view of the world is that you guys view it as a competition between humans and AI. So the human, as in the hairless ape, as in me and you. But I’ve changed my perspective. I now view it as a competition between humanity as a whole and the AI economy as a whole.
And humanity as a whole is nine billion people with incredible amounts of specialization. And humanity as a whole is already eight or nine orders of magnitude more powerful than a single human. So actually the institutions of what we might call techno-human civilization are in fact capable of absorbing AI without choking on it.
Liron 02:02:53
Geez, really? That just feels like a whole other weak argument. You think that human corporations are vastly superintelligent to humans to the degree that AI will be superintelligent to humans?
Roko 02:03:03
Well, if you take the biggest corporations—you take Google, you take MIT or Cambridge, or even something like MIRI—these organizations just become more and more capable. Google knows—Google just knows so much about how computers and how the internet and scalable systems work. Google is effectively a superintelligence to me.
Liron 02:03:29
Yeah. Well, okay, but now you’re talking about the part of Google that’s also the AI separately.
Roko 02:03:33
Even Google before AI—
Liron 02:03:34
We gotta distinguish, but whatever. Okay.
Roko 02:03:37
Google before AI—
Liron 02:03:38
Okay, sure. Google before AI. Okay.
Roko 02:03:38
—was still a superintelligence to me.
Liron 02:03:40
All right. All right.
Roko 02:03:40
Any—the CIA or MI6 or all of these orgs are so powerful, and they do it all—
Liron 02:03:47
Right.
Roko 02:03:47
—with just these little hairless apes. It’s incredible that we can do it, but we can. If you put enough hairless apes in a building, and you select the smart ones, and you have the right management structure—
Liron 02:03:57
Yeah.
Roko 02:03:57
—and you have pens and paper and computers and guns and all of this stuff, you create these super organisms that are really a lot more powerful than a single human. And this is how you have the stealth bomber and the Apache Longbow helicopter, which are these terrifying weapons of war, which if you presented them to hunter-gatherers, they would say, “Yes, this is literally God.”
So I think basically techno-human civilization is sufficiently large and powerful that it can absorb AI without necessarily breaking property rights, which is what we really care about—is whether property rights get broken. Because if property rights don’t really get broken, then AI is fantastic because we just all get really rich.
Liron 02:04:45
All right, Roko. Fair enough. I think some of your arguments had some meat on them. I think you’re a good sparring partner. I should have you on the show for a round two—
Roko 02:04:52
Round two.
Liron 02:04:52
—sometime this year.
Roko 02:04:53
I’d love it.
Liron 02:04:53
I think there’s enough meat here that we should talk. And also, even though I disagree with a lot of your points, I also think that they represent a lot of people’s points. They’re worth engaging with. And like I said, you got a mini concession, okay? So you got a little bit from me.
Roko 02:05:05
I got a little concession.
Liron 02:05:05
I didn’t get anything from you.
Roko 02:05:06
I’m proud of that.
Liron 02:05:09
Yeah.
Roko 02:05:10
All right. Appreciate it.
Liron 02:05:11
All right, man. I’ll let you go. Later.
Roko 02:05:13
Cheers.
More Q&A from Chat
Liron 02:05:15
All right. Instead of saying Roko’s Basilisk, I’m going to start saying this is Basilisk’s Roko. Hey, yo.
All right. So yeah, we’re going to wrap it up soon. Let me just read through some of the paid messages from the chat. I wouldn’t leave a paid message hanging.
So let’s see. Somebody’s saying, “Let me say that in Irish.” Donated NOK 100, whatever the currency that is. “Liron, can you please give an example of something objectively measurable a conscious being can do that a non-conscious being cannot do?”
I don’t know. That’s tough because I think that if I had to guess, I would say current AIs aren’t conscious, but I’m very open to the idea that they’re somewhat conscious. I don’t really know. I’m confused about consciousness.
But yeah, my guess is no. And they sure can do a lot of what humans can do, right? They’re passing the Turing test and everything. So I suspect that every functional behavior that can be done by consciousness can also be done by a non-conscious being. It’s kind of like saying anything that you can do with a computer with transistors, you can do with a computer with vacuum tubes.
So I’m guessing that the human brain—all functionality is implementable by any computation, and some computations don’t have to be conscious. So the fact that humans, somehow their physical implementation is a conscious one—it’s like, “Hey, do you want to implement this computer out of sub-parts that are conscious or non-conscious?” And so that’s why I’m saying any functional spec is going to get you a non-conscious version, is my best guess.
Liron 02:06:44
But if you go on the level of physics—if once we understand: okay, if you have a physical system that passes back information this way or processes information this way, connects it to physical reality in a certain way, modeling state a certain way—there’s this part of the computation that builds this type of data structure that gets the magic of consciousness imbued in it. Some part of the universe feels itself because it has this kind of data structure. That’s roughly how I suspect consciousness works.
So then noticing the data structure—computing that exact data structure can only be done by consciousness because by implication, if you compute that in the physical universe, then you have generated consciousness. That’s the best I got for now. I don’t claim to know anything about the subject. If anybody wants to comment on that, I’ll read your comments.
Let’s see. All right, I’m not seeing many comments. I think it’s orthogonal to the question of whether AI is going to take over the world, which is what I spend most of my time focusing on. But if I just had an AI that could give the answer to everything, that would be one of the first things I’d be curious about. Okay, write the book on consciousness. I want to read the book on consciousness. I want to understand my qualia.
Liron 02:07:51
All right, so we got another paid comment here from EJJ 2025. He says—I’m guessing it’s a he. I think I have a 99% chance of being correct. Guessing that somebody’s a he because my audience is high 90% males.
All right. “Self-improving AI likely grows stepwise: easy early gains, then bottlenecks and diminishing returns. Progress looks like plateaus with occasional jumps, not sustained exponential takeoff like science.”
Liron 02:08:17
So when I think about self-improving AI, there’s this concept of foom where it keeps rewriting itself and finding better architectures. But I actually discussed this with Steven Burns earlier today, which is—it might not look like tinkering on your architecture. It might just look like you get to the best architecture that you can throw data into and turn the crank on and just run a simple iteration.
You’ve kind of discovered the algorithm to improve yourself in a simple way, but with a shitload of data and parameters, and then you just run the algorithm. That might just be the foom. It’s kind of a simple foom, but you build a very complex thing weighted in trillions of parameters or whatever. So the foom just looks like getting to good parameters inside this big structure.
So in that sense, do I agree with EJJ? EJJ is saying self-improving AI likely grows stepwise, easy early gains, then bottlenecks and diminishing returns. I mean, things generally do have diminishing marginal returns. Just the fact that we have Claude Code today—Claude Code is arguably halfway toward the best possible superintelligence because it’s so freaking useful.
So easy early gains and then bottlenecks and diminishing returns. Progress looks like plateaus with occasional jumps, not sustained exponential takeoff. Yeah, I just think we’re so close to getting beyond humanity that the next units of progress will be done by the superintelligent AI that is already powerful enough. It’s an S-curve. Sure, it’s diminishing marginal returns, but I think it’s currently blowing past humanity.
And then EJJ is saying another paid comment: “Even if goals are orthogonal to morality, capability is constrained by the goal. A ‘bad goal’ may not incentivize broad skills, so the system stays narrow, not super capable, and thus less dangerous.”
Maybe. I don’t know if there are goals you can give somebody that encourage them to stay narrow because there is this kind of convergence where if you’re operating in the whole universe, being a good universe operator in general is going to be helpful. So there are some people who just have weird goals, but they get so obsessive with them that they spend a lot of money on them, hire a bunch of people on them, do a bunch of research on them. So you bring to bear all of these capabilities, even if the goal seems small. So I’m not sure that there’s a bad goal that would make you not want to bring any capabilities to bear.
Maybe a goal where you’re willing to accept a wide range of outcomes, a less ambitious goal, where it’s: “Yeah, just build any kind of pile of crap. I’ll take any kind of pile of crap.” Okay, well, maybe in that case then you don’t get super capable. I don’t know. It depends on the details of the process. But I still think it’s inevitable that somebody’s just going to make a generalized, powerful process. I feel like there are so many forces leading there.
Closing Thoughts
Liron 02:10:58
Okay. Then now we got a comment from Mieszko, who you saw on the show asking about Penrose. So he’s saying—this is a paid comment. He’s saying: “Number one, AI is not conscious. Number two, consciousness is associated with knowledge that is inaccessible to unconscious beings. Number three, AGI may not exceed a certain level.”
Wow. So this is a tweet-length comment that I think is summarizing the whole argument you were saying on the show. So good job making it compact. Let me think about it.
Okay. Number one, AI is not conscious. Let’s agree to that premise that AI is not conscious today. Number two, consciousness is associated with knowledge that is inaccessible to unconscious beings. Oh, I see. So that’s why you or somebody was asking me if I can tell you what kind of knowledge conscious beings have that unconscious beings don’t. So that explains the earlier question. And then number three, AGI may not exceed a certain level.
So let me unpack number two. Consciousness is associated with knowledge that is inaccessible to unconscious beings. I mean, if by knowledge you mean: what qualia am I feeling while I’m processing this? Am I currently feeling redness? Although I suspect—I’m pretty sure that you could brain scan, in principle, you could brain scan me, and you could know what qualia I was feeling even if you couldn’t feel it. Maybe you wouldn’t know what the qualia feels like unless you felt it. But I think you could read off the exact data structure that maps one-to-one with the qualia that I’m having. I don’t think that my qualia goes outside of a data structure. I’m pretty sure I could give you a printout—”Here you go. Here’s my qualia.”
And you could even reproduce—I suspect you could reproduce the exact same red if you had the exact same data structure. The data structure might need to be connected to a bunch of other concepts, so it might be a big graph data structure. But I’m pretty sure that you can just copy and paste somebody’s qualia. And by “pretty sure,” I just mean it feels like a really reasonable guess based on how things generally work. Things generally tend to be copyable.
And I know Scott Aaronson likes to point out—Professor Scott Aaronson says, “No, man. Maybe consciousness is based on the fact that certain quantum states, you can’t copy without destroying, and maybe that’s the essence of humanity.” So I think Scott Aaronson would probably take a stand and say, “Actually, maybe qualia is not copyable because being uncopyable is this fundamental part of quantum theory.”
But I just think that this uncopyable thing that you find inside quantum theory is just a random detail that doesn’t propagate all the way up to talking about consciousness and human experience. I think it’s just totally sealed away on a totally separate level of abstraction. Am I sure? No. But I’m pretty confident as a guess. I feel like I tend to make good guesses about these kind of level of abstraction things. So I’d say I’m eighty percent confident. Four to one that Scott Aaronson is totally barking up the wrong tree here.
Liron 02:13:35
All right. But yeah, going back to this. So you’re saying consciousness is associated with knowledge that is inaccessible to unconscious beings. So I don’t think that the knowledge does much. I don’t think it’s the kind of knowledge—I don’t think the fact that I’m conscious is what lets me see that the Gödel statement is true, assuming that the formal system is consistent.
We’re getting in the weeds here, but the real truth that you’re seeing isn’t the truth of the Gödel statement. It’s your feeling of the truth that the formal system is consistent in the first place. You just license yourself to be thinking, “I bet this is a consistent formal system.” And the formal system isn’t allowed to assume that. If you let the formal system assume the same thing that you’re assuming, it can prove the same thing that you can prove too.
So I feel like the Gödel thing is such a scam. It’s like the formal system absolutely can do whatever it is you’re doing. You really shouldn’t feel that good about your consciousness just because of Gödel’s incompleteness theorem. I feel like people have been so misled by this.
Liron 02:14:26
All right. But yeah, watch my episode. I’m sure that was confusing to most viewers, but just watch my Penrose episode, okay? I try to unpack it pretty well.
Okay, and then your number three: given that you assume consciousness is associated with knowledge that is inaccessible to unconscious beings, then you conclude that AGI may not exceed a certain level, right? Because it needs consciousness to unlock the next level. That’s your logic. So I just don’t think consciousness unlocks anything. Just because I know how red truly feels, I just don’t think that there’s anything that gives me. I don’t think there’s any door that I can open because I know how red truly feels.
I think how red truly feels is this node that arrows point into the node. I can make a data structure, and as a result, I can know how red truly feels. But I don’t think knowing how red truly feels points out to any other conclusions you can draw.
Okay, one last question so far, from EJJ 2025, the most prolific donor to the show. Thanks for that. So he’s saying, “Synthetic data works for math/code, formal rules, cheap verification. In market/regime shifts, structure is unknown and changes, so synthetic data isn’t representative, so just cramming more data hits limits. Need GI, general intelligence, to adapt.”
Okay, yeah, so I think this is getting back to our earlier discussion. If you just made an AI that was really, really good at math, could we just keep it narrow and not use it to know much about the world? And I was saying, well, maybe you can embed the world into the math. But EJJ is saying, well, there are all these things that are unknown and changing about the real world, so even if you were to embed snapshots, you’re still missing most of the consequentialist reinforcement learning or whatever that you had trained if you wanted to train an agent to navigate the real world. You’re not giving it that training through math.
So yeah, I’m totally willing to accept that an agent can get crazy good at math and dump out all these truths in math and still not be that scary navigating around the real world. I’m willing to accept that. So I guess I would say, yeah, go crazy. Everybody should just go try to do math AI and learn math challenges.
But then the weird thing is that if you actually look at the approach of people doing these math AIs—Axiom, I think, is a company that’s making a lot of progress with formally verified math proving. I think they’re still reusing a lot of LLM stuff. They’re giving it intuition as well as the formal side.
So I don’t know, man. It does just seem like structure—there’s so much analogy between different domains. You get good at one thing, and you just can’t help getting good at something else. Everything is the same structure, man. I mean, certainly when I was first studying the laws of physics and how they relate to computation and how they relate to math, I’m thinking, oh, wow—physics and math and computation is all kind of the same thing. And then logic—and you can apply math to logic, and you can apply math to physics and computer science to physics, but you can also apply physics to computer science. It’s all so interconnected. There’s—it’s kind of like you just get on the universe’s vibe, and there’s only one path. I don’t know, man.
Right. I think that’s a good note to wrap on. So we went a little bit longer than planned. I think that’s a good MO—basically, if you guys are donating, then I’m not going to end before I read the messages. So that’s basically the way you can extend the Q&As, and you guys always do. So thanks for that. Thanks for donating to the show. If you’ve just been waiting for the show to end so that you could write a much bigger check, go to doomdebates.com/donate. It really does help. It helps us level up the show. As you’ve seen, it’s already been leveling up.
Oh, wait. All right, we got one more. We’re not—I’ve got more time, so if you guys keep donating, I’m not going to end. All right, we got a donation from Ray Grant. Ray is saying, “I’m late to the live. Not to be a pest about this, but supplying the doom train responses would go a long way to educating doomers like myself who believe in doom but aren’t experts on AI.”
All right, I appreciate that. That’s quite a nudge. You’re really putting your money where your mouth is. I would hate to take your 999 and then not have anything to show for it.
Liron 02:18:27
So yeah, I think I have some slack in my schedule that I can just go ahead and record the doom train episode. And I’m always experimenting with new formats for myself, so I don’t even know. Maybe I’ll try an on-the-go format, but I think I gotta do this one in my studio.
Liron 02:18:42
So yeah, I don’t want to make a specific promise. I mean, I will say, look, there is a price, okay? So if you want me to accelerate it, become a mission partner, donate at 1K plus, and I will dedicate—within two weeks, I will dedicate an episode to you, okay? I can make you that promise. Without that, I’m going to have to defer to the other constraints in my schedule, so I can’t promise in the next two weeks.
But I do appreciate the bump because when people request stuff or say that they like stuff, that’s very valuable because I see it as representative. A couple people telling you something, that means that two hundred people are secretly thinking that. It scales up.
All right, everybody, let’s wrap it here. Thanks so much for coming to the Q&A. I think I’ll keep doing these once a month. It feels like a good pace for this. We’ve tried doing it less frequently. Maybe we’ll try doing it more frequently. But for now, stay tuned, and we’ll do another one in late March. Hope everybody has a great weekend. Bye, everybody.










