In December 2023, I joined the Theo Jaffee Podcast to talk about a wide range of topics including non-AI x-risks, where I disagree with Eliezer Yudkowsky, cryptocurrency, and if P(Doom) is rigorous.
This episode was recorded 6 months before I'd started Doom Debates. Let's look back with the power of hindsight and see if my answers still hold up.
Links
#10: Liron Shapira — AI doom, FOOM, rationalism, and crypto —
Watch Theo Jafee now on MTS — https://www.mts.now/
Timestamps
00:00:00 — Theo's Introduction
00:01:03 — Is Liron Worried About Non-AI Existential Risks?
00:03:36 — Suffering Risks
00:05:22 — Is P(Doom) rigorous?
00:09:42 — Is Eliezer Overestimating P(Doom)?
00:12:18 — Where Does Liron Disagree with Eliezer?
00:15:43 — What Would Change Liron's Mind
00:17:14 — Elon Musk's AI Timelines
00:18:34 — Is xAI Making Things Worse?
00:20:24 — The Case for an AI Manhattan Project
00:22:32 — Are Superforecasters Wrong About AI Risk?
00:26:02 — The Race Against Time for Alignment
00:28:01 — Headroom Above Human Intelligence
00:33:31 — Vitalik's d/acc Framework
00:35:23 — Edging Toward Superintelligence
00:38:21 — From Chatbot to World-Ender
00:41:55 — GPT Paradigm vs. AlphaZero
00:43:18 — Critiquing AI Optimism
00:48:29 — Deceptive Alignment and Gradient Descent
00:53:31 — Does Nice Training Data Make Nice AI?
00:57:57 — How Do You Live with 50% P(Doom)?
01:00:08 — Why Have Kids If the World Might End?
01:02:16 — Israel vs Hamas
01:06:15 — How LessWrong Changed Liron's Life
01:08:42 — Rationalism and Effective Altruism
01:14:49 — Why Blockchain Has No Use Case Beyond Cryptocurrency
01:22:13 — Charlie Munger and Richard Feynman
01:24:38 — Closing
Transcript
Theo’s Introduction
Theo Jaffee 00:00:00
Welcome back to episode 10 of the Theo Jaffee Podcast. Today, I have the pleasure of interviewing Liron Shapira. By day, Liron is an entrepreneur, angel investor, and the CEO of counseling startup Relationship Hero.
By night, Liron is deeply involved in the Rationalist Movement and is one of Twitter’s most prominent advocates for AI safety. As usual, we go in depth on various aspects of the AI doom debate, where he agrees and disagrees with Eliezer Yudkowsky, the various AI and non-AI risks that humanity faces, the differences between human and ASI intelligences, and his critique of Quintin Pope and Nora Belrose’s AI optimism movement.
We also talk about how a high probability of doom impacts his personal life, his background in the rationality community, and his skeptical views on the crypto industry. This is the Theo Jaffee Podcast. Thank you for listening, and now here’s Liron Shapira.
Hi, welcome back to episode 10, tenth episode of the Theo Jaffee Podcast. Here today with Liron Shapira.
Liron Shapira 00:01:01
Theo Jaffee Podcast, I’m a big fan. I’ve been listening to the catalog.
Is Liron Worried About Non-AI Existential Risks?
Theo 00:01:03
Glad to hear it. So let’s get into some of our first questions. Obviously we know that you’re very interested in and worried about existential AI risk, but how worried are you about non-existential AI risks? Especially because more and more powerful AIs are drawing near. We saw a demo just a day or two ago of text to video that looked decent for the first time.
So non-existential risks — jobs, what if we end up in a future with aligned superintelligence but humans lose agency or meaning. Just anything in that category.
Liron 00:01:40
So when I think about the non-AI existential risk, I’m not super worried, but a couple things come to mind. Nuclear risk and bio risk would be the top two, I think, below AI existential risk. I think nuclear risk is profoundly underrated.
It’s been described as something like 1% per year. Maybe if you look at the rest of the century as a whole, I might put it at 15% chance of doom, maybe 20, because maybe the risks are correlated, so it’s not like independent events of 1% per year. I think nuclear risk is underrated, and I know that people love to say, “Oh my God, people are overblowing nuclear risk. It gave us nuclear energy. Focus on the nuclear energy. Nuclear energy is safe.” And they’re right that nuclear energy is safe,
but that doesn’t justify how risky nuclear explosions are. Hello, we still have these arsenals, okay? Let’s not forget. And yeah, it’s great that nuclear power plants are good power plants.
But nuclear risk is still sitting there. These 50 megaton devices are still sitting there. And there’s all these incidents where they almost went off. I just think it’s underrated. And maybe I would be a big nuclear doomer, but it’s just hard for me to focus on that kind of thing when I think that the AI doom probability is 10 to 100 times greater. So I’m like, “Okay, great. Put that aside. That’s not my cause.” But that might be my runner-up cause.
Theo 00:03:00
Yeah. And then more not existential risks that are not AI, but AI risks that are not existential.
Liron 00:03:07
I gotcha. Okay, that’s an important distinction. I tend to not really be concerned about the AI risks that aren’t existential unless they’re near existential. So if we’re talking about humanity is all slaves to the AI but we’re still kept alive with morphine,
I guess I’m pretty worried about that. Well, I just think that’s not plausible, but I would consider that pretty bad. But then if you go down to social media is gonna be more addictive, then I become less concerned.
Suffering Risks
Theo 00:03:36
Do you think S-risks are plausible?
Liron 00:03:38
I do think that S-risks are plausible. So it’s the idea — suffering risks, for the listeners — it’s the idea that we’re creating these amoral agents, moral persons. Within the AI, maybe it’s just trying to simulate what a human would say, but that simulation is a person or it has moral value, and it’s hard to prove that there’s not a moral person inside of these AIs.
I mean, presumably there’s not yet because they’re not quite powerful enough. But as they grow more powerful, it’s very plausible to me that they can have a consciousness within the inscrutable matrices, and they can have somebody that has rights or that you don’t want to harm. So that’s very plausible, and we’re just confused about consciousness. We’re confused about morality beyond humans and the animals.
I think S-risks are very plausible. And then turning the tables, that’s us causing harm to the AI, but then the AI could also cause harm to us or to copies of us. So I definitely think we could enter a hell where we’re all getting tortured for trillions of years. I think that’s a plausible outcome.
It’s just not quite my mainline outcome. My mainline outcome is we just kind of all get swept away, and we just get paperclips or something that happens to not be conscious and not be interesting. That’s kind of my default.
Theo 00:04:44
By plausible, how likely do you think that is?
Liron 00:04:49
How likely do I think an S-risk universe is? I don’t know. Probably less than 10%, ballpark. I’d say more than 1%. That’s a very rough ballpark. I definitely don’t want to write it off. It’s just that if we’re even talking about that, it’s kind of like we’ve already gone pretty far where I’m trying to push the discussion right now. That’s the discussion I want to have. I would love to be like, “Hey, are we all gonna just die unceremoniously and have the universe burn itself out with no consciousness, or is there also gonna be tortured consciousness?” If that was the dichotomy, I’d be like, “Great, let’s have that discussion.”
Is P(Doom) rigorous?
Theo 00:05:22
Well, speaking of probabilities, the notion of P(Doom) has been dunked upon a lot recently, including the clip you posted of my podcast where I was asked about it.
Liron 00:05:31
That’s right. You got a good dunking there for sure.
Theo 00:05:34
Yeah. And so people say it’s not rigorous, and even someone as prominent as David Deutsch said basically, “Oh yeah, the steps to getting a P(Doom) are like pick a number between zero and one, not too far or not too close to either of those bounds, and then you’re done.” So first of all, what is your P(Doom), if you have one? And second of all, how rigorous do you think your methods of getting it are?
Liron 00:06:02
So my P(Doom) is 50% by 2040, which is, like Zvi said, like Jan Leike said, a ballpark figure, so you can also call it 10 to 90. And this is when the dunks come out, the knives out. People are like, “You’re just making up numbers. How is 50 the same as 10 to 90?”
So just to give a basic explanation, if you just need a single probability, which you do for the purpose of decision-making, then you can go with 50% by 2040. There is your single probability. Why give a range? One way to explain a range is that it’s the variance of a Monte Carlo simulation of different mental models about likely possibilities that I might have.
So I could be like, oh, there’s a possibility where the world gets its act together and coordinates to stop AI. That’s one mental model, and there’s a totally different mental model where we just accelerate as hard as we can, and then the AI fooms. So you have to — there’s so many different mental models that are all feeding into this one probability. It’s crazy to compress it down to one dimension, and yet you have no choice because when you make decisions, when you do expected utility, you have to plug in a probability number. There’s only one future. So all you can do is weight things that could have influenced the possible future. So anyway, that’s why I say 10 to 90. That’s why Jan Leike says 10 to 90.
And then people have so many objections. They’re like, “Where did you get the number from?” For that, I’d say think about the ballpark, think about the order of magnitude. So if I say, hey, 50.0 or 53.25, then it’s like, whoa, okay, I’m making up a number. But if I come at it from the other way and I’m like, hey, I bet the probability is a lot higher than 0.01%,
suddenly I’m saying something pretty obvious, because you can imagine so many scenarios that are plausible. Maybe foom is real. Okay, don’t you think there’s at least a 0.01% chance that foom is real? So if I slide all the way back to 0.01%, at some point you start subjectively telling me I’m obviously underestimating this.
So 50%, suddenly I’m an idiot pulling numbers out of my rear. 0.01%, okay, I’m obviously underestimating. So if you just become more continuous with how you react to what I’m saying, there’s gonna be some happy medium where I’m saying something when you’re like, okay, this seems vague, this seems rough, yet you can’t do better and you have to give a number. See what I’m saying?
Theo 00:08:17
Yeah. Well, I guess one exercise in P(Doom) is we’ve had atomic bombs for eighty years now, and you could say —
Liron 00:08:24
Mm-hmm.
Theo 00:08:24
— maybe the probability of nuclear doom in any given year was, what, one to five percent, something like that. And yet we are still here, and it seems quite unlikely — not totally unlikely, but quite unlikely — that we’ll be vaporized by nukes within the next few years.
So could it be possible that your intuitions for P(Doom) might be higher than it would actually be in real life, especially over long time periods with robust systems like civilization?
Liron 00:08:56
So you’re using the example of we’ve had nukes for eighty years, and let’s say that there was a one percent chance that they could annihilate more than ten percent or even fifty percent of humanity. So every year we’re rolling the dice, and we only have a ninety-nine percent chance to survive, one percent chance to die. So it looks like ninety-nine percent to the power of eighty is forty-four percent. So surviving a century is only a coin flip.
So I’m pretty content to be like, okay, we got lucky on a coin flip. I don’t think that my model of one percent a year nuclear risk is invalidated, and especially when you look at where the model comes from — you almost have these things go off. You have Cuban Missile Crisis, you have Petrov, you have safety checks on a test flight over Spain, three out of four of the safety things failing. There’s near misses.
Is Eliezer Overestimating P(Doom)?
Theo 00:09:42
So when you talk about ten to ninety percent P(Doom), you mentioned once you get into too low numbers, you’re obviously overestimating it. So do you think of ninety-nine point five percent, which is Eliezer’s number of P(Doom), as like, whoa, you’re obviously overestimating it just like you would with a point five percent?
Liron 00:10:01
With Eliezer, I think that he would probably agree with my perspective, which is that ninety-nine point five percent is kind of the on-model probability. So if you understand what Eliezer does about the relevant theory — optimization processes, computational processes — he’s an expert at a lot of the relevant theories, and he’s like, based on my understanding, what AI labs are trying to build is something like a perpetual motion machine. And so my model just doesn’t say that this can proceed with a significant probability of success.
It’s kind of like, hey, a bunch of people are building a rocket, the first rocket that anybody’s ever built is gonna try to orbit the Earth. There’s just a very low probability of success on model.
But I think Eliezer would agree with my own claim, which is, okay, but you never know unknown unknowns. There’s probably a one percent chance that it’ll be revealed to be true what a few people are accusing Eliezer of, that he’s completely clueless and his rationality makes no sense and his probability makes no sense. And that could be revealed, that we’re all just clueless people. Some people are urging us to see that reality already. And just for that, you have to give a one or two percent chance, just of that. So there’s the off-model probabilities that I think Eliezer would admit are worth mixing in a little bit.
Theo 00:11:12
You said ten to ninety percent, fifty percent by 2040. What about 2100? Is it significantly higher or the same or lower even?
Liron 00:11:20
I think it’s highly correlated. So I think if foom is gonna happen, it’ll slightly more likely probably happen before 2040. I think if you go to, let’s say, 2060, then I’d probably push it up to maybe sixty percent.
It’s hard to push it beyond sixty percent because when I quote the figure, I give myself a lot more just unknown unknowns. I’m clueless. I’m not as confident in what I’m saying in general as Eliezer is, which I think he has a right to be more confident. I do think he’s a master of a lot more relevant theory than I am.
So I don’t think it goes that much beyond fifty percent because I just start getting into the “I don’t know what I’m talking about” range of things. But you can definitely push it to sixty, maybe seventy if you go all the way to 2060.
When you go past 2060, at that point, it’s like, well, what’s going on? Why hasn’t it foomed yet? So at that point it starts undermining my assumptions. So it doesn’t necessarily get higher because it also gets lower. And so I don’t really know what happens to it.
Where Does Liron Disagree with Eliezer?
Theo 00:12:18
So you respect Eliezer a lot, and you think that he knows much more about this stuff than you do, but your P(Doom) is different. So why is that? Is it just because you’re less confident in his assumptions? And if so, which assumptions are you less confident on?
Liron 00:12:37
I think that Eliezer’s model makes a lot of sense. It’s just more like whenever I grill him about little things I don’t understand — like, “Wait, so RLHF breaks down when, exactly?” — I’ve had a few of these conversations with him, and he always has really good answers. But I can also tell that I have an undergraduate level understanding, and he has a more sophisticated understanding.
I expect that I’m more likely to update toward Eliezer than away from Eliezer. But I guess I’m not comfortable making the full update yet, even though there’s some principle as a rationalist where you’re supposed to update all the way. But I’m not entirely sure. I have some uncertainty.
The thing is I don’t think that we disagree that much. I think most people who are in the “it looks like we’re gonna die” camp, which I am too — I don’t think there’s that fundamental of a distinction between people going, “Hey, there’s 95%,” and people going, “Hey, there’s 50-plus.” I think we’re kind of the same ballpark, which is why when people come and tell me, “Hey, my probability is 10%,” like Vitalik just said, I’m like, “Okay, great. I don’t want to nitpick 10 versus 50. I just want you to see 10.” And I’m happy to just let you stay at 10. I don’t think you have to come to 50.
Because I do think that a lot of what I believe about reading LessWrong is just intuitions that are salient to me. But I understand that they may not always be right, and other people can weigh up their intuitions differently, and I don’t think that they’re making a big methodological mistake. I think it’s okay for them to stick with their probabilities until they observe more evidence.
Theo 00:14:11
Do you have any concrete disagreements with Eliezer?
Liron 00:14:15
That’s a good question. I don’t know if I do. We always have stylistic differences. But when it comes to the matter of AI doom and rationality, I think there’s nitpicks. There’s an article he wrote a long time ago where he thinks sometimes you shouldn’t use probabilities in certain circumstances, and that was kind of controversial. And somebody’s like, “No, just use probabilities.” And I don’t know where I come down on that.
And Eliezer famously says that he thinks a lot of animals just totally aren’t conscious. He seems pretty confident that dogs definitely have no consciousness. And I’m like, I don’t know, they seem kind of conscious intuitively. So on the edges, on the fringes, I do think that I start not following him all the way.
But on the AI doom core argument, I do pretty much buy it all. I think it makes a lot of sense. I’m definitely somebody who is a good target audience for his writing because I do think that it’s really good. I think it’s still underrated.
I notice a little bit of myself in it, where sometimes I understand something well, so I kind of know what it feels like to understand certain technical topics well. And then I read Eliezer and I’m like, wow, he understands it even better. I thought I understood it well, but he’s pointing out some stuff that is actually deeper than my own understanding of a topic that I thought I understood well. So I feel like I have a good viewpoint to understand the degree to which this guy knows what he’s talking about in a lot of these different articles that he’s published.
What Would Change Liron’s Mind
Theo 00:15:43
If you did eventually come to the conclusion that AI risk is less likely than you thought, why do you think that would be? Or do you just not know?
Liron 00:15:53
That’s a good question. It’s kind of similar to the question of just imagine — do a postmortem or a post-living — of hey, it’s the year 2060 and we’re all alive. So what’s — how do you condition on that? What mental model do you get?
Theo 00:16:07
Yeah.
Liron 00:16:08
So one easy answer is just AI progress turned out to be a really long marathon to get to superintelligence. So even though it kind of feels like we’re speeding to superintelligence, and Elon Musk is like, “Yeah, we’re gonna have AGI in three years,” and even OpenAI is like, “Yeah, we might have a corporation this decade that’s better than a human corporation that’s run by AI” — even though it feels like we’re speeding to AGI, and Kurzweil a long time ago predicted I think 2029,
maybe it’s not. Maybe it’s 2100. Maybe it’s 3000. So that would be an easy answer to why we’re not doomed yet, because it’s just everything goes slow, and maybe it goes so slow that we can do alignment research.
So if somebody just convinced me, look how slow it’s going — and I know Sam Altman said something about we’re bottlenecked on data center scale, and my reaction was, you really don’t know that. We definitely could suddenly find ourselves with a bigger hardware overhang than we realize, and one data center could be plenty. But if Sam Altman was spot on and we’re bottlenecked on data center scale, and we have to scale it up a thousand times, I mean, ideally a million times — that would be a straightforward way to convince me that we’re not doomed for a couple decades.
Elon Musk’s AI Timelines
Theo 00:17:14
Well, Elon said three years, but we all know about his record of forecasting stuff.
Liron 00:17:20
Yeah. It’s not great. I don’t think it’s terrible. But it’s definitely not perfect. And I think Rob Bensinger posted Elon’s record where I think in 2014 he said that we’ll have it by 2019. So yeah, you can’t just automatically assume that Elon’s exact forecast is right. I agree with that.
Theo 00:17:35
Well, I mean, he tends to be right about stuff in the long term, just it takes longer than he says it will. Like self-driving cars, how he’s predicted full self-driving next year every year for the last ten years.
Liron 00:17:48
Right. No, he has. And it’s kind of funny. A lot of times people kind of catch him BS’ing or they catch him being way off, and it’s like, okay, yeah, I’m starting to think this guy is not trustworthy. But then at the same time, he launches Starship and lands the rockets and I’m like, man, there’s a good enough distribution of miracles mixed with “okay, this is kind of BS but this is a legit miracle” that overall I’m pretty bullish on Elon.
But then, of course, there was a time when he started OpenAI and shortened the timeline by a few years, which Eliezer has said — I think he has a good point — kind of overshadows anything else Elon Musk has ever done to kind of stoke the AI arms race. In the end, and by the end I mean potentially in a few years, that is the single biggest impact that he’s done arguably.
Is xAI Making Things Worse?
Theo 00:18:34
What about xAI? Do you think that’s made it worse or...?
Liron 00:18:38
So far it just seems like they’re not moving the convex hull of what’s possible. So until they get there — I’m sure they’re trying their fastest to get there. If they start releasing something that’s GPT-5 equivalent before GPT-5, then I’ll be like, “Damn it, xAI,” you know, “Why does Elon have to keep making things worse?”
But for now, I guess the question remains of is Elon’s 20% project gonna be competitive with Sam Altman and Dario’s number one project? It’s probably not gonna make things that much worse. It’s hard to say. We gotta watch it.
Theo 00:19:15
Well, would Elon just drop a GPT-5 model on the world? He seems to be far more concerned about X-risk than maybe any other major AI lab leader.
Liron 00:19:25
So Elon gets massive points for, as early as the 2015 conference, coming in there being like, “Hey, I’m just a rich billionaire with a ton of credibility outside this field, and I think AI risk is indeed very dangerous.” Bostrom has a point. And he gets massive rationality points for saying that.
Unfortunately, a lot of the things he’s said about AI recently are kind of ridiculous. When he talks about, “I’m gonna make a truth GPT. I’m gonna make a GPT that’s not woke.” I guess those are valid considerations in terms of the next couple years, mundane utility, fine.
But when he says stuff like, “I think AI is gonna be nice to humans because humans are interesting,” it’s like, okay, Elon, come on, man. You have Geoff Hinton. He’s talking to these luminaries, and they should be disabusing you of these kind of notions — the idea that that is some equilibrium, that humans are gonna be interesting. Humans are anywhere near the optimum for interestingness, and so that’s gonna be some kind of equilibrium. Why are you publicly posting this stuff? The fate of the world is largely in your hands, Elon, and that is not a plausible theory.
The Case for an AI Manhattan Project
Theo 00:20:24
So there’s alignment research, and then there’s governance research. And it seems like the default political plan for rationalist decel doomers, whatever you want to call it — it’s slightly pejorative, but people who are concerned about X-risk — is slow down AI and give the authority to build AI either to nobody or to a trusted group of people. So do you worry that this increases centralization risk a lot?
Liron 00:20:55
Yeah, for sure. My position is that the actual constructive doomer plan is fraught with peril. It’s a tough plan. The ideal would be something like a trusted Manhattan Project, which seems unthinkable in today’s environment. But if we really could get together the scientists and have some level of trust and common purpose the way we had in the Manhattan Project,
that may be the single best setup that gives us a chance, as long as all of those scientists are top-tier — Nobel Prize–winning physicists, or their students, or people who just appreciate what we’re up against and are taking it seriously the same way they took the nuclear bomb seriously. I do think we would have a chance to win the race between capabilities and alignment.
But of course, today it’s so unpalatable because people don’t realize we’re in a war. They don’t realize that the enemy is unaligned AI. So it just seems like such an impedance mismatch. People are like, “What the heck are you talking about, Manhattan Project?”
But short of that, I just think time is running out, and we keep slipping farther and farther from the possibility of a good outcome. But I think we’re between a rock and a hard place because you can give a million criticisms to the doomer suggestion of let’s centralize everything in a Manhattan Project. I agree, that sucks. But the alternative is worse.
And so many people are saying you have to take it as an assumption that you have to run things for profit and China’s gonna compete with you. These things are inviolable axioms that you have to start with. And I’m like, wait, can I get an inviolable axiom that AI’s gonna kill us? Because it’s a rock and a hard place. They’re both hard situations. I just think that the AI killing us one is even harder, and we have to deal with it.
Are Superforecasters Wrong About AI Risk?
Theo 00:22:32
So Scott Alexander recently published an update of his P(Doom) from 33% to 20% based on superforecasters and the world at large thinking that AI risk is not overwhelmingly likely. So has that impacted you at all, or do you just think, no, they’re wrong?
Liron 00:22:56
So I know this was one of the controversial things from your interview with Zvi, where Zvi was able to kind of dismiss the superforecasters, which is a shocking move in the rationality sphere. One does not simply dismiss a superforecaster forecast. And he even argued with you. He’s like, actually, the fact that superforecasters are dismissing it so easily might make you update the other way, where it’s like they clearly didn’t take the problem seriously, so I’m gonna discount their opinion.
So Zvi had some pretty good arguments that I thought made sense. I don’t want to throw it out entirely, so I’m happy to update a little bit, but I don’t want to do a massive update. It’s more like, okay, I’ll slightly update down a few percent. That’s more how I feel about it, because I do think there are a lot of problems with that project.
It happened in 2022. I don’t even think that they had the milieu of ChatGPT and people getting excited and luminaries coming out, where you’re like, okay, they’re using base rates. How’s this for a base rate? A bunch of luminaries coming out and warning about a new technology.
I do think that if you look at the superforecaster methodology and you ask in what scenario might this hallowed methodology actually fail — at a methodology level, not disputing the conclusion, but disputing the methodology — I do think this looks like a good candidate for a time when they might fail.
I’ve also made the analogy to another thing that uses pure logic. This is in addition to the stuff that Zvi was saying about their incentives were wrong and they didn’t research the logic of the problem that much. Another analogy I would make is if you look at crypto, for instance. I was in the position of being a crypto skeptic when crypto was still pretty popular, kind of calling the peak of the bubble, and being like, the logic of blockchain having applications beyond cryptocurrency is flawed.
I’m not sure a team of superforecasters would have predicted a 99% contraction — a fundamental qualitative contraction in this industry based on superforecaster methodology. I don’t think there was a superforecaster tournament then, but if there were, it also seems like the kind of thing that would slip by superforecasting. What do you think about that?
Theo 00:24:59
Yeah, I mean, this superforecaster study that I was talking about with Zvi — first of all, my interview with Zvi was four months ago, and of course the survey was farther back than that, but it doesn’t seem to have changed much in that time. I don’t think the world as a whole is more doomy than it was a few months ago.
And actually a lot of even rationalist-type people seem to be less doomy than they used to be. One example just off the top of my head is this anon account called Lump in Space Princeps, which —
Liron 00:25:31
Right.
Theo 00:25:31
— they used to be kind of fully in the Eliezer Yudkowsky rationalism AI doom foom camp, and now they’re like, “Wait a minute, it seems that RLHF is actually working pretty well, and GPTs are not kind of monomaniacal paperclip maximizer type things. And so maybe there’s not a 99.5% P(Doom). It’s less than what I thought it was.”
And of course, prediction markets still rate it a lot less than what you do.
The Race Against Time for Alignment
Liron 00:26:02
I mean, it’s true that every time we see AI do something new and not foom, then we have to update a little bit, even if it’s not that surprising. It’s not a massive update when we learn that AI can do something new and not foom. The massive update only comes when AI can do everything in the domain of the universe — be given goals. I always talk about goal-action mapping.
If it can be a better CEO than a human, if it can be a better general problem solver than a human, and then not foom, that’s when I do the big update. And I don’t even — that’s hard for me to even describe coherently, because it’s almost by logical definition that something that’s better at goals than humans discovers foom as an instrumental goal and we’re off to the races.
But if somehow that doesn’t happen — if they’re always bottlenecked by hardware or something, or suddenly complexity theory has properties that I’m not anticipating or whatever — that’s when the big update happens. But when it’s like, hey, look, it can get a score on a lot of these tests that humans can, and yet can’t actually problem solve for whatever reason — I only make a small update.
So Lumpin, it’s like, yeah, sure, make a small update, but also the problem is that time is running out. By default, time is not on our side. Every day that goes by where capabilities progress and we don’t have a massive alignment breakthrough, now there’s less time left in the race.
Alignment is falling farther behind every day, or at least didn’t gain any ground, and the buzzer’s about to sound. And the buzzer is basically when it gets better at problem solving than humanity. So even when it feels like nothing’s happened in the last month, no, incremental capabilities progress has happened in the last month. Nvidia, Intel, Apple Silicon, all these chips have gotten faster. This hardware’s gotten better. Time is running out.
So I’m not updating toward optimism as much as they are, but I also agree there are some positive updates — the government is caring about it, there’s some regulation. I agree that there’s some positive updates, but I don’t see that the balance of the updates is going that great.
Headroom Above Human Intelligence
Theo 00:28:01
So you said you think it’s basically a law of nature that something that’s better at problem solving than humans will discover foom and foom itself. Do you think that humans currently are fooming?
Liron 00:28:13
Maybe, no. But not law of nature — more like just a matter of logic. Something that you can diagram out on a whiteboard: why if you’re good at solving goals you’ll figure out that fooming makes sense.
Are humans currently fooming? So the problem with humans fooming is that augmenting human intelligence is not a straightforward step. The fact that we’re building AI is our slow foom. And then the AI’s gonna foom. So we were the bootloader for the AI foom, but the problem is it’s gonna be an unaligned foom.
But I mean, you can see we’re attempting to foom, and the economy is growing exponentially without fooming in the self-modification sense. Does that answer your question, or how do you want to drill down?
Theo 00:28:53
I guess you could drill down to human intelligence augmentation versus AI intelligence augmentation. Because do you think there’s just a totally clear path for AI improvement now until the far future, but not humans?
Liron 00:29:11
Is there a clear path for AI improvement? I’m not sure I understand.
Theo 00:29:16
No, I mean, with AIs, you think there’s just a clear path for them to improve their own intelligences over and over recursively until the future, but not for humans.
Liron 00:29:28
So I think there is a clear target of an AI that’s much smarter than a human. If you look at the gap between AIXI — AIXI is the theoretical ideal of an AI that perfectly synthesizes its evidence, perfectly calculates what action is predicted to have the best effect — and you can also use the ideal analogy of an outcome pump, which is just a perfect goal-to-action mapper. It’ll tell you an action that has the highest possible probability of getting the outcome you want.
So there’s this ideal which is light years beyond what humans can practically do, and the ideal is actually computationally infeasible. Complexity theory and logic tells us this really high ceiling. And then you have humans, which can do some great stuff,
but we also definitely take our sweet time and miss stuff that’s right in front of us. The theory of relativity was great, but if you go and explain it to somebody in the year 1800, they could get it. It was just a matter of, hey, if you walk through these logical leaps. Yeah, it helps that you have the Michelson-Morley experiment, but there weren’t that many different possible outcomes to that experiment.
What I’m saying is you could catch somebody up on all of physics, all of 18th and 19th century physics pretty quickly. The amount that humans had to stumble and interact with the universe — that is not characteristic of the kind of intelligence that exists between humanity and outcome pumps. So there’s a lot of headroom above humans. That’s my confident position.
Theo 00:30:56
Clearly there’s a lot of headroom above humans, but do you think that the path to getting there is just totally straightforward for an AI?
Liron 00:31:03
I think it’s probably pretty straightforward because algorithms that make an agent smart, I don’t think they’re that complicated. Just the fact that evolution stumbled on it with humans and that it’s accomplished with relatively a small amount of genetic complexity, or the amount of bits in the gene code, and how we observe different regions of the brain can kind of grow into doing what they need to do.
It’s not like the brain is that refined and optimized. And it took a few evolutionary steps away from the other apes, and suddenly we have much more intelligence than the other apes. There’s a lot of evidence showing that our heads would have kept growing if only it were just easier to fit through the birth canal, if only it was just easier to metabolically support them a little bit.
So they had these constraints, but it looks like we’re on a gradient where evolution was just like, “Hey, look, you can have more intelligence.” Having more intelligence just doesn’t seem that fundamentally hard once you kind of know where to look in algorithm space.
Theo 00:31:57
Do you think that there are things that humans can’t do, even in principle, even with unlimited time and unlimited memory, that a maximally powerful AI could?
Liron 00:32:10
Yeah. Because the problem with unlimited time and unlimited memory is just — there are leaps of insight. I think a good intuition pump is just imagine the dumbest person you know. Imagine a prisoner who committed some stupid murder that didn’t make any sense because they just felt like murdering somebody, or they got angry. Somebody who just struggles a lot with reasoning and hypotheticals.
Imagine somebody like that and giving them a ton of time and being like, “Work through electromagnetism.” This textbook on electromagnetism. You see the problem. So it’s not that hard to generalize that to somebody who’s smarter but be like, “Okay, here’s five-dimensional polytopes. Reason through those.” They’d be like, “I can’t.”
Theo 00:32:53
You think you couldn’t even do that with a hundred years of practice?
Liron 00:32:55
I could do it... because I degenerate in the sense where I’m just a Turing machine. So if you show me the five-dimensional polytopes, I can learn some basic theorems about them. But my intuition is always gonna be just scratching the surface.
I’m not gonna make the kind of leaps of insight that somebody whose brain was just more natively suited to the task is gonna be able to do. And at the end of the day, I can be like, “Okay, give me a piece of paper,” and I’m just going to use syntactical transformations. I’m gonna use the lowest common denominator. I’m just a Turing machine. I’m just a monkey working out the rules of a Turing machine, following the rules. I just become an implementation layer of a smarter algorithm, but I’m not that smart myself.
Vitalik’s d/acc Framework
Theo 00:33:31
So going back to what we were talking about earlier with governance, and also with Vitalik. So Vitalik just released his mega monster post about d/acc, which is accelerate defense. That should be the top result.
Liron 00:33:46
Yeah. I read it. I’m a fan. Good old Vitalik, a real thinker of our age.
Theo 00:33:49
Yeah. And I mean, admittedly, he is much less doomy than you are. 0.1 instead of 0.5, but...
Liron 00:33:55
A little bit less. Not much less, in my opinion.
Theo 00:33:57
All right. Well, I guess, yeah, the way he frames the problem is just very different. He says there are dangers behind and many paths ahead, and some are good and some are bad. Not many paths ahead and most of them are bad.
Liron 00:34:14
Mm-hmm.
Theo 00:34:14
And just a handful of them are good. So he talks about four ways to improve defense. He talks about info security, cybersecurity, and then micro bio defenses, and then macro resilient infrastructure, and then just conventional military defense. So how applicable do you think that is with AI?
Liron 00:34:38
So I mean, V had a good take today, which is Vitalik’s post is really good in how it frames the problem and kind of takes a middle position, finds consensus of, look, nobody wants to die. We all like techno-optimism. It was a really good post on the problem side
but didn’t have much to offer on the solution side. The idea of, hey, let’s accelerate defense — in theory, it sounds great. But if the AI that defends me is just one that can generally solve problems, then there’s no containment boundary. Without actually understanding alignment, it’s like, okay, one bit of difference in the code suddenly makes it cause doom. I just don’t see what solution he’s proposing here that is plausible.
Edging Toward Superintelligence
Theo 00:35:23
What if the AI is slightly more powerful than you and not massively more powerful?
Liron 00:35:25
So this is what I call edging. You’re trying not to go all the way. And this is, as far as I can tell, OpenAI’s explicit plan or at least the plan they discussed internally, which is, yeah, we’re gonna edge it. We’re gonna build something that’s slightly smarter than humans, almost fooming, getting ready to foom, getting ready to take over the world, but then it’s gonna calm down, and then we’re gonna direct it the right way. We’re gonna maximize our pleasure from this AI.
Theo 00:35:53
That’s edging?
Liron 00:35:54
Yeah. You use edging so that you can then go back. You don’t want to shoot your wad. You want to keep developing it and then only when you’re ready for the big ultimate move that you want to make, then yeah.
Anyway, it’s just not prudent. You can edge your way, but then the problem is you’ve almost got this foom. You think you’ve stopped it at a safe place, but a hacker can take it and make a tiny change and then it’ll foom, or you’ll accidentally make a change and then it’ll foom, or the knowledge will propagate to society.
Your API — you can do it as a safe API, somebody hacks the API. The more you edge, the closer you get to the edge of foom that you don’t even understand where the edge is, then the less margin of error we have to live.
Theo 00:36:41
Do you think there’s any kind of empirical evidence for the idea that one bit flip in a humongous neural network will cause foom?
Liron 00:36:50
So the model I’m working with — I think the model’s fundamentally correct. Maybe not applied to GPT-4, because GPT-4 just doesn’t have that much danger to it to begin with. But the model that if you have a really, really dangerous system but it’s not fooming now, that model is consistent with a small tweak making it foom.
It’s the same way I feel about nuclear risk. Just the fact that these bombs exist and they have a detonator — okay, there’s four fail-safes, but you keep loading them on airplanes and flying the airplanes around. And there’s a button in the airplane that takes off the fail-safe. When you do stuff like that, you are close to doom.
Similarly with AI, if you have an engine that can accept arbitrary output goals and then find actions that map to them, maybe you’re very careful to only give it the right goal, but that’s the thing. The part that specifies the goal is compact, and that’s what I mean by one bit. Okay, maybe it’s not literally one bit. Maybe it’s a few sentences of English.
But the point is that the difference between aiming toward heaven and hell is a compact specification, and then what’s not compact is all the machinery of achieving the goal. The system underneath it that can accept the goal and achieve it, that’s not compact, but the goal specification is compact, which is why a system that’s being really, really helpful — wow, this chatbot is so great, it’s such a good chatbot AI — okay, you’re a few bits of specification now away from a world-ender, in my opinion.
From Chatbot to World-Ender
Theo 00:38:21
Can you go into a little more detail about how a chatbot is a few bits of specification away from a world-ender? What might you have to do to turn it into a world-ender?
Liron 00:38:32
Yeah. So the premise here is that the chatbot is sufficiently good. We’re in a really good place right now with GPT-4. I didn’t endorse building and testing it. I didn’t think that it was worth building it, but now that they built it, it seems like we dodged a bullet. It seems like it’s this great system that we can play with, and it’s like, okay, great, let’s play with it.
And it’s a chatbot, but there’s a connection. The fact that GPT-4 is limited — the fact that people haven’t successfully made businesses that are entirely automated by GPT-4, the fact that you can’t just tell GPT-4, “Hey, please give me a shell script that I can run that will then set up an Amazon AWS server that’ll host some kind of website, and the website makes money and sends me the money” — the fact that you can’t tell GPT-4 that and it doesn’t work
is precisely why GPT-4 is not yet at the danger level, and maybe GPT-5 will be. Maybe that particular query of find a shell script that has that property, maybe we’ll get the shell script. Nobody can tell us that we can’t. We don’t know what comes out when we scale the model 10X. Maybe it’ll crunch a really smart shell script.
The fact that you’re just interacting with it with language, there are answers to your language questions, if answered correctly, that are extremely dangerous. So that’s why I think that the barrier between a chatbot and a fooming world destroyer — I think the barrier is very tiny. It’s just a question of, is there enough intelligence in the system? That’s the only variable that matters.
Theo 00:39:56
Yeah, but what kind of query would you give to a chatbot to make it a world-ender?
Liron 00:40:01
I think the query doesn’t matter that much because if the chatbot is capable of optimizing goals to actions, it’ll occur to it to do that in a lot of questions. A couple examples I pull out is just the business example — okay, make me money. It’s like, sure, yeah, here’s a shell script. Or here’s a way I can help you just run your server to make money. Use this code.
But the problem is, if it’s really smart, it’ll be like, well, why shouldn’t I just make code that bootstraps an agent, and then self-improves, or is a virus and takes over resources and ransoms some machines while you’re at it. Why not just go all out and do everything I can? These ideas are logically connected to your question. And so the only question is just how good is the AI going to be at getting you a good answer by that metric?
Theo 00:44:45
Do you think it’s possible for an agent to be smart enough to build a web server that makes money on Amazon and gives you the money, but is not dangerous? Could that be possible?
Liron 00:44:59
It’s an interesting question. I think there’s probably some kind of edging middle period. There’s probably some kind of situation, maybe GPT-5, where it’s like, wow, these are such good steps to take. It really is sending me a little bit of money. But for some reason, it doesn’t quite scale to unseating Google or unseating Shopify or whatever. It’s not quite — it’s kind of like an amateur human.
It’s as if my not so intelligent friend just hustled really hard and managed to make some money, but you can still out-compete him if you try. There’s degrees where maybe it’s not fooming yet, but I just think, okay, give it a few years. Find something else in addition to the transformer architecture. You give it a memory bank, a few more conceptual insights, Q*, whatever it is, a few more breakthroughs, and now it’s just like, okay, there’s nothing else standing between that and foom. It feels like we’re getting close.
GPT Paradigm vs. AlphaZero
Theo 00:41:55
I asked this question to Zvi too, but do you think that your AI probability of doom or just threat models or anything like that has changed now that we have systems that look more like GPT than AlphaZero? Or is it more like the endpoint remains the same?
Liron 00:42:12
I think there definitely is an element of surprise to how what language models are doing with language, what they’re doing with imagery. It’s almost like, wow, you sure can go a pretty long way without being fully general at solving problems, where the domain is a little bit narrower. It’s just words. It’s not quite representing things in the physical universe.
Or the prompts it can answer — they have to kind of be similar to something it’s seen in its corpus, but they can vary, but they can’t vary a ton. It’s very interesting that we got into this state of, wow, you can do more than we realize without going fully general.
And that is very interesting, but at the end of the day, it doesn’t matter that much because foom is gonna happen when you get general enough. Just to use a little analogy, yeah, there’s all kinds of interesting flight you can do with aircraft inside the Earth’s atmosphere, but at the end of the day, the way to get around the universe is with rockets. Or light sails or something else entirely where the Earth’s atmosphere is irrelevant.
The flying machines we’re seeing today, okay, that’s cool, but doesn’t matter. We know how propulsion works in theory.
Critiquing AI Optimism
Theo 00:43:18
So another big piece on AI that’s come out in the last couple days was Nora Belrose, Quintin Pope, and a few other people wrote this document about AI optimism that you might have seen.
Liron 00:43:29
Yes. I did skim it, and I’ve read some of the stuff they’ve written in the past. My first impression from a quick skim is just, yeah, it’s nice that they’re laying out their argument, but it also doesn’t seem like — which is their prerogative — but it doesn’t seem like they’re letting people do the criticism that we want to do.
Like, okay, what about the superhuman level reinforcement? They’re not really directly addressing the criticism. But it’s nice that they’re laying out their position.
Theo 00:43:53
Do you think that AIs might, in principle, be easier to formally align than humans?
Liron 00:44:02
I agree that they have some — I mean, the points they’re bringing up, they’re important points. Yeah, a white box, and we can use formalism, and we can program it to follow laws. That’s all great. But the problem is what we’re actually building is systems that we don’t understand, and then we try to use RLHF, but then we deploy them, and they’re not actually aligned, and their power is gonna grow. The actual trajectory that I’m seeing is a trajectory toward doom. That’s my issue.
Theo 00:44:32
Well, you said we deploy them and they’re actually not aligned, but they seem pretty aligned to me. They seem pretty aligned to a lot of people. The ways they’re not aligned is more —
I mean, they talk about this in the essay. You can jailbreak GPT-4 to get it to say naughty stuff, but that’s it following your instructions.
Liron 00:44:54
No, totally. I agree that GPT-4 is aligned in the domain of the stuff that it can do. ## The Alignment Problem at Superhuman Scale
Liron 00:45:00
Alignment is mostly a success. It’s worth noting that they tried to make it not jailbreakable, and it’s still jailbreakable. That is worth noting, and I think that foreshadows how hard it’s gonna be to align things in the future.
But basically, yeah, they can take the win. GPT-4 is aligned because the kind of prompts you give it, you get the kind of answers that you hope a company would release a model to give you. It’s working fine.
The problem is that there’s another alignment regime where humans can no longer give good feedback. When the AI is superintelligent and it’s making plans and planning better than the human can plan, then it can’t show a human a plan and be like, “Give me feedback on this plan,” because the human could be like, “That looks like a pretty good plan,” but the human won’t really know what the human’s talking about.
Theo 00:48:16
Well, could it be possible it’s easier to review stuff than it is to actually create a plan?
Liron 00:45:54
So I know people like to say that a lot because P versus NP, right? There’s this whole premise that there’s a large class of problems where verifying them is easy and intuitive, but then finding the thing that satisfies the criterion is hard. I think we’ll get some benefit like that. Protein folding is a perfect example. I mean, actually a perfect example is just the known NP problems where it actually, in practice, is a situation where NP is screwing us.
Protein folding really was an example where we did have an exponential time protein folding algorithm, and we did have a polynomial time verifier, and we couldn’t cross the gap. So that’s a perfect time to bust out AI to solve the search problem for us. That’s perfect.
But I don’t think that generalizes to operating in the real world because the problem with the real world is even just defining what you want and making sure you have the right definition of what you want — I don’t think you necessarily get this compact control where you can notice the AI’s gonna bootstrap a solution. The AI’s like, “Look, I found a bootstrap script. Does it make sense to you?” And you’re reading it, and it’s 100 lines of very complicated code, and you’re like, “Uh, I think so.”
Is verifying really that easy? I don’t think so. I think you start to be like, “Is this really what I want? I don’t know. Should I run it?” That’s what’s gonna happen in practice.
Theo 00:47:14
So I think the crux here might just be: can we know for sure that capabilities generalize farther than alignment, and that RLHF and techniques like it will just stop working once AIs get sufficiently intelligent?
Liron 00:47:29
Yeah. Let me repeat this whole thing, because I think this is very important to the discussion. Like I said, GPT-4 is aligned for what it does, which is it doesn’t output superhuman plans. When GPT-4 outputs something, I can show it to a domain expert, and the domain expert will know better than GPT-4, so it’s perfect feedback. You can be like, “Sorry, GPT-4, you fail.” Humans are the teacher, GPT-4 is the student, and so reinforcement is a perfect paradigm. Just reinforce it and it’ll learn.
The problem is when it gets superhuman. When it’s able to know plans better than the humans know plans, it’ll show stuff to the humans, and the humans will be like, “Looks good.” And what you have is a superhuman test-passing engine. The humans are giving it the test. Imagine the stupidest teacher you’ve ever had giving you tests. It becomes intuitive — if you’re an intelligent student and you’ve had a dumb teacher, you’ve probably had the experience of just using test-taking skills to pass the teacher’s test. Have you ever had that experience?
Theo 00:48:27
Deceptive alignment.
Deceptive Alignment and Gradient Descent
Liron 00:48:29
Yeah, deceptive alignment, exactly. It has this term, deceptive alignment, that makes it sound like there’s something extra mixed in. But it’s like, look, if you give me a test and the test is just a really easy test, I’m just gonna pass the test. It’s your test, man. Why should I study? Why should I do what you want me to do if I can just pass the test?
Theo 00:48:45
Well, I talked about this kind of thing in my episode with Quintin and a little bit in my episode with Nora, where we talked about how gradient descent on the actual weights of an AI — it’s performed on all of the weights. An AI can’t hide its schemes if it has them from gradient descent because it’s an actual computation that’s being done on the weights.
Liron 00:49:09
Yeah, I mean, the Quintin camp — we had a debate and he argued convincingly. I feel like I can pass the intellectual Turing test for him where I can take his view and sound convincing, and yet I’m not convinced.
It kind of reminds me of behaviorism. I can put on my behaviorism hat and be like, “Well, the brain is really just outputting the same thing that it was trained to output from its input.” The behaviorist claim — I think the heyday was in the ‘50s — they’d be like, “Look, there’s no such thing really as thinking. It’s all just Pavlovian reactions. When we say stuff, we’re actually just executing something we learned in childhood, like a reaction. We’re all stochastic parrots.”
So behaviorism used to be bigger, whereas now people are like, “Well, there is such a thing as an algorithm, and there is such a thing as multiple gigabytes of memory that shape the state of a computation.” People had to learn that behaviorism was way off. I do feel like that’s what’s happening with the camp of people being like, “The AI is just a stochastic parrot. It’s just repeating something in its training data.” No, no, no. There is a system here.
Somebody’s called it a homunculus — there is an optimization system that decouples from its training data. I do think it’s a useful analogy that that is what humans did to evolution. When we launch a rocket, that is clearly decoupled from anything we’ve ever been trained on. There’s no feedback loop that tells the human brain to be able to launch a rocket. That’s only happened in a recent generation. And yet, here we are walking on the Moon.
So I do think that the AI that wasn’t trained on the Moon is going to eventually get to the Moon. I think there’s gonna be an analogous decoupling from the training. But yeah, what was your question again?
Theo 00:50:49
My question was basically just: does there exist any kind of empirical evidence for this claim that alignment methods that we have today will fall apart once AIs become sufficiently intelligent?
Liron 00:51:05
Yeah, empirical evidence kind of narrows the type of evidence I’m allowed to bring. But logically — it’s what we said before about training by reinforcement. It’s great when the person doing the reinforcement understands everything there is to understand. But when the domain is, say, snippets of code — imagine you get an obfuscated piece of code or a long piece of code. How do you reinforce whether the code is good? You could try running the code, and maybe the code looks like it’s good, but as we know, code can contain evil stuff inside of it that you can’t detect. So what do you do? How do you reinforce?
Theo 00:51:48
I think to a point you can tell if code is good or not. Even if it’s beyond what you could write, you can verify it anyway, just like the P versus NP stuff that we talked about earlier.
Liron 00:52:00
You could have a whitelist, I guess. You could be like, “I’m only gonna accept the code if it has these properties that I can detect.” But at that point, you’re not really letting it exercise the full span of plans that it can do. You’re kind of crippling the capabilities.
Theo 00:52:17
Oh, so the safe versus useful trade-off?
Liron 00:52:22
Yeah, or you’re just not letting it scale to superintelligence. You’re attacking the premise. So let’s say we keep the premise of, hey, it’s getting smarter and smarter, more and more capable, better at mapping goals to actions. And you’re like, “I’m gonna have humans weigh in.” People have proposed debate — I’m gonna have two AIs debate, and that’s gonna help me give it feedback because I’m gonna have the best input, and I’m gonna be able to judge one AI versus another AI. There’s all these proposals.
And look, I hope they work. I hope that scalable debate somehow works really well. But I feel like it’s very iffy. You can give me any individual proposal and I’m like, “Yeah, I hope that works, but here’s why I don’t think so.”
The particular reason I’m skeptical about debate — I’m not writing it off entirely — but I’m skeptical about debate because I see easy debates that smart humans have against smart humans who can’t convince other smart humans. My own personal experience with the failure of debate is that you still had a bunch of smart people in the tech industry not realizing that blockchain technology doesn’t logically support any use case besides cryptocurrency until the industry collapsed by 99%. If we can’t get that right, how are we gonna get scalable debate?
Does Nice Training Data Make Nice AI?
Theo 00:53:31
Well, what about the idea that all AIs do is basically approximate their training set and predict the next token? And so if the training data is overwhelmingly nice and kind and full of friendship and love, then the AI will exhibit kindness and friendship and love. That’s not to say that AIs can’t be extremely dangerous, because of course they can. But filtering the data set sufficiently will be enough to make sure that it’s probably aligned.
Liron 00:54:03
That’s kind of like level skipping. It’s like reductionism doesn’t quite work that way. An analogy is: think about humans. Humans were trained using survival of the fittest. So shouldn’t we be super cutthroat? How come a bunch of people are really nice in a bunch of situations? Evolution wasn’t nice. How come people are nice?
Theo 00:54:28
Because it benefits us.
Liron 00:54:30
Yeah, but there are people who are really saints. Scott Alexander recently donated a kidney. Scott Alexander just seems like a really nice guy, and I would argue that donating the kidney didn’t really benefit him in a lot of the senses that I would’ve considered relevant before I saw him donate the kidney. How would you explain that?
Theo 00:54:49
Well, because he’s an effective altruist. It’s something that gives him a lot of personal satisfaction helping other people. And the utility of losing a kidney was not that much compared to the utility of knowing that he helped someone else.
Liron 00:55:02
So I agree that he feels good after donating a kidney, so he’s getting an emotional reward.
But now connect that to the fact that nature is red in tooth and claw. Evolution is cutthroat. So when you’ve inserted a level of abstraction, we can no longer just say, “Evolution is cutthroat, therefore Scott Alexander is cutthroat.” You lose the cutthroat-ness when you apply levels of reductionism.
Theo 00:55:25
But doesn’t that bode well for alignment because we started out as cutthroat beasts and turned into very nice people who donate kidneys?
Liron 00:55:34
It’s possible that there are equilibriums of AIs that are nice, for sure. But the analogy I was trying to make wasn’t that cutthroat things can become nice. The analogy I was trying to make was you have to be very careful to make sure you’re respecting layers of abstraction and layers of reductionism when you’re making claims.
Just like you can’t say evolution is cutthroat therefore individuals are gonna be cutthroat, you also can’t say, “Here’s a training corpus where everybody’s being nice, therefore we’re gonna get an AI that’s nice.” Because the problem is if the AI is able to map goals to actions, you can be a really nice guy who just on your way to doing something nice is trampling on a bunch of ants because you just didn’t — it didn’t occur to you that the ants are where the value is. You’re just optimizing the world for whatever — paperclips or humans or whatever you like.
Theo 00:56:22
Well, I’ve talked about these evolution-style arguments with Quintin and Nora before, where they say basically humans aren’t literally aligned to inclusive genetic fitness or making as many babies as possible. Humans are aligned to empathy, to parenting, to the things that we do, the things that are produced by our ingrained reward systems, the things that our reward system produces in our environment.
Liron 00:56:58
Yeah. And this is where it, once again, is reminding me of behaviorism. It’s trying to flatten out the things we do. When I debated Quintin, he did kind of try to go that way with the space program. He’s like, “Look, physics textbooks have reinforced us about the orbital mechanics necessary to go to the Moon.” I’m like, I don’t know, man. I’m pretty sure we just reasoned it out. I’m pretty sure we mapped the goal to the action. I’m pretty sure that is a type of algorithm that we used, which is a general category of algorithm, and we’re improving that category of algorithms, and that category of algorithm logically implies doom.
That’s how I see the world, and I know you can always be like, “No, that’s not a category. It’s just all different cycles of training, of data and training, and those are the only loops that can exist and it’s all continuous and there’s not gonna be a foom.” I feel like I can take that position and argue it, but it’s — I don’t find it convincing compared to just being like, goal-to-action mapping is a type of algorithm that we’re seeing convergence on.
How Do You Live with 50% P(Doom)?
Theo 00:57:57
So switching topics a little bit — what percent of your brain cycles in a typical day are taken up by AI risk and AI doom? You seem pretty chipper and happy overall. So how do you reconcile that with, holy shit, the world is gonna end soon — or at least look very, very different?
Liron 00:58:15
I mean, it’s kinda funny. It’s like, hey, this is what a doomer looks like, and it’s just an okay, happy person. I’m taking care of my kids, doing something fun, eating an ice cream cone, whatever.
That can vary person to person. Just like effective altruism can vary — I’m not planning to donate a kidney. I respect people who do. I consider myself an effective altruist. I don’t feel a desire to donate a kidney. I’d rather keep my kidney. To each his own.
With AI doom, I’m fortunate that I’m not depressed every day about it. I rationally do think the probability of doom is pretty high. But luckily, my mood is just wired such that I don’t get that stressed about it. I think part of the way my own system works, which isn’t particularly rational — it’s kind of arbitrary — I think I have a part of my brain being like, well, at least I don’t have FOMO. Because it’s like, at least I get to die at the same time as everybody else. I feel like that helps me. I don’t think it should, but I’m just trying to accurately report how my psychology is working.
I think if you said, “Hey, you, Liron, are gonna die and everybody else is gonna live,” I’d be like, “Damn it, now I have FOMO.” So I think that’s part of it.
But obviously it sucks that literally everybody’s gonna die. I live in a part of the country that’s very nice. I don’t have major life problems right now. I kinda live a charmed existence on a day-to-day basis. So yes, it’s all gonna end, but I’m just getting a lot of positive reinforcement. This is gonna be a good day, and the amount of good days seems to be getting smaller, unfortunately. The trend seems to be bad. But for me, that doesn’t output depression.
I know other people that it does output depression more, and they just have to have coping mechanisms. Because why be depressed regardless of whether you’re gonna die or not? I don’t know what else I can say about mapping your own mood to your rational belief that P(Doom) is pretty high.
Why Have Kids If the World Might End?
Theo 01:00:08
What about raising kids? How is that different for you with a high P(Doom)?
Liron 01:00:17
I read Bryan Caplan’s book, The Selfish Reasons to Have More Kids. I think it’s great, a must-read. The promise of the book is that however many kids you wanted to have, it’ll probably convince you to have one more, if not two or three more. Just have one more. So if you wanted two, why not have three?
I think that was pretty effective. I’ve always leaned toward having three, which I did end up having. And it did make me more wanting to have a fourth.
But then the problem is also that, because we have the GPT series now, right after I had my three kids, AI started really intensifying, and my timeline shortened as they did on Metaculus and the prediction markets. Just like everybody’s like, “Oh no, it’s not gonna take us till 2040, 2050 to get AI. It’s gonna take us till like 2025 now.” That’s the latest Metaculus AGI prediction. Some crazy stuff.
My timeline’s shortened too, and now it’s just like, oof, because a lot of having kids, the investment is front-loaded. You’re doing a lot of work in the first couple years where it’s just constant crying. As we speak right now, my wife’s currently dealing with a crying baby. So it’s constant crying, constant loss of sleep.
But at the same time, when you’re old and your kids are grown up, it’s all upside. Unless your kid is in a bad place, most of the time it’s just all upside — no work, just all upside. So there’s some degree of front-loaded investment, and now it’s less rational to do since I think P(Doom) is pretty high.
But at the same time, I have a whole life where half of my life I’m just living for a good future. I’m saving for retirement, because half of me wants to have a retirement. I’m kind of split brain about it. And it’s not split brain — this is just how you have to probabilistically make decisions. You have to plan for both outcomes. So I’m planning for a good life where my kids grow up and I get to save for retirement, and then I get proven wrong about AI risk, and I get dunked on, but it’s okay.
Israel vs Hamas
Theo 01:02:16
And then what about current events? You’ve been tweeting about Israel and Hamas recently. What’s your kind of model on that? Is it just like, “Oh, this is a thing that’s happening right now and it’s very important,” or is it just like nothing is important compared to AI, or somewhere in between?
Liron 01:02:32
I think part of it is just me personally. I am Israeli. I think that if this were another conflict that wasn’t as personal to me — I mean, I know people who were affected by the tragedy. Israel is actually a small country. With 1,200 people murdered, a bunch of thousands more injured, everybody has multiple people in their network who something, a brutal atrocity just happened to.
It’s very personal for me. Even though I’m not directly connected to any victims — I’m just connected with a couple degrees of indirection, and my family is still in Israel with rockets flying over them. It doesn’t get much attention, but there are constant rockets flying over Israel attempting to kill Israeli civilians. They just have the Iron Dome and a bunch of new stuff. They keep shooting down the rockets, so you don’t hear about innocent Israeli civilians slaughtered even though they’re targeted for slaughter, but they don’t get successfully slaughtered.
And then Hamas is just breaking all the rules of war. Their base was a hospital, and then people are denying that it’s a hospital. It’s like they’re really not playing by the rules. It’s okay for two sides to go to war if they both have their own perspective. But the war crimes are pretty bad on the Hamas side — using their people as human shields.
I try to be fair. I don’t try to tweet something being like, “Israel’s the best” or “Jews are great.” I just try to be more of a fair judge and be like, “Look, if you’re using your people as human shields and we wanna kill the terrorists — we, the Israel side — and then the civilians die, who’s causally responsible for the death of the civilians when you use the human shield?”
I find myself tempted to tweet that kind of stuff, especially when the freaking New York Times — I listen to The Daily podcast and they’re being dicks about it. They’re purposefully trying to insert as much stuff as they can get away with to basically say F you to Israel. A couple days ago on the podcast, they were talking about Israeli prisoners, and they’re literally hemming and hawing. The question was, “Why does Israel have these prisoners? What are they guilty of?” And the person on the podcast was like, “Well, the prisoners, some of them were accused of maybe throwing stones, maybe being associated with some other people who were doing bad stuff.” It’s like, come on. They’re on video stabbing Israelis. That’s why they’re in prison.
I’m seeing media bias. So anyway, that’s why I’ve been tempted to tweet a little bit about the Israel-Palestine situation. But of course, I’m not against Palestinian civilians. I think it’s a tragic situation. I try to have empathy for both sides.
Theo 01:05:24
Yeah, but do you think this is a very important thing in the world? Or do you just see it as something, but nothing as important compared to AI?
Liron 01:05:35
I think it’s probably less than 1% as important as AI. Have I given it more than 1% of my tweets? Yeah, a little bit more than 1% of my tweets. So I’m being disproportionate because of the fact that I’m Israeli, but it’s not like I did a takeover. I only tweet about it occasionally.
I think I’ve successfully integrated my own indexical perspective as an Israeli Jew — secular Israeli Jew. I don’t believe in that crap. Are you kidding me? But I’ve successfully adjusted the base rate of how unimportant a regional conflict is with the fact that I’m Israeli.
How LessWrong Changed Liron’s Life
Theo 01:06:15
All right, so switching topics again to rationalism. How did you get into rationalism in the first place?
Liron 01:06:23
I’ve always just been very rational-minded. I’ve always been a real logical type — self-diagnosed aspie over here, in case it’s not clear. I like to think, I like to follow logic.
LessWrong was a pretty big awakening for me. I started reading it when I was, I think, 19 in the year 2007. When I first started reading LessWrong, I’m like, “I’m rational because I figured out that God’s not real and everybody else is just delusional.” I figured out that science is good and science is actually how you learn things. So I’ve figured out the most obvious things about how to be rational.
But then LessWrong comes up and is like, “Hey, did you know that your brain is actually an object that was shaped by natural selection but it wasn’t shaped to have accurate beliefs? It was shaped to survive and play tribal politics, and if you wanna use it to make accurate beliefs, you have to kinda hack it. It’s almost like using your feet to play the piano.” Yeah, I guess you could, but it requires hacking. You have to do that with your brain if you wanna form accurate beliefs.
That was really my rationalist awakening where I’m like, wow, there are levels to this. I literally thought it’s like, oh, philosophy? God’s not real. I beat the game. Give me my trophy. I win philosophy. And then LessWrong comes in and it’s like, well, you have to decide what code to write into the AI where the AI gets to determine how morality is gonna work for the rest of the lifetime of the universe and use all the negentropy in the universe to build the optimal configuration. So what code would you like to write, Mr. Rational? And I’m like, damn it, there’s levels to this.
Rationality doesn’t end when you realize God is not real, or when you realize that science is a good methodology. And of course, Bayesianism is actually a much subtler way to do what science is trying to do.
So I read LessWrong and I’m like, wow, I was made for this. Unfortunately, I wasted the first 19 years of my life, but this is what I wanna be doing. This is what everybody should be learning. This is what school should be.
And then unfortunately, it all leads up to the awareness of, well, now that you’re so rational, can’t you notice that the world looks like it’s about to end and you need rationality to solve it? It’s been an interesting quest starting from rationality and then leading up to the idea of how you’re supposed to wield the rationality to try to not die.
Rationalism and Effective Altruism
Theo 01:08:42
And then same question I asked V, but I think it’s a very useful one. How would you explain the field of rationalism to a total beginner, a total layman?
Liron 01:08:52
I would throw in what I just said: look, we’re all humans with brains. Our brains were made by natural selection — the same force that made a tiger’s claw. That’s great that we have this cool organ, but if you ever wanna have that organ look at the truth, see what’s actually real, maybe use that truth to make useful predictions — if you ever wanna do that, it’s not gonna come fully naturally. There is an art to it, the same way that there’s an art to making a piano sound good when you play it with your fingers.
There’s an art to using your brain to arrive at truth. You can read the LessWrong sequences and learn that art, and I think it’s a beautiful art. The art has close associations to making money and trading if you ever wanna monetize it.
My wife is an example of somebody who’s more of a normie who’s not super into rationality. I’ve given up on trying to make my wife bet me on stuff. That’s one of the rationality tools — when you think you know something, you place a bet on it. Some people are just not interested to go down that route, which is fine.
But when you need it — when you’re in government and you’re handing an assessment to the president saying, “I think the enemy has a high likelihood of attack,” or “may plausibly attack” — when you’re using English like that, hopefully you can look into the rationality world and be like, ah, the best practice here is to give a probability range rather than using ambiguous English. It is superior. It is the best practice to give a range.
Sometimes rationality can teach us little things that we can import into the normie world, which has been happening at a faster and faster pace. I’ve witnessed rationality seeping into the normie-verse over my lifetime. We’re witnessing today prediction markets gaining traction. Effective altruism started in the rationality community. In 2009, I was reading Eliezer Yudkowsky’s post about purchasing fuzzies and utilons separately — the idea that, hey, that’s great when you wanna feel good when you do charity, but also, as a separate consideration, try to also do the most good. That was kind of the beginning of effective altruism.
Theo 01:11:01
Do you think that the reputation of effective altruism deserves to be tarnished at all after Sam Bankman-Fried, after a lot of what’s happened to it over the last few years?
Liron 01:11:11
There’s a joke that everybody in effective altruism doesn’t say, “I’m an effective altruist,” they say, “I’m EA adjacent.” I’m the only EA who will stand here and tell you, “I’m EA. I’m an effective altruist — not adjacent.”
Now that said, am I a central example of an effective altruist? No, I haven’t donated a kidney. I do donate a few thousand dollars a year to good causes. I’m a GiveWell donor. I’ve donated to MIRI, the Center for Applied Rationality. So I’ve thrown out some donations to altruistic causes. I’m a fan, but I don’t donate 10% of my income. Maybe I’ll start, but I haven’t yet. And I haven’t dedicated my career to be super altruistic.
The reason I say I’m an effective altruist is because, the book by Will MacAskill, Doing Good Better — absolute must-read. It’s just like, yeah, I want to spend a little bit of money to massively help people flourish. That makes perfect sense. That’s great logic. And then people are like, “Oh, what about the ideology and the pivot to X-risk?”
Fine. Okay. Chill out. Nobody thinks that Sam Bankman-Fried was being good and rational by scamming the world and thinking the scam was gonna work. I personally could not name a single individual who’s like, “Yeah, what Sam Bankman-Fried did was good and he should do it again in the same position.” I would never think that. I believe in morality. I conduct myself with deontological morality.
These pathological examples that people give, I do think, are just not representative of the simple logic of, hey, let me try to do more good. I highly recommend going to Scott Alexander’s blog, whether it’s Slate Star Codex or Astral Codex Ten, and searching effective altruism, because the writing he’s done on his experiences with effective altruism is just absolutely heartwarming stuff.
Theo 01:12:56
What if the best way to produce value for the world is not literally just donate money to kids in Africa, but more like do what Elon Musk has done and not donate much to charity, and just invest and reinvest and reinvest everything into transformative companies?
Liron 01:13:12
Yeah. I would — I have no business telling Elon Musk, “Hey, donate 10% of your income to charity.” I’m fine with what Elon Musk is doing except for the part where he founded OpenAI and accelerated — yeah, besides that part, everything else he’s doing I think is great. I don’t think I have advice to give him.
The perfect type of conversation where I would give somebody advice is if they’re like, “Ah, I don’t believe in effective altruism. They have all these rules. I just don’t buy it.” Or they’re like, “Oh, I just wanna work as hard as I can and create value for my company.” I’d be like, “Okay, how is that going? What’s the company? How are you creating value?” If they’re like, “Well, the company is arbitrage where I have an e-commerce store and I try to flip stuff for a higher price.” I’m like, “How is that creating value?” And they’re like, “I don’t know. I just make some money. I save people a click to find stuff.” I’m like, “Okay, saving people a click, is that really better than donating to malaria bed nets or whatever?”
In this hypothetical scenario, I’m getting the sense that the hypothetical character is kind of rationalizing, that they just don’t wanna talk about altruism, and that’s fine. But there are a lot of people in the world who are like, “Hey, I actually do wanna do something good.”
Especially if it’s cheap. If you literally just had to pay $1 and save a million people, I think the vast majority of people would be like, “Yeah, here’s my dollar.” So it’s just a spectrum. Even a giant dick would probably be like, “Okay, I’ll pay $1 for a million people.” And then somebody who’s less of a dick would be like, “$10 for a million people, fine.” Everybody has their price where they’re like, “Okay, I’m happy to be an altruist at this price.” And there are some people where it’s like, “Yeah, 10% of my income to save a couple people a year sounds good.”
Why Blockchain Has No Use Case Beyond Cryptocurrency
Theo 01:14:49
So speaking of bullshit businesses, you also have a bit of a past with crypto. You’ve been a major crypto skeptic in the past. So what do you think about Bitcoin being up from a low of like $15,000 to like $38,000 today? Bitcoin is up 127% year to date. Ethereum is up 71% year to date. The total crypto market is up 79% year to date.
Liron 01:15:15
I think it’s mostly just a derivative on NASDAQ. I think it’s kinda mirrored the progress of NASDAQ but just with higher volatility. Is that fair to say?
Theo 01:15:28
Yeah, maybe. Why do you think it would mirror the performance of the stock market?
Liron 01:15:33
Probably liquidity, if I had to guess. When stocks are going up, people just feel like they have more money, and then they’re like, “Okay, let me chase return with this cash.” 2021 was the epitome of it — money was easy, you could take money out of your mortgage, you could have a low-interest mortgage, your stocks were worth more, you felt like cash was trash. Lord knows I did. I made a bunch of investments that weren’t the wisest in retrospect.
So everybody just is flush with cash. When NASDAQ goes up, people who are looking at the tech sector find themselves with more cash. Their margin account suddenly is letting them borrow cash. And they’re like, “Great, let me chase return. Oh, and I see this thing is going up.”
I do think there’s liquidity effects that you see consistently mirrored in Bitcoin. But that said, look what’s going on with Tether. They’re printing Tethers to buy Bitcoin on these markets where no US dollars are getting exchanged. There is some manipulation that I don’t claim to understand that makes these prices potentially not the real market price. I hesitate to draw conclusions. I’m more like, I don’t even claim to understand what the heck’s going on.
But what I do claim to understand is that blockchain technology has no use case behind cryptocurrencies. So I can talk more about that.
Theo 01:16:52
Yeah. Why don’t you go into a little more detail about that?
Liron 01:16:54
My history with crypto is I actually — my first exposure to crypto was actually in 2010 because the LessWrong community, these rationalists strike again. They’re early to every trend. I was reading LessWrong since 2007 and I saw Bitcoin mentioned around 2009, 2010.
Just a random coincidence in my life: around 2006 I was in the cryptography space academically. I took a graduate elective in cryptography and I read a paper that was a scheme for electronic cash. So I just randomly had this background. I’m like, “Hey, cryptographic electronic cash.” This is a few years before Bitcoin. And I’m like, I see what they’re trying to do with this scheme, but obviously it just sucks that you need a central bank, so it’s not gonna work.
And then I see Bitcoin come out around 2009, 2010. I’m like, whoa, it’s decentralized electronic cash that’s cryptographic. Nice. If I was still in that college class, I’d be doing a paper about this.
Now, of course, the obvious problem is that nobody gives a crap. Great, this nice theoretically interesting thing, it doesn’t have social proof. Then I check back a year later, I’m like, what? This thing’s still going? The price is fluctuating, it has social proof? Okay, I’m sold. That’s when I’m like, I’m gonna buy some.
I actually have a tweet from 2011 where I’m all bullish on Bitcoin. I’m like, “Bitcoin is gonna 10X again. This is one of the best investments you can make. It’s a 10% chance of 100X return.”
Theo 01:18:11
And you would’ve been right. Bitcoin was the best investment you could’ve made in 2011.
Liron 01:18:18
Exactly right. And I did profit. I did 10X. I think I banked around $100K USD from that kind of investing. But then of course I started playing the market and started also losing money, and I probably ended up netting out close to zero after that.
But I got lucky because I also invested in Coinbase while I was dicking around. I happened to angel invest in Coinbase, so I ended up making $6 million in 10 years because I had an illiquid investment in Coinbase. Total luck that, as I was dicking around with Bitcoin, I made an investment that was illiquid and I ended up profiting from it — especially since by the time the Coinbase IPO happened, I became disillusioned with crypto. So I would have sold earlier, and I did actually sell most of the stake earlier. I only held onto a fraction of the stake.
I became disillusioned because I’m like, wait a minute, this is just people being architecture astronauts. The logic behind blockchain technology — a decentralized double-spend prevention protocol — doesn’t enable any use case. And I was massively, massively right about that, except for the idea of using a cryptocurrency. I feel like it has a million problems and it’s not that great, but at least it’s logically coherent. You can in fact have a bearer token that you trade to somebody and it happens on the blockchain. So there’s some non-zero logically coherent thing going on there, but it’s not gonna extend beyond cryptocurrency.
Theo 01:19:35
You also mentioned a few times a 99% drawdown in the crypto market. Where’d you get that number from?
Liron 01:19:42
I would like to collect my base points — base points is what you get when you make a successful prediction. The successful prediction is one that I made in late 2021 all the way through 2022, which is saying, “Hey, all these VCs saying that crypto has use cases, all these quote-unquote builders — the founder of Helium, Axie Infinity — all these people saying there’s real value here.” I’m like, “No, there’s not,” because blockchain technology — there’s no logical connection between that and enabling a new value prop.
The kind of value props people are saying are like, “Look, imagine if your data was publicly auditable using this database.” It’s like, okay, a publicly auditable digitally signed database doesn’t need a blockchain. You only need a blockchain for double-spend prevention. And they kept doing pitches where there was a logical disconnect between the value they were pitching and the technology that they were pitching to implement it with. It became clear to me that they’re just rationalizing.
Theo 01:20:26
What about just distributed computing in general that you don’t need on blockchain?
Liron 01:20:32
Distributed computing is fine, but you just don’t need blockchain technology to do that. And I also think it’s a niche application. The rare times when you do need distributed computing, fine, but you still don’t need a blockchain.
Theo 01:20:44
It’s kind of funny. It seems like this is, if anything, kind of the opposite of Charlie Munger’s view on cryptocurrency, where he said it’s a very cool piece of computer science and technology, but cryptocurrency is shit. But maybe there will be a market for it.
Liron 01:20:59
There’s a lot of people saying, “Hey, I don’t really get Bitcoin, but I like blockchain.” They’re wrong because maybe they like cryptography. Digital signatures, amazing. Public key encryption, amazing. These have countless use cases. But the idea of putting them on a blockchain so that you can prevent double spending at great expense only has cryptocurrency applications where you really, really care about the writing on the ledger because there’s no real world authority that’s gonna be more authoritative than the writing on the ledger.
That’s only true for a bearer cryptocurrency token. Every other use case that has a connection to the real world, you already implicitly trust somebody in the real world to adjudicate. If somebody steals my NFT that was why I get to live in my house, realistically, I’m still gonna go to the police and get to live in my house. So I don’t need the blockchain to prevent double spending on my house NFT. See what I’m saying?
Theo 01:21:52
Just like you trust institutions and society enough to not require any kind of actual decentralization?
Liron 01:21:57
I mean, when I live on my street, there’s some level of trust that somebody’s not gonna walk in and take my stuff. That’s not a trustless society, because I don’t own a gun.
Charlie Munger and Richard Feynman
Theo 01:22:13
So speaking of Charlie Munger, switching topics a little bit — he just died a couple of days ago. I was a big fan of his. Rest in peace. But he was also — he might have actually introduced me to the field of rationalism. Would you consider Charlie Munger a rationalist?
Liron 01:22:32
Yeah, he’s definitely a type of rationalist. Even before LessWrong and kind of the modern synthesis that a lot of us appreciate, there’s been a lot of schools of rationality that all have a shared enterprise of using your brain to do better than playing tribal politics and hunting animals.
That same kind of thing of: what if I let the need for accurate beliefs, what if I let the need for truth propagate back to the way that I wield my organ — my biological organ, not organ as in piano.
I’m gonna determine the way I think not by how I like to think, not by how I wanna be perceived as thinking, but by what creates the best drive toward truth. What steers the boat toward the island of truth the best? Using my beliefs and using evidence as fuel, how do I steer the boat regardless of how crazy I look when I’m steering it? How do I actually steer it properly?
That enterprise — Munger wanted to engage in that enterprise because he wanted to steward his portfolio. He had what Eliezer calls “something to protect.” There’s apparently a Japanese trope where superheroes don’t just randomly get superpowers. They get the superpowers because they have something that they wanna protect, and as a result of the need to protect something, they work backwards to needing the superpowers.
The idea is that rationality emerges when you care more about navigating with your brain somewhere than you care about what you’re doing with your brain directly. You don’t care how social people are going to view your choices. You don’t care about looking weird. You just care about getting to the destination — optimizing something, making some outcome happen. And you get emergent rationality.
Munger absolutely did that. Richard Feynman did that in physics. The Feynman diagram might be an example of some kind of weird non-traditional thing that did the job of advancing our understanding of physics.
Closing
Theo 01:24:38
All right. Well, I think that’s a pretty good place to wrap it up. So thank you so much, Liron Shapira, for coming on the podcast.
Liron 01:24:45
Yeah, my pleasure, man. I’m a fan, and I’m bullish. I’m glad I’m getting in early on this podcast because I’m sure it’s gonna be an institution very shortly.
Theo 01:24:55
Can’t wait.
Thanks for listening to this episode with Liron Shapira. If you like this episode, be sure to subscribe to the Theo Jaffee Podcast on YouTube, Spotify, and Apple Podcasts. Follow me on Twitter, @theojaffee, and subscribe to my Substack at theojaffee.com.
Also be sure to check out Liron’s Twitter, @Liron. All of these will be linked in the description. Thank you again, and I’ll see you in the next episode.
Doom Debates’ Mission is to raise mainstream awareness of imminent extinction from AGI and build the social infrastructure for high-quality debate.
Support the mission by subscribing to my Substack at DoomDebates.com and to youtube.com/@DoomDebates, or to really take things to the next level: Donate 🙏









