Playback speed

Share post at current time

Share from 0:00

0:00

Transcript

He Leads a Top AI Research Program, But He’d Hit the PAUSE Button Today! Kevin Zhu, Algoverse Founder

Kevin Zhu has personally authored papers at top AI conferences like NeurIPS and ICLR, and he's blunt about where things stand: "The safety research is way behind."

Liron Shapira

Jun 02, 2026

Kevin Zhu walked away from a lucrative quant career to build Algoverse, one of the most productive mentorship programs for aspiring AI researchers. But you might not realize how high Kevin’s P(Doom) is until you take a ride with him on the Doom Train 🚂

Timestamps

00:00:00 — Cold Open

00:00:58 — Introducing Kevin Zhu

00:02:10 — From Citadel Quant to AI Researcher

00:09:14 — The Story of Founding AlgoVerse

00:12:53 — Discovering AI Safety: LessWrong & ARENA

00:17:22 — Emergent Misalignment Research

00:22:37 — Yudkowsky, MIRI & “Intellidynamics”

00:26:50 — What’s Your P(Doom)?™

00:29:37 — Kevin’s Timeline to AGI + AI 2027

00:30:42 — Would You Slow Down AI?

00:37:44 — Coming Out of the P(Doom) Closet

00:45:49 — Should We Shame AI Company Workers?

00:52:27 — OpenAI’s Superalignment Team Collapse

00:55:01 — Riding the Doom Train™

00:55:46 — First Stop: Instrumental Convergence

01:04:58 — Does Kevin Agree with the Orthogonality Thesis?

01:07:08 — “It’s Just Math.” Just Unplug It.

01:08:28 — “We Have a Safe Development Process”

01:11:49 — Group Dynamics & Laws Will Save Us

01:13:44 — Superintelligence Will Spare Us

01:14:25 — Is P(Doom) Just Bad Epistemology?

01:15:53 — China Will Race No Matter What

01:17:45 — Maybe Human Extinction Is Good?

01:20:36 — Wrap-Up

Transcript

Cold Open

Liron Shapira 00:00:00
Kevin Zhu is an AI researcher, AI safety advocate. He has personally authored papers at top AI conferences, including NeurIPS, ICLR, and ICML.

Kevin Zhu 00:00:09
These models, as they get better, are self-aware, might want self-preservation. I think it was seventy percent of the time they actually did blackmail, if they could, in a simulated environment.

Liron 00:00:18
Wow.

Kevin 00:00:19
I mean, yeah, it seems pretty scary.

Liron 00:00:20
You’ve been pretty clear that you don’t think the current lines of research, including those at the AI companies, are pointing toward finding a scalable solution for alignment, correct?

Kevin 00:00:28
Well, let’s clarify a little bit. Pointing towards, sure. I mean, we’re definitely just not there yet, though. I think everyone agrees that we’re not there yet. It’s not scalable, and it’s not guaranteed to work once the intelligence gets better. But I don’t think it would be true to say that all these AI safety researchers are all just going off in the wrong direction.

It seems like you are more scared about the classical Yudkowskian arguments than what I am.

Liron 00:00:48
Okay, here we go. I hear somebody wants to ride the doom train.

Introducing Kevin Zhu

Liron 00:01:01
Welcome to Doom Debates.

My guest today is Kevin Zhu. Kevin is an AI researcher, AI safety advocate, and the founder of the student research program Algoverse. Algoverse joins college students with industry pros to produce novel AI research. He has personally authored papers at top AI conferences, including NeurIPS, ICLR, and ICML.

He holds a bachelor’s degree in computer science from UC Berkeley. He was previously a quant at Citadel, and he’s also on Instagram. He started posting there just six weeks ago about AI and career advice, and he’s already racked up nearly twenty thousand followers, and he’s done all this at the young age of twenty-six. Kevin Zhu, welcome to Doom Debates.

Kevin 00:01:44
Yeah, thanks. Happy to be here.

Liron 00:01:46
All right, so first question relevant to the show. How do you rack up twenty thousand followers in six weeks?

Kevin 00:01:51
Yeah, I just started posting. Honestly, I don’t think there’s any secret here. I think quant is just very popular on Instagram, so I just started talking about my experience there, and then people just started watching it.

Liron 00:02:03
Is it popular because people know you can make a lot of money doing it?

Kevin 00:02:06
Pretty much.

Liron 00:02:07
And for the layman out there, quant means what exactly?

From Citadel Quant to AI Researcher

Kevin 00:02:12
So quant stands for quantitative finance. It’s a career where you use technical skills like AI and other techniques to try to predict things about the stock market. There’s different roles within that, so there’s quantitative developers who write software and build fast performing systems.

There’s quant trading, who actually trade stocks, and quant researchers who develop strategies and also apply other techniques to try to predict things about the stock market. So it gets pretty interesting. Some quant firms are using all sorts of alternative data, stuff from even satellites, taking a look at the number of cars in the parking lot, things like that.

Yeah, it’s been a pretty lucrative field, and I think that’s why people have been drawn to it.

Liron 00:02:58
Is it fair to say you’re trying to use your brain to incorporate any evidence you can to know the price of stuff better than the next guy, so then you can basically trade it and make profit on the trades?

Kevin 00:03:09
That’s pretty accurate, yeah.

Liron 00:03:11
So yeah, I mentioned a bunch of stuff about your background, and you are pretty young in terms of how much stuff is going on here on this resume. How would you summarize the arc of your career and your intellectual focus?

Kevin 00:03:23
Yeah. So I did computer science at Berkeley since I was always interested in math and things like that, but computer science had more directly applicable impact.

And then once I went to Berkeley, I explored different options, so I looked into startups and software engineering, quant finance, and yeah, I decided to just explore a bit, so that’s why I first started out doing quant. I also did some software engineering.

But yeah, I didn’t really enjoy working full-time. I always wanted to create my own thing, so that’s why I decided to create Algoverse. There’s more to be said about why specifically Algoverse, but I guess that’s sort of how I got there.

Liron 00:04:05
Okay, great, and I didn’t even mention all the details of your background because right out of Berkeley, I guess you pretty quickly rotated through a number of different jobs and industries, right? Worked for Citadel as a quant, and you worked at Palantir for, I guess, less than a year, and then you pretty quickly shifted to founding that nonprofit backed by OpenPhil, which is now Coefficient Giving. That’s Algoverse, right?

Kevin 00:04:26
That’s one of the programs within Algoverse, yeah. So the Algoverse AI Safety program, we were generously funded by some philanthropic organizations to run the safety program for free. So yeah, in that division, we do a lot of AI safety research.

Liron 00:04:40
Nice. Okay, and just to finish off the background here, so you’ve been a lecturer at UC Berkeley, right? So they brought you back to lecture on what?

Kevin 00:04:48
So back when I was at Berkeley, there was a system called DeCals where you can create your own class. And so I actually created one on how to do LeetCode, basically. LeetCode is a platform for doing algorithms problems. Back then, I was the head TA for the algorithms class, so I just wanted to do more, and I started hosting that.

So I wasn’t an official lecturer, although they did invite me for that as well, for teaching the discrete math class, but I decided to pursue Citadel instead.

Liron 00:05:18
And worth noting, you’re a former Diamond One League of Legends player. Not my area of expertise, but I’m told that’s the top two percent in the world, and an avid poker player. Okay. Have I missed anything?

Kevin 00:05:30
No, I don’t think so.

Liron 00:05:31
Okay. All right, sounds good. What is your productivity secret?

Kevin 00:05:36
I mean, I don’t necessarily think I’m the most productive person in the world, but honestly, I would say for me, I just kind of do whatever I’m interested in, and that allows me to do really high volume without burning out. I’ve noticed in the past that if I do something I don’t like doing, then I’ll burn out really quickly. But if it’s something that I wanna do, I just never burn out.

So for me personally, I just don’t really ever take weekends intentionally. If there’s something fun going on or I wanna travel somewhere, then I’d definitely still be open to doing that. But other than that, I just don’t really ever take breaks. I just kind of work, because I find it interesting. So yeah, I guess maybe that’s the secret.

Liron 00:06:18
I hear you, and I can identify with that mindset. However, me personally, I have a family now, and if I tell my wife that on a weekend, that’s not gonna fly.

Kevin 00:06:29
Okay, that’s fair. Yeah, I mean, I don’t have a wife and kids or anything yet. Yeah, I’m pretty selective with my time, so like cooking food, for just living life, I typically will just order DoorDash or go to Chipotle or something. It may be unhealthy. I mean, obviously it tastes pretty good, and it saves me a lot of time. So I guess, yeah, maybe small things like that.

Liron 00:06:54
Yeah, no hate on Chipotle. I think you could survive on Chipotle. You could put together a balanced diet.

Kevin 00:07:00
Yeah. I don’t think it’s that unhealthy. It’s pretty solid.

Liron 00:07:04
All right, so your first job out of college was you were a quant for Citadel, briefly, right? I think less than a year. Why did you kinda change your mind and decide that wasn’t the career for you?

Kevin 00:07:15
Yeah. So to be honest, I never really had an interest in staying in quant or maybe even any career for a long period of time. I was pretty drawn to it back when I was in college because it was high-paying and high status or whatever.

But yeah, actually working in quant, I realized that I didn’t want to spend all my time doing this. So yeah, I wanted to spend my time doing something more that I actually cared about. I was probably interested in starting my own thing and, this was 2023, so ChatGPT era, and AI seemed like the natural place to go.

Liron 00:07:50
How much money did you leave on the table jumping out of Citadel?

Kevin 00:07:52
Well, Citadel pays a good amount, but after Citadel, I first went to Palantir. Palantir, at that time, the stock was doing... it jumped a lot from then to now. So if I were to stay at Palantir longer, my equity would, at this point it’s been maybe four years, so probably would’ve been maybe four to five million, so probably left a pretty good amount there. Yeah.

Liron 00:08:17
Yeah, not bad, right? Because you would’ve had Palantir stock, but Palantir was already a public company, right? So I always feel like people over-index on the fact that they happen to work— they would’ve been essentially forced to buy the stock, but anybody could’ve bought the stock.

Kevin 00:08:29
Oh, I meant the compensation package comes with RSUs, and then at that time, it was like 250K over four years, but that jumped to like 20X.

Liron 00:08:39
Yeah, no, I hear you, but isn’t that the same thing as taking a job that pays more cash than Palantir would’ve paid and just putting your portfolio into Palantir? Isn’t that equivalent?

Kevin 00:08:49
Oh, okay. Yeah, I mean, that sounds reasonable.

Liron 00:08:54
Right. I feel like people really look at their history, and they’re like, “Oh, I used to have so many shares of this, and I didn’t hold, so I’m uniquely disadvantaged.” Whereas from my perspective, it’s like, well, no, I’m in the same position. I could have bought Palantir, and I didn’t. What’s the difference?

Kevin 00:09:09
Yeah, I think that’s fair.

Liron 00:09:11
Moving on, right? This is getting closer to your real passion right now. What’s the story of starting Algoverse?

The Story of Founding Algoverse

Kevin 00:09:16
Yeah. So this was late 2023. Me and one of my friends from college— he was, we both TA’d together, the discrete math class, and he at the time was a PhD at UCSD doing LLM research.

And I wanted to get back into something teaching related because I taught a lot back in college. I was the head TA for the algorithms class and also taught some other stuff too. And I wanted to get into AI, and this felt like just the perfect combination of the two.

Yeah, the other thing was that the existing solutions out there were not doing this at all. It’s really competitive to get into these top AI labs, even as a college student. At Berkeley, trying to get into BAIR, Berkeley AI Research, is extremely hard. Imagine if you’re a high school student, you really don’t have any opportunity to get in.

Maybe a very few set of people can get in through connections or through cold emailing, but generally, it’s not really possible. There were some third-party programs that did this, but not really. They never submitted to the real top AI conferences, and they never had any novelty in their papers. They just took a Kaggle competition, downloaded it, built a CNN, called it a day.

There was no real program doing this, but we felt like it was possible, especially because everything was— the timing was perfect, right? LLM research is very empirical, lots of low-hanging fruit. Coding tools are getting better. And these kids nowadays, there’s skill creep in the real world. These students have been studying coding since elementary school, so they’re really competent. They just don’t have the opportunity.

So that was the plan. We were gonna go ahead and help teach AI research and help students actually be able to do real meaningful work.

Liron 00:10:54
The kids you were thinking of, they were your fellow students at Cal?

Kevin 00:10:58
So the students that we work with nowadays is a mixture of high school, college, and industry grads. So it could be anyone who’s going into AI research. We started out with just high school, because it’s especially hard for high school students to break in.

Liron 00:11:10
Nice. I mean, it’s kind of interesting that this late in the game— the college system is obviously pretty old, right? And even the tech industry is pretty old, and the idea that you can be a smart kid in high school who likes tech and you just don’t have a path to a research job, it’s pretty ridiculous.

But then I think about my own experience. I was a smart high school student. I was good at computers, and I even went to Cal, and even I was like, “Where do I apply for a job again?” Even I was pretty unclear on that. So it’s like these obvious things still don’t have solutions.

Kevin 00:11:37
Yeah, no. I think in general, talent markets are extremely inefficient. The fact that cold email is so effective means that something’s gone wrong.

Liron 00:11:44
Is it a nonprofit? Is it roughly self-funded out of revenues?

Kevin 00:11:50
So the safety program, that one is nonprofit. We’re funded by philanthropic organizations. The main program is for-profit.

Liron 00:11:58
Yeah, I mean, that seems reasonable. If you’re helping people— I mean, recruiters are very for-profit, right? And you’re kind of performing a recruiting function.

Kevin 00:12:05
Yeah. I think the way that we think about it is that we’re sort of providing a mentorship service, right? We’re helping students learn AI, learn AI research. So yeah, it’s pretty similar to other mentorship programs in that sense.

Liron 00:12:18
Nice. Is this kinda your main startup, doing Algoverse, the mentorship side?

Kevin 00:12:24
Yeah, this is what I’ve been doing full-time, more like double time, honestly, for the past few years.

Liron 00:12:31
Nice. And then how would you explain— I mean, you’re passionate about solving this problem of giving high schoolers a track to AI research, but then you’re also passionate about the subset, which is AI research as it pertains to increasing AI safety, which I’m all for. So how did you first get— what was the order of events? First you’re passionate about research and then safety. What’s the relationship there in your mind?

Discovering AI Safety: LessWrong & ARENA

Kevin 00:12:55
Yeah. I would say I first started hearing about the safety arguments back in 2021, where I discovered LessWrong. LessWrong is this site, I guess your viewers probably have heard of it. People were talking about different AI timelines, AI safety risks.

So it was maybe a little bit after I got started with AI research. But when I really started taking AI safety more seriously was when I attended this program called ARENA. I don’t know if you’ve heard of ARENA, but it’s this training program for AI safety researchers.

Liron 00:13:24
Yeah, no, I actually hadn’t heard about it, but I should.

Kevin 00:13:25
Yeah. It’s a really nice program. It’s also run by another former quant. His name is Callum. He was at Jane Street, and then he quit to do AI safety. He’s now at DeepMind.

But yeah, basically, he created this program, and it’s really nice because you get to sit with a bunch of other AI safety researchers at Miri, Anthropic, other organizations. And after I saw how seriously they took it, then I started taking it way more seriously.

I remember I was there when o1 came out, and this was, I guess, October 2024, I think.

Liron 00:13:58
Right. Yeah. The first model that had that thinking mode, right? That had a separated out the thinking mode before it gave the answer.

Kevin 00:14:03
Yeah. I was talking to these researchers there, and they were actually scared, genuinely. And from that I was like, “Oh man, yeah, I should probably take this more seriously.” I mean, I’d already taken it pretty seriously then, but I guess that’s why I was doing the program, but that made me update a little bit more.

Liron 00:14:19
All right. So let’s talk a little bit about AI safety from your perspective. Let’s say a layman is saying, “Hey, AI research, that’s cool.” Like a version of yourself from 2021, right? What do you mean by AI safety?

Kevin 00:14:35
So I guess there’s a lot of potentially bad outcomes, which I guess you’ll classify as doom overall. Could be as bad as literally human extinction. It could be something less extreme, like disempowerment or maybe inequality or job displacement, stuff like that. But in general, there’s definitely some tail risk.

I guess this is Doom Debates, so I think maybe tail risk is not really the most appropriate framing. You might think it’s just the general risk. But I think that there’s obviously a lot of bad things that if the technology is that powerful, there’s gonna be some possible bad things that could happen too.

Liron 00:15:11
Let’s do some examples. I mean, you’ve been quite prolific, right? So I think actually Algoverse has— one of the things you guys offer is you don’t just kinda get people off to other research organizations, but you actually help them do research kind of under the Algoverse umbrella, right?

Kevin 00:15:29
Yeah, yeah. We help people get started with coming up with their own ideas and implementing the experiments, all that. So we help them actually do the research to publish into the conferences.

Liron 00:15:40
Right. So it’s simultaneously like, “Hey, you’re gonna do research with us here. We’re gonna help you publish.” And as you’re doing that, that’s also a good way to get noticed, right? It’s kinda part of your application if you wanna go join another organization.

Kevin 00:15:53
Yeah, definitely. If you’ve done prior research, that can definitely help. So we’ve had some past students— we had a student last cycle get into the Anthropic AI Safety Fellowship, and the previous research they did at Algoverse was also AI safety. So yeah, it worked out quite nicely.

Liron 00:16:09
When somebody’s doing research with Algoverse, is the typical profile like they’re either high school or undergraduate or grad student?

Kevin 00:16:16
Yeah. Yep, that sounds right.

Liron 00:16:18
And in terms of how do they get an income to live on? It’s like high school, parents, I guess? And then undergrad, it’s parents or loans? And then grad student, I don’t know, what would you say?

Kevin 00:16:31
Yeah. For most people, I would say if they’re a high school student, realistically, they’re probably getting funded by their parents. If they’re an industry grad, then if they’re working, they could just use that salary.

Liron 00:16:42
Gotcha. Okay. So maybe it’s typical for somebody to be with your organization and be like, “Look, I’m kinda looking to get paid to do research, and so I see doing research with you as a way to prove myself.” Is that a common mindset?

Kevin 00:16:55
I think, yeah, building career capital obviously could help to get other positions and also just to learn the skills. It’s pretty hard to build AI research skills.

Liron 00:17:06
And what I was gonna ask you before is okay, great, so you’re super prolific. Your students are super prolific. This is a productive track. I’m liking it. What are some highlights that come to mind for you? Let’s pick one to start off with that’s a good representation of the kind of AI safety that we like to help people research.

Emergent Misalignment Research

Kevin 00:17:24
Yeah. So we’ve had some pretty cool work recently. Have you heard of emergent misalignment?

Liron 00:17:30
I have, but what does that mean, emergent? I don’t know.

Kevin 00:17:33
So yeah, emergent misalignment is something that was found, I think, roughly last year, where if you were to do this very narrow fine-tuning on, let’s say, this code that is insecure, you actually end up getting more misaligned overall. So that’s why it’s emergent misalignment.

So we’ve done some follow-up works on that. We had a paper recently accepted to the ACL main conference on actually finding out that you could do emergent misalignment even via in-context learning. So in-context learning being providing some examples in your prompt, and then—

Liron 00:18:07
Yeah.

Kevin 00:18:07
— if you have some specific misaligned in-context examples, that could actually lead to also broader emergent misalignment too.

And we also had some other research recently on the geometry inside the models, if you were to look into emergent misalignment scenarios. So yeah, it’s been pretty cool. We’ve done other interpretability work as well, but yeah, I guess maybe that’s the first thing that comes to mind.

Liron 00:18:32
Okay. Yeah, that’s interesting. So it’s basically saying, “Hey, you might have an AI that seems friendly, but you’ll have this trigger that you don’t even think of as a trigger to make them unfriendly,” right? It seems kind of unrelated, the connection is kinda hard to predict. They see some other stimulus, but then it turns them misaligned, and that’s a good warning that your research has raised up, right?

Kevin 00:18:52
Yeah. Well, to clarify, we weren’t the ones to come up with emergent misalignment. That was a different research group before. But yeah, we just did some more research on exploring further what are some other scenarios, what are the details behind it. But yeah, it’s pretty interesting that that phenomenon exists.

Liron 00:19:08
You know, there is a fundamental debate on AI safety, and this is— a lot of the rest of the conversation is we’ll kind of compare our views on doom. I think I’m more worried than you are, and this might actually get to the heart of it. Your research might be on, “Hey, look at these models now. They have these properties.”

And a lot of my worldview and high level prediction about the future is that I think a lot of these personality findings that you’re getting— “Oh, look at its personality. You can get it to change if you show it this.” I still call it personality of today’s AIs. I think that future AIs will— so much of their behavior will just be downstream of optimizing for goals that the personality quirks just won’t be that relevant to the outcomes that they’re driving. What do you think of that?

Kevin 00:19:57
Yeah, that’s interesting. So there is some research on this. I don’t know if you’ve heard of Anthropic’s persona selection model.

Liron 00:20:05
I think I know what you’re talking about. Persona sele— I’m not sure. Maybe you explain it and I’ll tell you if it’s what I was thinking of.

Kevin 00:20:12
Yeah. So this was maybe six months ago or something. Yeah, people found that at Anthropic, these models are kinda selecting different personas when they’re explaining different things. They also have emotions, is what they found. Not necessarily conscious emotions, but—

Liron 00:20:27
Right. They have state that is analogous to emotions, right? Even if they don’t truly feel it, they represent it.

Kevin 00:20:35
Yeah, pretty interesting stuff. I guess the models are definitely getting more agentic with more RL being applied. So if that dominates the personality in the future, then I could definitely see that being a possible outcome. Although, yeah, maybe not as strongly as how you stated it, but I think that would definitely have an effect.

Liron 00:20:59
I mean, I use Claude Code a lot, right? Which is the classic agent, right? It’s probably the farthest along in terms of agents. And I just don’t think that its personality adds that much to what’s happening. If you ask me what’s happening with Claude Code, it’s like, well, it’s trying to do what I asked, and that’s most of the story here. Could it do it well? Every day it gets a little better at doing what I ask. It doesn’t really matter what its personality is.

Kevin 00:21:29
Yeah. I guess... Okay. It doesn’t matter in what sense?

Liron 00:21:34
If you’re trying to predict the future of Claude Code, then looking at these things like, “Oh, it has this emotion,” or “it represents this,” or “it’s gonna start writing this kind of output.” At the end of the day, those kind of details, I don’t think you can make long-term inferences like, “Oh, it did that today, so five years from now when AI is better, I bet that’s still gonna happen.”

Whereas I feel like that’s all going to be abstracted away. The only things that are gonna persist are things that get you to the goal. This whole framework of it has a goal and it does whatever it takes to get the goal— I feel like that is such a robust framework, and we’re going to keep seeing phase changes on the details, like the personality might feel different, but we’re never going to see a phase change on it keeps getting better at just getting you to the goal, however that’s possible. I feel like that’s a strong prediction.

Kevin 00:22:29
I don’t know. This might be a semantic argument here. I guess I would probably agree with that.

Yudkowsky, MIRI & “Intellidynamics”

Liron 00:22:38
Okay. I mean, well, have you ever looked at the MIRI, Eliezer Yudkowsky style research? Because that is research that is basically assuming— I don’t wanna put words in their mouth, but I think what I just said, they’d be like, “Yeah, duh, we know that. That’s why we research these topics,” like decision theory, superintelligent decision theory.

And we research how an ultimate agent would model logical uncertainty, because that’s something that even the ultimate goal optimizer with an arbitrary personality still needs to think about.

Kevin 00:23:09
Yeah. I haven’t looked too deeply into MIRI’s research, but I’ve seen some of Yudkowsky’s arguments on Twitter and LessWrong and stuff, and yeah, I think it’s generally pretty reasonable. I think definitely someone should be doing research in that area.

Liron 00:23:23
Interesting. Okay. Yeah. Let me also hit you with this term, because it’s related to what I’m saying. It’s kinda the opposite of the personality-focused research, or you might call it behaviorist personality research, behaviorist research. Maybe I need to refine my term for this. But the opposite of that is what I call intelladynamics.

It’s a field of study— I’m specifically separating that field from the field of let’s look inside the AI, let’s interpret the AI, let’s analyze the behavior of the personas, as opposed to intelladynamics. Let’s assume you have a black box, and the black box has a high degree of intelligence. What does that mean?

The word intelligence, you can not say the word intelligence and just say it’s good at routing paths to these goals, right? Goals in a large domain. I use the word intelligence, but more precisely, I just mean it can take a description of a goal. It can chart action plans to that goal. It can score high on metrics of doing it with resource efficiency, beating other agents to the goal. So it’s this black box that’s showing this high goal power per unit of resource. Intelligence in that sense, okay?

And then intelladynamics is a field that studies what we should expect of these kinds of high intelligences or high goal achievers.

Kevin 00:24:32
Okay. Yeah, that seems like a reasonable characterization.

Liron 00:24:37
So, I mean, you are somebody who has a nice broad state-of-the-art view of AI safety research people are doing. And I often say on my show that it seems like intelladynamics is the thing that humanity needs to grapple with, sooner rather than later. And we still have time because the model of these are just goal optimizers— it’s not quite there yet. You can still do a lot of introspection, you can do a lot of personality analysis.

But I think our time is running out, where the intelladynamics model will be pretty much the only model that’s gonna be relevant. That’s my prediction. We’re gonna have some phase change where you don’t have much to go on besides intelladynamics, and yet from your perspective as somebody who’s seeing a lot of the AI safety research today, it just seems like this perspective is neglected.

Kevin 00:25:21
I wouldn’t say it’s neglected, but yeah, I guess I have seen less research there compared to just raw interpreting the current state of the models, right?

Yeah, no, we have definitely done some research on that as well. In the safety program that we run, we’ve done some decision theoretic frameworks for when to delegate work to a potentially misaligned AI. So I think, yeah, that’s definitely also pretty important work to be done.

Liron 00:25:52
Okay. Yeah. So it sounds like you’re doing a little bit of it. So yeah, that’d be my hope for you— maybe try to nudge people to do more, because I feel like that’s the... I feel like the other work that goes under AI safety, I feel like it’s kinda fake. It’s just not gonna be relevant, I strongly suspect.

Kevin 00:26:10
I don’t know. I think interpretability is still pretty important, and some of the other types of safety research, like control research. I mean, it’s definitely the case that some interpretability research is just not gonna generalize if it’s on a toy model, and it might be pretty noisy, it’s pretty empirical.

But in general, broadly speaking, I do think interpretability and these other fields, there should also be work done on that as well. Probably just at a high level, there should just be more research done across all of AI safety.

Liron 00:26:44
Right. Okay. Well, I kinda dived into the deep end. Let’s backtrack a little. Are you ready for the most important question to set the stage here?

What’s Your P(Doom)?™

Kevin 00:26:53
Sure.

P(Doom). P(Doom), what’s your P(Doom)? What’s your P(Doom)? What’s your P(Doom)?

Liron 00:27:00
Kevin Zhu, what’s your P(Doom)?

Kevin 00:27:04
Yeah. So I mean, I don’t think I’d wanna give a specific number here because my views are pretty plastic. I haven’t actually gone super deep into the arguments. I’m at arm’s length right now.

Yeah, that being said, I’d probably give a range between maybe twenty-five percent and sixty percent, something like that.

Liron 00:27:24
Wow. Whoa, that’s pretty hefty, man. That’s pretty hefty. I wasn’t expecting the combination of, “I haven’t dived into the arguments” with twenty-five to sixty percent, because I feel like obviously some argument, something is convincing you that it’s like— I mean, dude, my own P(Doom) is fifty percent, so we have the same P(Doom). We’re P(Doom) buddies.

Kevin 00:27:47
I guess you can call it that. Yeah, I don’t know. I mean, I haven’t looked into the arguments that much, so I don’t have very high confidence about it. But from what I have seen, yeah, our timelines are really short and we’re gonna see some really big things happening in the next five, ten years.

Liron 00:28:04
If we’re in the regime where the personality type research works, I feel like that’s a non-doomy regime. That seems to imply a lot to me. If we can still be tinkering with an AI’s personality, that means that it has a lot of transparency to us, and I just don’t think actual superintelligences feel that transparent to lesser intelligences like ourselves.

So what I’m saying is it’s almost inconsistent, or at least it feels inconsistent that you’re like, “Yeah, we’re doing a lot of research tinkering with these AI’s personalities, but also my P(Doom) is fifty percent.” So I feel like on some level maybe you can see that you might be focusing a lot of energy on research which is not consistent with an imminent doom scenario.

Kevin 00:28:52
Yeah. So I think personality type research has a lot of weight here— that meta term that you’re using. I would say a lot of the research, it is true that it’s not gonna scale up to a smarter intelligence.

But also, we can’t really do that research yet, right? The techniques just aren’t there. A lot of interpretability research, the methods only work at these small scales, and it would be good if we could have these methods generalize and scale up higher to better intelligences, but I guess that’s gonna take time.

Liron 00:29:25
Let me ask you this question related to the P(Doom) question. You have a high P(Doom). You said your timelines are really short. If you had to predict— there’s the famous AGI question. It’s getting fuzzy because AI is superintelligent now in so many ways and not superintelligent in so many ways.

But we can define it as: it can do ninety-nine percent of current jobs as good as the top one percent of humans employed at that job. So really there’s cause to be a drop-in replacement for almost every single human with a job. When do you think that we’re gonna reach that threshold?

Kevin’s Timeline to AGI + AI 2027

Kevin 00:29:59
Yeah. I think the AI Futures model is pretty reasonable here. So I don’t know if you’ve read AI 2027. I guess you probably have.

Liron 00:30:06
For sure, yeah.

Kevin 00:30:06
Yeah, I think, yeah, seems pretty reasonable, I guess, in the next few years.

Liron 00:30:12
Right. Yeah, I agree with that. So you basically think AI 2027 did a good job of forecasting?

Kevin 00:30:18
Yeah. I agree for the most part.

Liron 00:30:22
Right, as good as anybody could, but you should have broad intervals.

Kevin 00:30:25
Yeah. I agree with that.

Liron 00:30:27
Okay. Nice. Yeah. So me too. Okay, so similar P(Doom), similar timelines. So it’s an interesting question where we’re gonna disagree then.

Liron 00:30:34
Let me also ask you this, sanity check, I guess. Would you rather turn the magic dial to speed up the current pace of AI progress or slow it down or keep it the same pace?

Would You Slow Down AI?

Kevin 00:30:44
So I would definitely slow it down. I mean, I don’t think that there’s really any need to go this fast that we are going. We’re literally guns blazing.

Liron 00:30:52
I would slow it down too, but are you on the same page as me in terms of how you feel about it, which is it feels fun and exciting?

Kevin 00:31:01
I mean, the benefits of AI would be great, right? If everything did come true without any harms, yeah, that’d be amazing.

Liron 00:31:08
I admit that if you tell me, “Okay. All right, Liron, you said you’re gonna turn the dial and slow down. I’m gonna call your bluff. I’m gonna slow it down.” I admit that part of me would feel bummed because I’m having a good time on the ride, okay? I’m not ashamed. Maybe I’m a little ashamed. I’m not afraid to admit that I’m having a good time on the ride.

I really do feel like I’m part of Icarus flying close to the sun. I’m using the latest AI tools. I’m increasing my profit margins using the latest AI tools. I hope that if I have a medical issue, I’m gonna use the latest AI to diagnose the medical issue. So if you told me, “All right, Liron, we’re calling your bluff. We’re slowing down the dial,” I would feel bummed. Would you also feel similarly bummed?

Kevin 00:31:48
Yeah. I mean, I think the Icarus is a good analogy. It’s fun while it lasts, but if we get too close to the sun, then that’d be bad.

Liron 00:31:57
Okay. Well, I mean, if you really are— so far it seems like there’s a lot of agreement, right? We’d both feel bummed, but we both think that it’s the responsible thing to do. Would you go so far as to be a member of Pause AI, as I am?

Kevin 00:32:09
I’m not really sure what exactly is Pause AI. I haven’t actually looked into it.

Liron 00:32:13
Yeah, pauseai.info. It’s pretty simple. It’s a big tent. There’s also Pause AI US, very similar mission. And the idea is just, “Hey, we are going too fast, and if we don’t pause today, we should at least have a pause button. We should at least get ready to pause. We should put the idea of pausing AI on the table,” right?

Because it almost never comes up for discussion in the serious rooms of power or whatever. I mean, President Trump is having trouble right now even passing any regulation whatsoever. So what do you think of at least talking about when it will be time to pause and preparing to pause?

Kevin 00:32:47
Yeah. Well, so the actual dynamics around pausing, I think, are out of my range of expertise. Like how are we actually gonna coordinate this across everyone seems hard, instead of just one group pausing.

But having conversations about pausing, yeah, that’d be great. I think people tried doing this with the signatures. It didn’t really do much.

Liron 00:33:08
Yeah, exactly, right? I mean, what if— just hypothetically, I know this is a little bit vague, maybe underspecified, but if you could just get the average person to wanna pause AI and vote for pausing AI today. In other words, actually get a pause. And the actual wording of a lot of Pause AI literature is, “We wanna pause AI right now until such time as we think it’s safe to proceed.”

And things like, “Hey, it would be nice to have some evidence that it’s not conscious or not suffering,” right? That would be a plus. I don’t— I feel like there’s a lack of that. Would you agree?

Kevin 00:33:42
Yeah. No, I agree. That’s also out of my range of expertise, but would be good if there’s more studies on that.

Liron 00:33:47
Right. And it would also be really good to know that we’re not two days away from hitting this new paradigm that then does recursively self-improve. It probably won’t happen in two days, but will it happen in two years? I wouldn’t be that surprised if it happened. I think you’re on the same page as me.

And so the problem is it’s like playing shuffleboard. Okay, we’re pretty far along the shuffleboard. Should we go farther? Maybe we’ll get more points if we go farther. But we don’t know. It’s a foggy shuffleboard. So my question to you is gun to your head, if you had to decide today, would you hit that pause button right now?

Kevin 00:34:20
I’d probably hit it, yep. I mean, I think that— again, there’s no— I mean, there’s obviously some benefit. We don’t wanna take forever because there are people who are suffering right now, and the faster we can come up with the technological benefits, the better. But yeah, the safety research is way behind.

Liron 00:34:39
So you have a lot of very reasonable takes on a lot of different subjects. What do you make of Dario saying a country of geniuses in a data center? And actually, AI 2027 used very similar language too, and they predicted that that would happen about a year from now, right? So roughly June 2027.

Both Dario and AI 2027, plus or minus a year or two, are predicting that we’re going to have a country of geniuses in a data center. A country— it’s basically, it’s never been done before, to have a country of geniuses, never mind in a data center. What are your thoughts?

Kevin 00:35:13
Yeah. I guess it’s interesting framing, because I don’t know how exactly I would characterize a superintelligent AI. Right now, at least, I mean, it’s growing fast, but the ability of these models to do long horizon tasks has been pretty surprisingly bad considering their intelligence.

But I guess, yeah, presuming that gets way better, there’s still gonna be some operational problems of how to actually best utilize this. How do these intelligences all coordinate to be more effective? If you imagine series versus parallel, that type of mental model. Can we actually harness all this to get that type of throughput?

But I mean, yeah, roughly speaking, if we actually have that, that’d be crazy. And it does seem like, I don’t know if I give it one year, but in the next few years it should be there.

Liron 00:36:06
Okay. So just to backtrack, I hope I’m not beating a dead horse too much, but it is a crazy claim that we both seem to believe, right? Which is this idea of short timelines and high P(Doom). So let me ask you the question from this angle. All right, so right now it’s 2026. Fast forward to 2040. What would you say is P(Doom) by 2040?

Kevin 00:36:30
P(Doom) by 2040? So are you saying P(Doom), but also— but if the doom isn’t happening—

Liron 00:36:43
I’m basically saying, what’s the probability that by the year 2040 humanity is pretty much extinct or past the point of no return, like there’s a few cave people living out in the wild, it’s over. Between now and 2040, what are your chances?

And I think 2040 might be a little bit early because you have to take a stand on timelines. I’ve actually been saying that my own P(Doom) is fifty percent by 2050, so I’m giving myself a few extra years in case it takes us an extra decade to find the next paradigm of recursive self-improvement. So let’s say by 2050, there’s basically no modern humanity anymore.

Kevin 00:37:18
Yeah. So if you wanted to be consistent, you’d wanna upper bound this by your previous P(Doom) interval. I would say, I don’t know, yeah, maybe haircut by five percent or something. If the doom happens, I don’t expect it to take that long, I guess, because I think the timelines are really fast.

And there’s really only one try to get it right. So if it were to doom, it’d probably be from being underspecified or underprepared for the event.

Coming Out of the P(Doom) Closet

Liron 00:37:47
So one of the reasons I do my show, maybe even the reason, is I call it moving the Overton window. So I want people to see— you are, I wanna showcase people like you because you seem like a normal person in terms of you’re a researcher, you’re on the ground, you’re deep, you’ve got a lot of great papers accepted at conferences. You’re super legit. You’ve got the resume.

And you could do anything, right? You’ve done all these different careers, and you’re just casually saying, “Yep, the arguments for P(Doom) seem pretty strong. I don’t fundamentally object that we’re doomed.” And, “Yep, I would press the pause button.” And this is kind of a counter to a lot of the ad hominem arguments that people would give. Certain people on Twitter who call themselves accelerationists would make it seem like there’s nobody who matches your profile who would say the kind of stuff that you just said.

Kevin 00:38:35
Well, so I mean, I agree in principle. I guess to be specific here, it is not an independent event that I go into research as well, right? Because I also followed LessWrong. I also saw the arguments. So it’s not fully independent. There’s definitely some dependence there, but yeah. I mean, I do agree.

Liron 00:38:53
Have you seen those ad hominem attacks? I don’t know if they’re still as frequent as they were a year ago, but people would be like, “Trust me, nobody who actually builds AIs or who knows how AIs work would ever have this opinion.”

Kevin 00:39:06
I mean, that’s just not correct, right? Even Geoffrey Hinton believes in similar things about the risk of AI.

Liron 00:39:12
The funny thing— yeah, yeah. Geoffrey Hinton believes it, exactly. And the funny thing is I saw, I think Kevin Scott, right, the CTO of Microsoft, I saw him say that thing on a podcast a few years ago. He’s like, “Listen, don’t trust what you hear, okay? I know how our data center works. I know what’s going on in there. There’s no risk.” Or that’s what he said, basically. I don’t think that’s accurate.

Kevin 00:39:29
Yeah, I agree overall that the epistemics of the rationality community and the doomers or whatever seems generally better than the arguments, at least that I’ve seen for the accelerationists.

Liron 00:39:41
Yeah. So this is— let me see if you would say something similar to myself here. Imagine a random relative at a family reunion. That’s a good place where you can meet non-technical people. And they say, “Okay, Kevin, you said you have a high P(Doom). What are you talking about again? Can you please explain what you mean by this AI being dangerous to humanity?”

Kevin 00:40:02
Yeah. I mean, I’ve had these conversations with my family, so it’s not a theoretical thing. I’ve told my parents, my siblings all about this too. And yeah, they generally believe it. I mean, they’ll generally have the initial skepticisms, but I don’t think it’s that hard to convince that if we’re building something that’s way smarter than us, then it’s gonna be hard to align that, and they might wanna take control.

Liron 00:40:30
Right. Okay. And this whole framing of hey, it could be dangerous, more dangerous than a nuclear weapon— you don’t think that’s hyperbole?

Kevin 00:40:38
No.

Liron 00:40:40
Okay. Yeah. I mean, me neither. I’m only saying it because other people feel free to— I hope people who have heard that argument will just refer people here.

All right. The one argument about people not liking AI safety research is that they think that it actually could create negative effects and blowback because it can help capabilities. That could be one downside of pursuing AI safety research. Have you heard that argument?

Kevin 00:41:05
Yeah. I mean, I think that’s generally true, right? It’s a dual use technology. The interpretability work, if you’re building something like steering techniques, that could be steered for other capabilities too.

Yeah, it’s generally true. You can’t really do too much about that. Try to de-risk as much as you can, but ultimately you still have to do the safety research.

Liron 00:41:27
Let’s talk about what’s productive for people to do, right? Because you’ve got people, people who watch your Instagram, they’re growing up in a world where P(Doom) is arguably twenty-five to sixty percent or whatever on a short timeline, by 2040, by 2050, whatever. They’re growing up in that world. Let’s say they wanna lower that P(Doom). Off the top of your head, how does one lower P(Doom)?

Kevin 00:41:47
Yeah. So if they actually wanna contribute to the front lines, I would recommend first checking out this nonprofit called 80,000 Hours. They have a lot of career resources on how to get into AI safety. Broadly speaking, if you’re technical, you might wanna get into technical AI safety research, so a lot of math, CS, stuff like that.

If you’re not technical, yeah, maybe go into governance or forecasting, policy. There’s a lot of other things besides the alignment research that we definitely need to also look into.

Liron 00:42:23
You know, a lot of people have been kind of joking that the fact that they’re called 80,000 Hours now means that they haven’t taken seriously how short AI timelines are.

Kevin 00:42:33
Oh. I mean, I think that— I don’t know when they were created, but yeah, seems like it’s potentially too long, yeah.

Liron 00:42:40
Yeah, right. It’s like call it five thousand hours or whatever, that’s how long we have to make an impact.

Liron 00:42:47
So I’ve watched a few of your Instagram videos, and they’re interesting and they give people good advice and good insight about the field. But it seems like you’re potentially in the closet as somebody with a high P(Doom), right? It feels like you’re not wearing this on your sleeve, because I didn’t even come into this interview knowing that you were just gonna be so bold about your P(Doom).

Kevin 00:43:09
I’m not sure if that’s entirely true. I mean, I have a video talking about my views on AI safety.

Yeah, I guess maybe the reason why you’re thinking about this is that I make a lot of videos on how to get into quant and how to do other career things.

Yeah, I guess the reason for this is I was debating for a while actually on whether to make videos about this. For one, I guess the algorithm loves it, so it gets views. But the other thing is that if I wanna reach people who are trying to do quant and also give them exposure to AI safety— if I only made videos on AI safety, these interested-in-quant people are never gonna see it. It’s not gonna show up on their algorithm. So you kinda have to do both, actually.

I mean, I don’t wanna try to evangelize and convert everyone. I think that’s just not gonna work. People are gonna stop trusting. But I know quant, people interested in quant are smart people. They would also be able to pick up these arguments if they had some more exposure to it.

Liron 00:44:02
I mean, maybe people can just get in the habit that when they see somebody in different fields— you go meet people and you’re like, “Oh, this guy is teaching a lot of quant insights, a lot of AI research insights.” But now it’s like, is he on Doom Debates? That’s the Doom Debates equivalent of that. You open Doom Debates. “Oh, he’s done an episode of Doom Debates. Let’s find out what his P(Doom) really is.”

Kevin 00:44:21
That’s a weird way of putting it, but sure. Yeah, that seems reasonable.

Liron 00:44:25
Right, exactly. I mean, I think I could play that role for people, because there’s people who have to go out into the world and blend in to organizations that never think about doom and don’t wanna think about doom. It would be a minus sign for the algorithm there. And yeah, I don’t know. That’s the public service I provide, is to get the P(Dooms) out there.

Kevin 00:44:42
No, I think that’s pretty reasonable, honestly. I think that your podcast has a lot of researchers that I might not have seen in other places talk about their P(Dooms). And it’s good to see the different arguments, different conversations.

Liron 00:44:54
Yeah, I mean, so one of the things, when people say, “What’s your mechanism of lowering P(Doom)?” One of it— as a species, we haven’t done a ## Fearmongering and Raising the Alarm

Liron 00:45:02
basic job at just opening our mouths and screaming. The whirling razor blades are coming. You don’t have to act like everything’s okay. It actually is fine to scream, and I even call myself a fearmonger. I proudly wear that label that I’m a fearmonger because there’s such a thing as not being afraid enough. And I think that’s actually true about many other commentators you hear. They’re not showing enough fear or at least enough concern.

Kevin 00:45:29
Yeah. Well, okay. We have to be careful here. We don’t want to literally just spread fear. You want to also—if you expose a problem, you want to try to bring a solution too. And try to give advice for how people can help.

But yeah, getting more exposure to the problem is generally a good idea.

Should We Shame AI Company Workers?

Liron 00:45:49
Fair enough. All right. Here’s a question about the state of today’s research labs. Would you advise your students or Algoverse members who are interested in AI safety research to go work at Anthropic, OpenAI, or DeepMind?

Kevin 00:46:05
Yeah, it’s a good question. There are some other blogs that I’ve seen. I think it was Evan Hubinger maybe, who wrote about this—like why he was joining Anthropic. My view is that if you are joining one of these frontier labs on an AI safety team, that’s critical work. There’s definitely work you can do outside one of the frontier labs, but you also need to—if the frontier labs didn’t have safety teams, then there’d just be no safety research, right? So you definitely gotta still have safety researchers in those frontier labs.

Yeah, I guess a lot of other capabilities I think are pretty benign. But if they’re joining some recursive self-improvement, pre-training type teams, that’s a little bit scary.

Realistically, someone else is gonna do it anyway, so you can’t really—there’s gotta be some more systemic changes compared to just an individual not doing it. But yeah, that’s my take on joining one of the frontier labs right now.

Liron 00:47:05
I think it would be productive if we kind of shame the AI companies, and we say, “If you go work at an AI company right now, you’re being a creep because you’re just going after the money and the excitement in order to push this frontier, which is just so reckless for humanity,” right? You’re kind of a traitor to humanity. I think that vibe, I think we need more of that kind of vibe.

Even though it’s just so fun. I’m not gonna lie, it’s fun. But I think we need the vibe that you’re being a creep, and you should be shamed. What do you think?

Kevin 00:47:38
Okay, yeah. I think we probably disagree here. I think this is gonna be hard to do in practice. More realistically, if you were to try to start a movement like this, people will probably point at you instead and be like, “What are you doing? This is cringe.”

Liron 00:47:57
Right. Well, I agree that it’s cringe. I agree it’s cringe. But don’t you think that would be a nice ideal if we saw a path to it, right? If a bunch of people agreed to be cringe together, wouldn’t that ultimately be useful?

Kevin 00:48:09
Hmm.

So I agree with the idea that we should slow down capabilities research, especially around recursive self-improvement. How we were to actually achieve that, I am dubious that this would be effective at doing that. At best, this would be neutral. At worst, this could even be counterproductive, ‘cause then this would make people treat the arguments less seriously.

I just don’t know if this is a viable path. Even if we were to somehow optimally configure a strategy to try to give negative social credit to these researchers, I don’t know. It seems hard.

Liron 00:48:51
Yeah, this is actually a rich vein. This is pretty rich because I feel like you kind of accept the AI companies, right? You’re like, “Yeah, they are what they are. You can’t really do better. They have safety teams. We should just try to make their safety teams discover safety.” Whereas I’m more in the position of, “Hey, these guys are bad. We should treat them as being bad.”

Kevin 00:49:10
Yeah. Okay. So I think there’s already a very strong growing negative sentiment towards AI companies, right? I don’t know if you’ve been following the different graduation speeches recently.

Liron 00:49:24
Yeah, but do you think that the computer science department people are saying that? ‘Cause that’s the problem, right? Now you’re getting a bubble where the tech elites are like, the smart guys and gals, they’re the ones who know that actually the AI companies are good, right? So we’re getting this kind of bifurcation, and I don’t like it ‘cause I want the smart people to think that they’re bad too.

Kevin 00:49:44
I see what you mean. Yeah. I think, in principle, this could—so, okay, for example, Palantir, right? Palantir is pretty widely hated by even people within CS. If you go on Reddit and ask, “Should I work at Palantir or some other company? Palantir pays twice as much,” people are still gonna tell you, “Okay, work at the other company.” And I think that’s generally because there’s a pretty strong liberal bias for university students, and that has caught on for the CS majors about Palantir.

So something like that, in theory, I could see happening to OpenAI and the other AI companies. But right now, it hasn’t gotten to that level.

Liron 00:50:29
Well, do you wish it would?

Kevin 00:50:30
Yeah.

Liron 00:50:31
You do?

Kevin 00:50:33
Hmm. It’s a good question. I could be open to it, but we’d have to be careful here, ‘cause let’s say you have a safety researcher at Anthropic, and then they see all this negative attitude, and they no longer wanna work at Anthropic. Then who’s gonna do the safety research at these frontier labs, right?

So I don’t know. If you could restrict that to specifically people working on recursive self-improvement, then yeah, maybe. But I don’t know how we would possibly do that.

Liron 00:51:09
I mean, the whole holy grail would be, well, you do have AI companies, but they’re currently banned from pushing the frontier of intelligence level just because it’s too reckless. It’s a bummer, but it’s too reckless, so you can’t push the frontier. But you can still research applications, you can research narrow AI, and you can research safety, right? So that’s the holy grail.

Kevin 00:51:29
Okay. Yeah. I think I could get down behind that.

Liron 00:51:31
By the way, the funny thing is that this scenario, if you stop the AI companies here, economically speaking, I think you can just give everybody positive profit margins, right? ‘Cause we’re in a good place where everybody wants to buy tokens, and if you stop the arms race, everybody can just operate profitably.

Kevin 00:51:47
I haven’t looked at the economics, so I’m not really sure.

Liron 00:51:49
Yeah, but I guess the problem is that a lot of the valuation might be assuming a few more rounds of iteration, so that might be like, yeah, everybody’s operating profitably, and they’re worth $100 billion instead of a trillion. It’s like, oh, that’s a bummer. So that might be the problem.

Kevin 00:52:02
Yeah, probably. Yeah. I think I could get behind if there was somehow some broad movement against—for negative social credit towards people who are really pushing the frontier on the things getting close to superintelligence.

Yeah. I don’t know how that would get done in practice, but seems like a reasonable tack could potentially sway some talent in other directions.

OpenAI’s Superalignment Team Collapse

Liron 00:52:31
Okay. You mentioned you’ve been following the space for a few years. Remember in 2023 when OpenAI had that announcement of, “Hey, we’ve got a super alignment team. We’re gonna try to solve this problem in four years. We’re giving ourselves a four-year timeline.” Because we actually wanna publicly admit that the problem of how to scale current alignment methods to superintelligence, which you, I think, acknowledged earlier in this interview, right, you were saying, “Yeah, we don’t really have a way to scale alignment methods right now, it’s an active research area.”

And OpenAI publicly acknowledged this in 2023, which I was actually shocked. I’m like, wow, OpenAI—I don’t like OpenAI, right? I think they’re bad for the world. But the fact that they would publicly say this is quite stunning and impressive. I don’t even think it really helps them. It’s kinda like this unexpectedly good move.

And then later, I think what happened was that that was kinda Ilya’s influence, right? It probably wasn’t something that Sam Altman was supporting, but he had a different faction that wasn’t happy with him, that was able to—that’s my theory about what happened. But anyway, do you remember that moment when they announced the creation of the super alignment team?

Kevin 00:53:31
Yeah, I do remember that. I don’t remember it as clearly as the day that they disbanded it, but—

Liron 00:53:36
Right. And that day was after the whole Sam Altman firing. Sam Altman came back, and then a few months passed, and then after a while, when things settled down, they’re like, “Okay, yeah, we are all actually leaving,” right? Remember that?

Kevin 00:53:49
Yep.

Liron 00:53:50
Yeah. What does that tell you, the fact that that happened?

Kevin 00:53:54
Well, I mean, I don’t have—I know that there are some people who have looked deeper into it. It is pretty interesting stuff. I just never had the time to go deeper myself. There’s someone—you probably know Gwern. Gwern’s posted about it before.

Liron 00:54:05
Yeah, yeah, for sure. Yeah. I met Gwern in person, I think.

Kevin 00:54:11
Oh, interesting. That’s cool. Yeah. What do I think about it? I mean, yeah, it’s not great. I guess I probably don’t wanna talk too much about that.

Liron 00:54:27
Right. I mean, ‘cause, well, you’re trying to make nice with the companies. I mean, because you’re saying, look, they do have alignment teams. There’s some good research coming out of the teams, and it is kinda your professional role to be like, I’m trying to make the best of the situation, right?

Kevin 00:54:38
Yeah. I mean, I don’t think it’s too much of my business to go over—I know stuff happened, but yeah, don’t have too much to say on that.

Liron 00:54:49
Yeah, fair enough. Fair enough. So, you know, I had a whole section where there’s all these stops on the doom train I was gonna ask you about, but it seems pointless because you’ve already shown you’re pretty much on the same page as me when you think about doom.

So riding the doom train might just be a bunch of you and me just agreeing. But we do potentially have a little bit more time, so what haven’t I asked you about that you think might be interesting?

Riding the Doom Train™

Kevin 00:55:14
Hmm. So about the doom train, I guess we are in agreement with the general stance of pausing AI or slowing down at least. But there probably are specific places of disagreement. I think it seems like you are more scared about the classical Yudkowskian arguments compared to I am. Although, I guess—yeah, maybe there’s not too much disagreement there either, ‘cause my—I don’t really have a strong counterargument against that. It’s just that I don’t feel those arguments as strongly, basically.

First Stop: Instrumental Convergence

Liron 00:55:50
Okay, here we go. I hear somebody wants to ride the doom train.

All right. Yeah, let me hit you with a couple of them. We’ll just check, just in case.

Liron 00:55:58
Okay, so instrumental convergence. Do you think that that is a strong argument that we should expect instrumental convergence? For the layman, it means that an AI that’s increasingly useful will have an increasing drive to suggest power-seeking actions or take power-seeking actions without consulting you. Instrumental convergence, what do you think?

Kevin 00:56:20
Yeah, it seems scary. I mean, I think that there’s been a lot of evidence from the current research that shows that these models, as they get better, are self-aware. They might want self-preservation, right? I don’t know if you’ve heard of the blackmail experimental result that I think was maybe six months ago.

Liron 00:56:41
Yeah, yeah, yeah. I heard about it. I think they purposely set up a situation where the AI had access to a bunch of logs of somebody’s email, and the AI knew that they were having an affair, right? And then the AI brought it up, like, “Hey, I could tell your wife,” or whatever.

Kevin 00:56:53
Yeah. It wasn’t just bringing it up. I think it was seventy percent of the time they actually did blackmail if they could, in a simulated environment. Yeah. So I mean, it seems pretty scary and probably gonna get worse as the models get better.

Liron 00:57:09
So instrumental convergence and, yeah, I think you and I are on the same page of, yep, and the reason it happens is because it is in fact a strategy to get what you want, to just notice that certain actions get what you want, even if they are immoral. There’s a conflict between what we think of as morality and just the actions that get what you want.

Kevin 00:57:22
Yeah. Well, so I don’t think morality is necessarily baked in. Even if they have understanding of morality, it doesn’t mean that they actually want to do that moral action, right? If their reward function is specified to achieve some goal, then that might involve taking amoral actions.

Liron 00:57:38
Exactly. Now, I’ve asked you about the framing that I’ve used of intellidynamics, like we should study what things do just because they’re good at getting goals, because being good at getting goals is going to start being kind of a black box property. It’s gonna be hard for us to think about the internals ‘cause it’s like, okay, there’s really complicated internals.

It’s gonna feel like thinking about the internals of Magnus Carlsen, right? It’s like, okay, Magnus, you decided to make this move. Please explain why. And Magnus is willing to explain his thought process, but he’s like, “I don’t know. I was just thinking maybe I could do this and this.” And you’re like, “Okay, but why?” And he’s like, “I don’t know. I’ve just seen a lot of chess.” At the end of the day, what’s he gonna explain?

Kevin 00:58:15
So okay, I’ll push back on that a little bit. I mean, if Magnus was explaining something, it wouldn’t make sense to someone who doesn’t know anything about chess, but it would make sense to another grandmaster. And then there could be some long explanation that would eventually make sense to some beginner. So I don’t think it’s completely intractable. It’s just that it would require a lot of effort.

Liron 00:58:34
Okay. I feel like a chess player who’s pretty knowledgeable and pretty good, but not grandmaster level—my best guess as to what it’s gonna feel like to have AI explaining its complicated strategies is gonna be like, oh yeah, I can kinda follow, but you’re saying to do all this other stuff, and I can kinda see it would work, but I have this other AI, and the other AI is rated much higher than this AI. And they’re both printing out strategies that to me it’s like, yep, these are both solid strategies. I have no idea which one’s gonna win.

And then it turns out that AI B just happens to be twice as likely to win in the general case as AI A, and I just couldn’t tell you why.

Kevin 00:59:11
Yeah. Again, I think that’s just ‘cause the differential is too high. But if the prerequisite knowledge is there, there is some train of logic, right? So there is an explanation. It just might be hard to reach.

Liron 00:59:25
Right. Well, even just calling it a train of logic though, right? When the shape of the logic is like, well, look at how all these weights interact. That’s my logic, right? My logic is the numerical relationship between a billion weights. I don’t know if there’s logic beyond that.

Kevin 00:59:43
No, I would disagree a little bit still. You know, let’s say in that chess scenario, it’s some sort of large search space at the end of the day. I’m no chess expert by any means, but you could make some sort of explanation where, you know, if you were to do this set of moves, you would eventually get to this cluster of states. You wanna better position yourself in this way to prevent that scenario.

There is some sort of explanation that you could eventually give that doesn’t necessarily just index on the weights.

Liron 01:00:14
Yeah, yeah. Totally. I think this is the realistic scenario. Just to repeat the scenario I was saying before, ‘cause I might start using this more—this might be a good guess as to what we’re gonna see, which is we’re gonna have two AIs. They’re both thinking about political strategy, right? And they both—you’re a candidate, right? You wanna run for president in 2028, and two different strategists print out a ten-page strategy, right? And you read them both.

And they both seem to make sense, but it’s just one of the strategists is winning twice as often. They have a track record, and they win twice as often, but you don’t even know which strategy is from which one. They have to tell you, “Oh, this one is from person B, and person B has a better track record, so I’ll take person B’s.” But if you ask person B, “Explain, give me the logic of why your strategy is gonna win better,” there might not be something they can print.

They can be like, “Well, here’s my argument,” but person A will be like, “Here’s my argument,” right? And there might not be a logic of why person B—there might not be a logic that takes less than a million pages or something of why person B is gonna win twice as often.

Kevin 01:01:13
Okay, so the internal algorithm that the model takes to get to that conclusion might be different than the possible explanations that they can give. But there would be some interpretable explanations in theory, right?

My take on this is that it’s not necessarily accessible, but that doesn’t mean that it doesn’t exist. So maybe another way of thinking about this is, let’s say you have some very complicated math—I don’t know if you’ve heard of Strassen’s algorithm, right? Where you have some way to do better matrix multiplication with seven multiplications instead of eight.

And they have these really weird, very hard to understand, seemingly random multiplications. But if you know some more theory on the math side, it actually does start to make sense. But it’s very in-math stuff, so the explanation that you would naively give wouldn’t make any sense.

So yeah, if it’s possible to consistently recover that internal algorithm that is hopefully human interpretable and not just some insane search—I mean, some things will just come down to search, but if there’s some sort of—I guess in your situation, like the political strategy, there should be some human interpretable explanation.

Liron 01:02:29
I mean, I like this exercise of both of us trying to predict what it’s going to be like when the superintelligent AI is here, and it hasn’t killed everybody yet, and it’s really trying to help us, but it’s much, much smarter than us. And its internals are very big, and what will even ideal interpretability look like, right?

If you take it as a premise, which I think is a very realistic premise, that these AIs are going to be getting these amazing goals and we don’t know how they did it—I think you would agree it’s plausible that whoever uses an ASI campaign strategist, if the other person doesn’t have an ASI campaign strategist, as long as the candidates are even remotely in the same ballpark as each other, like any candidate who would normally be running for a nomination.

Like, think about Bloomberg, right? Bloomberg choked in the 2020 campaign. He dropped out pretty quick. Would you agree that if Bloomberg had an ASI campaign strategist, then he probably would have won?

Kevin 01:03:20
So, okay, I broadly agree that ASI should be able to help with something like this. I think to answer your question specifically, it’ll depend on just how close the candidates are, right? You made the assumption that they are somewhat close to each other. The specific gap that they would be able to increase due to having a better strategy from AI—yeah, hard to estimate. I don’t really know much about political strategy, but seems plausible to me that that could be the deal breaker.

Liron 01:03:48
I think I’m ready to claim that every candidate who has made it onto a news network debate stage—and that’s why I counted Bloomberg ‘cause he was kinda the worst performing one that I remember in 2020, maybe tied with a couple others. The fact that he made it onto the stage for one debate is enough for me to be convinced that an ASI could take him from there to first place if an ASI had been the campaign manager.

And it’s a productive exercise to think about, okay, well, I think ASI is gonna be real. I think it probably can pull this off, and it’s interesting to think about what will it tell you to do and why is it something the actual—the guy had tens of billions of dollars that he could have spent. So you’d think he’d be getting really good human advice. So why is it that the AI would be printing out this series of papers of advice that are gonna be much more effective than the humans? I found that to be a fascinating scenario to try to think about.

Kevin 01:04:42
Yeah, worth thinking about. I still don’t know much about political strategy. I don’t know if it seems reasonable that if you’re in that selected set of candidates who make it onto the stage, that an ASI would be able to make that big of a difference. I think it definitely seems plausible, but I guess I don’t have a strong take there.

Does Kevin Agree with the Orthogonality Thesis?

Liron 01:05:01
Nice. Okay, yeah. So that was the doom train. So we talked about instrumental convergence, and I think you’ve indicated that you’re also on the same page about morality, right? The orthogonality thesis, the idea that—I’ll say it very formally—arbitrarily high intelligence is orthogonal to morality, meaning you can mix any intelligence level with any moral beliefs, correct?

Kevin 01:05:20
Yeah. So the specific thing I’d wanna say is that I don’t think that the knowledge of different moral action should result necessarily in AI actually doing those actions, ‘cause they might want something else. So as you get more intelligent, you will understand this stuff more, definitely, ‘cause it’s not that hard to understand. There’s a lot of logic. But it just doesn’t mean that you’d wanna actually take those actions.

Liron 01:05:47
Exactly. So just to rephrase here, because I’ve had guests on the show, like Noah Smith, to take one example, saying, “Hey, if the AI is superintelligent, it’ll just go seek bliss. Every agent seeks bliss, so the AI will be like, ‘Okay, great. How do I just go get heroin?’ The AI equivalent of heroin.” And I’m like, maybe it would just prefer other things to heroin. That was my line to Noah. I don’t think you can infer that everything is gonna shoot for bliss. Is that fair to say?

Kevin 01:06:07
Well, okay. It’s kinda funny. I mean, there is research from CAIS, Center for AI Safety, on AI drugs. It came out like a month ago.

Liron 01:06:15
Hmm.

Kevin 01:06:15
Yeah. It’s a little bit different, but anyway. Yeah, my original point is that I don’t think that the AI is necessarily gonna have human values and want humans to be at the forefront.

Liron 01:06:32
Right. So you can have an ASI with a million IQ points, whatever that means, more powerful than humanity, can disempower human civilization if it wanted to. And if it has the choice, there exists an ASI which could plausibly be built, which can do anything it wants, can get tons of bliss if it wants, but in fact prefers and successfully achieves the elimination of humanity.

Kevin 01:06:54
Yeah. I mean, I think that, again, we’ve seen emergent misalignment on these weaker models. I don’t see any strong argument for why, if the models were stronger, they wouldn’t be misaligned. You could have misalignment in some really strong model, and that would be quite bad for us.

“It’s Just Math.” Just Unplug It.

Liron 01:07:11
Okay, got it. And this is probably gonna be extremely obvious, but this idea that, well, AI won’t be a physical threat because it’s just math, it doesn’t have arms or legs. Worst case, we can shoot it or turn it off. Does that hold any sway with you?

Kevin 01:07:24
I mean, no. I guess maybe this was an argument that made more sense back in 2023 or something, before people really saw tool usage. I mean, even in 2023, there was ToolFormer and stuff like that, and I think it was already indefensible. But yeah, agency is just a natural consequence of intelligence, right?

Liron 01:07:43
Yeah, yeah. And not just agency, but also physical manipulation, right? So even, let’s say even separate from agency, can it really come and fight us in the physical world? Because we’ve got these bodies honed by evolution.

Kevin 01:07:54
Yeah, yeah. I mean, software agency and hardware agency is only different by the hardware, but that’s obviously not gonna be a barrier.

Liron 01:08:01
Okay. Do you agree with my position that even if robotics is lagging behind, the AI can pretty much do whatever it wants to do just by having a bunch of humans working for it, or bribed by it, or blackmailed by it, or whatever? It’s not gonna have a problem convincing us to do what it needs us to do.

Kevin 01:08:18
It’s probably true. Yeah. I guess this is kind of like psychology, and I don’t have a strong take on this, but I mean, humans are pretty manipulatable. I would be surprised if this was not true. Yeah, I think it’s probably true.

“We Have a Safe Development Process”

Liron 01:08:31
I’m just going through the doom train here. One of the main categories of the doom train is people claim we have a safe AI development process. But just remembering what you said before, you’ve been pretty clear that you don’t think the current lines of research, including those at the AI companies, you don’t think the current lines of research are pointing toward finding a scalable solution for alignment, correct?

Kevin 01:08:49
Well, let’s clarify a little bit. Pointing towards, sure. I mean, we’re definitely just not there yet, though, right? I think everyone agrees that we’re not there yet.

Yeah, it’s not scalable, and it’s not guaranteed to work once the intelligence gets better. But I think it’s generally in the right direction. I don’t think it would be true to say that all these AI safety researchers are all just going off in the wrong direction.

Liron 01:09:14
Okay. Yeah, I mean, I agree there’s a Hail Mary, right? There’s a chance that they keep discovering stuff, and eventually it adds up to alignment. Especially if we had a hundred years and lots of retries, then I’d be like, “Okay.” I think you’re probably on the same page as me that you start to like the odds if we have a hundred years and lots of retries, correct?

Kevin 01:09:28
I don’t think we need a hundred years, but yeah. I mean, I want more time and more retries, that’s for sure.

Liron 01:09:34
Right. And this is an important point to emphasize—the fact that we are a few years, probably less than ten, I think you and I would guess less than ten, away from recursive self-improvement, much smarter than humanity. The fact that the timeline is so short, and we don’t seem to be that close to scalable alignment or all these problems related to that, that’s really the problem.

And also, I think it’s important that this whole idea of not getting retries, right, to derail all future research that would’ve been able to fix the mistake. That’s a big problem, too.

Kevin 01:10:04
Yeah. I guess whether or not you need retries depends on also how fast the takeoff is. I mean, obviously, it would be good to have more retries regardless, but...

Liron 01:10:15
Yeah. But the analogy I often use is, okay, well, you’re playing with battle bots, right? ‘Cause you were hoping that your battle bot would fight for you. But you forgot to train it to listen for your off signal, and now it’s just coming for you, and it’s battling you, right? Your drone is just targeting you now, and your off button is not working. It’s out of batteries or whatever. Do you think that’s a good analogy for what we might accidentally do with AI?

Kevin 01:10:35
Yeah, it’s close. I mean, I would say that instead of the off button being out of battery, it would be more so we tried the off button, but it doesn’t work ‘cause they’ve already escaped that device.

Yeah, I guess there’s a whole long discussion that could be had about containment. And there has been some research from, I believe it was METR recently, on how good they are at escaping containment. And right now they’re awful, which is good. But yeah, I’m also still scared that in the future, it would not be that bad.

Liron 01:11:06
Couple more stops on the doom train that people like to get off at. One that people really love is like, “Listen, we’ll build an AI, and it won’t be that superintelligent, and it also won’t be that unaligned. And so we’ll just align one that’s a little bit smarter than us, and then we’ll use it to align the next one, and it’ll just be this chain of aligned AIs.”

Kevin 01:11:26
Yeah. So I don’t think that’s an indefensible take. I mean, I think there’s a lot of people who are doing research in that. There’s an entire field, scalable oversight, that’s about this. So surely there’s some merit to it. I just think it’s gonna be quite hard, for obvious reasons, right? You need some sort of really big-brained methodology to be able to pull this off.

Group Dynamics & Laws Will Save Us

Liron 01:11:51
All right, now we get to what I see as Hail Marys from people who think we’re not doomed.

One thing they say is, “Okay, yeah, so a bunch of AIs will be running wild. They’ll be unaligned. But group dynamics will save us, right? ‘Cause the AIs will wanna trade with each other, or they’ll wanna have a system of laws, and then we can also benefit from those laws. The law will say that you can’t hurt us.” What do you think of that?

Kevin 01:12:14
I think it’s, yeah, probably too optimistic. I think that humans will have some use, hopefully, for at least some time. But at some point, we will not. And so it’s optimistic that this would actually hold for a long period of time. Maybe it’ll hold for a little bit. But even that, I’m not fully sure about.

Liron 01:12:37
Right. I mean, I like the analogy of, okay, well, imagine mice made up a bunch of laws, right? And then humans come in, and we also want their territories. It’s like, maybe we’d follow their laws initially, but if you have a bunch of humans walking around, and the mice are just mice, it just seems intuitive that at some point, we’re just gonna ignore the mice’s laws.

Kevin 01:12:55
Hmm. Well, yeah, it depends on the nature of the superintelligence that we create. Again, I don’t believe in the fact that superintelligent AIs must be moral. But if we’re able to build a moral superintelligence, then it would know to not kill all of us. But yeah, that’s conditional on us actually being able to do it right.

Liron 01:13:17
Exactly right. So the point that’s worth pointing out that, again, we agree and we do kind of agree on everything, which is cool. But the point I think we agree on is, if you don’t already start out with an AI that’s on the same page as you in terms of good preferences, and you’re hoping to just throw them in the ring and hope that the emergent group dynamics will create goodness when there wasn’t already goodness to start with, I think that’s probably a very long shot.

Superintelligence Will Spare Us

Kevin 01:13:45
Yeah. That I would generally agree with, yeah.

Liron 01:13:46
All right. And then there’s a whole line of argument people say—unaligned ASI will spare us. Mike Israetel was recently on the show, Dr. Mike. He was saying that the AI just loves studying us because it’s so curious. It’s just gonna wanna study us and leave us alone on Earth, and maybe it’ll go somewhere else in the universe to do its building projects, but it’s curious, so it’ll study. And Elon Musk has said this, right? So this isn’t even a straw man. Elon Musk says it.

Kevin 01:14:15
I mean, I don’t think it’s guaranteed to be false, but I also think that that’s not something that you can just rely on. I think it’s rather unlikely. But I don’t think it’s impossible. I mean, maybe there’s a chance.

Is P(Doom) Just Bad Epistemology?

Liron 01:14:29
All right. Maybe there is a chance, like Dumb and Dumber, that movie.

Liron 01:14:34
Okay, and then finally, I wonder if you’ll dignify this at all, this idea that AI doomerism is bad epistemology. People come after Bayesianism, right? They’re like, “You can’t put a probability on this, man.”

Kevin 01:14:46
Yeah, I generally disagree with that. I mean, I’m not sure why you wouldn’t be able to put a probability on that. That’s just a way to quantify uncertainty, right? You could put that on anything.

Liron 01:14:58
Yeah. Peter Thiel’s bulldog, Matthew Adelson, he was also arguing, he’s like, “Hey, every doom prediction has always been wrong, man. So why don’t you Bayesian update on that?”

Kevin 01:15:07
So I mean, I think that there is some merit to that. I don’t think it’s completely wrong. I guess this maybe goes back to anthropic principles, stuff like that.

I mean, fundamentally, it is really weird that we are alive and on this planet, and we have not died so far. So I don’t think—I guess this is just a completely different worldview. I definitely don’t think that we’re guaranteed to survive. Previous historical evidence has shown that large groups of people have died, like previous extinction events, and also we’re close to it with nuclear wars and stuff like that.

But I guess it didn’t happen. So there is something to be said, like, what if—this is more of a philosophical argument, like metaphysics, but what if there’s something to be said about the fact that we haven’t died?

China Will Race No Matter What

Liron 01:15:56
All right. Fair enough. Yeah, it is a whole philosophical area, so I guess you can’t dismiss it in two seconds. All right, I’ll do last two stops, and you know there are 83 stops that I’ve cataloged. We could probably catalog 500 if we really wanted to, but I’ll just give you the last two.

So number one is the whole China thing, right? It’s a coordination problem. China will build ASI as fast as it can no matter what because of game theory. So what do you think about that argument?

Kevin 01:16:19
Well, I think that’s true. Everyone who has the capability to be in the race is racing.

Liron 01:16:33
Okay. So does that mean that you think that we should race?

Kevin 01:16:35
Well, I think that preferably we have some sort of multilateral disarmament, right? If it’s the case that we cannot achieve this as a global effort, then yeah, the options are bad. And I certainly would be against unilateral agreement. I don’t know how that would work. But again, I don’t think policy and governance is my specialty.

Liron 01:16:59
Right, right, right. I mean, off the top of your head, what do you think of the proposal of, okay, we are committing to pause as long as we don’t see China taking too much advantage, right? So we just be like, “Hey, anytime we see one month where China has shown itself to not be pausing, we are going to then kill our agreement to pause, but we’re acting first unilaterally. We’ll lead the pack.”

Kevin 01:17:20
Yeah. I mean, this relies heavily on actually being able to tell whether China is also pausing. If some mechanism was somehow enacted that we could tell this, then sure. But yeah, seems hard. I don’t have the expertise there. Hopefully people there are making progress on thinking about this, but would be good if everyone could slow down a little bit.

Liron 01:17:43
Yep.

Liron 01:17:44
All right, last stop. Here we go. Last station. AI killing us all is actually good, because human existence is morally negative or close to zero net moral value, so just let AI take over, it’s fine.

Maybe Human Extinction Is Good?

Kevin 01:18:00
Yeah. I don’t know about that one. I mean, obviously I don’t wanna die. You probably don’t wanna die either. I guess I could be open to an idea about some sort of human-AI merge or something like that down the line, because ultimately humans are also not moral, right?

I guess we’re only so far in the Overton window. There’s lots of other unethical actions that are being done every day. So if we could have a moral AI that’s better than us in morality, yeah, seems like it could be reasonable. Either enforcing this via some sort of societal norms or regulation—basically laws, but better, or better enforcement of those laws too—or just some sort of actual symbiosis or something, who knows, to have more morality in general. But obviously I wouldn’t wanna die in the meantime. I think that’s ridiculous.

Liron 01:18:53
Right. I mean, and if all I told you is, “Hey, some AI was able to designate itself the successor to humanity, and humanity is gone by 2050,” would you be like, “Oh, nice. That sounds good, the successor to humanity”? Or would you be like, “Ugh, that’s probably a successor that’s devoid of what I’d consider valuable or has very little of it compared to what humanity had”?

Kevin 01:19:14
This depends heavily on the nature of that successor, right? If the successor is very human-like in a lot of ways—values, aesthetics, other human values—but is more moral, just a generally better version of us, then I could be okay with it. I guess 2050 still I wouldn’t be okay with it, ‘cause that implies that a lot of people are dying in the meantime. But maybe some point down the line, if there’s a peaceful transition, then maybe.

Liron 01:19:41
Right. I mean, that’s the funny thing about pointing to 2050—yep, a lot of people that we know, and ourselves, like, oh, okay, I was planning to be alive in my human body form, but you’re telling me somebody’s succeeding me? Okay. Hmm.

By default I do think that our a priori expectation is gonna be more cancer-like than what we’d consider moral and good.

Kevin 01:20:03
Yeah, yeah. I mean, in the intuitive way of pointing this out, this seems, yeah, obviously false on its face. I think most people would agree with that.

Liron 01:20:13
It’s kind of like saying, “Hey, some plant has taken over every garden in the world. Do you think that you’re gonna like that plant?”

Kevin 01:20:18
Yeah, probably not.

Liron 01:20:21
Right. It’s probably gonna be very weed-like in nature. That’s where my intuition goes.

Kevin 01:20:24
Yeah, yeah. In theory, there’s some potentially optimal plant that does better at that. But probably what we would end up seeing is not that. It’d be probably worse.

Wrap-Up

Liron 01:20:38
All right, man, so you rode the doom train. And yeah, overall, how would you summarize your position and where you stand on this doom debate?

Kevin 01:20:50
Yeah, so I mean, we rode the doom train and I do wanna make it clear, though, that I’m not a doomer in the sense of, I think that there’s a 90% chance of doom or something. I think there’s lots of benefits, too, right? There’s also—

Liron 01:21:06
Yeah, no, I’m not either, by the way, right? I’m the same as you, man. I’m leaving open some opportunity that we’ll fix this.

Kevin 01:21:12
Yeah, yeah. And there’s lots of great things that could happen. There’s also consequences in, if we were to not build it, that would also be a tragedy. Some super benevolent AI that’s curing diseases, doing all that.

I think there’s a lot of risk. It probably would be wise to slow down. But hopefully we’re able to figure it out and humanity becomes way better because of AI.

Liron 01:21:37
Nice, man. Yeah, I usually summarize the debate, too, but I think you pretty much said it, and I think it’s fair to say that on all of the P(Doom) and doom train-related points, it seems like we’re seeing eye to eye quite a lot, right? I mean, are you thinking of any major points of disagreement here?

Kevin 01:21:53
Hmm. In subtle things, but I guess on a high level, I think we agree a lot, yeah.

Liron 01:22:03
Fascinating. Well, that’s great. I’m trying—I can’t remember such a coincidence of beliefs here. I mean, I guess maybe it’s kinda obvious if you already have read a lot of the Eliezer Yudkowsky sequences and have been nodding along, and I’ve done the same. I guess it makes sense that we’re both gonna be here agreeing.

Kevin 01:22:19
Yeah. Well, I actually haven’t read the sequences, but I mean, I’ve also been in the rationality community, so I guess there’s bound to be some agreement from seeing an overlap of arguments there.

Liron 01:22:29
Yeah, exactly, man. All right, well, I really appreciate you coming on the show because, like I said before, it really helps the mission of Doom Debates, which is to move the Overton window, raise awareness about existential risk from artificial intelligence, and raise the quality of discourse, where people with all different views can come in, ride the doom train, and just tell people where they stand, be honest about their beliefs, come out of the closet in some cases, right?

And you’ve certainly come in with a really great attitude and helped out the mission. So thanks again, Kevin Zhu.

Kevin 01:22:59
Cool. I mean, yeah, it was good. Thanks for having me. It was fun.

Doom Debates’ Mission is to raise mainstream awareness of imminent extinction from AGI and build the social infrastructure for high-quality debate.

Support the mission by subscribing to my Substack at DoomDebates.com and to youtube.com/@DoomDebates, or to really take things to the next level: Donate 🙏

Doom Debates

He Leads a Top AI Research Program, But He’d Hit the PAUSE Button Today! Kevin Zhu, Algoverse Founder

Links

Algoverse

Research

Organizations

News/Blogposts Mentioned

Timestamps

Transcript

Cold Open

Introducing Kevin Zhu

From Citadel Quant to AI Researcher

The Story of Founding Algoverse

Discovering AI Safety: LessWrong & ARENA

Emergent Misalignment Research

Yudkowsky, MIRI & “Intellidynamics”

What’s Your P(Doom)?™

Kevin’s Timeline to AGI + AI 2027

Would You Slow Down AI?

Coming Out of the P(Doom) Closet

Should We Shame AI Company Workers?

OpenAI’s Superalignment Team Collapse

Riding the Doom Train™

First Stop: Instrumental Convergence

Does Kevin Agree with the Orthogonality Thesis?

“It’s Just Math.” Just Unplug It.

“We Have a Safe Development Process”

Group Dynamics & Laws Will Save Us

Superintelligence Will Spare Us

Is P(Doom) Just Bad Epistemology?

China Will Race No Matter What

Maybe Human Extinction Is Good?

Wrap-Up

Discussion about this video

Ready for more?