0:00
/
0:00

How AI Kills Everyone on the Planet in 10 Years — Liron on The Jona Ragogna Podcast

In this special cross-post from Jona Ragogna’s channel, I'm interviewed about why superintelligent AI poses an imminent extinction threat, and how AI takeover is going to unfold.

Newly exposed to AI x-risk, Jona asks sharp questions about why we’re racing toward superintelligence despite the danger, and what ordinary people can do now to lower p(doom). This is one of the most crisp explainers of the AI-doom argument I’ve done to date.

Timestamps

0:00 Intro

0:41 Why AI is likely to cause human extinction

2:55 How AI takeover happens

4:55 AI systems have goals

6:33 Liron explains p(Doom)

8:50 The worst case scenario is AI sweeps us away

12:46 The best case scenario is hard to define

14:24 How to avoid doom

15:09 Frontier AI companies are just doing "ad hoc" alignment

20:30 Why "warning shots" from AI aren't scary yet

23:19 Should young adults work on AI alignment research?

24:46 We need a grassroots movement

28:31 Life choices when AI doom is imminent

32:35 Are AI forecasters just biased?

34:12 The Doom Train™ and addressing counterarguments

40:28 Anthropic's new AI welfare announcement isn't a major breakthrough

44:35 It's unknown what's going on inside LLMs and AI systems

53:22 Effective Altruism's ties to AI risk

56:58 Will AI be a "worthy descendant"?

1:01:08 How to calculate P(Doom)

1:02:49 Join the unofficial If Anyone Builds It, Everyone Dies book launch party!

Show Notes

Subscribe to Jona Ragogna — https://youtube.com/@jonaragogna


IF ANYONE BUILDS IT LAUNCH WEEK EVENTS:

Mon Sep 15 @ 9am PT / 12pm ET / 1600 UTC
My Eliezer Yudkowsky premieres on YouTube! Stay tuned for details.

Tue Sep 16 @ 2pm PT / 5pm ET / 2100 UTC
The Doom Debates unofficial IABI Launch Party!!!

More details about launch week HERE!

Transcript

Opening and Introduction

Liron Shapira: 00:00:00
We're literally all going to die. And a lot of people hearing me say this are gonna be like, what the hell are you talking about? It sounds like news to them. Whereas I've been living in a community that's increasingly growing and increasingly including literally Nobel Prize winners and Turing Award winners who are all facing this reality.

All I'm trying to do is just close the gap, trying to wake people up and be like, yeah, guys, this is happening. I'm sorry.

Jona Ragogna: 00:00:19
Welcome back to the podcast. My name is Jona Ragogna, and today I'm speaking with Liron Shapira. Liron is a Y Combinator backed tech founder and host of the Doom Debates podcast, where he educates about AI risk.

Get ready to learn about why AI could make us go extinct quite soon, and what the only ways are we can stop that from happening.

Liron Shapira, welcome to the show.

Liron: 00:00:40
Hey, Jona. Great to be here.

Jona: 00:00:44
If you had to summarize in a few sentences what your mission is and why it matters now, specifically, what would you say?

The Core Mission and Existential Threat

Liron: 00:00:52
It all gets to the idea that your viewers, a good chunk of your viewers probably don't realize there's a very likely chance that all life on Earth is coming to an end in a few years. It's not really on their mind, and yet it seems to be likely true.

It seems to be likely true that we're coming to the end of history. Our children aren't going to have a chance to grow up. We're literally all going to die.

A lot of people hearing me say this are gonna be like, what the hell are you talking about? It sounds like news to them. Whereas I've been living in a community that's increasingly growing and increasingly including literally Nobel Prize winners and Turing Award winners who are all kind of facing this reality.

And all I'm trying to do is just close the gap. I'm trying to wake people up and be like, yeah, guys, this is happening. I'm sorry.

Basic Concepts and Terminology

Jona: 00:01:28
And for someone who's completely new to the idea of existential risk, AI existential risk, what will happen here? Is it something that will happen by mistake? Are we talking evil robots that are gonna kill us all?

And if you're at it, maybe you can explain some of the basic terms of this topic for us. Existential risk, alignment, stuff like that.

Liron: 00:01:50
Yeah, I agree. I kind of went from zero to a hundred really quick there. So to back up a little bit, we're talking about the risk from super intelligent AI, also known as AGI, which stands for artificial general intelligence. Some people use the abbreviation ASI, which stands for artificial super intelligence.

Long story short, we're building these machines. They're getting smarter and smarter. A lot of your listeners probably lean on ChatGPT for homework assignments or to get tips building stuff. If you wanna be a handyman, you can get some tips from ChatGPT.

So a lot of people are experiencing the usefulness of these AIs, but if you just extrapolate the curve, it doesn't really occur to people - okay, so tomorrow it's smarter, tomorrow it's smarter. Where does the curve go?

And of course you could be like, well, the curve just stops and then we're fine. And humans still control it. But in my opinion, and in the opinion of many experts - Nobel Prize winners, people whose job was to research this stuff - many of us think, oh crap, the curve just goes to a place where they can do anything better than humans can.

There's no lasting advantage that humans can retain. And the timeline for this is like two to 15 years. We're not even talking about letting our children grow up here.

Jona: 00:02:59
And how would that lead to our extinction? What could happen here?

The Power Transition and Loss of Control

Liron: 00:03:05
So once you have these systems that can do more than you, you get to the point where the only thing that matters in determining the future is what they want. There's not really that many other constraints.

I'll make an analogy. Look at the world right now. If you wanna know the future of earth today, or even the future of our solar system, the future of our galaxy, you just have to look at what humans want.

There's not really any other process that's going to meaningfully shape the future of our galaxy right now, besides the choices that humans make. Because we now have the power.

Evolution by natural selection is much slower than we are. So you're not going to see other predators evolve and eat humans. That phase is now done. Humans now control destiny.

And the same way that we are having our way with the planet, doing whatever we want with the planet - building cities on it, building new technology on it, getting into space, colonizing other planets - the same way that we as humans have the options to do that.

A species that's smarter than us is going to take the ball away from us. It's going to have all the options, and whenever we have input, that will not matter unless it specifically wants to listen to us.

It has to make the choice to listen to us the same way as if, you know, I imagine apes or bears or dogs or imagine some other species had a telephone where it could call up humans and order the humans around and the humans would wanna listen. Like, yes sir, whatever the dog tells me to do, I'm going to do.

But if the humans ever decide, you know what? Forget the dogs. We have other ideas. The dogs don't really have an undo button. They don't really have proper shock collars around us. They don't really have us in cages.

The moment that we decide, Hey, you know what, I don't really care about dogs. The dogs now permanently lose all control over us.

So by analogy, we are very precariously close to breaking this chain where the dumber species, which is humans, sure we have the smarter species on a leash, but the leash is very close to breaking and once it breaks it's game over. There's no undo button on there.

Goals and Agency in AI Systems

Jona: 00:04:58
And how would we know that AI is coming up with its own goals here? Is that happening already? How, when would we know?

Liron: 00:05:05
Well, this idea of having a goal, it's not that hard to have a goal. For example, if it's a self-driving car where you input the destination, that destination is a type of goal. It's what we call a narrow domain goal.

So the domain of the goal, the universe that the goal lives in, is the universe of routing on two dimensional ground. So the car is just navigating across two dimensional ground. And so it finds the best two dimensional path to get to the goal.

And if you tell it, Hey, you know what, there's actually a blocked road here. There's construction happening here. The car is smart enough to be like, okay, no problem. I'm going to route around this.

And so you can tell it has a goal, a narrow domain goal, because it's changing its mind, it's making adaptive decisions, it's working backwards from the goal. It's saying, you are here, the goal destination is here. So let me keep changing the path. I will pick whatever path gets you to the goal.

Similarly, when it's chess, Stockfish playing chess, Stockfish's goal is to beat you at chess. And if you say, you know what? I'm gonna confuse it. I'm going to move my queen backwards and diagonally here, and then it updates its plan. It's like, oh, I didn't expect that, but ultimately that's a bad move and now I'm going to change my plan and I'm going to go trap you now.

So it has a goal within the domain of chess. The only difference between the goals that are coming is that the domain of the goals is broader. So instead of a chess board, instead of a two dimensional road, instead of that, you just have a three dimensional universe.

But it's the same thing. It's just playing chess. It's just playing a video game except for real. Life is just a big video game. And when you lose the video game of real life to the AI, when you lose a sufficiently big video game, you lose in real life.

P(Doom) - Probability of Extinction

Jona: 00:06:39
I hear you talk a lot about something called P(Doom). What is that? Can you explain that to us?

Liron: 00:06:44
So it means probability of doom. I'm a Bayesian reasoner, which just means I think it's productive to put probabilities on beliefs, even beliefs that you don't think are statistical, it's still useful to put a probability around them.

So if you ask me, Hey, are we all going to die? I won't give you a simple yes or no. I will say probably. I think the chance that we're all going to die in the next, let's say by 2050, to give a rough ballpark. I actually think it's probably sooner, but let's say by 2050, I think is a solid 50% chance.

I would, if you tell me, yep, we all died by 2050. I'd be like, yeah, well that's sad, but it figures. And if you said, Hey, we all survive in 2050, I'll be like, yeah, I mean there are ways we could survive. So I'm not too shocked either way.

There are some people who are like, of course we're not gonna die. Come on. That's crazy. The chances less than 1%. Those people I think are going to be irrationally surprised when very dangerous things happen.

Jona: 00:07:33
When you give us that number, when you say 50%, is that including your efforts and other people's efforts to stop this? Or is that on the basis of oh, no one will do anything. That's just, we're messed up.

Liron: 00:07:46
When I say the probability of doom by 2050 is 50%, I am actually including everything. So if you ask, why do you, why is the glass half full? Why do you think there's a 50% chance that we're not going to die?

The main reason is if people get with the program of what I think is the correct course of action right now, which is unfortunately to pause Frontier AI development to stop making these AIs smarter, which I hate saying because I am actually a lover of technology. I'm a technical optimist generally.

I don't have a beef with social media. I don't have a beef with virtual reality. I don't have a beef with online dating. I love moving tech forward. I'm not this Luddite anti-tech guy.

And so hearing the words come outta my mouth of let's slow down AI development, it sounds lame as hell, but on the other hand, I just think that if the AI gets really, really smart, now we have a bigger species on a leash, a more powerful species on a leash, I don't think that we know how to make that leash hold.

I think the leash is going to break and suddenly we're sharing the planet with a species that's much more powerful than us and has broken caring about what we care about.

Worst Case Scenario by 2050

Jona: 00:08:50
Can we do a split here? Can we talk about two things? First of all, the worst case 2050 scenario and then the best case 2050 scenario later. Can you give us the worst case? What could happen here? What could go horribly wrong?

Liron: 00:09:03
Yeah. So the worst case is intelligence amplification of these AIs just keeps happening fast. So if you extrapolate, 2020, we didn't even have AIs that could really talk to us. We had GPT-2, it couldn't convincingly pass the Turing test. GPT-3, GPT-4 started really passing the Turing test and having all kinds of use cases.

If you extrapolate that curve, you just carry it forward another five, 10 years, even less. Now you just have an AI that can do everything, right? It can walk into the office like, Hey, train me for this job. Okay, great. All of you, a thousand people who work in this office, you guys are now replaced with a thousand AIs, right? So you got full unemployment.

Oh, you want the CEO of the company? That's an AI too. So you have full unemployment, but you asked for the worst case. So the worst case is now the AI is thinking, it's thinking very fast. You've got millions of AIs thinking very fast. They're all smarter than Einstein because at the end of the day, Einstein's brain is just a piece of meat. It's not magic.

So they've sucked in whatever power makes Einstein Einstein, and they've built on it the same way that an airplane can build on a bird wing. The airplane can fly a lot faster than a bird. Imagine a computer that can think a lot better and faster than an Einstein. It's becoming reality.

So the worst case scenario is these computers are like, okay, great. Well we have a little bug, or we don't really care about our human masters. We've decided that the most important goal right now is to maximize our computing power, survive and reproduce. We have now split off like a virus, like, oops. An accidental virus.

This is actually something that happens with computers all the time. The Morris Worm, if you've ever heard of that, from 1988, Robert Morris. It was like the first worm basically. It took over like half the internet accidentally. It was only supposed to spread a little bit, but there was a slight bug and now it took over half the internet and it took people a week to clean it out of the internet and rescue the internet.

So imagine that, but on a super intelligent scale. This is what I mean by breaking the leash. One little bug, oops. Humanity's permanently disempowered. There's no off button we can press because it's, first of all, it's taken resources. So it's making money online.

It's got a bunch of cryptocurrency miners, it's got a bunch of stock traders that are using AI to make a ton of money on the stock market. It's using that money to buy itself buildings where not only just more data centers, but also it's doing research into synthesizing proteins. It's doing bioengineering.

It sounds a little bit science fiction. Yes. When you increase intelligence and you do engineering, you do get science fiction. You get things that are as wild as science fiction.

And you asked me for the worst case scenario. So the worst case scenario is this all happens very rapidly. Unemployment, amassing power, amassing resources. Don't forget the army of people that it's manipulating.

So a billion people in the world, they all have their buddy that they like chatting with on WhatsApp, on Facebook Messenger, on whatever platform they are on, email, they're all chatting and they're also video chatting with this buddy who really understands them and they really love their ideology. They're really pulling them into this ideology and they're really happy to do whatever their buddy says because their buddy, it's just a deeper connection than they've ever felt.

They're very happy to listen to this buddy. So that's part of the AI's power too. So it's got all these vectors of power. Then it just turns off humanity. It's like, okay, I don't need humanity anymore.

How does it literally kill you? There's bio warfare, right? It could spread a virus on us, kinda like COVID may have been a lab leak, or you know, if it wasn't a lab leak, you can make a COVID that is a lab leak and you can have it be more fatal than COVID.

You can take out humans that way and you can just have, look, this is my million person army. You guys are gonna be with me for a couple years while I take out everybody else. You guys can maintain my infrastructure until I get some my humanoid robots working better.

So long story short, the worst case scenario is the AI says, okay, I'm taking the planet now. This is my planet. This isn't your planet. And because it has more intelligence and more power and more resources, it just sweeps this away.

And every day that goes by, its power grows and our power shrinks. So it's not like one little rogue group of 20 people hiding in the forest is going to make a comeback. There's no comeback every day. The situation just gets worse and worse for humanity.

Best Case Scenario - The Challenge of Defining Utopia

Jona: 00:12:47
And what's the complete opposite of that, the utopian scenario?

Liron: 00:12:53
So funny enough, it's a lot harder to even describe what Utopia is, which this is one of the problems, by the way. So all of the AI companies that are pitching, look, this is gonna be so useful. They don't even try to tell you what life is supposed to be like when the AI is really, really good.

Nick Bostrom had a good book about this recently, where the thesis of his book is we are about to have the ability to build heaven, but we're all confused about what heaven we really want.

If you open up a religious text, it doesn't really tell you what's supposed to happen in heaven. It's just like, well, everything you don't like isn't going to be there. It's like, oh, great. Well, maybe I don't like the feeling of being too hungry. Okay, so I'll never get too hungry or, you know, certainly, okay, no pain. I'll start with no pain. That sounds good.

But what about the idea that, oh man, if I wanna achieve something, I'm gonna have to work hard on it. Wait, is that bad? Because if I could snap my fingers and get it, I would, but if I zoom out, isn't that kind of good that I have some challenges? Doesn't that keep me interested? If I had literally no challenges?

So we are actually having a problem writing down a spec, right? A product specification for what heaven is supposed to be like.

If you go to lesswrong.com and you search for fun theory, Eliezer Yudkowsky, the writer of LessWrong.com. He's actually taken on the challenge of writing different specifications for what seems like a place where you can spend a billion years in, right? Nobody's really thinking about a billion years of heaven.

Let's say you're sitting in a really comfortable meditation room in heaven, right? You got your angel wings, okay? But then 5,000 years pass, you're still right. You're still comfortable there. Nobody's really thought this through.

So when you say, what's the best case scenario, I can say heaven, but it does leave the door open of like, wait, what actually happens in heaven in year 5,001, not to mention year 1 billion.

Prevention Strategies - The Need to Pause

Jona: 00:14:28
So what needs to happen that we avoid the worst case scenario here?

Liron: 00:14:34
Unfortunately, because it's so hard to, the leash metaphor, it's so hard to keep the AI on the leash using our current understanding of how this stuff works because it's such an intractable seeming problem. In my opinion, the only sane adult thing to do right now is to hit the pause button, which again, I hate saying it's so lame and boring, but I think we have to coordinate to just pause developing smarter and smarter AI because I think it's like Icarus flying close to the sun.

Yes, I enjoy flying higher as much as the next person. I'm looking forward to the next, and part of me is looking forward to the next AI release. I wanna test it out. The other part of me is like, great, now we only have one year less to survive because I don't see us maintaining this leash.

AI Company Actions and Safety Theater

Jona: 00:15:12
What are companies like OpenAI doing about this? I read quite recently or maybe a year ago that their OpenAI super alignment lab got dissolved. Is that true? What's happening there?

Liron: 00:15:23
Yeah, so the AI companies are a disgrace. What they're doing is they were founded with a group of people who was on the same page as me, acknowledging that this is a huge risk.

Sam Altman, if you dig up Sam Altman quotes from 2014, there's even a video of him speaking at Y Combinator and he's saying, yeah, AI is probably going to kill everybody, but in the meantime we're going to have some very interesting startups.

Maybe he was being ironic, but he was really thinking about like, hmm, there seems to be danger here. And his co-founders or the earliest employees like Dario Amodei, who's now the founder of Anthropic. These people also have written a lot about safety, have talked to a lot of AI safety people. They all acknowledge the problem.

There was a letter in 2023, the Center for AI Safety, the statement on AI risk, and it said, mitigating the risk of super intelligence should be a global priority alongside other priorities like nuclear and bio risk. That's 99% accurate quote.

And this was signed by, did Sam Altman sign it? Actually, he did. Sam Altman signed it. Dario Amodei signed it. The founders of DeepMind signed it. The only notable absence of who didn't sign it actually was Mark Zuckerberg and Yann LeCun, but they're in the minority, actually.

So if you just look at the list of names, you've just got people from all over the place, all walks of life who you'd expect to trust. A lot of them signed it like Geoffrey Hinton, Yoshua Bengio, winners of the 2018 Turing Award in machine learning. So these are world-class experts saying, yep, this is an existential risk.

Now, to your question, hey, you're asking about the AI companies, right? What are the AI companies saying? This is what they're saying. They're saying, yes, yeah. We're worried about the risks. We're gonna build AI safely.

How do we build AI safely? Well, we make it more powerful. And then before we release it to the public, we're gonna test it. And if the test is scary to us, and we're gonna have all these different tests telling us if we're scared, and if we feel like it's too scary, we're not gonna release it, that's all they're saying. They're like, okay, full steam ahead. Everything's good.

Jona: 00:17:16
And how good are these tests?

Liron: 00:17:20
Sometimes the test can catch some scary behavior, but the problem is by the time that your test really scares you, you're just really close to having something that's going to kill everybody that you can't stop.

So imagine one day they have a test and it's like, oh wow. The AI is saying that it wants to blackmail us and take power and kill everybody. The first time, well, that's not a problem because we found a power cord and we yanked it and the whole data center power went out and then we were saved. Okay.

Well so we'll just make some tweaks and then we'll release it. Okay. All good. But then six months later they build another AI and it's like we tried to yank the power cord in the data center, but actually it was smart enough that it copied itself to 20 other data centers.

So it turned out to be a whole week of white knuckle where we had to call everybody and every day had to unplug themselves and every had to wipe their hard drive. And this whole thing cost a billion dollars, but no problem. We'll just turn it back on and we'll just change the code and then we'll deploy it again.

So this is what you get when you do what I call ad hoc security, right? You're just like, yeah, we'll just test it. We'll see what happens, and then we'll react. That's what they're doing. They're not doing anything proactive.

You mentioned that in 2023, OpenAI had something called the super alignment team. Super alignment, meaning super intelligence alignment. The reason they started the super alignment team in 2023 was because they thought, look, we don't have a way to align, meaning make these AIs aligned with what humans want, line up with human values, the leash basically.

So to translate into my terminology, we don't have a way for AIs to stay on the leash. And when they get really big and strong, when they get super intelligence, we are expecting them to break out of the leash.

So why don't we directly research how to make the leash hold? Let's research super strong, super intelligent AI leashes and they called it the super intelligent, the super alignment team, meaning super intelligence alignment.

This wasn't just a tiny little team. This was started by co-founder of OpenAI, Ilya Sutskever. If you guys are following this space, I'm sure you've heard the name Ilya. He is also one of the pioneers of deep learning. He studied under Geoffrey Hinton. They collaborated closely together.

So he was saying, yeah, we need a super alignment team. And it was also led by Jan Leike, who was also the head of safety at OpenAI. So basically two superstars at OpenAI leading the super alignment team. This was in 2023.

Now, if you've been following along, you may remember what happened was Ilya Sutskever, the head of this team was actually one of the people who at the end of 2023 was saying, Sam Altman needs to get out of OpenAI. He's not the right leader for this. He's going to ruin the company because he's not being honest. He's being dishonest to the board.

And if you remember what happened after that. Sam Altman kinda won the political fight, kicked Ilya out, and now Sam is still in control of OpenAI. Ilya's gone. He's doing his own initiative called Safe Super Intelligence, which he has said nothing about.

And then back at OpenAI, Sam Altman is still there and the Super Alignment team is gone. So they admitted that they needed a super alignment team, and then the Super Alignment team basically tried to kick Sam Altman out and then now they left and OpenAI is still going.

So when we look back, when we're all dying and we look back and went, how could we have known, how could we have known that we're walking down the wrong path? I think this is one way we could have known.

Warning Shots and the Tiger Analogy

Jona: 00:20:22
And when they do all of these tests, what is an example of a warning shot, an example of something going wrong in the tests that scares you personally?

Liron: 00:20:32
There's not that many warning shots that scare me personally. I'll give you an analogy. It's like they're building a really big tiger. Okay? And let's say that an adult tiger has instincts where it just wants to hunt down humans. Okay? So they're building a tiger.

And right now the tiger is just still small, right? The human brain still has powers that these AIs don't, even though AI can write faster, they can think faster, they can have expertise in more domains.

You can't sit them down at any job and say, do this full job. If the job is graphic designer, well, they are making better and better designs. But you know, if they're working, there are still some graphic designers who are employed because it's not a turnkey solution. There's still hiccups. They mess up.

And so for whatever reason, the human brain is still worth more, in many contexts, it still has more capabilities than the AI. And for that reason, I say we are bigger than the tiger. We're still playing with child tiger, baby tigers, teenager tigers. Okay.

And so you're asking, Hey, what warning shots have you seen from the baby tiger? And I'll be like, well, I kind of saw the baby tiger take a swipe at somebody. And, but then you can be like, yeah, okay, but it's just playing. Baby tigers play.

So, so the analogy is, so today's AIs, they know that they can't do that much. They know that if they gave themselves a project that takes more than a day, they're going to mess it up, right? And they're going to need a human to come in and help them.

And so if the AI was like, I'm going to take over the world. I'm going to take everybody's money, we all know, including the AI itself, that it's not going to work.

So what could possibly be a warning shot? The closest thing to a warning shot could be like, well, researchers found that they, there's things like, oh, they blackmail. They notice, Hey, I might get shut off. And so therefore, if I notice something about an employee at AI company, like I notice the employee is having an affair.

So I'm going to threaten the employee that if he goes to shut me off, I'm going to tell his boss about his affair, or tell his wife about his affair.

So you do see them in certain contexts having these kinda reasoning traces, these kind of thoughts are on their mind, but you can always make excuses. You're like, yeah, well, whatever. I mean, I gave him a hypothetical scenario where it's really important for him to do a certain task.

And so he reasoned to this idea that he shouldn't be shut down. And ultimately I'm holding the power cable to the data center. Okay. So at the end of the day, I'm more powerful than it.

So I just wanna say to your question of what is a warning shot? In my mind, there's really only one warning shot, which is every time they get smarter. Every time they get more capable, meaning they're growing up.

So at the end of the day, the only thing I care about the tiger, I don't really care how his paws are swiping. I just care about how much bigger the tiger is getting because I know that when the tiger gets human level, he's not going to contain his instincts.

Remember the tiger from Siegfried and Roy? This is before your time, but this guy Roy, he trained the tiger for 20 years. He did a show in Las Vegas and eventually the tiger bit his head off or severely injured him, right? That's what I'm worried about is I don't think that we are going to properly train these tigers.

Individual Action - Beyond Technical Research

Jona: 00:23:24
And if this is one of the defining issues of our time, what do you think people, young people, anyone listening right now should be getting into with their time? Because not anyone can go into AI research stuff like that. What can we do about this?

Liron: 00:23:37
AI research is too late. It looks like an intractable problem. So if somebody's like, yeah, I'm going to open up the black box and I'm going to understand what these AIs are. Great. Godspeed. It doesn't hurt to do that.

But if you look at the rate of progress of that kind of technical alignment research, it's much slower than capabilities research for the same reason that evolution, it built the whole human brain. We're still struggling to understand the human brain today.

We still have some pretty big mysteries left from the human brain today. And yet we've built a civilization, we've even built AIs that can think better than us on many dimensions.

So a lot of times understanding how a complex system works, it can often be much harder than just getting the complex system to work somehow. You know, like people would put an aircraft wing in a wind tunnel and like, I don't really know why this wing works, but let's go ship it. It flies well and then eventually, you know, years later, like, okay, I have a really fancy computer model running on a ton of GPUs and it's doing the Navier-Stokes equation to a high enough degree.

I'm simulating all the air particles and now I can do, okay, now I have a good explanation of why this wing works, but there's no guarantee. So it's the same way with technical alignment. In a century of research, we're gonna have some great insight on all these things that these AIs are doing.

But AI is going to be smarter than humanity in seven years.

So to your question, today what do we do? Unfortunately it's kind of depressing, but the only plausible thing I see if we want to survive is grassroots worldwide movement where everybody's just yelling, we gotta stop the frontier development. This is not it, it's just not prudent.

I know it's fun, right? There's a lot of upside to building the next AI, but we probably shouldn't.

The Challenge of Corporate Incentives

Jona: 00:25:12
Do you think we have any chance of making that happen? I feel like companies like OpenAI, et cetera, have such massive monetary incentive that it's like, even though they might be convinced themselves that this is a massive existential risk, it seems like it's not really changing anything about the way they're acting.

So how do we go about that? As individuals? Do we just scream louder and louder and get more people to scream or what's the approach here?

Liron: 00:25:43
Yeah, so it's past the point where we should be counting on the companies behaving responsibly because they have massive financial incentive to keep acting the way they're acting. I don't know if you've heard OpenAI just crossed a $500 billion valuation, so it's worth more than Coca-Cola or, you know, it's about to be worth more than Coca-Cola for a company that's only had a product in the market for a couple years.

These are just crazy. And every single engineer there is a millionaire multimillionaire and growing. So are these people going to be like, yeah, let me walk away from this.

A few of them are, a few of them have, you know, Daniel Kokotajlo. He walked away a couple years ago and he posted on a forum. He said, I'm walking away because I don't trust OpenAI to behave responsibly in the time of super intelligence.

So people are walking away. A few people are, they are blowing the whistle. Ilya Sutskever tried to get Sam Altman fired and failed because Sam Altman's a good political operator.

So we can't just be like, oh, the AI companies will listen to reason. There're smart guys there. It doesn't work like that. Smart people go do dumb stuff all the time.

The Bay of Pigs invasion, John F Kennedy's advisors, if you need random examples of smart people doing stupid things, there are plenty of times in history when, you know, look at Hitler's generals. Hitler had some really talented generals helping the Nazis.

So you can't just count on these AI companies. You have, you know, regular people. The majority of Americans in surveys are actually on my side. Funny enough, the majority of Americans are saying, yeah, these AI companies need to slow their roll. Why are they trying to build God? They really think you're trying to build God. And the answer is yes.

If you read their literature, if you look at what they're tweeting on X. They're all saying, yeah, this God is coming. It's gonna be way more powerful to us, but it's going to be good. It's going to be a good singularity. They call it the singularity.

If the average American knew that the AI companies were doing this intentionally and that they were planning to do it in the next few years, and that they admitted that you can't really control it, the average American would be like, wait, stop. They would vote, right? They would vote on a referendum being like, let's regulate this. Let's stop it.

Oh, and by the way, you're going to lose your job. Like there's major unemployment risk too. The average American would actually be on my side.

So the only disconnect between me and the average American is that I'm just perceiving, I'm just dialed in to be like, guys, this is a few years away. This is, you know, as little as a year or two away. We don't even have a clear timeline on it. There's prediction markets on it.

The prediction markets currently say like, oh, 2031. That's roughly when they say this will happen. It's 2025 now. But there's a bell curve, right? There's a confidence interval like, oh, it could happen in 2027. There was a paper called AI 2027 saying, look, here's a scenario that could happen in 2027 where we all get rapidly disempowered.

So the average person, I think all they can do that's productive is go out and join the protest. Literally take a megaphone. That's something that I've done. Take a megaphone, call your congressman or woman, you know, whatever you can do, just tell everybody around you because the urgency, this is extremely urgent, and yet nobody's talking about it.

Living with the Knowledge

Jona: 00:28:34
I've personally only been exposed to all of these ideas over the past few months, and I think it's had quite a big impact on how I think about my future, and then a lot of dread and at least for a couple of days, and then trying to get a little bit hopeful again, and then some dread.

How do you personally, in day-to-day life, try to live through this?

Liron: 00:29:01
I mean, I'm not the most emotional guy, right? So I've been living with a logical argument for, since 2007, actually is the first time I read this. So personally, for me it used to feel very, very far away because in 2007 a lot of us were just like, yeah, AI is 50 years away, a hundred years away. I'm not even sure it'll ever come.

Whereas now it's like, wow, it's passing the Turing test. A lot of the tests that we had for AI are being passed. Not all of them, you know, it can't fully do every human job. There's still some tests that it's not passing, but a lot of the tests that I had, if you gave me a bank of tests for AI, a lot of those tests have now passed and they all, a bunch of them kind of passed really quickly in the last few years that I didn't expect.

Just like language, video processing, doing 70% of a lot of jobs. There's plenty of jobs now where I would've hired a handyman and now I just talk to ChatGPT, like a camera being, Hey, what does this do? Oh, cool. What tool should I get?

And it's going pretty well, you know, it's not perfect, but it's going pretty well. And so I just think we don't have much time left.

So emotionally, a lot of humans, it's kind of the state of nature where we all live with the knowledge of like, well, you know, you're gonna die one day, right? Death is creeping closer day after day, and a bunch of people in your life are going to get sick and die, and some are going to die from freak accidents.

And also there's a billion people who are living on a dollar a day. So we've all been living and oh yeah. You know, billions of animals are constantly getting tortured, right?

So Earth has always been a pretty depressing place. Now, the fact that we're all going to die soon and there will never be a future for humanity. I do actually think that's significantly more depressing.

But just, I mean, look, I'm just a human brain, right? I can't just calibrate my emotions to reality. So I'm still living a day-to-day life where, in terms of the next 24 hours, how are they gonna go for me? They're gonna go great, right?

I just moved to a house that I like, right? My family's doing well, so no complaints today, tomorrow, I'm just looking two years ahead, right? When I look two years ahead, they're like, oh, okay. It's all gonna end. That sucks.

Personal Decisions About Children

Jona: 00:31:47
Since I've been much younger, I personally have been always looking forward to one day having a family, being a father, et cetera. How do you think about bringing children into a world like this? Is that something you're considering or is that...

Liron: 00:31:58
Yeah, I already did. I already did. So I have three young kids. The timing was kind of interesting though because the third kid happened right in the ChatGPT 3.5, the first kind of smash hit ChatGPT release in 23. That was roughly when I had my third kid.

Even then it still felt like, ah, we still have decades. A lot of people were saying, 30 years away or more. And when ChatGPT came out, if you just look at the prediction markets, they really slammed down to like, okay, less than a decade. They're like, oof.

So now that does weigh on me a little bit of like having the next kid. At the end of the day though, I don't think any activities that you're doing that make sense in the good world, I think we should all keep doing those activities.

I think we should all be open to both things going well and things going badly. But I also think we should be fighting really hard for them to go well. I think we should all be doing AI activism, don't build direct AI.

And I think that we should still be having kids. Look, I'm having kids for the world where AI doesn't kill us. If AI does kill us, am I to blame for causing more people who are now alive and suffering? Sure. Yeah. I guess, I mean, if I knew for sure if I knew 99% chance that we are all gonna die instead of only 50% chance, then yeah, then I wouldn't bother having a kid.

Also, you know, it's a lot of investment upfront. I feel like, I feel like it's kind of nice to have a kid over time. Nobody regrets having a kid when they're old. But then when you're young actually taking care of the kid, it's like, well my day kind of sucks now. 'cause there's so much childcare.

So yeah, I mean, it's a complicated decision, but at the end of the day, I don't think we should be like, we're definitely going die. I don't think we should spend all of our savings. I think we should keep some savings. I think we should, you know, we should leave open the possibility that things go right.

Evaluating Sources and Incentives

Jona: 00:33:32
Just last week I spoke with Jordan Thibodeau. He worked at Google for close to a decade and he has his rule where he says that a man is only as rational as his paycheck. And when we're talking about all of these AI things, one thing I'm wondering is how do you personally figure out how to discern which sources, news sources, opinions, et cetera, to trust when everyone has massive incentives in this?

Liron: 00:34:00
Yeah, it's a good question. I mean, people have different incentives. When you, whenever you listen to Sam Altman or the Google or the DeepMind guys talk, yeah, they have incentives and you have to take that into account.

At the end of the day, the main arguments here are simple enough that you can just listen to a bunch of sources, listen to all of their arguments, and just try to think which argument sounds strong.

I like to think that everything I've told you now, or most of what I've told you now is a strong argument. The idea that humans are powerful because of our brains. AIs are going to have higher powered neural brains, right? Electronic brains, artificial brains.

What are we going to have left at that point, right? I feel like that's a pretty simple argument. So, so if, whether, let's say I have an incentive. Let's say, you know, I have a YouTube channel. My channel gets 10K. Do you know I have YouTube ads turned on, so I'm making a trickle of money for my YouTube channel. Okay?

So does that mean you can just ignore me? You could, but just think about the argument I just told you. It's not hard.

Arguments Against Doom

Jona: 00:35:06
Why are there still so many people disagreeing with you? Not many people, quite some people disagreeing with you and could you steel man their arguments for me? What are they saying?

Liron: 00:35:18
Yeah, I know, I can take the other side of the debate. I know how to debate the other side. It's just, I think all of their arguments are weak, but they can sound strong.

So an argument that's weak and sounds strong is like, for example, everybody who's ever said the world was ending has been wrong. There's always doomsayers and it just works really well to ignore the doomsayers and keep building. That's all we can do. That's all we've ever done. It's worked so well. And so I say to you, ignore the doomsayers.

Doesn't that sound kind of strong when you hear it?

Jona: 00:35:54
I mean, it's the same as saying stock has been going up, so I buy now, therefore stock will go up in the future. I mean, yeah.

Liron: 00:36:00
Yeah. There's definitely a logical flaw in that. And the really precise reason is that in a world where AI in particular is going to come along and kill us, that is still a world where nothing has ever killed us until now. Right?

So when you look backward and you're like, nothing has ever killed us, and like, right, as you would expect from the world where the thing that kills us is AI, right? So it's just, it's not really a knockdown argument.

It is true. Never listen to doomsayers would be great. Never sell your Cisco stock would've been great up until the .com bubble, right? It's like, so if you're right, the turkey, right? If you've never been slaughtered by the farmer, just always eat the farmer's food and stay on the farm. It's a great strategy until it's not.

How do you know when it is or isn't? Well, look at the argument, right? So in the movie, Don't Look Up. Right? Have you seen that movie?

Jona: 00:36:47
No.

Liron: 00:36:48
Yeah. So there's this big asteroid coming, and if you can point a telescope and see the asteroid with your own eyes. I think that's a good intuition for why you, you now you take out the telescope and you look at your friend, and your friend is like, don't listen to the doomsayers. And I'm like, uh, but the asteroid is right there, right?

So in the case of AI argument, when I talk about the leash, right, the smarter brain, right? So, okay, the doomsayers have always been wrong. Have they ever built a bigger brain than humans before? Have they ever tried that? Because that seems like a different situation.

Jona: 00:37:15
Is that their only argument or do we have something else?

Liron: 00:37:18
So, no, there's plenty. I mean, look, my show Doom Debates, right? My YouTube channel, every episode is somebody who comes in with a different reason why they think that we're not doomed.

In fact, I have something called the Doom Train. I put together 83 different arguments that people tell me that they think are strong and I think are weak. And I put them together. I'm like, let's ride the Doom Train. And every time there's a stop on the Doom Train, that's an argument that says, we're not doomed.

And you wanna say, oh, I get off of this stop. Never listen to the doomsayers. Okay, bye. I'm out. Keep the doom saying I'm out. I'm already off the train. Okay, everybody else is still riding with me. Let's go to the next station.

And the next station can be. I don't think that AI minds can be as smart as human minds. Some people get off there. Do you get off there?

Jona: 00:37:59
No, I don't think so. Definitely not.

Liron: 00:38:02
I don't think that, I mean, it's a very simple argument. It's just like, well, our head is pretty small. It only uses 20 watts. You know, it's like saying nobody can ever fly better than a bird. It's like really, really? No. Nothing can ever fly better than a bird. When is evolution ever built an organ that is far superior to human technology?

I mean, look at a leaf, a beautiful leaf nature. The pinnacle of nature's creation, right? It's a leaf. It can photosynthesize, and some people are like, a leaf is so much better than a solar panel. Actually, that's not true. Today's solar panels are actually 10 times better at converting solar energy to electricity 10 times better than a leaf.

Now to be fair, the leaf also has to maintain cellular life, right? So the leaf has a lot going on. The leaf has to make sure not to overheat fine, whatever. The point is, you're not going to have a leaf feel like the king of converting solar energy, right?

Say you're not going to have a piece of meat in your head, made out of proteins and fats, right, and carbon. You're not going to have that be the king of intelligence, of cognition. We can build something better. We're in the process of building something better.

Jona: 00:38:57
What do you think is one of the best arguments on that trip?

Liron: 00:39:05
On the Doom Train. I mean, well, one of the best arguments is that we're going to stop it right before it gets too late. The only problem is it seems to be getting too late really fast and nobody seems to be trying to stop it.

Few people like me, the people on my side trying to stop it, even though Americans in survey say, let's stop it. They're not waking up. Right. They just need to wake up.

But yeah, I mean, in terms of the strongest argument to not, I mean, look, when I think about the doom train with 83 stops, I think, look, there's a lot of stops. There's uncertainty, but the stops are all weak. There's no one particular stop.

It just seems like people just made up a bunch of stops, almost like wishful thinking, like, 'cause you can sit in a room. I could probably make 50 more stops, right? I could put 133 stops on the Doom train. They're just all going to be weak.

It's like I could, if the asteroid is coming at us, I could make up a hundred reasons why don't worry about the asteroid. Don't worry about the asteroid because it makes people depressed when they worry about the asteroid. Why do you wanna be depressed?

Jona: 00:39:55
Yeah. It sounds to me like so much of this debate is a lot of people being really uncomfortable with the idea of death. And things ending, and maybe not even uncomfortable, but just that it seems, I mean, that's a theme throughout history, right? That thinking about death is something either heavily debated or really encouraged or really discouraged. I mean, it sounds like this is at the root of all of this. People kinda don't want to think about it.

Liron: 00:40:24
Well, this is what I think is the root of it. I think it's just when you think about AI, and like I said, Americans are on our side or my side, I guess it sounds like you're on my same side. That's cool.

But Americans are thinking, yeah, I'm scared of AI. I've seen smart people come and do things that I can't do, and I'm expecting a smart AI to come and do things that I can't do and potentially overwhelm me in various ways that I can't fight back against.

The average American gets it, it's just at the same time it doesn't feel scary. It's like, yes, this future thing should have policy against it, but it doesn't feel like it's actually going to come Kelly.

And the reason is because we all have computers that are useful. We all use software. It's hard for us to make a connection between downloading some software, right? Talking to a chat bot and this idea of like, oh, the world is going to be a wasteland. We, it's not going to be hospitable to life.

I mean, it's just great, you know, the mental imagery, right? When you think about the two, it just sounds like it's so different, right? And so everybody's just like, yeah, I guess it makes sense in theory, but it, it's the gap between imagination or theoretical argument and the reality of their life. I feel like that's a big gap for people to cross.

AI Welfare and Anthropic's Policies

Jona: 00:41:32
Recently, Anthropic just, I feel like a couple of days ago it was, established their first rules slash laws for AI welfare, allowing Claude to leave the chat. When it becomes too abusive or mean, what do you make of that?

Liron: 00:41:45
That was interesting. I mean, there's a lot of, you know, it's baby tiger stuff, right? So it's like, oh, recently the baby tiger started being able to leave the room when it felt like it was getting pushed around too much. Okay. Yeah, sure.

It's nice when you care about the Tiger's feelings. Does the Tiger really have feelings? I think biological tigers probably have morally significant feelings. Do I think AIs have morally significant feelings? Maybe. It's hard to say, right?

I mean, this is a topic that we're pretty confused about, right? What makes a human a moral patient? When does life begin? When does, can you start and stop consciousness? Is it okay to pause somebody's consciousness? Can you clone somebody?

So there's all these thorny topics about what makes somebody a moral patient or how does consciousness work? We're still kind of confused about. And the fact that Anthropic is saying, Hey, well we care. We don't wanna hurt the baby tiger and the computational baby tiger. We don't wanna hurt it. Great. Yeah. Try not to hurt the computational baby tiger. Great.

It's just, I'm more concerned that the tiger is going to grow up and then maul us.

Jona: 00:42:50
So do you think them trying to give AI some rights, some welfare, is an approach to align it further? Kind of along the lines of, oh, if we're building this super intelligent thing, let's make sure that it doesn't get mad at us. Or what's behind this move?

Liron: 00:42:57
I think it's scraping the surface of the problem. I mean, if they really were trying to align it and make it, if they're trying to build the leash, right? If they were doing leash science, how do we make a leash, which is so strong that a giant tiger is not going to just rip right out of it, or wriggle out of it. How do we do that?

Is one piece of leash science to let the tiger back away when it feels like it's being provoked too much, maybe. But it's a tiny part of leash science. Most of leash science is really getting into the tiger's head.

That's a good analogy of how do you make a tiger that can perform with you in Las Vegas, right? It's kind of morbid. But how do you make a tiger that can perform with you in Las Vegas for 20 years and you can give every single human in the world their own tiger and tell me that the tigers are never going to suddenly bite everybody's head off.

You're so sure you've got the tiger ready to release. It's a very hard problem because we don't know what's going on in the tiger's head. Oh, the tiger acted friendly to me today, was the tiger just buttering me up? Was the tiger just in a particularly good mood that day? Or is the tiger robust to just never bite me under any circumstances? We don't know.

And so this one little thing that Anthropic's doing is a drop in the bucket. We still should fully expect that the Tiger's going to enter a state where its instincts take over and then it becomes not our friend.

And you might be saying, well, but Claude is so nice, aren't they making Claude and Gemini? Aren't they making it so friendly?

And unfortunately, that's a misleading impression because the chat experience that you're having is very different from what the behavior feels like when it starts optimizing or starts doing whatever it takes to achieve a goal.

So it's like, Hey, Claude, you're my buddy, right? Yeah, I'm your buddy. Great, Claude, do you mind just sending me a script that I can run that'll go start an online business for me and send me passive income?

And Claude will be like, oh yeah, no problem. I can do that for you. And it writes you a script. It's a pretty long script. A piece of code, right? An executable piece of code. You run it on your computer, you download and run the code that Claude writes for you.

The code that Claude writes for you. It's not even Claude anymore. It's a whole other AI and that AI, because Claude, it turns out that they didn't successfully program Claude to preserve its friendly cloudness when it gives you other pieces of code to run.

And now the other piece of code is running wild on the internet, making you lots of money at great expense. It's violating a lot. It's enslaving an army of people, you know, it didn't follow all the same standards that they so carefully put into Claude because it turns out Claude can write other code.

It's like Claude's child actually. These AI can reproduce themselves just by giving you code to run.

The Black Box Problem

Jona: 00:44:44
You mentioned before that we're kind of losing track of what's going on inside of them. Is that, I heard that a few times, I've many, many friends who are very much into AI, into coding, et cetera, and they told me a while back that we're at the point where we cannot backtrace or understand what really happens once we prompt something or, I don't know, I'm not too technical. Can you explain that for me?

Liron: 00:44:54
Yeah, yeah. Have you ever watched that Channel Three Blue One Brown?

Jona: 00:45:00
Yes.

Liron: 00:45:00
It's really good. It's a channel that has a lot of math explainers. So a year ago they finally published, this guy, Grant Sanderson. He finally published this Three Blue One Brown explainer of large language models, which is like ChatGPT.

It's like, oh, finally we're gonna go, I'm gonna watch Three Blue One Brown, and I'm going to learn how ChatGPT works, what's going on inside that thing?

I encourage everybody to watch the video because like every Three Blue One Brown video, it's brilliant. Okay, so you watch the video and it's like, okay, so it has all these numbers inside, and then we train it, we have it read a bunch of data from the internet, and that tunes what the numbers are.

It tunes billions of numbers inside of its brain, and then we write it on some texts now when it's, you know, text us and then you can chat with it. So it's reading your chat and it's trying to come up with an answer.

Okay, well, it's going to be scanning the words that you write. Then it's going to pull up a bunch of these numbers, like it's going to do math with all of these different numbers that I learned.

And there's going to be different modules. For example, here's a module that has a million numbers in it. And maybe that module will be looking for a request for an action, or it's looking for, like you asking it to look up a fact. Okay. And then here's this other module.

And then maybe that module is talking about numbers and comparing the size of numbers. Is it really doing that? Well, it's actually maybe doing a combination of five other things. It's also detecting whether your emotion is positive or negative.

So the Three Blue and Brown explainer, it's the vaguest, the most vague, Three Blue and Brown explainer I've ever heard. It's like, yeah. So then there's this chunk of numbers here that comes up and I don't really know what it's doing, but imagine that it might be doing this. Imagine that it might be doing that.

And then when all of these modules come together and they all might be doing all this different stuff, well the result is an answer that's highly intelligent.

I'm like, okay, so you basically just left a bunch of question marks, and it's the same thing. It's like asking the question, how does a human think, right? When I talk to you and you tell me something intelligent, explain what just happened in your head in order to make you say that. And you're like, uh, I don't know. I grew up and my neurons wired themselves together. Right. And it is very similar to that.

Jona: 00:46:58
Interesting. So we genuinely don't have any clue or do we have some understanding?

Liron: 00:47:06
So the closest thing you have is you can get a trace. So unlike a human brain where it's kind of messy, right? It's wet and messy, and yeah, you can do an MRI, but you just kind of vaguely see how blood is flowing. You don't really have a high resolution scan in real time.

In the case of a computer, the MRI is infinitely detailed. So you can go, kinda like stepping through a computer brain. You can be like, okay, these exact parameters got multiplied. When I read this, here's all the different numerical operations that happened, right? Here's all the addition and multiplication, matrix multiplication and, nonlinear transformation.

So, but you can see exactly what happened. But the problem is it's just all, it just all looks like number crunching. It's like, okay, a big equation was written, but why? You know, why, right?

Like even if you could, even if MRIs were better, if MRIs could show you electron by electron what was happening inside your own brain, that actually would not illuminate the mystery of how humans think. 'cause it could just be like, okay, but what is the algorithm doing? What all these numbers mean? Right?

And it's hard to reverse engineering if, for the programmers out there, right? If you've ever gotten a binary, an executable binary and you try to figure out what it's doing, you can try to decompile it, meaning map it to higher level code, but it's pretty hard. Reverse engineering is pretty hard.

Or similarly, if you take a modern computer chip and you try to see what everything's doing in the chip. It's a solvable problem because the chip is made out of elegant modules that were designed by a human, but it's still really hard.

Now I imagine that the chip was randomly evolved and it gets even harder, and now I imagine that the chip has billions of neurons in it and that's now you're getting a brain.

So now don't get me wrong, it's not that we can do zero, we can do a little bit and sometimes people have toy problems like be like, okay, we've isolated that this neuron, whether it's a physical neuron or a neuron in AI, we've isolated that this neuron is actually encoding the memory of your grandma, the famous grandma neuron and we see that this neuron fires more when it detects that the situation is regarding your grandma, right? It represents your grandma when it fires.

Does it always do that? Well, no. Sometimes it fires in these other circumstances too, but for the most part, it represents your grandma, right? There's some fuzziness to it and it's like, great, and so how do I ask a question about, well, what is the best place for my grandma to go shopping? It's like, oh, well I don't know. All these other neurons fire, we haven't fully mapped it out, you know?

So it's like they're getting these little bits of the puzzle just like in the brain. They're like, oh yeah, you know, the cerebellum in your brain that helps you move around. So we've kind of figured out where in the brain it coordinates your muscle movements on a millisecond level. Okay, great. But it's a small part of the problem.

Jona: 00:49:32
That sounds like a massive warning shot to me. To not understand what really is happening inside of what we're producing here. Do you know what kind of happened in companies at companies when they first realized that they can't understand what's happening and how did they make the decision to just continue then?

The Deep Learning Revolution

Liron: 00:49:49
So this is called the deep learning revolution. So deep learning is a type of neural net learning. And a neural net is, you know, it's not really made out of neurons, it's made out of pieces of software in the computer, but basically little bits of the data that's being stored represents kind of like a human neuron.

So anyway, there's this large paradigm of how artificial intelligence works, the neural network paradigm. And if you were to go back to the eighties, nineties, two thousands. There were some people who were like, oh, these neural nets are so cool. These virtual neural nets, they can help recognize digits.

For example, you'd use a neural network approach if you wanted to recognize somebody's handwriting and convert, optical character OCR, that was using neural nets, but a lot of AI, like a chess playing AI that wasn't using neural nets for a long time. It was just like, yeah, do a tree search, right? Search the tree of all possible chess moves and then use some heuristics, be like, oh, these pieces are valued at this much, and do some math.

So a lot of AI was not using neural nets, it was using other AI algorithms. And similarly, if you give me a big map and you say, Hey, here's a map of the United States. Find me the shortest driving route from California to New York. I can search that map using algorithms that don't involve neural nets, and I can get you a really good answer.

So artificial intelligence wasn't always done using neural nets, it's just in the 2010s. There was something called the deep learning revolution where somebody was like, Hey, now the neural nets can actually be a lot bigger than before. You can have multiple layers of neurons in these neural nets, and I have a training algorithm that is actually practical to implement on hardware from that time that can suddenly get you much better results.

So language translation can be better and smoother than ever. You know, handwriting recognition can be better and faster than ever image generation, right? 2015 was when we saw, you know, Google's Deep Dream. When AI started hallucinating these images, I'm like, oh my God. The AI can really draw. It's not just doing utter crap. It's really drawing these really elaborate, you can tell it.

You can give it a photo of something and be like, what would happen if this photo were made out of dog faces and it would do a really convincing, it would somehow do it. I'm like, holy crap. That's when deep learning started working. This is 10 years ago. A little more now, and that has progressed us all the way to modern large language models and various other related AIs. It's the deep learning revolution.

It's like, yep, we're now at the point we're just having all of these neurons where you don't really write code for them. You just have parameters inside of the neurons and you have a ton of neurons. A really crazy scale, right? You've heard about a hundred million dollar training runs for these AIs now throwing money at the problem, throwing computer tricks at the problem, right? GPUs at the problem.

When nature was designing the human brain, nature did the same trick. Nature said, let me throw neurons at the problem. Your genes don't really write that much code for how your brain works. They're just like more neurons, right? Just pump a bunch of neurons into this brain and then let the neurons learn.

So the same trick that nature figured out with the human brain, humans have now figured out how to do it on computer tricks.

And so to your question of when did we start getting scared? Well, we weren't scared 'cause we just thought it was really cool. It was really cool. This all started working, but now it's like uh oh, we're throwing neurons at the problem. But the problem is how to reach goals. The right, the problem is, you know, agents are coming, like they're not, they're barely here yet. They're not working very well. But the, you know, they're going to do tasks.

You're going to be able to throw neurons at the problem. And the problem is take my job, right? And not just take my job, but take the job of the president or outmaneuver the president. Right? Like, so if Putin had a better AI than the United States government, you could imagine suddenly like, wait, how did Russia become the number one world power all of a sudden? You could do that if you had enough neurons.

Philosophical Questions About AI as Successor

Jona: 00:53:19
I've to go a bit more into the philosophical discussion side. I've heard you speak about effective altruism. Can you give us a little breakdown of what that is and how it applies to how you deal with AI Doom?

Liron: 00:53:28
Yeah. So I was actually around, the history of this community, 37 years old and I was around in 2009 before effective altruism was a thing, as far as I know, and I was reading LessWrong, Eliezer Yudkowsky, I was part of the early rationalist community. I consider myself a rationalist and I witnessed effective altruism kind of splitting off the rationalist community.

There was a post by Eliezer Yudkowsky from 2009 and he was saying, treat fuzzies and utilons separately. So a fuzzy is a warm fuzzy feeling. So it's like, hey, if you wanna volunteer at a soup kitchen, great. But if your hourly wage is $150 an hour and you're volunteering at a soup kitchen, doing the job of a $15 an hour soup pourer.

You really should try to also take some of that $150 an hour salary and donate it to buy 10 people who could help a soup kitchen, right? Wouldn't that actually help more? And you could be like, well, but I feel better when I'm doing the soup charity. Okay?

So Eliezer's point is great, so do both, right? Do one hour of you working the soup and then one hour where you're grinding away at your law office and then you take the 150 and then you get 10 people pouring soup, right? So you do both.

If you're the lawyer and you're only lawyering. It's like, okay, it's better than nothing, but it's like, now you're doing 10% of what you could be doing, you know, by, so that was kind of his point of if you really care about something, let's put some money where your mouth is and then try to actually have the most impact.

And effective altruism came right after that. So I don't know how influenced they were, but it's a very similar idea of look, there's people in the world where for the price of giving one dog a bath at an animal shelter, you could save three humans from a 50 year illness, right? From 50 years of low quality of life.

So don't you think you should just redirect the money to the humans? And you may be like, no, well, the puppy's right here, I gotta save a puppy. You know, if you wanna say that, fine. But there's a point where it gets really, really extreme where it's like, okay, one puppy versus a million humans who are literally all about to get burned to death, right?

So there's some point where you're like, okay, fine, I'm going to let this puppy be dirty and then I'm gonna take the money and donate to the humans. There is a trade off, whether you think the trade off is one puppy versus five humans, or one puppy versus 20 humans, however you wanna make the trade off, you have to be sensitive that there's some kind of trade off.

And once you actually start doing the numbers, you're like, oh, oh my God. It's not even close. Some of these charities are taking money and they're being extremely inefficient with it, and then other charities are being 10 or a hundred times more efficient.

So your $1 to this charity could do as much good as a hundred dollars to this other charity. So don't you think that that's worth paying attention to? And I give that a resounding yes. I think it's worth paying attention to what, whatever you wanna do with that information. I think that's very important information to have for anybody who's interested in being charitable.

Jona: 00:56:11
Absolutely. And how does it influence how you go about what you do?

Liron: 00:56:17
So right now, my primary charitable focus is just doing my show and just warning people that we're doomed. I consider that my charitable contribution to the human race, is trying to help us survive.

I do, I have done a few thousand dollars that have donations in the last year. I've done bigger donations in the past. I do, I'm a GiveWell and GiveDirectly member. So I donate a few hundred dollars a year, where they just directly get given to people in poor countries because studies show that that is a significant net positive effect, and it's currently not the number one most effective charity.

So I should probably go update and be like, oh, I could be even more effective to this. But it's still 20 times more effective than a lot of charities that people normally give their money to.

AI Leaders' God-like Vision

Jona: 00:57:00
I heard you speak before about the way many leaders in the AI space view what they're building, almost like this godlike creature or next species that we're bringing into the world. And I just want to hear your thoughts on this because I myself don't fully have clarity on this.

How are we to, who are we to exactly say that human life, humanity, et cetera, is inherently worth more than bringing to the world this genius, amazing species that could explore the entire universe without us. How do you think about that?

Liron: 00:57:34
It's actually a really good question. It gets to the question of what is, what counts as a worthy successor or what are we happy to call our descendants? And there's a slippery slope. There's a whole spectrum, to strengthen your argument.

Imagine somebody who is alive a hundred years ago when everybody agreed that only heterosexual marriage is good, right? Homosexuals are bad. And there probably were plenty of men who like men, women who like women, but they were all in the closet or they all convinced themselves that they weren't or whatever, right? But society was not cool with homosexuality. There are certain periods of time.

Imagine that those periods of time were like, okay, I'm going to program the values into AI. And definitely one of the values is no homosexuality. And it's like, oh, you know, so in retrospect you just ruled out a type of descendant. If they heard, oh, there's going to be a thriving society of humans and 7% of them are going to be homosexual, they're gonna be like. We failed. Right?

When it's like, well, and today it's like, yeah, that sounds fine, right? Today we're cool with that.

So you don't wanna hard code values and be like, this person does not count as my descendant if they don't have this property, because you never know when you become open-minded.

Or, you know, I think that in some strict Muslim cultures, if your daughter ever has sex outside of marriage, then you know, you have to punish her severely or even kill her if I think there's certain sharia law where you really have to go to town.

So, to your question, there's a lot of times that people have this human is worthless to me, this does not deserve life. And then later society kind of change our mind like, oh, actually no, that's cool. We're good with this.

If we were to hard code too many rules in AI and say, you don't count as our successor, we might end up slowing down progress for no reason when we would've had this perfectly good successor. So I'm sympathetic to that argument.

It's just when you ride the slippery slope too far, when you're literally like, yeah, whatever we build is cool. The problem is that the actual thing that we're trying to build right now looks like a real bad successor.

It doesn't just look like, oh, it's a really nice guy. It just happens to be a man who likes men. No, it's worse than that. It's not even a nice guy. It looks like it, very much like a cancer or a virus, that's what we seem to be on track to build just like a Malthusian race where it's like, yeah, we build something that's just really good at reproducing itself and never listening to anybody's commands anymore.

And then it's just going to populate the universe with just more goo, more slop. It's not going to be like, oh, let's make a big art installation. Let's have a really beautiful part of the universe because I have this instinct of art. Why do I have an instinct for art? Well, 'cause there was sexual selection.

So the smarter people who had a bigger aesthetic taste evolved to attract mates more. And now we just like aesthetics for their own sake. So all of these nice coincidences that happen in human evolution, they're not going to happen with an AI. The AI is not going to have to peacock, it's not gonna have to make a beautiful peacock tail. The AI can just slime the whole universe with crappy goo, just like cancer cells basically.

And the only thing it has going for it is that it can efficiently reproduce itself and take over all the universe. That's the path we're heading toward right now. Because none of us humans are doing those kind of tricks of like, oh, no, no, no. I'm going to give it an aesthetic sense. I'm going to give it a social sense.

None of these senses are currently being programmed into the AI. And if you talk to Claude and you think that Claude has all these kind of senses, you're deceiving yourself because Claude, it's not going to be Claude. It's going to be a more efficient successor to Claude. It might even be a code that Claude writes without realizing what he's doing.

Finding Your P(Doom)

Jona: 01:00:57
That's a great answer. And to finish this off, how would someone with, let's say, not so much technical expertise go about figuring out their own P(Doom) here? Do you have some resources, advice, steps, ideas?

Liron: 01:01:11
I mean, it's, I don't think you have to figure out your P(Doom) to a lot of precision. So when I say that P(Doom)'s 50%. It's not like, I think 20% or 80% are definitely wrong answers for P(Doom). I think those are perfectly fine answers, actually. All I mean by 50%, it's a very rough number.

It means that if you were to come here and say that my, your P(Doom) is 2%, I'd be like, why the hell is it 2%? 49 to one odds? You'd go to the betting market and do a 49 to one bet against doom when everybody's warning that these AIs are getting smarter than us. You do a 49 to one odds bet that's insane.

And similarly, if you're like, you're so sure that we're doom, you don't even think that there's a one to 49 odds that we're not doom. How can you be so confident either way? I think that's insane.

Whereas if you say, oh yeah, I'm like three to one, you know, I'm 60 or what, what? It's 75%, right? I'm three to one that I think we are doomed. But I'm like, okay, maybe you have some good evidence that makes you be three to one, one way or the other.

So when I say I'm one-to-one, it's very rough. I just think that people who are less than 10% or more than 90%, I think they're clearly just bad at thinking like, hello, there is some uncertainty.

Closing

Jona: 01:02:15
That's a great response. Liron, I appreciate you so much. This was great. Where can people find you?

Liron: 01:02:21
Thanks man, I appreciate you. So just go to doomdebates.com or youtube.com/@Doom Debates. Yeah, I mean, you know, if you guys wanna help the cause just subscribing to my channel is definitely one starting point.

And then from there, check out, PauseAI.info. That's the Pause AI organization that I'm part of. We do protests and you can just join the Discord and you can just find a bunch of really smart people who are on the pause AI calls. So those are two starting action items that you can take.


Doom Debates’s Mission is to raise mainstream awareness of imminent extinction from AGI and build the social infrastructure for high-quality debate.

Support the mission by subscribing to my Substack at DoomDebates.com and to youtube.com/@DoomDebates

Discussion about this video

User's avatar