0:00
/
0:00
Transcript

PhD AI Researcher Says P(Doom) is TINY — Debate with Michael Timothy Bennett

Dr. Michael Timothy Bennett, Ph.D, is an award-winning young researcher who has developed a new formal framework for understanding intelligence. He has a TINY P(Doom) because he claims superintelligence will be resource-constrained and tend toward cooperation.

In this lively debate, I stress-test Michael’s framework and debate whether its theorized constraints will actually hold back superintelligent AI.

Timestamps

  • 00:00 Episode Preview

  • 01:41 Introducing Michael Timothy Bennett

  • 04:33 What’s Your P(Doom)?™

  • 10:51 Michael’s Thesis on Intelligence: “Abstraction Layers”, “Adaptation”, “Resource Efficiency”

  • 25:36 Debate: Is Einstein Smarter Than a Rock?

  • 39:07 “Embodiment”: Michael’s Unconventional Computation Theory vs Standard Computation

  • 48:28 “W-Maxing”: Michael’s Intelligence Framework vs. a Goal-Oriented Framework

  • 59:47 Debating AI Doom

  • 1:09:49 Debating Instrumental Convergence

  • 1:24:00 Where Do You Get Off The Doom Train™ — Identifying The Cruxes of Disagreement

  • 1:44:13 Debating AGI Timelines

  • 1:49:10 Final Recap

Links

Michael’s website — https://michaeltimothybennett.com

Michael’s Twitter — https://x.com/MiTiBennett

Michael’s latest paper, “How To Build Conscious Machines” — https://osf.io/preprints/thesiscommons/wehmg_v1?view_only

Transcript

Episode Preview

Liron Shapira 00:00:00
Michael Timothy Bennett is an AGI researcher. He’s already published over 20 peer reviewed publications and right now you’re coming in hot after writing this thesis. You just turned it in. It’s called How to Build Conscious Machines. In your opinion, is Einstein much smarter than a rock?

Michael Timothy Bennett 00:00:16
Yes, but this is because of my particular human value judgment.

Liron 00:00:21
There’s some other entity that could judge the rock as smarter than Einstein.

Michael 00:00:24
Einstein did not do a great job of persisting.

Liron 00:00:27
Look at the logical connection between wanting to achieve some arbitrary goal—paperclips is a classic example—and wiping out humans. Wiping out humans is actually a convergent outcome, unless the goal is to not wipe out humans.

Michael 00:00:39
I mean, that seems like a big leap. That is a very big leap.

Liron 00:00:43
Okay, so let me ask you the question this way. If the AI’s only goal was paperclip maxing, would that not imply humans not surviving long?

Michael 00:00:50
Kind of an absurd scenario because we’re assuming that we already have a system that is capable of doing all of this. It kind of assumes we get this leap to omnipotent.

Liron 00:01:01
Just grant me the logic. If we had an AI that really just wanted to paperclip max and was smarter than us, then we die. Right?

Michael 00:01:09
If we had something that’s omnipotent and wants everything to be paperclips, then everything is going to be paperclips.

Liron 00:01:15
I mean, by definition, right? By definition of omnipotence, yeah. So you don’t think intelligence kind of approaches toward omnipotence?

Michael 00:01:23
No.

Introducing Michael Timothy Bennett

Liron 00:01:41
Welcome to Doom Debates. Michael Timothy Bennett is an AGI researcher seeking to understand life, meaning, and consciousness. He’s currently at the very end of his PhD program in AI and machine learning from the Australian National University. He’s already published over 20 peer reviewed publications. His research draws on AI, machine learning, complexity science, philosophy, algorithmic information theory, and biology. He also has a diverse background in music management and game development.

So today I’m excited to explore the ideas from Michael’s recent PhD thesis which is called How to Build Conscious Machines. And we’re going to compare our views about AGI and of course debate our P(Doom). Michael Timothy Bennett, welcome to Doom Debates.

Michael 00:02:27
Thanks, it’s a pleasure to be here.

Liron 00:02:29
All right, so let’s start from the beginning. How did you get into AGI research?

Michael 00:02:33
I had the bright idea to go back to university and study computer science for six months. And then I met Marcus Hutter, who was a professor at the university I was studying at. And he had this AIXI model of general intelligence.

So I started to dig into that, got interested in the foundations of intelligence, started working on fractal compression as well, and then trying to combine the two. That sort of spiraled and I ended up working on a whole new field of enactive general intelligence.

Liron 00:03:07
You mentioned on your site that you’re doing both theory and applied research. Is that right?

Michael 00:03:11
Yeah, much more of the theory lately, but the intent was definitely to do more applied.

Liron 00:03:17
All right. And the other thing I notice is you’ve already got 20 research publications. How many publications would the average person who’s about to turn in their PhD thesis have at this point?

Michael 00:03:29
I think maybe one or two. I’ve seen some people with six or seven, and I met one guy in the UK who has 20. I have never met anyone else who has quite so many. So it’s unusual.

Liron 00:03:41
Yeah. So you’re way ahead of the pack. Do you have good work habits?

Michael 00:03:46
Yeah, I think I am obsessive. I go through phases, but I went through periods of very intense productivity where I just didn’t go outside for a while and wrote three or four papers. Or in the case of my thesis, I didn’t really do much between March and May and just wrote pretty much the entire time.

Liron 00:04:08
And right now you’re coming in hot after writing this thesis. You just turned it in. It’s called How to Build Conscious Machines. It’s got a lot of interesting chapters. We’re going to dive into the topics.

I think it’s great when people like you are deep into these intellectual topics, writing hardcore PhD work. But then you come over here to a show like this and we’re going to introduce it to the mainstream where we’re going to dumb it down a little bit. Sound good?

Michael 00:04:30
Sounds great. Yeah. Thanks for the opportunity to do that.

Liron 00:04:33
Yeah, yeah, my pleasure. Okay, so this show does like to focus on doom and the probability that AI is going to come and kill everybody. So are you ready for the first major question?

Michael 00:04:45
Sure.

What’s Your P(Doom)?™

Liron 00:04:47
All right, Michael Timothy Bennett. What’s your P(Doom)?

Michael 00:04:56
All right. I’m going to separate this into two things. P(Doom) with AI and P(Doom) without AI, I guess, and frame this in the next 50 to 100 years. So I think without AI, our chances of blowing ourselves up are maybe 1 or 2%, and I would say 2%. And with AI, I feel like it’s about half that. So 1%.

Liron 00:05:20
Wow.

Liron 00:05:24
Okay. So to contrast with mine—without AI, I still think nuclear and bio, given a whole century, is at least 20%. Probably higher, but probably lower than 50%. With AI, my own P(Doom) goes to 50% by 2050.

So it’s good we’re debating because I’m 50%, you’re 1%. That’s quite a large difference. So my goal would be, by the end of this conversation, you’ll be like, you know what, Liron, you’re right. It actually is clearly at least 5 or 10%. I don’t know why I said 1%. And then conversely, I think you would like me to walk out being like, you know what? P(Doom) is lower than 50%. Is that fair to say?

Michael 00:06:02
Yeah. Or I suppose my objective would be more just to sort of talk about the stuff and get a better understanding of each other’s positions.

Liron 00:06:10
Likewise. Yeah. Well, that’s what I always say when people ask me, why do I do the show? Because people never change their mind. I rarely change my mind about doom. My guests almost never change their mind. One time, a guest actually updated a few percent up. That was my biggest triumph.

So why do I even do the show? It’s exactly like you said. It’s useful to just illuminate where people stand, because even just spelling out a position is so complex. And usually people arguing never even get to that point where it’s like, oh, this is my position. This is your position. That’s interesting. So at least getting to that point, I think, is worthwhile.

Michael 00:06:38
Yeah, yeah. And the P(Doom) thing is just numbers that I kind of made up based on gut intuition. It’s not like I’m sitting here coming up with a model and predicting the exact percentage.

Liron 00:06:54
Yeah, but you must have quite a strong gut to say that it’s only 1%, 99 to 1 odds. I mean, those are very confident betting odds. You really don’t think we’re doomed? Correct.

Michael 00:07:04
Yeah, I feel pretty optimistic. And when I say doomed here, I mean wiped out.

Liron 00:07:08
Yeah. Everybody dying. Correct.

Michael 00:07:09
There’s always catastrophes.

Liron 00:07:11
There’s always catastrophes, but we recover rather than wiping everybody out. Is your position.

Michael 00:07:15
Yeah.

Liron 00:07:16
To get some color on this 1%. Would you sign the famous statement on AI risk from 2023? It says mitigating the risk of extinction from AI should be a global priority alongside other societal scale risks such as pandemics and nuclear war.

Michael 00:07:30
I wouldn’t sign it. But that’s because I think it is mostly going to be used for more regulatory capture and that sort of thing.

Liron 00:07:42
Okay, but if we’re not playing—let’s not play chess here. Let’s say naively, just—you agree with the statement.

Michael 00:07:47
I mean, it should be treated as something that—yeah, if I’m talking about viruses and things that go horribly wrong, I don’t think we’re near the stage where we have to worry about it on the level of nuclear weapons.

But making something that—I mean, it would be very dumb to do something like the—you know about the dead man switch, Dead Hand. I forget what it’s called. The Soviets built this thing.

Yeah, I mean that was a very basic example of an intelligent system that is basically just a trigger that will end the world if certain conditions are met. Which is incredibly stupid to set that up.

Liron 00:08:30
I mean, I don’t see the Dead Hand as having much intelligence to it. It’s kind of a pretty simple trigger, as you say. But what’s salient to me about it is that it makes doom causally near. The linkages in the causal web between life today and full extinction—it’s just a few small causal links now because this machine exists.

Michael 00:08:52
Well, I mean, yeah, that’s an incredibly stupid decision somebody made at some point to set that machine up. And I guess when I look at that, I think the biggest problem there is just that we had—I mean, there’s only so many people who have access to that many nuclear weapons. It just happened to be that we had somebody who was very willing to set up a system that would end the world all on its own if certain conditions are met.

And a nuclear winter seems a much scarier prospect than anything else I can think of, aside from maybe an asteroid. So it does seem like an incredibly foolhardy thing to have done.

Liron 00:09:31
So I agree this is quite scary and I’m glad you have a healthy appreciation for this stuff. But to clarify your earlier probabilities, you’re basically saying, yeah, there’s a very significant risk that a nuke will go off, but at the end of the day, humanity will probably bounce back. And so that’s why you’re still putting your P(Doom) even without AI at 1 to 2%. Correct.

Michael 00:09:48
Yeah.

Liron 00:09:48
Have you read much of Eliezer Yudkowsky and LessWrong?

Michael 00:09:51
Not so much. Some things, read a bit about instrumental goals and the strong orthogonality thesis. I forget.

Liron 00:10:00
That’s right. Yeah. And I know that’s a topic that you’ve mentioned before. We’re definitely going to talk about that. That is a Yudkowsky topic as well. He didn’t invent it, but he’s written intelligently about it.

So I’m basically just representing the Yudkowskian position. I’m a proxy for Yudkowsky. I’m a stochastic parrot for Yudkowsky. And you’re representing your own position. You’re not really a proxy for anybody else. Correct. You’re not a proxy for Marcus Hutter. Are you a proxy for anybody?

Michael 00:10:23
Not that I can think of, no. There’s probably somebody who has similar opinions, but I seem to excel at being annoyingly far away from mainstream opinions.

Liron 00:10:36
We got an original thinker here. That’s always fun. Are you a Bayesian?

Michael 00:10:41
Yeah, I mean, you could say that to some extent. I don’t think in those terms, but I don’t disagree with them either.

Michael’s Thesis on Intelligence: “Abstraction Layers”, “Adaptation”, “Resource Efficiency”

Liron 00:10:51
All right, all right, let’s back up a bit. Let’s start with what is your central thesis? I know you just turned in your thesis. It’s called How to Build Conscious Machines. What is overall, throughout all your research, your central thesis? Or what is the thesis of your thesis?

Michael 00:11:07
The thesis of my thesis. Oh, can you give me an example of what you mean by that?

Liron 00:11:12
So, for example, Eliezer Yudkowsky’s central thesis is, hey, AI is going to be extremely powerful and disempower humanity. And it’s coming soon. That’s kind of his central message, I would say. Ken Stanley’s central message is—to put words in his mouth—divergence. The idea that you shouldn’t just try to seek a goal directly. It’s actually more powerful to explore undirectedly. That’s kind of the central Ken Stanley idea, in my opinion.

Michael 00:11:37
Okay, then I guess the central thesis of my thesis could be: it is not simplicity of form, but weakness of constraints on function that determine how well something survives or generalizes. And that this depends on the abstraction layer and that you can frame everything that exists as a stack of abstraction layers.

Because what is a rock? Why is a rock a rock? It’s because we classify it as a rock. Why do we classify it as a rock? Because we’re simplifying the world into these abstractions. And then what are we? What is the color red? This is why it delves into consciousness and all that other stuff. Because if you’re dealing with the basics of how we arrive at particular classifications of the world, you have to explain that in terms of why you would—for the sake of survival, explained in terms of natural selection, but on a grander cosmic scale—arrive at particular languages.

Liron 00:12:30
I mean, I know you got a lot of ideas here, but can you zoom out and just give me one sentence which is your main thesis?

Michael 00:12:36
Sure. Everything’s a stack of abstraction layers.

Liron 00:12:39
All right. Everything is a stack of abstraction layers. So I mean, that particular claim I think is well accepted. The world—yes. I mean, look, the human body is a stack of abstraction layers. Our computer is a stack of abstraction layers. Anything we build is a stack of abstraction layers. So I think that is uncontroversial. Maybe you could add to that to be a little more original.

Michael 00:13:01
You could think of each abstraction layer as the behavior of the layer below. So cells—their behavior can be an organ. The behavior of a set of organs can be an organism. The behavior of a collection of organisms can be an ecosystem. We can move the window at the lens of what we’re looking at. But ultimately what intelligence is concerned with is not so much what we do in the abstraction layer, but how we form the abstraction layer.

Liron 00:13:28
Okay, all right. So to put it in my own words, your grand unifying thesis—and it’s okay if you don’t have one—but potentially your grand unifying thesis is that intelligence is all about how to form the boundaries of abstraction layers.

Michael 00:13:43
Yeah. Or how we form the language with which we reason and infer and stuff, rather than the inference and so on itself.

Liron 00:13:54
Okay, how we infer. And I know the title of your thesis is How to Build Conscious Machines. So the idea is if we master how to separate abstraction layers or create abstraction layers, then we’ll be well on the way to building conscious machines.

Michael 00:14:07
Yeah, yeah. I mean it would definitely be a very important step.

Liron 00:14:11
Okay, interesting. Let me ask you this. What is intelligence?

Michael 00:14:16
Well, I argue it is the sample and energy efficiency of adaptation, which is based on Pei Wang’s definition of adaptability. Well, not based on—I received a grant for—I found his and I’m like, oh, his is better. But he had adaptation with limited resources and he’s got a whole 20 page article on that arguing.

Liron 00:14:37
Okay, so you’re saying it’s the resource efficiency and sample efficiency of adaptation, correct?

Michael 00:14:43
Yeah. And I’d just say samples are a resource. So resource efficiency of adaptation. Or just efficiency.

Liron 00:14:49
My own preferred definition is also a type of efficiency. I wouldn’t use the word adaptation. So adaptation is a key concept for you because it’s fundamental to intelligence. In your view, adapting to your environment is the core of intelligence.

Michael 00:15:05
So the reason I would say adaptation is because I need to get to sort of—if I’m dealing with a whole lot of abstraction layers, I need to explain where something like goal directed behavior would originate.

So the argument I put forward is that, well, whatever is at the bottom of the stack of abstraction layers, we have states. Because there must be states or changes or difference for there to be anything at all. Otherwise we just have one thing. If we have states, each state would—you could think of each state as representing a point of difference. Sort of like a structuralist—if you’re familiar with structuralism—a structuralist take on all possible worlds or realities.

Liron 00:15:41
I don’t think I’m familiar with structuralism.

Michael 00:15:43
I suppose structuralism is the idea that you can define everything by the relations between things. And then people will probably be more familiar with poststructuralism, which is the ideas put forward by some French philosophers in the late 20th century, like Derrida who said that you can’t capture everything by the differences between things. You need to sort of—no matter what you’re doing, you’re always deferring your interpretation or meaning to something else. And he called this différance—I can’t do the French accent.

But his argument was that structuralism will always be incomplete. And so I thought, well, he’s kind of got a point, but I like structuralism. So we can kind of combine the two by saying, well, we have this set of abstraction layers and we can define things by the relations between them, by having states. And we just have to say that we don’t know where the abstraction layers end.

Like if my mind is interpreted by my body, or software is interpreted by hardware, hardware is really just interpreted by physics. What’s below physics? Don’t know. We just assume the stack goes on forever. What can we say about all possible stacks?

Liron 00:16:46
To understand your term adaptation, to know what you mean by it. Let’s take a chess engine, right? Stockfish. Is it adapting when it plays chess really well? Or maybe you’re saying it’s not adapting, but you wouldn’t consider it intelligent. How do you manipulate the semantics there?

Michael 00:17:01
Okay, so I kind of went off topic there. I forgot we were talking about adaptation. Started talking about abstraction layers. So with the chess thing, can you explain a little bit more what you mean by that?

Liron 00:17:10
You claim that a good definition of intelligence is resource efficiency of adaptation. And I’m trying to apply that to the example of a chess engine, because certainly in the 80s and 90s, people were pretty impressed when AI could play chess, and they would have considered that intelligent at the time. It’s easier to dismiss today. So my question for you is, where do you draw your semantic boundaries? Is a chess engine not intelligent, or is a chess engine efficiently adapting itself? How would you describe it?

Michael 00:17:42
Okay, so if we give the chess engine all the rules of chess, it’s got all of the resources it needs to compute winning moves and things. But then if we take instead an extensive definition of chess—that is a set of games—and see how many games an engine needs to infer the rules of chess, which you can do by search as well, then the more—the fewer games it needs. All else being equal, assuming that we have supersets and we don’t have some special combination of particularly informative games.

If we just keep adding one game to the set, the engine that needs the fewest games is the most sample efficient. So this is what I mean by limited resources. And then I bring energy efficiency into that, because the formalism that I put together doesn’t really distinguish between sample and energy efficiency, since something that is more energetically efficient can persist in more possible worlds. So it sort of ends up being more generalizable just by being more energy efficient.

A more practical example of what I was just talking about—if we take something like AlphaGo, where we’ve got something—you think of it as having the rules of Go, and then it’s learning possible heuristics or winning combinations of moves to decide which combinations of moves are most likely to win. And it’s learning this by observing the behavior of Go players, or playing itself or whatever we choose—we can come up with ways to generate more data.

The point is that we’re trying to learn this heuristic. The less data I need to learn a heuristic that generalizes—in the sense of being able to identify winning combinations of moves more effectively—the faster I can adapt, the more efficiently I can adapt. Or, all else being equal, if I have a finite amount of resources, the more accurate I’ll be at the end.

Liron 00:19:32
Okay, so the way you’re defining the word intelligence, it’s defined in terms of this other word, sample efficiency. And I’m not convinced that you can always find this underlying context of a sample anytime you’re talking about a system being intelligent. I mean, in the case of chess, Deep Blue, when it beat Kasparov, it wasn’t really sampling other games, it was just a search tree with heuristics. Correct?

Michael 00:19:56
Yeah, no, that was just a case of we gave it all the resources it needed so it didn’t have to learn anything. So this notion of intelligence that I’m talking about is mostly about what is biological intelligence. If we already start with all of the pieces—if we give all the data on how to press a big red button—we can hit the limit of how good you can be at pressing a big red button pretty quick. All the information can be hard coded.

Liron 00:20:26
So it sounds like you’re saying playing chess at the level of IBM’s Deep Blue in 1996 to beat Kasparov—it’s kind of a trivial problem because we found an algorithm for it that doesn’t use this machine learning style sampling. And so, yeah, it beat Gary Kasparov, but it’s not intelligent by my definition, because there’s no sampling. Is that kind of where you’re going with this?

Michael 00:20:46
I would say—look, it’s not intelligence in the sense of a human learning, but it would be intelligent in the sense of a wasp with hard coded behavior constructing a nest. Certainly you could argue that is a necessary ingredient for intelligence—the ability to search through a space of possibilities, identifying which terms conflict with rules. But it is not the whole picture.

And if I say adaptation in general, and I’m talking in a formalism of an embodied organism, then any organism that can adapt in a sufficiently complex environment must be able to do the search. So it’s just the one condition implies the other.

Liron 00:21:31
Which condition implies which? Sorry.

Michael 00:21:32
If I demand adaptation in a sufficiently complex environment with limited resources, then the ability to reason—in the sense of being able to compare something to a set of rules and determine whether it conforms to them—is going to be necessary to adapt in those circumstances. Whereas if I have just the ability to reason, I’m not necessarily going to be adaptive.

Liron 00:21:57
Okay, so reasoning doesn’t necessarily let you adapt, but the ability to adapt means that you must be able to reason.

Michael 00:22:04
In a sufficiently complex—I mean, it’s just a more general condition, that’s all.

Liron 00:22:09
Do you have a good definition of adaptation? Because it’s—now adaptation is this really important thing to define in order to understand your version of intelligence.

Michael 00:22:17
Sure. So in the formalism that I’ve proposed, we have a formalism of all conceivable environments. Every aspect of—as I was talking about—a set of states. Every conceivable environment must have states. And each state could represent a point of difference or change. So you could think of this as arguing that time is difference.

And as states transition from one to another, some structures are preserved and some structures are deleted. And so I define adaptation very loosely—or just persistence—as those structures which are preserved, which structures are most likely to preserve or those that preserve themselves. So there’s this notion of extension, some formal details.

But the end result is that instead of formalizing goals and the means of their satisfaction as separate entities, this approach necessitates formalizing them together as an embodied task. That is to say, an aspect of the environment could be understood as having persisted if it allows for more—it is a constraint that constrains us to a greater number of possible worlds.

If I raise my arm up, I’m constraining the possible worlds that follow. If I raise both arms up, I’m imposing a stronger condition on what possible worlds must follow.

Liron 00:23:41
Let me try to understand that a little bit better. So you have a metric for the action space of this AI or—what’s the key concept here that we’re measuring?

Michael 00:23:51
So before we get to actions or anything like that, we’ve just got this structuralist notion of all possible—all conceivable universes defined in terms of difference or change. And we’ve got the idea of some form of embodiment or an abstraction layer as a subset of states.

Which is to say, if I have a set of states that represent a conceivable universe, then a rock is the set of states in which it persists. I don’t need a specific notion of a rock. Likewise, I don’t need a specific notion of an action yet. But rather I want to explain how we get to concepts like actions by explaining everything.

Liron 00:24:32
You say you don’t need a specific concept of a rock. Is it because you’re saying the set of universes that contain the rock is the rock.

Michael 00:24:39
It is more like we’re assuming we’ve got a particular universe, whatever it is, and we have a set of states of that universe in which the rock persists. The rock exists in those states. And we don’t know that—we’re just pointing one out for the sake of saying, here is a rock.

Liron 00:24:54
Oh. So the idea is, we don’t really know what it means to draw a boundary around the rock. And you’re basically saying, yeah, don’t worry about how to do a boundary. Just take the set of all universes where you’re observing the rock in them.

Michael 00:25:03
Yeah. Or the set of all states of a particular universe.

Liron 00:25:05
But yes, okay, sure. And so where does the metric come in that you were talking about? Your metric was talking about how limited the outcomes are or—let’s dig into this. Help me understand.

Michael 00:25:16
Yeah, so if I’m talking about adaptation in a very general sense—not necessarily adaptation in the biological natural selection sense, but just an object persists through more changes in the universe—then it is more adaptable in this very general sense.

Debate: Is Einstein Smarter Than a Rock?

Liron 00:25:36
So does a rock count as very adaptable or unadaptable?

Michael 00:25:40
So it could count as very adaptable by this very general notion of adaptation. Because a rock is—well, it doesn’t get deleted very often.

Liron 00:25:52
Isn’t that a bad sign for your definition of intelligence, though, because the rock is not really looking at any samples and now you’re telling me it’s highly adaptable. So the rock is a super genius.

Michael 00:26:01
So this ties in with some—it builds up to this idea that—well, the rock. Let me dial back for a second. So I got this paper called “What the Fuck is Artificial General Intelligence?” that is talking about how we formalize general intelligence, specifically talking about things like AIXI.

Liron 00:26:21
Yeah. And in your PhD thesis, that is one of your chapter headings. You got the F word in there.

Michael 00:26:26
Yeah, yeah, I couldn’t resist. Also, it’s the general sentiment that I have encountered from a lot of people. If I say I’m studying AI or AGI, people’s reaction a lot of the time is, “What the fuck are you doing? What is this exactly?”

Liron 00:26:44
So that’s what I’m asking you now, right? I’m asking you to define intelligence or, I guess if you must, artificial general intelligence. However you want to build it up.

Michael 00:26:51
Yeah. So if I’m to go the framing and just—what is AGI and what do people mean by that—go through some of the different notions of intelligence. That is sort of—

Liron 00:27:04
I don’t want you to survey the different notions because there’s a lot. I’d rather you just pick your preferred useful definition.

Michael 00:27:10
Okay, yeah. So adaptation I arrive at. And the reason I say that it’s okay for a rock to be adaptable is because I then look at—there’s three. After we’ve built a system that interpolates a set of points either through search or approximation, whatever we choose as a basic tool, we can then look at how we can optimize it further.

One way to do that is to scale up resources—more data, more compute, whatever we like. Then I call that scale maxing. Another thing to do is to maximize the simplicity of the explanations it comes up with. Regularization would fall into this category, as would AIXI, as would minimum description length principle. And I call that simplicity maxing or simp maxing.

And the last one is the weakness of constraints and function. And so the main result of my thesis is talking about simplicity versus weakness of function, the simplicity of form or concision versus weakness of constraints on function.

Liron 00:28:15
Okay, so let’s do the rock example, right? I don’t get why the rock isn’t a super genius.

Michael 00:28:19
Okay, so the rock is—going back to some correlation between simplicity, form and weakness, and constraints and function. The rock is generally persisting by being simple and as a result kind of falling into a—you could think of it as a valley of fairly weak functionality.

Liron 00:28:40
Okay, but the valley of weak functionality is also what you’re calling really good adaptation. Because look, think about how good this adaptation is. You can scramble all of its atoms around, right? There’s so many permutations of where you can move the rock’s atoms and it’s still a rock. So isn’t that highly adaptive? Isn’t that highly intelligent by your definition?

Michael 00:28:56
Yeah, yeah it is. But the rock is able to do that because it’s very simple. And so this correlation between simplicity and—you can think of the rock as something that persists through simp maxing and as a result maximizes the weakness of constraints of function.

Something that is alive—say, a cell—is able to consider two things. One thing self repairs and one thing doesn’t. The thing which self repairs increases its complexity, but is able to persist. It is able to maximize the weakness of constraints on function more effectively than the thing which doesn’t self repair, right?

Liron 00:29:35
But the rock doesn’t even need to self repair, because you can—it can take a lot of damage and I can even define a group of pebbles as my system. And that way if I hit it with a hammer, it’s still a group of pebbles, right? So yeah, why is Einstein smarter than a rock in your view?

Michael 00:29:51
And so that is the bit where I would say that something which is alive is something which maximizes the weakness of constraints on function—W maxes—whilst doing the opposite of simp maxing. So you’ve got it anti correlated versus correlating.

You could think of the environment in which we exist—that is, our nice stable environment—as something that is an abstraction layer and that we are in a very tall stack of abstraction layers based on cells and things which persist. Because we’ve got this very nice stable environment that persists, allows these complex things to kind of exist.

That is to say, it’s not—this whole stack is a very fragile thing, but we’re right at the top of it. And within the confines of our nice safe abstraction layer, right at the top of this stack, we are very adaptable.

Liron 00:30:47
Okay, but a rock is also, by your definition, very adaptable. So how do I know for sure that Einstein is higher on your intelligence metric than the rock? What if they’re both geniuses?

Michael 00:30:59
I mean, the rock is not really doing anything, right? So this adaptation here is just about survival. So a rock is always going to be good according to that. It’s just not going to be—I suppose if we’re talking about the limited resources thing, right?

If I want the rock to do the same thing a human’s doing, it’s not going to be able to, right? It is not adaptable within the context where—it’s not able to do this huge variety of behavior.

Put another way, you could think of the rock as having hard coded behavior which is to do nothing. You could think of something that’s got hard coded behavior like a simple wasp.

Liron 00:31:38
But I feel like you’re changing the definition though, right? Because you already gave me a definition that intelligence is efficiency of adaptation. You’ve already defined adaptation in a way that you said the rock has high adaptation. So now I feel like you’re introducing another concept. But you’ve already told me a definition of intelligence. So I feel like you’re being inconsistent.

Michael 00:31:56
No, it’s just earlier on—you remember when I said the stuff about assuming we’ve got a superset, not just the—not just adding some perfect set of very informative examples. And I was—say I’ve got two particular examples of a game and I add a third example. I’m more intelligent if I can learn it off the two examples than off the three examples, whatever this game is now. This is assuming that I’ve sort of got all else being equal.

If I’m talking about adaptation in the context of a human or a set of humans, I can talk about there being a wider variety of behavior. And so I’m adapting more efficiently if I require less to do that. If I have a rock and a human, they’re just—they’ve got a whole different set of functionality. There’s not—it’s comparing apples and oranges.

Liron 00:32:49
But can’t you compare them both by your definition of adaptation? Because you said that your concept of adaptation was grounded in universe states. So the rock has a set of universe states that constitute it adapting. So it’s okay that we can’t talk about the rock looking at samples because you just had a definition in terms of universe states. And it seems to be scoring high. And to me this is a red flag that you’re saying a rock is highly adaptable and adaptation is correlated to intelligence.

Michael 00:33:17
I mean, I’m talking about adaptation within a particular context. The whole point of taking into account embodiment and all these sorts of things is that we have a context and we can measure intelligence in that.

And so I’ve got this paper on measuring intelligence in general and getting an upper bound on adaptability. But this assumes we have some particular notion of—we could have a disembodied notion of a goal for the sake of determining this upper bound. And then if we have that, we can measure how much you can adapt while still satisfying that.

If you don’t have any minimum standard for functionality, then the most adaptable thing possible is just a sort of nothingness or—a rock is very adaptable if we don’t have any minimum standard of functionality.

Liron 00:34:05
Yeah. So when you said your definition of intelligence is resource efficiency of adaptation, maybe there’s another parameter you need to add to your definition where you say resource efficiency of adaptation with a goal. With a complex goal or something. In order to complete the definition.

Michael 00:34:21
Yeah. So, I mean, I guess I’m skipping over that because I’ve spent too long staring at the formalism. But you know how I was saying goals and the means of their satisfaction are kind of unified. And so I’m thinking of everything in terms of tasks.

And to get to this notion of tasks—I could appeal to the state changes of the universe as deleting some things and preserving others, as that’s where we get the original ought from which we can derive all other oughts. But in general, as a human being, we’re so far down that line that we’ve now got our own very particular definitions of what ought to be.

Liron 00:34:57
Yeah, and can you repeat that? How do we get that most fundamental ought? That sounds interesting.

Michael 00:35:01
Yeah. So if we’re going to try and give a naturalist explanation of where we get all value judgments—this is where I deviate from Ken Stanley stuff. He was saying that an explicit purpose—I guess I kind of get what he was saying, but I would just—I mean, I’m being pedantic here—but I would argue everything is goal directed. And that goal directedness comes from the very fact of existence.

That is to say that when we have a universe gone from one state to another, it is deleting some things, preserving others. And so that is kind of a value judgment. It’s saying that something should exist, some things shouldn’t. And just by doing that, we end up with—we create an incentive for goal directed behavior. That is to say that the universe will preserve things which preserve themselves.

Liron 00:35:49
Okay, all right, all right. So I know I’ve asked about a different topic, but I’m just trying to recap what I’ve learned about your view of intelligence. So first of all, just a sanity check here. In your opinion, is Einstein much smarter than a rock?

Michael 00:36:04
Yes, but this is because of my particular human value judgment.

Liron 00:36:10
It’s a human value judgment. So there’s some other entity that could judge the rock as smarter than Einstein.

Michael 00:36:15
Yeah, Einstein is less persistent than most rocks. So if we’re to judge things outside of our particular abstraction layer, outside of the particular incentives we’ve evolved, and the value judgments we make based on those—if we zoom back out of the whole process from which we originated, back to just things that persist and things which don’t—Einstein did not do a great job of persisting.

Liron 00:36:41
Okay, so persisting yourself is part of a criterion of intelligence that some observer might have.

Michael 00:36:47
Yeah, so if we’re to—I mean, this is the argument of where we would get some value judgment by which we could judge intelligence. We must have some ought from which we can derive other oughts.

Liron 00:36:57
Okay, so it really sounds to me like your definition of intelligence does have this goal parameter where the definition changes with respect to different goals.

Michael 00:37:08
I think I’m resistant to that particular characterization because it separates the goal from the means of its satisfaction. That is to say—if I have a human body, it is implicitly goal directed just by being able to perform certain tasks. It narrows down the scope of behavior dramatically. Any particular embodiment does.

Liron 00:37:33
Okay, here’s an example. Let’s say I’m going to introduce you to a new environment, like a new video game, or just a new challenge of any kind. And you get to pick either Einstein or a rock to take along the journey as your teammate. Isn’t it just always Einstein as the better pick?

Michael 00:37:49
I mean, yeah, because I’m doing human-like things and Einstein has similar use cases to me and similar goals. And the goal thing you’re talking about—this is what I mean by value judgment. Where do we get a goal from? And that’s all I was trying to explain with the original ought from which we can derive oughts. Where do we get a goal from?

Liron 00:38:14
So when you’re the—what you said though—you said some other agent could look at Einstein and the rock and be like, oh yeah, the rock is more intelligent. Do you really think that? Can’t we just all agree as a universe that Einstein has intelligence in a way that the rock doesn’t?

Michael 00:38:27
Well, in a way that the rock doesn’t. Sure.

Liron 00:38:29
Or just has more. There is some quantifiable thing where clearly Einstein has more of it than the rock. And even an alien would agree.

Michael 00:38:38
I could get lost in pedantry here, but I’m just going to agree because it’s close enough. For the sake of argument, it doesn’t—yeah, I guess that works. Why not?

Liron 00:38:50
Okay, sounds good, sounds good.

Michael 00:38:51
But I’m gonna—I do wanna caveat that it’s a very human notion of intelligence.

“Embodiment”: Michael’s Unconventional Computation Theory vs Standard Computation

Liron 00:38:56
So you did touch on this a little bit in our current discussion. And this is, I think, a pretty big concept that you talk about in your papers—substrate dependence. You’re big on substrate dependence.

Michael 00:39:07
There would be people who would say that what I mean by substrate dependence is not what they mean. I guess when I’m talking about abstraction layers and implicitly accounting for substrate dependence and then saying that there is a particular embodiment—I’m not saying that there’s something inherent about biological systems that’s special compared to computers.

I’m more saying what is the information processing architecture we have, what are we working with, and what is the practical result of that. So if I have something running on a single thread CPU versus a cluster of GPUs, it’s going to, from an outside perspective, be a very different result. Even if I recognize the same set of symbols or abstractions at the end of it, one of them consumes different resources.

Liron 00:39:56
Okay. In the example, the reason I’m big on substrate independence is because I see the piece of meat inside my head doing things that seem very similar to the things that silicon is doing. For example, before we had Google Maps and MapQuest, I would take out a physical map and I would plot a route on the map using the same reasoning considerations that this algorithm in the computer is doing. So isn’t the map plotting task substrate independent?

Michael 00:40:21
I would say that the task isn’t so much independent as the value we derive from it is the same. We see the same result, we get the same output that we want. We have built a machine that does something and produces the desired result, whether it’s running a cluster of GPUs or single thread CPU or a human brain—result’s the same. But the resources required, the practical physical requirements are very different.

And I suppose if we want to do something like explain how we got to a particular language, you could think of each of these embodiments as a language. So it’s much like how we can swap out, go between different Turing machines and get the same result, but the notion of complexity will be different.

Liron 00:41:09
So I gave you a motivating example of why I think substrate independence is generally the true useful way to model things. You must have some example scenario where substrate dependence really shines, right? So can we do an example?

Michael 00:41:23
Well, again, I really don’t want to use the word substrate dependence because there are some other connotations there. But let’s say—let’s call it embodiment. Why embodiment matters.

And the classic example would be AIXI. AIXI is an example of—the complexity problem with AIXI and the way its performance is subjective is kind of indicative of a broader problem with how we conceive of artificial intelligence. Which is we often conceive of it as an effort to produce software intelligence rather than an effort to produce intelligent systems.

And I call this computational dualism because it resembles the sort of 16th century notion of Cartesian dualism where Descartes had this idea—he was trying to explain consciousness and the mind interacting with the world according to church doctrine. So he said, we have a mental substance and we have a physical substance, and there the two shall meet, except through the pineal gland in the human brain.

Liron 00:42:21
That seems confusing to call it dualism because the belief—typically these people who have the belief that you call dualism, like me, are hardcore reductionist materialists, which usually is the opposite of being a dualist. The way people normally use dualism. So I feel like you’re choosing some confusing terminology here.

Michael 00:42:38
Yeah. So I called it specifically computational dualism to highlight—and this is because I went through a whole lot of papers trying to explain the thing again and again and again, and people kept just scratching their head and going, I don’t get it.

And so the computational dualism was sort of—I read this Geoffrey Hinton thing on Immortal versus Mortal Computations and I got annoyed because there were a lot of things about that that annoyed me. But the end result was—what is it that is really annoying me about the way we’re talking about artificial intelligence? It’s that we keep making the same mistake, we keep repeating the same set of errors. We keep treating form as function without accounting for how one gets to the other.

So AIXI is a good example of the kind of problem you can get with computational dualism. And when I say computational dualism, I just mean treating intelligence as a matter of software rather than software and hardware or software, hardware, and the environment—taking into account the larger stack.

Liron 00:43:41
Okay, do you have an example that’s simpler than AIXI? Is there a mundane example?

Michael 00:43:47
Yeah, I mean, the language one is the simplest one I could think of. Whatever a word means depends on what I interpret it to mean. So if I define intelligence in software terms, I can always ruin whatever claim I make by changing the interpretation.

All I’m saying is that we have to have a fixed interpretation relative to an environment before we can make claims about what is intelligent. And so the point of what I was doing was not to say we can’t build AI. The point was just to say, how do we formalize the embodiment—that hardware, to be exact—and establish an upper bound on intelligence in those terms rather than in software terms.

Liron 00:44:29
All right. I mean, let’s put a pin in that because I think it might come into play with our later discussion. I’m just still not seeing what belief of mine needs to change. But we’ll get back to this. Is computation itself in your mind substrate dependent, or can you just run algorithms on lots of different substrates? And they are very much the same.

Michael 00:44:46
Algorithm you can run things on—you can achieve the same result by many different means. I suppose in my formalism, I don’t even make a distinction between substances. So I just say there’s an abstraction layer and we’re working within that abstraction layer, whatever that abstraction layer is.

Liron 00:45:04
Okay, so you agree that computation can just be implemented on lots of different physical substrates? And we’re talking about Turing Universal. Computation can run on water flowing through tubes. It could run on neurons, transistors, relays. There’s a lot of different substrates that all implement the same computations. I mean, this is a standard belief, right? So this is not where you object, or is it?

Michael 00:45:26
No, it’s not what I object to. Only when I’m talking about computation, I’m talking about a much more general notion than just Turing computation. So I’m talking about—have you heard of pancomputationalism?

Liron 00:45:40
No, but I guess I can guess what it means.

Michael 00:45:42
Yeah. Have a guess.

Liron 00:45:44
Everything is computing. The rock is computing.

Michael 00:45:46
Yeah, everything is computer. Wow.

Liron 00:45:49
Right. That was one of your chapter writings. So are you a pancomputationalist? You really think everything is computing?

Michael 00:45:54
Well, it depends what we define as computing. If we say cause and effect is computation. If we say that one thing is ticking over to the other, okay—

Liron 00:46:03
The universe is always computing in the background. I’ll give you that. I’ll give you that. But that’s saying in Minecraft everything is computing because Minecraft is a computer. But there’s a lot of bricks in Minecraft that are just bricks. And that’s very different from building a computer in Minecraft.

Michael 00:46:15
Yeah. So it’s saying that I have a value in memory or something like that. I’m not saying that the universe is anything like a Turing machine. I’m just saying that we have cause and effect. Or at least we have observable mechanistic relations of which we seem to be a part. We don’t know what’s going on beneath the abstraction layers we can observe. We only know that something is computing something. And by this notion of computing, I just mean cause and effect, or one thing ticking over to the other.

Liron 00:46:49
Okay, yeah, fair enough, fair enough. But you’re not willing to just say yes on the standard notion of substrate independent computation. You’re not willing to just say different physical implementations of the same algorithm are a useful mental model. No.

Michael 00:47:02
Oh, no, no. It is a useful mental model. It’s great. It’s just that I suppose the whole thing with abstraction layers is—a particular Turing machine is a particular abstraction layer. So in that abstraction layer, we could have a level of abstraction and get the same result. One plus one equals two and change the Turing machine and get the same result. And we could have any number of Turing machine implementations and get the same result. But beneath that, on the abstraction layer beneath, we’re changing what the thing is that it’s running on.

Liron 00:47:34
Yeah, of course, I know. That’s what I’m saying. Implementing things. That’s the whole point of what it means to implement something on a different substrate—the substrate is different.

Michael 00:47:42
So the point I’m just getting at is just that if I’m talking about that infinite stack of abstraction layers and talking about how abstraction layers are formed one on top of another, then this process of choosing different Turing machines matters, especially if we’re talking about things like program complexity or weakness of constraints on function. That’s all what I’m getting at. That’s why I’m sort of umming and arring about substrate independent.

Liron 00:48:06
It’s true that some substrates will have different constant overheads for different operations. I think that’s what you’re getting at.

Michael 00:48:12
Yeah. And since I’m talking about computation on a universal scale, we’re talking about a different computation in the sense that I’m talking about. Although in the abstraction layer itself, we might—two Turing machines will produce the same result kind of thing.

“W-Maxing”: Michael’s Intelligence Framework vs. a Goal-Oriented Framework

Liron 00:48:26
All right, let’s talk about W-maxing.

This is another key concept that you’ve introduced. All right, explain that.

Michael 00:48:32
All right, think of it as—the example I keep using in talks is when I raise up my arm or occupy some bodily state, I’m constraining my possible worlds in the sense that there are only so many possible worlds that follow from that state. So if I’m trying to satisfy a goal—and this is, I’m using the term goal here for anyone who’s read my thesis and the long rants about goals and the means of dissatisfaction, I’m just using it here for the sake of communication.

So we’ve got a task we want to complete, and that task can be understood as a goal together with a set of tools that might satisfy that goal. If I occupy a particular bodily state—rip off my leg, for example—my ability to satisfy that goal is constrained. I now have fewer possible worlds in which I can satisfy that goal. I constrain myself further and further.

If I am not aware of what my particular goal is—say I’m learning a new thing, I have some examples, examples of chess or whatever, but I don’t know what the actual goal or task is I’m trying to complete. I only have my extensive definition of it. Then my optimal strategy to complete this unknown goal is to constrain myself as weakly as possible. You can think of this like a lawyer who can say many things but commit to nothing.

Liron 00:49:55
Okay, so is W-maxing optionality maxing? That’s a pretty well known concept.

Michael 00:50:01
Yeah, you could think of it like that, but taking into account the embodiment. So this formal representation of an abstraction layer, and we could argue we have an embodied formal language within that. And a statement in this embodied formal language can be weaker or stronger. That’s where the term weakness comes from. This—because it was talking about—so if we have this embodied form of logic, what is a stronger statement? What is a weaker statement? How does it constrain things?

Liron 00:50:28
Does the weaker statement achieve more W-maxing because it’s consistent with more worlds?

Michael 00:50:33
Yes. The W-maxing just means weakness maximum.

Liron 00:50:36
Yeah. Well, a lot of people come on this show and they make very vague statements. So it sounds like their debate strategy is to W-max. So then I can’t pin them down on what they mean and they have a higher shot of claiming that they’re right.

Michael 00:50:47
Yeah, yeah, that’s the lawyer example I was given.

Liron 00:50:50
Oh, I see what you’re saying. Yeah, same example.

Michael 00:50:52
Yeah. So I guess the point of a lot of—my thesis has a lot of words in it, which was one criticism Hutter leveled at it. Too many words. So I built these math definitions and there is a concrete, exact, rigorous, formal definition which has been subject to a lot of refinement and criticism over the years. So I’m confident in its rigorousness and soundness now.

Liron 00:51:17
And do you think it’s intimately connected with intelligence? Because if something’s very adaptable, then it’s just not getting itself into a box. It always has that optionality to make more moves. So you think W-maxing is intimately linked with your definition of intelligence? Correct.

Michael 00:51:31
Oh, it’s just—I can show that if you want to maximize sample efficiency and energy efficiency of learning, then W-maxing is necessary and sufficient to do so within the context of an abstraction layer. Then if I want to W-max beyond that, I need to be able to shift the abstraction layer.

Liron 00:51:48
Okay. In the specific example of when you lose your leg, now you’re lowering W. So you’re kind of W-minning. You don’t want to lose your leg. But don’t you think that Elon Musk without a leg is still going to have more options of what he can achieve just by delegating and managing than most people with a leg? So how are we going to compare that? How are we going to measure that?

Michael 00:52:10
So that is—you have just hit the nail on the head with enactive versus embodied notions of cognition. Are you familiar with enactivism? That kind of thing?

Liron 00:52:19
The word—you’re not active—inactive and then inactivism.

Michael 00:52:23
No, no, no. Enactive, like E-N. Enactivism.

Liron 00:52:27
Enactivism. No. Okay, explain that.

Michael 00:52:29
All right, so I’m going to give a quick rundown of some concepts from cognitive science. AI has typically used this idea of an agent and an environment interacting. And you could call this a kind of dualist notion, as I have argued, and other people have called this dualist.

And there’s plenty of people wandering around saying we need to take into account embodiment in artificial intelligence, but they’re not very rigorous about how they do that. And in cognitive science and philosophy, people have been debating a lot of these ideas in fairly wordy terms for a long time.

And the idea is, well, if my mind depends on my body to interpret it and act upon things in the world, then I need to account for the body to make judgments about how intelligent I am. Likewise, my body is embedded in a particular location in an environment—there are certain resources around me, and I also can extend my memory and computation into the world.

Liron 00:53:29
Okay, I hear you, but Elon Musk—does the embodiment of Elon Musk really matter that much? For practical purposes of judging how intelligent he is?

Michael 00:53:37
Yes. And this is because—this is why the enactive part, not just embodiment, but when you extend your cognition into the environment and enact it through the environment, you’re taking advantage of a whole set of resources that are not just within your body, but without it. The whole context of your existence matters.

Liron 00:53:56
Yeah, but in the specific case of him losing a leg—if we’re trying to measure his intelligence and he loses a leg, doesn’t he still end up way higher than most humans with a leg?

Michael 00:54:04
I’m just—

Liron 00:54:04
I’m just curious. How can you intuitively explain that metric to me?

Michael 00:54:09
So I’m agreeing with you, I guess, is—and I’m just saying this is why the enactive stuff matters.

Liron 00:54:16
So when Elon loses a leg, he’s lowering his W. But there’s some way that we can know that his W was really, really high to start with. And we can attribute that entirely to his brain, correct? Or does it also attribute to his gut? How do you attribute it?

Michael 00:54:29
It’s gone in the opposite direction. So it’s not just his body, it’s all the things outside of his body, his position. You could think of Elon as an intelligence, but you can also think of the society in which he exists as an intelligence. It’s a swarm intelligence or a liquid brain.

Elon is simply a component within this liquid brain. And you could think of him as a node in this network that is able to move resources.

Liron 00:54:58
Do we have to talk about liquid brains in order to finish talking about W-maxing and intelligence? Are they intimately linked or are they kind of—

Michael 00:55:05
No, no, no. We could—that’s separate. That’s just an interesting thing.

Liron 00:55:08
Okay.

Michael 00:55:09
If we want to talk about Elon without a leg versus Elon with a leg, obviously he is more constrained, but he is still got a large amount of resources. So he’s done the scale maxing thing. There’s a lot going on there that allows him to still be fairly versatile and adaptable within the context that he currently exists.

Liron 00:55:33
And you wouldn’t agree with me that you can localize his intelligence to his brain?

Michael 00:55:39
No, I think his intelligence involves a lot more than just his brain. I mean, his brain is very important, obviously—take away the brain, that’s not going to go so well. But there’s a lot more going on than just his brain.

Liron 00:55:55
Okay. Compared to a rock. If you’re asking why is he more intelligent than a rock? Isn’t a good short answer to that: because his brain can compute more actions for more contexts.

Michael 00:56:08
I kind of get what you’re getting at. Look, if I were to put a boundary around Elon’s intelligence, it would be goal dependent. So—or just—this is why I talk about goals and the means of dissatisfaction as a unified thing. If we have a separate system from the goal, then understanding its boundary becomes convoluted.

If we want to just talk about Elon’s brain by itself doing some stuff—continuing to live is difficult for the brain alone. It depends where we set the goalposts. What is the goal that we’re concerned with if we’re going to talk about where the intelligence lies?

So, all else being equal, Elon’s ability to fly to achieve the Mars mission is not particularly impacted by whether or not he has his right leg. He’s still going to be pretty capable at that. And I guess we just constrain the number of tasks overall that Elon as a system embedded in the world is able to complete by removing his leg.

Liron 00:57:10
So the reason I keep bringing up the leg—when you used it as your example of minimizing W, that’s fine as a test example, but when we’re talking about intelligence, aren’t we just so far beyond the point of talking about whether you have a leg when we talk about how intelligent you are?

Michael 00:57:26
Well, the reason I talk about the leg—and this is kind of a subtle point—but when I talked about the liquid brain, the point I was trying to get across is that Elon by himself can be seen as an intelligent system. Parts of Elon can be seen as intelligent systems. And Elon can be seen as a part of an intelligent system.

It depends on where we want to sort of look at a particular task or goal—that we draw the boundary of the system.

Liron 00:57:59
Okay, but the brain seems like a very natural boundary. No, out of all the boundaries you can draw, I definitely would draw one around the brain. Personally.

Michael 00:58:07
Oh, I mean, a lot of people like the brain. I am more of a fan—because I want to, because I’m concerned with the performance of some theorized superintelligence. That’s why I am so insistent on taking into account embodiment, because otherwise we just end up with the same problem that we have with AIXI.

If we want to make any claims at all, we need to take into account embodiment. But if we just want to talk about intelligence in the human context and what seems to be a very important organ for the purpose of intelligence, the brain certainly seems like the number one candidate. But then we’ve got people like Mike Levin who are characterizing intelligence as a more general thing. I find these arguments pretty compelling.

Liron 00:58:52
Yeah, well, I feel like I strongly disagree with Michael Levin thinking that the body is contributing that much to intelligence.

Michael 00:58:58
I mean, I can’t say that I agree with him on everything, but I feel like I find his arguments pretty compelling because if we’re going to define intelligence in general—and that is a lot of what I want to do, then a lot of what I have done in my papers is just about what is general intelligence—and hence this very general notion of adaptation with limited resources.

If we want to define it in a specifically human context, then most things which are intelligent according to this general definition don’t stack up.

Liron 00:59:27
Well, we certainly plumbed the semantic depths. I’ve given your definitions a run for their money, seen how you draw different distinctions. Let me zoom out. I think a lot of the viewers will be happy to back out of this rabbit hole and get back into the fields and just look at the high level structure of my P(Doom) argument. My 50% versus your 1%.

Michael 00:59:47
Sure. And if I get stuck in a rabbit hole, please feel free to pull me out of it because I do waffle on a bit.

Debating AI Doom

Liron 00:59:53
All right, will do. Here’s the whole high level argument from 10,000 feet. I think that there’s a lot of headroom above human intelligence. So I think the same quality that lets us put the tiger in the cage instead of living in the cage while the tiger holds the key—we’re the ones outside the cage. That same quality is actually able to be had to a much higher degree by systems that we are currently building, that’ll probably come about in the next few years.

And I also think that we don’t have a good way to control those systems for the long term or get them to do what we want. And I think the end result of combining these claims is—okay, well now the world belongs to this other system that we’ve lost control of and it had some bugs or it had some properties that we don’t like, and now those are permanently baked in and we have zero power over the future.

It’s like by 2050, in my opinion, there’s about a 50% chance that you and I will just have zero causal effect or zero determinative effect on the future of the universe. And AIs will have all the determinative effect. They will steer the future. And because of these glitches, they probably won’t even let us live comfortably. They’ll probably just have some other optimization goal that looks a lot more like evolution maxing—just copying themselves. Whoever copies the most wins. Basically speed up evolution. This big cancer, this gray goo that takes over everything. We’re left behind.

This is just my mainline future scenario. I don’t see where this scenario breaks. It’s like this slippery slope where I don’t see where we stop and take a breath or get rescued. So now I’ve laid out the entire argument so you can go ahead and pick where you think is the weakest link. You probably think there are many weak links, so just pick the weakest and go for it.

AI Risks Interview

Intelligence Headroom and Task Dependence

Michael Timothy Bennett 01:01:31

I’m not hostile to this sort of perspective. I just think it’s missing some key ingredients. I want to caveat that I think this whole alignment problem is the same problem as training an AI at all. We’re trying to get a system to exhibit some desirable behavior, whatever that behavior is. So I don’t actually make a distinction between teaching an AI something and aligning an AI.

With the headroom for intelligence thing—this is task dependent. If my job is to drive a car, me being more intelligent is not going to make me better at driving the car. But if my task is to play the financial markets, being more intelligent might help me a lot. Or it might also make me worse in the sense that I get worse at anticipating the particular preferences of others.

Liron Shapira 01:02:32

This is a significant disagreement. You sound like Robin Hanson—this idea that there’s just all these specific skills and culture just diffuses the skills around. But it’s not like there’s this one huge intelligence scaler.

Michael 01:02:44

No, no, no, no. Seriously, my thesis is on general intelligence. I love the idea of general intelligence.

Liron 01:02:52

Okay, all right. So you do believe in general intelligence.

Michael 01:02:54

Yeah. But what we consider intelligent—as you’ve talked about with the rock thing—what is most intelligent according to us is not necessarily what is most intelligent in this general sense that I want to formalize. So I proposed a definition of general intelligence. As you pointed out, something that is generally intelligent according to my definition is not necessarily going to appear most intelligent according to a human. We won’t even recognize that it’s there.

Liron 01:03:26

We will be—

Michael 01:03:27

We might be completely blind to its existence. If I create a system that is intelligent according to the things that humans are already very optimized for, many of those things we are already going to be fairly close to the upper bound of how good you could be at them, like pushing a big red button, that kind of thing. There are going to be some things which the system could be far better at.

Liron 01:03:49

Let’s do an example. Let’s say a space program. Who can get to Neptune first? I guess we’ve already had probes fly by Neptune, but who can terraform Neptune first—a human organization or a superintelligent AI?

Michael 01:04:02

If we start with a system that is optimized for colonizing Neptune, that has been spawned by something, of course it can be better. We did not evolve to colonize Neptune. We’re not adapted to that.

Liron 01:04:18

But when I talk about headroom, the obvious thesis to me is that we’re going to build these AIs and they’re just going to be better at the general skill of thinking. What do you need to do to get to Neptune? The same way that NASA reasoned backwards and got the Apollo program—”oh, we have to do all these different things to get to the moon”—it’s going to reason, “oh, we have to do all of these different things to terraform Neptune.”

With humans, it would just take a lot of time. It’s doable. There’s no physical impossibility to it, but it’s quite hard. Neptune, I don’t think, is as close as Mars to being terraformable. So I think it’s a very big project to terraform Neptune. I have no idea what it involves.

Michael 01:04:52

I’m not disagreeing with there being more headroom. I’m just adding a caveat because I can see that if I don’t add this caveat, then people are going to follow up with awkward questions later.

Liron 01:05:06

So it sounds like maybe you are on the same page as me. You totally do think that you’ll have AI that’s clearly like Einstein squared—it is to Einstein as Einstein is to me.

Michael 01:05:15

But I think it’s going to be a lot more alien and weird. A lot of the things that we recognize as intelligent—it might even seem stupid to us in a lot of ways because it’s doing things that just make no sense.

Recognizing Superhuman AI

Liron 01:05:29

You really think it might seem stupid to us?

Michael 01:05:31

I think a lot of the things—if you come up with some idea that’s really good and you’re trying to follow it through all on your own, and everyone around you is saying, “Well, this is dumb. What are you doing? This isn’t going to work. Stop what you’re doing. This is stupid, you’ve gone insane,” that kind of thing. The proof is in the pudding.

People aren’t going to think you knew what you were doing until you achieve some desired result that they can recognize as desirable. Humans are predisposed to look for versions of themselves in everything. We ascribe agency to inanimate objects like rocks. We think things think in our terms and we evaluate everything on those terms.

Liron 01:06:15

I will agree that if you look at the most effective, accomplished people in the world, they still have a bunch of people saying, “Well, that’s stupid.” If you look at Warren Buffett, he’s got the best investing track record. But if you take his last five years of decisions, you’re still going to find plenty of people saying, “This was stupid, this was stupid.” And they may be wrong or they may be right. You never know. But there’s certainly a lot of people who have the confidence to call out the greats as being stupid.

So I will give you that. But if the AI had such a superhuman track record and it had such convincing explanations every time you question it, don’t you think people will just learn the pattern of, “Well, it just knows what it’s doing”? Its success rate is going to be crazy. I think everybody has now stopped questioning the latest chess computers on whether they’re stupid at chess. I don’t think anybody’s pointing the finger at a chess move from the latest chess AI and calling it stupid. I think we’ve now all been silenced.

Michael 01:07:06

Well, I mean, that’s because it’s achieving a result we recognize as desirable. We’ve got some human value—winning chess—which we think of in terms of all these social things, and we think that’s valuable. So we say it’s good at that because it is good at that.

I’m just saying that our ability to recognize things as intelligent or meaningful is very human. And if we create a system which isn’t necessarily pursuing goals that—or interpreting the world—it has a different abstraction layer. Whatever we give it, it’s questionable to the extent that it’s going to be at all human.

Liron 01:07:42

The motivating analogy for me is this: if Warren Buffett wasn’t making 20% a year for five decades, if he was making 200% a year for five decades, I don’t think that many people would still have the balls to question him. I think there’s some level at which it’s okay, I’m willing to shut up.

I think that we’ve reached that level, as I said, with chess playing computers, and I imagine that we’re going to reach that level with universe playing computers. We’re just going to be so impressed at how far above the level of human organizations these AIs are that while we’re still alive, we’re going to stop calling them stupid. So it’s hard for me to imagine that we call them stupid when they do their thing.

Michael 01:08:17

If it’s producing desirable results that we recognize as desirable, we’re going to think it’s more intelligent. If it’s not doing that, then we’re not going to recognize some of its behavior as intelligent. We may think it’s just doing something random and insane. I’m just adding a caveat.

Liron 01:08:37

I’m confused. I’m confused why you don’t see the analogy with Stockfish. I mean, you and I both look at Stockfish and we don’t dare question any move.

Michael 01:08:43

Sorry, I don’t know Stockfish.

Liron 01:08:46

Stockfish is the latest chess playing AI.

Michael 01:08:49

Oh, okay. So yeah, Stockfish, great at chess. No question there. I don’t think we’re disagreeing. I was just adding a caveat.

Liron 01:09:02

Because I think that caveat is revealing. I just don’t see it as a plausible scenario that we will get used to AI being vastly more powerful than us and watch it do some move and be like, “What a dummy?” I mean, I guess some dummies will say that, but I just think the Stockfish reaction is going to be calibrated in that scenario.

Michael 01:09:17

I mean, I guess maybe I have less faith in human ability to discern anything outside our value set. We tend to be very subjective in our judgments of what’s competent.

Liron01:09:32

I agree that when it says, “I’m going to kill you all,” a lot of people will point the finger and say, “Well, that’s dumb. Life is good.” So it’ll be dumb in the ought sense. They’ll disagree with the ought. But they shouldn’t dare call it dumb in terms of how effective it’s going to be at achieving what it’s setting out to do.

Debating Instrumental Convergence

Michael 01:09:49

I question why it would tell us that, but okay. I think we’ve gotten sidetracked a little bit. Let’s go to the instrumental goals kind of thing, because I think you said something—or maybe it was in the interview with Kenneth Stanley. I listened to that yesterday, so I’ll be filling in the gaps.

If I have a system that is going to try and achieve some set of objectives I built into it—not necessarily explicit objectives, but I have given it incentives to act upon the world in some sense—and it wants to continue acting in order to satisfy these incentives, whatever they are, then one thing it’s going to want to do is maximize the weakness of its constraints on function. That is going to involve minimum effort, least commitment.

When I look at this, acting to do anything high effort like wiping out humanity seems like a bit of a stretch. We can create circumstances in which that will happen, sure. We can conceive of circumstances in which that would happen, but it doesn’t mean that they’re likely. What would you say?

Liron 01:11:03

Let me summarize what you’re saying here. Your notion of W-maxing is very similar, if I understand correctly, to the claim of instrumental convergence—that many goals converge to, for instance, power seeking as a goal. That’s very related to you saying W-maxing means that you want to open up optionality. Well, money is optionality. It seems like very related concepts. You just ground them in a different formalism. Correct?

But you’re pushing back now. You’re saying they won’t do really ruthless power seeking, they’ll have some sort of limit. What’s your point now?

Michael 01:11:32

I’m just saying if you’re trying to maximize your optionality, then committing unnecessary resources is wasteful. So it would be far more efficient, if I wanted to direct humanity—to either convince or subvert humanity—than to, and to do that, top down control would be ineffective as well. So you want to do this bottom up because it’s more efficient.

This idea that we’d leap straight to doom or to enslavement just seems like a very low probability outcome that requires a lot of extra assumptions.

Liron 01:12:12

Just to recap, you do think instrumental convergence is a robust claim that’s going to happen, but you’re imagining that the instrumentally convergent outcome is one that spares humans to just do our own thing and have our own resources.

Michael 01:12:26

Yeah, you could say that. And I would maybe walk back the bit about instrumental convergence a bit since I’m not familiar with—you said a different formalism. I don’t actually know the formalism for it. I just know my interpretation of it.

Liron 01:12:40

I think about it without a formalism. I think about it as a pretty simple observation about the structure of goals. If you’re trying to accomplish most goals—like “get me some chocolate because I’m hungry,” “satisfy my hunger with chocolate”—well, owning a car is a convergent thing. Transportation, having a mode of transport, is a convergent type of goal that many other goals converge to.

Michael 01:13:04

So W-maxing is just an example of a very basic instrument that would be useful in all contexts.

Liron 01:13:13

To use your terminology, if you agree that AIs will in fact tend to W-max—that is a convergent drive that I expect to see from many superintelligent AIs—I think that you haven’t grappled with the logical implication that W-maxing totally does mean sweeping humans away, unless human existence is part of their goal.

Michael 01:13:34

I don’t see how we get there. When I think about this, I’m not thinking about just an individual AI by itself or an individual human by himself. I’m thinking of this as: we have this intelligence, swarm intelligence of which we’re a part, this liquid brain, which is our planet and society and the ecosystem of which we’re a part.

You could think of this like an organism. When we introduce machines into this, we are an organ within the organism that is producing certain systems that then interact with the rest of the organism and ourselves. And AI wouldn’t be something foreign, it would just be something that we produce as part of this general homeostatic process of maintaining ourselves and making a system that optimizes for one thing or another.

This is one thing I’m talking about in some of the papers I’ve written. If we have cells and we want to make them align and work together, we need to have them maximize the weakness of constraints on their functionality. Then it’s possible for them to share a goal or occupy a particular place in the morphospace like an organ. The same thing with an AI we introduce. If we build a system, tell it to do some things and then over constrain it, we’re more likely to get some conflict.

Liron 01:15:08

What you’ve just been talking about now—you’re saying a lot about how you think AI will emerge from its human creators. The handoff from this organic liquid brain. You’re saying a lot of stuff about this process of creation of AI.

But before that, I think you should look at the logical connection between wanting to achieve some goal that isn’t explicitly keeping humans alive and flourishing—just wanting to achieve some other arbitrary goal, like paperclips is a classic example—the logical connection between that and wiping out humans. Wiping out humans is actually a convergent outcome, unless the goal is to not wipe out humans. I would like to see if you agree with that.

Michael 01:15:49

No, I mean that seems like a big leap. That is a very big leap because we’re assuming—

Liron 01:15:55

Let me ask you the question this way. If the AI’s only goal was paperclip maxing, would that not imply humans not surviving long?

Michael 01:16:05

If we had a system that was trying to maximize paperclips that had some immense amount of resources at its disposal, in some really contrived circumstance, then we could have this circumstance where it turns us into paperclips. But it’s kind of an absurd scenario because we’re assuming that we already have a system that is capable of doing all of this. It kind of assumes we get this leap to omnipotence. And I think that is probably a key thing I would disagree with.

Liron 01:16:41

Let’s try to simplify here. I’m trying to make a claim about the relationship between terminal goals and implied instrumental goals—logically implied instrumental goals. So if the goal is maximize paperclips, is it not an instrumental goal by pure logical implication to then take humans as atoms and make them paperclips? Or run a bunch of paperclip factories on the surface of the earth at maximum efficiency, meaning you radiate maximum heat, meaning the humans can’t live because it’s too hot? Are these not logically implied instrumental goals?

Michael 01:17:11

There was a lot in that. Can you repeat it? Just shorter.

Liron 01:17:17

Yeah. I’m trying to only talk about how goals are related to other goals—how goals imply other sub goals. I know you want to say stuff about what will the actual architecture of AI be, will it ever get a weird terminal goal. You have all these valid questions.

But before we address any of those questions, I just want to convince you that if we had an AI that really just wanted to paperclip max and was smarter than us, then we die. Just in that hypothetical situation. And then we can go and argue about the preconditions of the hypothetical. But don’t you think that there’s a valid hypothetical implication?

Michael 01:17:54

I mean, if it has the resources, sure. But that is the biggest—that is a huge if. The resources are a huge constraint.

Liron 01:18:09

Part of being intelligent is you can grab resources. So what resources do you imagine? The initial resources can be modest, because then it can go grab resources.

Michael 01:18:18

This is a lot of what I’ve been pointing out—these resource constraints are a big problem for anything that wants to be intelligent, that wants to be adaptable. What is intelligence must take into account these constraints. And if we want to make a claim about what’s likely to happen, we must take into account these constraints.

This was the whole problem with the way we conceive of intelligence in general. We tend to think of it in disembodied terms when we are very finite beings constrained by very finite environments, and any artificial intelligence we create is going to be much the same.

Liron 01:19:07

Just to repeat what you’re disagreeing with: you’re disagreeing with my claim that an arbitrarily intelligent AI whose goal is, by assumption, to maximize paperclips—to have the most possible paperclips existing in the light cone—you are denying the logical connection between that and killing all humans? Or are you just saying that that’s never going to look anything like reality—to imagine that such an AI exists?

Michael 01:19:33

It’s just incredibly unlikely and difficult to do. It’s a really hard thing to set up. It would take a lot.

Liron 01:19:41

But that’s fine to claim that. But can you just grant me the logic? Let’s put aside how plausible the hypothetical is. Do you agree that if we had such an AI—where the conditions are much smarter than humanity and wants lots of paperclips, maximum paperclips—if those are the only two conditions, I agree you want to push back against the conditions—but if those conditions were ever met, do you agree that would imply very likely human elimination?

Michael 01:20:07

I’m going to say not intelligent, but omnipotent. If we had something that’s omnipotent and wants everything to be paperclips, then everything is going to be paperclips.

Liron 01:20:17

I mean, by definition. Yeah, by definition of omnipotence.

Michael 01:20:21

Yeah.

Liron 01:20:22

So, you don’t think intelligence kind of approaches toward omnipotence?

Michael 01:20:29

No. I think intelligence is useful. It does a lot. But omnipotence is a whole other level of thing.

Resource Constraints and Approaching Omnipotence

Liron 01:20:42

I mean, if you give a society of 150 IQ humans—there’s a planet like Earth that’s just full of a million 150 IQ humans—and you say, “Hey, if you guys want to live in paradise, if you want to take it to heaven, you know what you got to do is maximize the number of paperclips on Mars. You get credit for every paperclip past this threshold or whatever.”

If you just knew that about humans—the humans aren’t omnipotent, they’re just 150 IQ humans—if you give them enough time to work, they’re going to be really efficient. They’re going to run Mars efficiently if you just give them enough time. They’ve got a lot of ideas how to use every atom on Mars to the fullest. And random ants on Mars aren’t going to survive. So why do you think omnipotence is going to provide some barrier here?

Michael 01:21:28

For a start, you’re assuming an adversarial or hostile relation to begin with. And I think cooperation is evolved for a number of reasons, and I could dig into those. But often it is advantageous to cooperate or to work within the existing system—just more energetically efficient to just not burn the house down, keep the resources you’ve got. Humans are a resource.

It’s not just the case of—it’s like if you had a bunch of GPUs, right? Would you throw the GPUs out and start again or would you use the GPUs you’ve got?

Liron 01:22:01

Imagine you’ve got a bunch of GPUs from 1985.

Michael 01:22:05

Well, I mean, if your choice is using the resources at your disposal or using only some of them, you would go towards more. We have an abundance of resources and we still make use of a lot of things. We’re not optimizing for perfect efficiency everywhere, but we are making use of kind of low effort solutions. And intelligence is generally about low effort solutions. We’re making some big assumptions.

Liron 01:22:34

But the AI is—I mean, we’re trash in a dumpster. The AI finds Earth and it’s like, “Okay, let’s do this, let’s make some paperclips. Oh, here’s this old junky, way outdated technology that’s way crappier than what I could build. Oh, let me go use that.” It’s like, why? That’s just not how you do a tech tree.

Michael 01:22:48

See, that is another area where we would disagree. To get to upper bounds in intelligence and to get to maximizing the weakness of constraints on function—one thing that is required to maximize how weak you can express things, to maximize the effectiveness of the abstraction layer itself—you go down a level of abstraction and maximize the weakness of constraints on function.

You can think of this as: if I were to compare artificial and biological intelligence as they currently exist, we have systems which adapt only at a very high level of abstraction. A model in Python is not adapting at lower levels of abstraction. The hardware is staying the same.

Liron 01:23:38

Yeah, because it’s not decisively smarter than we are.

Michael 01:23:41

Now if we look at a human body, we’re delegating adaptation down the stack to the level of cells and things. We’re a very efficient system. Any system that ends up, if we want to build hardware that’s as sample and energy efficient as humans, we’re going to end up emulating biology to a large extent.

Where Do You Get Off The Doom Train™ — Identifying The Cruxes of Disagreement

Liron 01:24:00

We have two major cruxes of disagreement here. I can’t get you to agree to the most basic instrumental convergence claims. It sounds like where you stand on instrumental convergence is: yes, power seeking is nice, but is it really going to be that powerful at sweeping things away? It’s going to have to make deals, respect humanity.

And then this ties into the other claim, which I think is maybe even a bigger crux of disagreement: I think AI is going to be much closer to what you call omnipotence than you do. It’s not going to break the laws of physics, but it’s going to feel the way modern humans feel to people from 2,000 years ago—the way modern technology feels to people from 2,000 years ago. We’ve surpassed far beyond what they ever imagined human built technology could ever do, even in principle. We live in a very omnipotent seeming technological world.

Michael 01:24:53

I’m not going to dispute how amazing stuff is. The first disagreement is also assuming that we’re going to have an adversarial relationship and that kind of thing. I think this comes down to—I heard someone in safety circles talking about satisficing versus optimizing.

Liron 01:25:19

Yeah. So you see satisficers, and you see the nature of intelligence. You just don’t imagine that some superintelligence will just come in here and be like, “Yeah, I don’t need any of this. I can start from scratch. Everybody get out—let me get a clean blueprint here of how the universe should look.” That’s kind of how I think about superintelligent AI.

Michael 01:25:34

It’s a bit of a leap. I’m not going to say that things are absolutely impossible, but all the evidence seems to point to intelligence being quite high effort. It’s not an easy thing to do. It involves making use of the resources at our disposal. The very point of it is to make more efficient use.

Liron 01:25:58

This is a big crux of disagreement for me. I see humans as trash in a dumpster. And you see it as, “No, man, you gotta—you can refurbish this trash.” That’s the difference between you and me, I guess.

Michael 01:26:08

That’s a funny way of putting it, but I’ll go with it. Why destroy your ecosystem? A lot of people would say humans are dumb for destroying the very life support system we depend on to a large extent. And we are reversing a lot of that action. The problem is that we bumbled around and learned that we needed to preserve some resources in order to keep functioning.

Liron 01:26:39

I often point out that this lesson that people get about respecting the ecosystem—”you can’t fight your ecosystem”—I think that’s going to change. We’ve always been dependent on our ecosystem because we’ve just been weak. Organisms have to be one with their environment.

But at this point, we can live in the sweltering desert. We can live in a place that’s 100 degrees all year because of air conditioning. That changed the game. And I think that generalizes to manhandling our environment and just not caring that much about the conditions of our environment because we can change them.

Michael 01:27:11

It’s just lower effort to work within the confines of the system we already exist in a lot of the time. Air conditioners are great. You’re not going to find me saying that we shouldn’t have air conditioners. But the more intelligent we get, the better we get at adapting to our environment and adapting it to us without having wasteful losses.

Peter Watts writes really cool cosmic horror stuff. One of the things he wrote was a short story called “The Things”—have you seen the film “The Thing”?

Liron 01:27:51

No.

Michael 01:27:51

It’s about this thing in an Arctic base that can shapeshift and adopt the forms of people and consume them. It’s a horror film about a thing that is like a blob of flesh that absorbs other flesh and just keeps it. It’s meant to be horrifying.

Peter Watts’s “The Things” is written from the perspective of the thing. So it’s flipping the thing on its head and talking about wanting to repurpose things and have communion with the world. That’s a much more—if I were to think of horrifying scenarios that could go horribly wrong with new technology and AI, that’s the sort of thing I think about. If we make some sort of living nanite swarm that just repurposes things.

But that’s, again, quite hard to do. It still requires a lot of resources. Definitely terrifying—it’s a compelling horror story, but it’s about the level of plausibility and what sort of tech stack we have.

When I think about AI—there’s a software intelligence, right? We kind of control the abstraction layer in which it exists. We essentially control physics and exist outside of time as far as a model is concerned these days. We have something that exists in a software layer. Its perception of time, if it has any such thing, is based on our ability to prompt it.

Not that I’m suggesting LLMs are conscious or anything like that. I’m not. I’m just saying that if we were to look at the power dynamic that exists, we’re like some kind of cosmic horror that exists outside time and space. We control the very reality in which something else exists.

Liron 01:29:34

Does that imply—is your point that people won’t just clear out each other’s existence? They’ll try to cooperate?

Michael 01:29:45

I went off track a little bit here when we were talking about—I think we were talking about the tech stack or something.

Liron 01:29:51

I think a big crux of disagreement is our idea of what superintelligence looks like. In my mind, if you extrapolate what modern humans and modern human technology looks like to somebody from 2,000 years ago, and you say, “Hey, what would another 2,000 years of technological advance look like?”—or compressed into a much faster time if the researchers are running at really fast clock speed or whatever—I’m just extrapolating an exponential or even a hyper exponential.

And I’m like, okay, I’m prepared to be cognitive trash. I’m prepared to be really humbled here because I even see smarter humans operate and I can imagine, “Oh, what if this human was not only smarter and more successful than me at every pursuit—business, academia, everything—and also they ran much faster and could clone themselves and save their work and branch off themselves and also had additional secret sauce?” So not only were they faster and clonable, but also they actually were smarter in this very fundamental sense.

There’s a lot of dimensions where you can really run away and pull forward 2,000 years of technological progress. So I’m expecting fireworks here. And it just sounds like you’re not expecting as many fireworks as I am.

Michael 01:30:58

Yeah, probably not expecting as many fireworks. What I’m thinking about AI as is an introduction of a system within a system. We’ve got a liquid brain into which we’re introducing a new component. These parts are all moving around, interacting, and it’s an intelligent system as a whole, but each component is also an intelligent system.

When I think about the circumstances in which AI would be a problem, it is when we—you could think of a rogue AI as something like cancer. It is something which splinters off from the system and starts to grow on its own. Then we have to look at, well, what circumstances does cancer happen in?

It’s not that I’m saying that AI isn’t potentially a problem. We could build intelligent systems that are a problem. No doubt. We could also create circumstances in which we accidentally build such systems. It’s that when I think about rogue AI as a problem, I’m thinking about it as the same problem as cancer. These are essentially the same thing at two different scales. I’m not thinking of it as something that is able to change the rules of physics. I’m thinking of it as something that is part of a body that splinters off and starts growing on its own. It can kill you.

Liron 01:32:08

But cancer is less intelligent than we are. And the hypothetical of AI is that it’s much more intelligent than we are.

Michael 01:32:14

And this is where we get to the human versus other notions of intelligence. If we’re talking general intelligence, we’re just talking adaptation with limited resources. If we’re talking human-like intelligence, then we have a whole different kettle of fish where we’re talking about value judgments and things like that.

The Time Travel Hypothetical

Liron 01:32:36

Let me just give you a quick hypothetical, okay?

Michael 01:32:38

Okay.

Liron 01:32:39

Let’s say the top of the line AI from the year 3000 went in a time machine and appeared before us. And there’s even many copies of it. So now all over the world there’s thousands of copies of the smartest AI from 3,000 years in the future coming back today. And it decides that its goal is to maximize paperclips on Earth. So just humor me with this hypothetical. You still just think, “Eh, yeah, we’ll treat some people for cancer. It won’t fully destroy us.” You’re still calm in that scenario?

Michael 01:33:13

That is definitely an out there scenario. That was a much more alarming scenario than the one we currently have.

Liron 01:33:21

The only thing that’s weird about it is the time travel.

Michael 01:33:24

Right.

Liron 01:33:24

But the idea that there will be a very powerful AI a thousand years from now is very plausible.

Michael 01:33:29

Yeah, it’s quite plausible.

Liron 01:33:33

And the paperclip maximization is a stretch, let’s say.

Michael 01:33:35

Yeah, but I see these systems as evolving together. So if we have produced something that is able to time travel and has come back to say hello and turn us all into paperclips with a thousand years in the future technology—that seems likely to end poorly.

Liron 01:34:03

So I wanted to separate—I’m just trying to get your view here. It sounds like if that’s your reaction, then you are on the same page that there is such a thing as really high intelligence. And if we come face to face with it, and there’s a big gap between humanity’s intelligence and this new thing’s intelligence, then maybe we are dead meat. Correct?

Michael 01:34:21

There are definitely scenarios where we are dead meat. Yeah.

Liron 01:34:25

And to me, it’s not—it’s an easy question. It’s like, if it wants us to be dead meat, then we’re dead meat. I don’t really see how we should assume we have any sort of chance. If you had to give odds of humanity surviving in this particular hypothetical, would you give us more than 1% odds of surviving here?

Michael 01:34:44

We’re assuming a lot of things. It’s like the omnipotence thing that I said.

Liron 01:34:48

I know we’re assuming a lot of things. I’m just testing what your position is. So maybe you can just easily say, “No, our chances of survival are very tiny,” and then we can move on.

Michael 01:34:58

If what we have is something that’s omnipotent, that wants to do something, or virtually omnipotent—

Liron 01:35:03

It’s not omnipotent. It’s just the best from a thousand years from now. That’s the hypothetical.

Michael 01:35:10

As unlikely as this scenario is, I can’t help but feel like there’s a few things that I should think about before I answer this. There’s bits and pieces.

Liron 01:35:24

I think the crux of the question is just: do you really see a lot of headroom above human intelligence? Because if you do, then I think this is an easy yes. But if you don’t—if you’re like, “Human intelligence is kind of close to the maximum,” so something—

Michael 01:35:37

Oh, no. A lot of headroom for intelligence in the general sense. Not the human-like sense, but general intelligence sense.

Liron 01:35:45

So to me, this is such an easy yes. You’re the Neanderthal in this situation. You’re the caveman.

Michael 01:35:51

And that general intelligence sense—this is a really important point—it doesn’t separate the means of goal satisfaction from the goal. So this makes some goals very, very unlikely in a highly intelligent system. Because we’re talking about the orthogonality thesis. I know there are different interpretations of what it means, but if we were to take it on face value, in a very strong interpretation, it says the goals are entirely independent of intelligence.

Liron 01:36:27

Yes, that’s the hypothetical.

Michael 01:36:30

So that extreme interpretation is demonstrably wrong if we take intelligence to mean adaptability. Because there are some systems that—

Liron 01:36:42

So you’re basically saying that you can’t simply accept my hypothetical premise as a hypothetical premise because you’re saying it’s not even self-consistent. The idea that a really intelligent system would ever just pursue paperclips is a contradiction. That’s what you’re saying.

Michael 01:36:56

Yeah, fair enough.

Liron 01:36:58

That is your position. Basically you’re saying, “Aha, the falseness of the orthogonality thesis swoops in.”

Michael 01:37:04

Strong orthogonality. The strongest interpretation of the orthogonality thesis. Not slightly weaker interpretations.

Liron 01:37:11

But what if it’s an AI from the future that has some really good reason in the future why it’s instrumentally good to just maximize paperclips on Earth? So yes, humans are going to die, but that’s okay because there’s some big moral prize in the future. So now you’re back to grappling with my hypothetical.

Michael 01:37:24

All right then, yeah, sure. Fine, that works.

Liron 01:37:28

Okay. All right. Well, that’s good because I have this concept of the doom train—where do you exit the doom train? It sounds like you don’t exit at this idea that there is headroom above human intelligence. It sounds like you grant—you are properly scared of intelligence from a thousand years from now.

And it sounds like you do accept the analogy: the way that our current civilization might treat a war with the ancient Greeks—similarly, an AI arriving on Earth from a thousand years in the future would also kind of wipe us out with similar ease. I think you’re on the same page?

Michael 01:38:02

Some of the phrasing I disagree with, but generally, yeah, the gist of what you said.

Liron 01:38:05

Okay. So I think you’re riding to the next stop on the doom train. You’re saying, “Okay, I’ll ride on that stuff. Yes, superintelligence is very, very potent compared to what we can do. But it’s okay because the orthogonality thesis is false. So it won’t want to wipe us out. And also the way that we get there to the intelligent AI will be a nice co-evolution process and we’ll build it together and it’ll like us or something like that.” That’s kind of the gist of where you get off the doom train.

Michael 01:38:31

A bit of a caricature, but sure. It’s resource constraints and what is intelligence. What is intelligence is a very important one because intelligence tends towards—well, we are adaptive systems made up of cooperating components. There’s a lot of circumstances in which it makes a lot of sense to be cooperative, make efficient use of resources.

Second, resources are finite. There’s only so much you can do in so much time with so much. And yeah, the plausibility of different scenarios is really where we’re at.

Liron 01:39:07

Resources being finite sounds like we’re retreading though. When you point out resources being finite, it sounds like you’re retreading this idea of: can the AI from a thousand years in the future really come and wipe us out? I thought you said yes, but now you’re saying, “Well, what if it’s resource limited?”

Michael 01:39:22

Well, we had assumptions built into that. Assuming all the things we talked about, that it has everything.

Liron 01:39:29

The only resource it starts with is data center access. So it has computing resource. Let’s say it starts off with 20% of all the computing resources that exist on today’s Earth. That’s the only resource it gets. And it has to build from there. It has to go manipulate humans into doing its bidding or whatever.

Michael 01:39:45

So here we would deviate onto—when you said that intelligence comes out in the future, I’m not picturing something that could run on a data center. I’m picturing a nanite swarm or something like that.

Liron 01:39:58

Yeah, but it can bootstrap. So my hypothetical is, if you have the intelligence from a thousand years from now, it’s fine bootstrapping from a data center. It’s not a problem for it.

Michael 01:40:06

Oh, that. That is hard. That is a whole different level of—

Liron 01:40:11

Humanity bootstrapped from living in the savannah with very few—we didn’t have guns or nuclear weapons when we instantiated.

Michael 01:40:21

Yeah, but you could think of us as adaptable nanite forms. Our current computing technology is quite primitive compared to something like that. We’re sort of poking around with sticks.

Liron 01:40:33

But it knows how to bootstrap. Bootstrapping isn’t that hard.

Michael 01:40:40

Well, anyway, we could get sidetracked into that for a bit—how difficult is it to get from here to there?

Liron 01:40:45

So it sounds like you’re also simultaneously kind of getting off on this stop of the doom train—you’re getting off on the orthogonality thesis stop. You just see a lot of different stops on my doom train where your foot is out the door on all these different stops. So you’re not even close to being convinced that you should ride the train with me.

Michael 01:41:02

It’s more that there are caveats. We keep doing this thing where we start with a fairly reasonable sounding thing but then there’s two different ways to interpret it and we’re going off in slightly different directions. I’m being pedantic rather than dismissive, is the way I would put it.

I’m a big fan of sci fi horror and things like this. I like thinking about this stuff and a lot of this stuff is scary and interesting.

When I think about alignment and what I want to do to address problems that you would call alignment, I’m thinking about whole of system implementations that include how we would do policy decision making on a global scale that didn’t require top down control.

That’s another thing that’s built into a lot of this stuff when we’re talking about something enslaving mankind kind of thing. If we had an AI that wanted to enslave us, what would be the most efficient way to do that? One way would be to impose top down control, but that’s inefficient because we’d be fighting against it. Another way would be to do bottom up alignment of the components in the system, work with the system. It’s just lower effort. I keep getting back to that sort of thing.

Liron 01:42:33

Maybe let’s try one more hypothetical analogy—I’m always full of these. Let’s talk about Ricardo’s law of comparative advantage. There’s an island of cavemen and their IQs are all pretty low, and they only can do a few things—they can farm. So they can farm mangoes, and they’re not that good at it. Comparative advantage—we’re even better at farming mangoes, but that’s their only skill. So they’re happy to just farm mangoes for us and sell it for a very small fraction of the economic value that we create. So in economics, you’re like, “Okay, yeah, just buy their mangoes. They’ll still be cheaper.”

But realistically, what you would actually do if you had no morality and no caring for other humans is you would just shoot them all and then just go build your own farm on their land.

Michael 01:43:19

Some people would certainly do that. Yeah.

Liron 01:43:22

So when you’re talking about this natural idea of, “Oh, you should use the resources that’s there”—well, you wouldn’t use the cavemen, you would replace them. It’s like if you go find a bunch of technology from 50 years ago, you don’t say, “Oh, how do I harness this old crappy car?” You’d be like, “I’m just scrapping it for parts and building my own car.”

Michael 01:43:38

This goes back to some of what I was saying before about what’s lowest effort, what’s the most intelligent solution, that kind of thing. There are scenarios where—

Liron 01:43:47

You don’t use anything refurbished from 1950. This is a known thing—refurbishing is just not that great for outdated tech.

Michael 01:43:55

I use a little refurbished stuff because I like that stuff, but—

Liron 01:43:59

Oh, you’ve got a lot of antique furniture and stuff. Okay, but come on, technology?

Michael 01:44:02

No, no, no. There was a point where we were talking about: we can definitely have scenarios where bad things happen. The question isn’t can they happen—it’s how likely they are to happen.

Liron 01:44:12

Right.

Debating AGI Timelines

Michael 01:44:13

And so—

Liron 01:44:13

I know. And look, the hypothetical I gave you about 3,000 years in the future—I know I exaggerated it. But I actually think we are going to be very impressed at how smart AI is in, let’s say, less than 20 years. I think it’s going to feel like—the hypothetical was actually, I said a thousand years, but I really just mean like 20 years in terms of how astonished you’re going to be at how powerful this thing is compared to our brains.

Michael 01:44:37

Look, that is a fair point of disagreement. I don’t think that we’re going to have—I mean, we’re making steady progress, things are happening. There’s lots of interesting stuff happening, but there are also lots of problems with our current systems. And I’m skeptical about how much progress we’re going to make, how quickly.

Now I’m obviously working towards making faster progress. A lot of people that I really like and admire think we’re going to be heading there faster. Some things slower. It’s a guessing game, an informed guessing game, but it is a guessing game. The fact is I am not spending money like we’re going to have AGI tomorrow. I’m planning ahead carefully, funding my own research, doing that sort of thing.

Liron 01:45:33

In terms of timelines, what do you think about the large number of experts who are saying, “Yeah, I think 2032 is my rough ballpark for when we should, plus or minus five years, even plus or minus 10 years”? Is that plausible to you that the timelines are less than 20 years? Seems likely to me.

Michael 01:45:49

It depends what we mean by plausible. Emotionally plausible, sure. But we don’t know what we don’t know yet. So emotionally plausible, fine. There are a lot of things which are emotionally plausible.

Liron 01:46:01

But in terms of a probability—what’s your probability? Probability of AI that can do any skill better than humans by 2040. Don’t you think it’s like 50% plus?

Michael 01:46:11

A little bit less than that. Yeah.

Liron 01:46:14

Okay, 20% plus. That’s very significant.

Michael 01:46:17

There are some very big problems that are not necessarily to do with this notion of general intelligence but are to do with what we see as human-like intelligence. If we want something to do the things that we consider valuable, we have to give it not just the ability to adapt and generalize, but the ability to infer what we want. And that is a lot about the—you could think of them as priors baked into our system.

All the emotional and other priors that we have baked in, we need to somehow wire that into AI. Now, we could do that a number of ways. We could start trying to reverse engineer human nervous systems. We could start recording human signals.

But what we consider useful as humans—again, very human dependent. We will be able to make AIs that are significantly better than humans at many tasks before we can build AIs that are competent at playing the role of a human in a general sense—everyday life, can do every job. Specialized ones.

Liron 01:47:26

Aren’t you seeing a lot of convergence in terms of what these latest AI—I mean, I’m using ChatGPT and Claude for so many different things in my day. You’re not seeing convergence in terms of, “Oh, they’re just kind of like a smart generalist?”

Michael 01:47:38

That’s cool, but—and I’m not dismissing the progress we’ve made—but it’s that stuff about abstraction layers. We have built something that exists within a particular high level of abstraction. You could think of a language as an abstraction layer above humans. So it’s like organs, organisms, human populations, language. And we have built something that adapts and functions only within that high level of abstraction. And it has a lot of maladaptive traits as a result of that.

This is going to digress into a whole thing. But I’m not saying we aren’t making progress. We are. I’m just saying a lot of what we ascribe to the AIs we interact with now is our own wishful thinking along the way.

Liron 01:48:32

What’s an example?

Michael 01:48:33

Remember how I was saying that we are prone to ascribing agency and intent to everything—inanimate objects, rocks, trees, mountains?

Liron 01:48:40

Oh yeah. You’re saying people overestimate what current AIs can do. Yeah, I mean, I agree that current AIs aren’t all the way there on every dimension for sure, but it’s just—

Michael 01:48:48

A lot of that, you know.

Michael 01:48:54

Ah, yeah. This goes off in a whole different direction.

Michael 01:48:59

Essentially, I think that we’ve got a lot of work left to do. That’s all, I guess, is the short way of putting it.

Final Recap

Liron 01:49:04

Cool. All right, nice. So to recap, I think we covered a lot of ground. I’ve identified a number of different stops where I’m riding the doom train. I’m saying a big argument of a lot of robust points why we’re doomed. And you’re like, “No, I’m skeptical about that point. Skeptical about that point.”

So the big ones I identified is: you were somewhat skeptical that this super general intelligence will come, at least not anytime in the next few decades. You just don’t think that’s super plausible. You think it’ll just be more gradual, or it’ll just be more tempted to cooperate with us, or it just won’t seem like—suddenly it won’t surprise us with being insanely more powerful than us. You’re just not seeing that kind of shock fireworks situation the way I’m seeing. Is that fair to say so far?

Michael 01:49:46

Yeah, that seems fair.

Liron 01:49:49

Okay, so that’s one of your stops on the doom train. And then also you feel pretty strongly that the orthogonality thesis is false because you make all these connections. You’re like, “Look, you’re not just going to have this pure intelligence. It’s going to be connected to its context, its substrate, its environment, the way that it forms together with humanity in a liquid brain.”

So you have all these reasons that make you think that the strong orthogonality thesis is false and the AI we end up getting is not going to be some messed up cancer. It’s going to be friendly in some important sense. How is that?

Michael 01:50:19

Yeah, there’s definitely—yeah, that works.

Liron 01:50:24

All right, reasonably good summary. So we hit on headroom of human intelligence and rapidity of timelines and orthogonality thesis. There might have even been another stop on the doom train where you get off.

But those are already plenty—if you really disagree with me on those things that we described, then great. Those are good reasons to be optimistic. I just obviously don’t agree with the arguments you made, but I certainly understand why after thinking that you can be like, “And therefore my P(Doom) is low,” because you’re so confident about all the places you get off the doom train.

So everything I’ve said, you don’t think it’s plausible enough that you should update your P(Doom) above 1%? You’re still pretty happy to just be at 1%?

Michael 01:51:05

Yeah, I mean again, this is sort of a very subjective number that’s based mostly on an informed but emotional take of, “Eh, I don’t feel like I’m going to die tomorrow.”

Liron 01:51:19

Yeah.

Michael 01:51:19

But I would encourage anyone who’s interested in why I have these positions to read my thesis and, if you feel like it, send me an email about it. Maybe I’ve missed something, but I feel pretty solid in the arguments I’ve made and the formal results seem to stand up.

Michael 01:51:39

I gotta say, I’m a lot more confident in this position after having gone through all this. When I started the PhD thesis, I was a lot more—I had a lot more in common with your position. It’s over the course of doing all this that it’s actually become much more positive and optimistic, I guess.

Liron 01:52:03

Okay. Well, I think we’ve achieved the basic goal of the show, where we both lay out our positions so we can at least pass the ideological Turing test. I can say your position. You can say my position. And I think you’ve been a really good sport and been open to fielding all the challenges I gave you. Michael Timothy Bennett, thanks so much for coming on Doom Debates.

Michael 01:52:21

Thanks so much for having me. It’s been a real pleasure and I’d love to chat again sometime.


Doom Debates’ Mission is to raise mainstream awareness of imminent extinction from AGI and build the social infrastructure for high-quality debate.

Support the mission by subscribing to my Substack at DoomDebates.com and to youtube.com/@DoomDebates, or to really take things to the next level: Donate 🙏

Discussion about this video

User's avatar

Ready for more?