0:00
/
0:00
Transcript

We Found AI's Preferences — What David Shapiro MISSED in this bombshell Center for AI Safety paper

LLMs already have unaligned utility functions!

The Center for AI Safety just dropped a fascinating paper — they discovered that today’s AIs like GPT-4 and Claude have preferences! As in, coherent utility functions. We knew this was inevitable, but we didn’t know it was already happening.

This episode has two parts:

In Part I (48 minutes), I react to David Shapiro’s coverage of the paper and push back on many of his points.

In Part II (60 minutes), I explain the paper myself.


00:00 Episode Introduction

05:25 PART I: REACTING TO DAVID SHAPIRO

10:06 Critique of David Shapiro's Analysis

19:19 Reproducing the Experiment

35:50 David's Definition of Coherence

37:14 Does AI have “Temporal Urgency”?

40:32 Universal Values and AI Alignment

49:13 PART II: EXPLAINING THE PAPER

51:37 How The Experiment Works

01:11:33 Instrumental Values and Coherence in AI

01:13:04 Exchange Rates and AI Biases

01:17:10 Temporal Discounting in AI Models

01:19:55 Power Seeking, Fitness Maximization, and Corrigibility

01:20:20 Utility Control and Bias Mitigation

01:21:17 Implicit Association Test

01:28:01 Emailing with the Paper’s Authors

01:43:23 My Takeaway


Show Notes

David’s source video: https://www.youtube.com/watch?v=XGu6ejtRz-0

The research paper: http://emergent-values.ai


Watch the Lethal Intelligence Guide, the ultimate introduction to AI x-risk! https://www.youtube.com/@lethal-intelligence

PauseAI, the volunteer organization I’m part of: https://pauseai.info

Join the PauseAI Discord — https://discord.gg/2XXWXvErfA — and say hi to me in the #doom-debates-podcast channel!


Doom Debates’ Mission is to raise mainstream awareness of imminent extinction from AGI and build the social infrastructure for high-quality debate.

Support the mission by subscribing to my Substack at

https://doomdebates.com

and to https://youtube.com/@DoomDebates