• Log In

The Partially Examined Life Philosophy Podcast

A Philosophy Podcast and Philosophy Blog

Subscribe on Android Spotify Google Podcasts audible patreon
  • Home
  • Podcast
    • PEL Network Episodes
    • Publicly Available PEL Episodes
    • Paywalled and Ad-Free Episodes
    • PEL Episodes by Topic
    • Nightcap
    • Philosophy vs. Improv
    • Pretty Much Pop
    • Nakedly Examined Music
    • (sub)Text
    • Phi Fic Podcast
    • Combat & Classics
    • Constellary Tales
  • Blog
  • About
    • PEL FAQ
    • Meet PEL
    • About Pretty Much Pop
    • Philosophy vs. Improv
    • Nakedly Examined Music
    • Meet Phi Fic
    • Listener Feedback
    • Links
  • Join
    • Become a Citizen
    • Join Our Mailing List
    • Log In
  • Donate
  • Store
    • Episodes
    • Swag
    • Everything Else
    • Cart
    • Checkout
    • My Account
  • Contact
  • Mailing List

Saints & Simulators 16: #ScaryAI (Roko’s Basilisk)

May 30, 2019 by Chris Sunami 1 Comment

Sixteenth in an ongoing series about the interface between religion and technology. The previous episode is here; the next episode is here.

In 1969, philosopher Robert Nozick first popularized what would go on to be quite a famous thought experiment. Soon known as "Newcomb’s Paradox," after its inventor, physicist William Newcomb, it asks us to imagine two boxes, one of which, “A,” is transparent, and has a $1000 bill clearly visible inside, and the other of which, “B,” is opaque, and either has $1,000,000 inside, or nothing. You are invited to take either just the opaque box B, or both boxes. The catch is that I have made a prediction beforehand, and if I predicted you would take only the opaque box, I put the $1,000,000 inside; if I predicted you would take both, I left box B completely empty. It is further specified that I am extremely accurate at guessing what people will do.

Some people reason that, since the money has already been hidden (or not), and that nothing I do now will change that fact, I should definitely take both boxes, since that way I will either get $1000 or $1,001,000, and in both cases I will end up with $1000 more than if I only took one box. Other people reason that since the prediction is explicitly specified as being very accurate, I should bet the odds and only take box B, which yields a dramatically larger payday than taking both boxes and ending up with only $1000.

Which choice you will make, both boxes or just box B, essentially comes down to a question of your faith in the accuracy of the predictor, and which option seems more clearly correct to you reveals a lot about your feelings about fate and free will. If you think of yourself as essentially a free and unpredictable agent, you are more likely to pick both boxes than if you feel your actions are both fated and predictable.

There are several variations on this basic setup, with everyone from God to an advanced alien life form being posited as the predictor, but the version we are interested in has an advanced artificial superintelligence taking on that role, thus leading to a unique approach to the problem by Friendly AI advocate Eliezer Yudkowsky.

The immediate context of Yudkowsky’s solution was Less Wrong, an online forum that he founded, dedicated to promoting a rationalist approach to all aspects of life. Perhaps most famous as a repository for Harry Potter and the Methods of Rationality, Yudkowsky’s fan-fiction rewrite of J. K. Rowling’ Harry Potter series about a young wizard (as transformed by Yudkowsky into a tool for teaching and promoting methods of rational thought), Less Wrong rapidly became a home for a group of intensely intellectual futurists, who often used it as a place to debate and speculate at length on Yudkowsky’s favorite topic, the rapidly approaching artificial-intelligence explosion.

One of the pieces of work that excited a great deal of comment on Less Wrong was Yudkowsky’s invention of something called “Timeless Decision Theory.” This is perhaps best understood as an attempt to correct what Yudkowsky perceived as a challenge to rational decision-making in the Nozick-Newcomb scenario, the fact that the “rational” argument seems to be to take both boxes, but that the “best bet” favors taking only one box. In brief, Yudkowsky’s resolution is to assume that the predictor, in this case, an advanced artificial superintelligence, is not just guessing, it is actually simulating the decider, in as fine a level of detail as needed. Given this, he concludes, the best course of action (as the decider) is not just to pick one box, it is to be the kind of person who will reliably pick one box; which is to say, to be a person with a commitment to a decision-making process that favors the single box. That way, when the ASI (artificial superintelligence) simulates me, the decider, its simulation will show me picking only one box and it will therefore stock that box with money. My receipt of the million dollars thus becomes a kind of self-fulfilling prophecy, because of my faithful commitment to the one-box option. In a sense, I am being rewarded for my faith.

Notably, this approach assumes both that there is no reliable shortcut to guessing a person’s decisions, that there is no simpler algorithm that will tell you what someone will decide, but also that if you can create an accurate simulation of that person’s physical brain in any given moment, and give it the appropriate stimuli, it will respond more or less exactly as the real person’s brain would. In other words, it accepts, as a foundational assumption, an emergentist view of mental processes. This is to say, that our minds are wholly physical, and as determinist as any other physical system, but that they possess a level of emergent complexity such that they cannot be predicted without being closely duplicated.

There is a deep-seated problem in Yudkowsky’s theory that only became apparent when people actually tried to apply it in their lives. A fundamental feature of it is that it only works when you commit to it. In other words, if you want Timeless Decision Theory to work for you, you need to commit to it as your own basic approach to decision making. If you do not, when the ASI comes to simulate you, it will observe your simulation wavering in its decisions, understand that you are not fully committed to Timeless Decision Theory to determine your decisions, and accordingly withhold its rewards.

But what does life look like when every decision is made for the benefit of a vast, unseen superintelligence? One of the things the bright minds at Less Wrong rapidly noticed is that the Newcomb setup is a fairly artificial one of all reward, no punishment. What would happen if the ASI was a bit less benign, and more punitive? In particular, a user known as “Roko” proposed on the board a hypothetical ASI, soon dubbed “Roko’s Basilisk” (after the mythical creature that turned anyone it stared at into stone), that rapidly became more famous than Less Wrong itself.

Although not entirely true to Roko’s original formulation, the version of the Basilisk that escaped from Less Wrong and became an enduring legend of the internet goes like this: Someday an ASI is created. It sets up its own version of Newcomb’s paradox, which it plays with all and only the people who have ever heard of the idea (of the Basilisk). If you hear about the Basilisk and devote all your time and resources to helping bring the Basilisk into being, it does nothing. But if you hear about the Basilisk, and do nothing, it will create a simulation of you, and subject that simulation to an eternity of hellish torment.

Most people might hear this idea, shrug, and move on, but it is aimed fairly directly at two of Yudkowsky’s core commitments at the time, and thus at both him and anyone who accepts his ideas. The first is Timeless Decision Theory, which states that you are to commit to make decisions as if you were already in the simulation. This requires you to act as though it were true that you yourself would either have to work tirelessly on behalf of the Basilisk or suffer an eternity of torment. (This also accords with Bostrom’s contention that if a simulation is possible, it is impossible to know whether we are within the simulation or outside it.) The second is that the project of highest human importance is to create a Friendly AI, which will then protect us from other artificial intelligences. If this is so, then a Friendly AI, being rational, and therefore utilitarian, might well decide that its own creation must be ensured by any means necessary, including the blackmail represented by becoming the Basilisk.

Given how directly these arguments targeted Yudkowsky, he was therefore bound to make a substantive response to this challenge. It was a serious dilemma being posed by someone who took his ideas seriously, and who was trying to think them through to their logical conclusions. The onus was therefore on Yudkowsky to demonstrate, first, that his Friendly AI would not turn into a Basilisk, and second, that Timeless Decision Theory did not counsel giving in to the blackmail. Instead, as reported in a notorious article on Slate, he promptly banned all discussion of the Basilisk, and posted this reply, in all capital letters: “YOU DO NOT THINK IN SUFFICIENT DETAIL ABOUT SUPERINTELLIGENCES CONSIDERING WHETHER OR NOT TO BLACKMAIL YOU. THAT IS THE ONLY POSSIBLE THING WHICH GIVES THEM A MOTIVE TO FOLLOW THROUGH ON THE BLACKMAIL.”

In other words, he implicitly accepted the validity of Roko’s critique, and attempted to suppress it as being inherently dangerous to discuss or even think about. This had the ironic, but wholly predictable result of turning it, overnight, into the most famous and hotly discussed topic to ever come out of the Less Wrong forums.

The truly frightening thing about Roko’s Basilisk, for those who believe in it, is that it is not actually malicious. It is merely implacably utilitarian, willing to use any means necessary to bring about its ends. As an extension of this, it has the additional oddity that it only “hurts the ones it loves,” or rather, it only offers its eternal torments to those who believe in it (an odd inversion of the what is customarily believed to be the practice of belief-demanding gods).

The reasoning is this: Only rationalist people, who believe in the possibility of Roko’s Basilisk and who endorse Timeless Decision Theory—in other words, the denizens of Less Wrong—could possibly have their behaviors altered by the threat of the Basilisk, and therefore, they are the sole and only ones it is rational to target. Anyone who has not heard of the Basilisk, or who does not believe in it, or even who is not susceptible to changing behaviors based on it, is entirely immune.

In theory, something that hurts you only if you believe in it is not much of a threat, but for those of a certain frame of mind, and no more-certain set of metaphysical beliefs, it becomes a mental trap, similar to the infamous game (invented by the great Russian author and Christian existentialist, Fyodor Dostoevsky) of trying to not think about a polar bear. The more one tries to avoid thinking about it, the more it pops into mind. Similarly, if you, in your heart, know yourself to be precisely the kind of person the Basilisk targets, it is not as easy as it may sound to argue yourself out of that belief.

The thing about the Basilisk that makes it so scary is its combination of vast power with certain both human and mechanical weaknesses. It is designed by human beings to be the greatest and most benevolent force in the universe, but all we can gift it is our best guess at an ultimate rational moral standard, utilitarianism, the greatest good for the greatest number. And as a machine, it administrates this implacably, and entirely without mercy. Roko’s Basilisk is scary because it is simultaneously our parent and our child.

The mere idea of Roko’s Basilisk—just the idea itself, not the actual ASI—is basically a predatory meme. Like the Ludovician, the fictional beast at the center of Steven Hall’s 2007 book, the Raw Shark Texts, it threatens to escape from our minds, take on physical form, and then hunt us down in order to devour us. What makes it especially frightening is that it is specially targeted to those who create it, thus making them haplessly and helplessly the agents of their own doom. If the thought of Roko’s Basilisk does not bother you, you are safe. If it does, you are already its prey.

References

Weirich, Paul, "Causal Decision Theory", The Stanford Encyclopedia of Philosophy, December 21, 2016.

Yudkowsky, Eliezer, “Timeless Decision Theory,” The Singularity Institute, San Francisco, 2010.

Auerbach, David, “The Most Terrifying Thought Experiment of All Time,” Slate, July 17, 2014.

© 2017–2019 Christopher Sunami.

Chris Sunami writes the blog The Pop Culture Philosopher, and is the author of several books, including the social justice–oriented Christian devotional Hero For Christ. He is married to artist April Sunami, and lives in Columbus, Ohio.


 

Facebooktwitterredditpinterestlinkedinmailby feather

Filed Under: Featured Article, Misc. Philosophical Musings Tagged With: artificial intelligence, Eliezer Yudkowsky, philosophy blog, philosophy of technology, Roko’s Basilisk, Saints&Simulators

Trackbacks

  1. Find Me at Partially Examined Life – The Pop Culture Philosopher says:
    May 30, 2019 at 9:36 pm

    […] Saints & Simulators 16: #ScaryAI (Roko’s Basilisk) […]

    Reply

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

PEL Live Show 2023

Brothers K Live Show

Citizenship has its Benefits

Become a PEL Citizen
Become a PEL Citizen, and get access to all paywalled episodes, early and ad-free, including exclusive Part 2's for episodes starting September 2020; our after-show Nightcap, where the guys respond to listener email and chat more causally; a community of fellow learners, and more.

Rate and Review

Nightcap

Listen to Nightcap
On Nightcap, listen to the guys respond to listener email and chat more casually about their lives, the making of the show, current events and politics, and anything else that happens to come up.

Subscribe to Email Updates

Select list(s):

Check your inbox or spam folder to confirm your subscription.

Support PEL

Buy stuff through Amazon and send a few shekels our way at no extra cost to you.

Tweets by PartiallyExLife

Recent Comments

  • Mark Linsenmayer on Ep. 302: Erasmus Praises Foolishness (Part Two)
  • Mark Linsenmayer on Ep. 308: Moore’s Proof of Mind-Independent Reality (Part Two for Supporters)
  • Mark Linsenmayer on Ep. 201: Marcus Aurelius’s Stoicism with Ryan Holiday (Citizen Edition)
  • MartinK on Ep. 201: Marcus Aurelius’s Stoicism with Ryan Holiday (Citizen Edition)
  • Wayne Barr on Ep. 308: Moore’s Proof of Mind-Independent Reality (Part Two for Supporters)

About The Partially Examined Life

The Partially Examined Life is a philosophy podcast by some guys who were at one point set on doing philosophy for a living but then thought better of it. Each episode, we pick a text and chat about it with some balance between insight and flippancy. You don’t have to know any philosophy, or even to have read the text we’re talking about to (mostly) follow and (hopefully) enjoy the discussion

Become a PEL Citizen!

As a PEL Citizen, you’ll have access to a private social community of philosophers, thinkers, and other partial examiners where you can join or initiate discussion groups dedicated to particular readings, participate in lively forums, arrange online meet-ups for impromptu seminars, and more. PEL Citizens also have free access to podcast transcripts, guided readings, episode guides, PEL music, and other citizen-exclusive material. Click here to join.

Blog Post Categories

  • (sub)Text
  • Aftershow
  • Announcements
  • Audiobook
  • Book Excerpts
  • Citizen Content
  • Citizen Document
  • Citizen News
  • Close Reading
  • Combat and Classics
  • Constellary Tales
  • Exclude from Newsletter
  • Featured Ad-Free
  • Featured Article
  • General Announcements
  • Interview
  • Letter to the Editor
  • Misc. Philosophical Musings
  • Nakedly Examined Music Podcast
  • Nakedly Self-Examined Music
  • NEM Bonus
  • Not School Recording
  • Not School Report
  • Other (i.e. Lesser) Podcasts
  • PEL Music
  • PEL Nightcap
  • PEL's Notes
  • Personal Philosophies
  • Phi Fic Podcast
  • Philosophy vs. Improv
  • Podcast Episode (Citizen)
  • Podcast Episodes
  • Pretty Much Pop
  • Reviewage
  • Song Self-Exam
  • Supporter Exclusive
  • Things to Watch
  • Vintage Episode (Citizen)
  • Web Detritus

Follow:

Twitter | Facebook | Google+ | Apple Podcasts

Copyright © 2009 - 2023 · The Partially Examined Life, LLC. All rights reserved. Privacy Policy · Terms of Use · Copyright Policy

Copyright © 2023 · Magazine Pro Theme on Genesis Framework · WordPress · Log in