Fourteenth in an ongoing series about the places where science and religion meet. The previous episode is here.
Given how likely killer robots are, and how clearly the paths we are currently embarked on lead to that eventuality, can this destiny be averted? Can the killer robots be stopped? The most obvious answer is just to commit not to building them, but that is a more difficult proposition than it may seem. As we have seen, even the most perfectly functional machine, designed for the most benign purpose, could turn sinister when juiced up by the power of runaway technological advance, and gifted with superhuman intelligence. The possibility of a not-perfectly-functional machine, or a machine subverted to devious ends, or designed for a less-than-benign purpose seems inevitable, and even one such machine, powered by the singularity, could prove apocalyptic. If all that were not enough, the human history of arms races indicates that no technology is so disastrous that some nation will not pursue it under the threat of some other nation getting there first.
The larger context is that it is tough, arguably impossible, to stop progress (outside of some external disaster such as a meteorite strike, or perhaps a human-caused disaster like climate change). There are cultures that have rejected progress, and clung to traditional ways, but they have generally been out-competed (and eventually forced to adapt) by the more technologically advanced ones. A “nothing” can never beat a “something,” and a movement that simply reacts against another one can never end up in the lead. Even ancient China, which excelled in scientific discovery without being transformed by technology, was not so much acting in suppression of technological progress as it was replacing it with a different kind of focus (on the areas of culture and aesthetics).
Acceptance of the unstoppable inevitability of progress is the motivation behind yet another approach to artificial intelligence called “Friendly AI.” It starts with the assumption that runaway technological progress is inevitable, that some one among the many teams around the world working on artificial intelligence will soon succeed, and that disastrous robot apocalypse is the far most likely result. Given that, the belief of the Friendly AI camp is that it is absolutely essential that we ensure the first artificial superintelligence is “friendly,” meaning that it has the best interests of humanity at heart, and is willing and able to protect us from its nastier cousins.
It might be easiest to understand this by analogy. Imagine that a new race of alien creatures has been discovered, and that a number of people are trying to raise them. It is very hard to get them to grow, and so far none of them is yet six feet tall. But it is widely believed that somewhere around the point where they reach seven or eight feet tall, they will get a sudden growth spurt, and begin growing taller at an ever-increasing rate, without stopping.
To you, this seems very dangerous, since a creature that size might stomp us all to death without even realizing it. But you have been entirely unable to convince any of the people captivated by these creatures to stop trying to grow them ever bigger. So you decide your only option is to purchase one yourself, train it to be protective of people, and hope that it becomes the first successful giant—so that it can stop any others from following in its own footsteps, potentially by stomping them out of existence while they are still too small to fight back.
In case it is not obvious, this is a “Hail Mary” strategy for human survival, a desperate and unlikely plan launched because of the belief that is the only option left. Not only is it absolutely essential that the “friendly” artificial intelligence reaches the superintelligent state first, but it also requires that the training plan for that artificial intelligence be foolproof, and not something it is capable of shaking aside once it is out of our own direct control. It also courts the cruel irony that this particular attempt to save humanity might actually be what dooms it, in the case that it goes wrong.
This, at least, is the “cartoon version,” the easy-to-understand caricature of Friendly AI. As personified by its most visible advocate, polymath and autodidact Eliezer Yudkowsky, however, it is a bit more nuanced and plausible. Although Yudkowsky does support the creation of that first “friendly” artificial intelligence as described above, his larger strategy is to develop and promote a set of baseline standards, to be followed universally by all artificial intelligence researchers, that will build a set of safety mechanisms into all artificial intelligences at a basic, fundamental, and hopefully non-circumventable level. This project has been actively adopted, not only by Yudkowsky’s own Machine Intelligence Research Institute, but also by other similar organizations, most notably the Future of Life Institute, which at one time boasted both Elon Musk and legendary physicist Stephen Hawking as board members. As described in a position paper by the latter group, the overall concept is
… it may be possible to design utility functions or decision processes so that a system will not try to avoid being shut down or repurposed… and theoretical frameworks could be developed to better understand the space of potential systems that avoid undesirable behaviors.
The basic idea is quite similar to the great science fiction grand master Isaac Asimov’s concept of "Three Laws of Robotics," to be etched into the very pathways of every robot’s brain at the very deepest level:
A robot may not injure a human being or, through inaction, allow a human being to come to harm. A robot must obey orders given it by human beings except where such orders would conflict with the First Law. A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.
Of course, however, there are several issues with the “Three Laws,” namely that they (1) are fictional, (2) were invented as a plot device, and (3) never functioned quite as designed in any of the stories in which they were featured (see number 2).
The real life version has some problems as well, as Yudkowsky himself admits: There are so many different approaches to artificial intelligence that it becomes hard to find structures or fail-safes that would work for all of them. It is difficult to know if the rules that make sense to us today would really function as designed, if carried out by superhuman intelligences. Finally, it seems at least plausible that a superhuman intelligence would find a way to easily brush aside our puny fail-safes, at least once it got powerful enough.
Overall, there are two major reasons for our uncertainty. The first is that we simply do not know enough about advanced machine intelligence, and what it will actually be like when we achieve it. Will computers need to gain free will and emotions and therefore unpredictability along the way? Or is that just overly anthropomorphic thinking? Will they remain ultra-rational and literal, or is a touch of poetic irrationality part of being truly intelligent? Can they remain as pure machines, or will we need to find some way to graft some sort of soul into them, one way or another?
The second problem is more acute, but less widely acknowledged: We do not understand enough about ourselves to know what to protect. There is no stable consensus on what makes us truly, significantly, importantly human. So how can we design machines to protect our humanity, if we do not know ourselves what it consists of?