Episode #431 from 2:20
What to you is the probability that super intelligent AI will destroy all human civilization? What's the timeframe?
People
Topics
Introduction
0:00
If we create general superintelligences, I don't see a good outcome long-term for humanity. So there is X-risk, existential risk, everyone's dead. There is S-risk, suffering risks, where everyone wishes they were dead. We have also idea for I-risk, ikigai risks, where we lost our meaning. The systems can be more creative. They can do all the jobs. It's not obvious what you have to contribute to a world where superintelligence exists. Of course, you can have all the variants you mentioned, where we are safe, we are kept alive, but we are not in control. We are not deciding anything. We're like animals in a zoo. There is, again, possibilities we can come up with as very smart humans and then possibilities something a thousand times smarter can come up with for reasons we cannot comprehend. The following is a conversation with Roman Yampolskiy, an AI safety and security researcher and author of a new book titled AI: Unexplainable, Unpredictable, Uncontrollable. He argues that there's almost 100% chance that AGI will eventually destroy human civilization. As an aside, let me say that I'll have many often technical conversations on the topic of AI, often with engineers building the state-of-the-art AI systems. I would say those folks put the infamous P(doom) or the probability of AGI killing all humans at around one to 20%, but it's also important to talk to folks who put that value at 70, 80, 90, and is in the case of Roman, at 99.99 and many more nines percent.
Existential risk of AGI
2:20
Ikigai risk
8:32
I would love to dig into each of those X-risk, S-risk, and I-risk. So can you linger on I-risk? What is that? So Japanese concept of ikigai, you find something which allows you to make money. You are good at it and the society says we need it. So you have this awesome job. You are podcaster gives you a lot of meaning. You have a good life. I assume you're happy. That's what we want more people to find, to have. For many intellectuals, it is their occupation, which gives them a lot of meaning. I'm a researcher, philosopher, scholar. That means something to me In a world where an artist is not feeling appreciated, because his art is just not competitive with what is produced by machines or a writer or scientist will lose a lot of that. At the lower level, we're talking about complete technological unemployment. We're not losing 10% of jobs. We're losing all jobs. What do people do with all that free time? What happens then? Everything society is built on is completely modified in one generation. It's not a slow process where we get to figure out how to live that new lifestyle, but it's pretty quick.
Suffering risk
16:44
Okay, so what's S-risk? What are the possible things that you're imagining with S-risk? So mass suffering of humans, what are we talking about there caused by AGI? So there are many malevolent actors. We can talk about psychopaths, crazies, hackers, doomsday cults. We know from history they tried killing everyone. They tried on purpose to cause maximum amount of damage, terrorism. What if someone malevolent wants on-purpose to torture all humans as long as possible? You solve aging. So now you have functional immortality and you just try to be as creative as you can.
Timeline to AGI
20:19
Okay, we'll talk about possible solutions and what not playing it means, but what are the possible timelines here to you? What are we talking about? We're talking about a set of years, decades, centuries, what do you think? I don't know for sure. The prediction markets right now are saying 2026 for AGI. I heard the same thing from CEO of Anthropic DeepMind. So maybe we're two years away, which seems very soon given we don't have a working safety mechanism in place or even a prototype for one. There are people trying to accelerate those timelines, because they feel we're not getting there quick enough.
AGI turing test
24:51
So what's a good test to you that measures whether an artificial intelligence system has reached human level intelligence and what's a good test where it has superseded human level intelligence to reach that land of AGI? I'm old-fashioned. I like Turing tests. I have a paper where I equate passing Turing tests to solving AI complete problems because you can encode any questions about any domain into the Turing test. You don't have to talk about how was your day. You can ask anything. So the system has to be as smart as a human to pass it in a true sense.
Yann LeCun and open source AI
30:14
Let me ask about Yann LeCun. He's somebody who you've had a few exchanges with and he's somebody who actively pushes back against this view that AI is going to lead to destruction of human civilization, also known as AI doomerism. So in one example that he tweeted, he said, "I do acknowledge risks, but," two points, "One, open research and open source are the best ways to understand and mitigate the risks. Two, AI is not something that just happens. We build it. We have agency in what it becomes. Hence, we control the risks. We meaning humans. It's not some sort of natural phenomena that we have no control over." Can you make the case that he's right and can you try to make the case that he's wrong? I cannot make a case that he's right. He is wrong in so many ways it's difficult for me to remember all of them. He's a Facebook buddy, so I have a lot of fun having those little debates with him. So I'm trying to remember their arguments. So one, he says, we are not gifted this intelligence from aliens. We are designing it. We are making decisions about it. That's not true. It was true when we had expert systems, symbolic AI decision trees. Today, you set up parameters for a model and you water this plant. You give it data, you give it compute, and it grows. After it's finished growing into this alien plant, you start testing it to find out what capabilities it has. It takes years to figure out, even for existing models. If it's trained for six months, it'll take you two, three years to figure out basic capabilities of that system. We still discover new capabilities in systems which are already out there. So that's not the case.
AI control
43:06
So let's focus then on the control problem. At which point does the system become uncontrollable? Why is it the more likely trajectory for you that the system becomes uncontrollable? So, I think at some point it becomes capable of getting out of control. For game theoretic reasons, it may decide not to do anything right away and for a long time, just collect more resources, accumulate strategic advantage. Right away, it may be still young, weak super intelligence, give it a decade. It's in charge of a lot more resources, it had time to make backups. So it's not obvious to me that it will strike as soon as it can.
Social engineering
45:33
Maybe the social engineering. For social engineering, AI systems don't need any hardware access. It's all software. So they can start manipulating you through social media, so on. You have AI assistants, they're going to help you manage a lot of your day to day and then they start doing social engineering. But for a system that's so capable that can escape the control of humans that created it, such a system being deployed at a mass scale and trusted by people to be deployed, it feels like that would take a lot of convincing. So, we've been deploying systems which had hidden capabilities.
Fearmongering
48:06
So as I mentioned, just to linger on the fear of the unknown, so the Pessimist Archive has just documented, let's look at data of the past at history, there's been a lot of fear-mongering about technology. Pessimist Archive does a really good job of documenting how crazily afraid we are of every piece of technology. We've been afraid, there's a blog post where Louis Anslow who created Pessimist Archive writes about the fact that we've been fear-mongering about robots and automation for over 100 years. So why is AGI different than the kinds of technologies we've been afraid of in the past? So two things; one with wishing from tools to agents. Tools don't have negative or positive impact. People using tools do. So guns don't kill, people with guns do. Agents can make their own decisions. They can be positive or negative. A pit bull can decide to harm you. It's an agent. The fears are the same. The only difference is now we have this technology. Then they were afraid of human with robots 100 years ago, they had none. Today, every major company in the world is investing billions to create them. Not every, but you understand what I'm saying?
AI deception
57:57
But see, I'm very concerned about system being used to control the masses. But in that case, the developers know about the kind of control that's happening. You're more concerned about the next stage where even the developers don't know about the deception. Correct. I don't think developers know everything about what they are creating. They have lots of great knowledge, we're making progress on explaining parts of a network. We can understand, "Okay, this note get excited, then this input is presented, this cluster of notes." But we're nowhere near close to understanding the full picture, and I think it's impossible. You need to be able to survey an explanation. The size of those models prevents a single human from absorbing all this information, even if provided by the system. So either we're getting model as an explanation for what's happening and that's not comprehensible to us or we're getting compressed explanation, [inaudible 00:59:01] compression, where here, "Top 10 reasons you got fired." It's something, but it's not a full picture.
Verification
1:04:30
But a lot of the conversation I'm having with you now is also kind of wondering almost at a technical level, how can AI escape control? What would that system look like? Because it, to me, is terrifying and fascinating. And also fascinating to me is maybe the optimistic notion it's possible to engineer systems that defend against that. One of the things you write a lot about in your book is verifiers. So, not humans. Humans are also verifiers. But software systems that look at AI systems, and help you understand, "This thing is getting real weird." Help you analyze those systems. So maybe this is a good time to talk about verification. What is this beautiful notion of verification? My claim is, again, that there are very strong limits in what we can and cannot verify. A lot of times when you post something on social media, people go, "Oh, I need citation to a peer reviewed article." But what is a peer reviewed article? You found two people in a world of hundreds of thousands of scientists who said, "Ah, whatever, publish it. I don't care." That's the verifier of that process. When people say, "Oh, it's formally verified software or mathematical proof," we accept something close to 100% chance of it being free of all problems. But you actually look at research, software is full of bugs, old mathematical theorems, which have been proven for hundreds of years have been discovered to contain bugs, on top of which we generate new proofs and now we have to redo all that.
Self-improving AI
1:11:29
So, this paper is really interesting. You said 2011, Artificial Intelligence, Safety Engineering. Why Machine Ethics is a Wrong Approach. The grand challenge you write of AI safety engineering, "We propose the problem of developing safety mechanisms for self-improving systems." Self-improving systems. By the way, that's an interesting term for the thing that we're talking about. Is self-improving more general than learning? Self-improving, that's an interesting term. You can improve the rate at which you are learning, you can become more efficient, meta-optimizer.
Pausing AI development
1:23:42
The condition would be not time, but capabilities. Pause until you can do X, Y, Z. And if I'm right and you cannot, it's impossible, then it becomes a permanent ban. But if you're right, and it's possible, so as soon as you have those safety capabilities, go ahead. Right. Is there any actual explicit capabilities that you can put on paper, that we as a human civilization could put on paper? Is it possible to make it explicit like that versus kind of a vague notion of just like you said, it's very vague. We want AI systems to do good and want them to be safe. Those are very vague notions. Is there more formal notions?
AI Safety
1:29:59
Can you help me understand, what is the hopeful path here for you solution wise out of this? It sounds like you're saying AI systems in the end are unverifiable, unpredictable. As the book says, unexplainable, uncontrollable. That's the big one.
Current AI
1:39:43
Yeah, it's definitely not like 1 or 0%. Yeah. What are your thoughts, by the way, about current systems, where they stand? GPT-4.0, Claude 2, Grok, Gemini. On the path to super intelligence, to agent-like super intelligence, where are we? I think they're all about the same. Obviously there are nuanced differences, but in terms of capability, I don't see a huge difference between them. As I said, in my opinion, across all possible tasks, they exceed performance of an average person. I think they're starting to be better than an average masters student at my university, but they still have very big limitations. If the next model is as improved as GPT-4 versus GPT-3, we may see something very, very, very capable.
Simulation
1:45:05
What's the probability that we live in a simulation? I know never to say 100%, but pretty close to that.
Aliens
1:52:24
There is a lot of real estate out there. It would be surprising if it was all for nothing, if it was empty. And the moment there is advanced enough biological civilization, kind of self-starting civilization, it probably starts sending out Von Neumann probes everywhere. And so for every biological one, there are going to be trillions of robot-populated planets, which probably do more of the same. So it is this likely statistically So the fact that we haven't seen them... one answer is we're in a simulation. It would be hard to simulate or it'd be not interesting to simulate all those other intelligences. It's better for the narrative.
Human mind
1:53:57
Some humans. Humans on the whole. And we would like to preserve the flame of human consciousness. What do you think makes humans special, that we would like to preserve them? Are we just being selfish or is there something special about humans?
Neuralink
2:00:17
Incredible technology in a narrow sense to help with disabled. Just amazing, support it 100%. For long-term hybrid models, both parts need to contribute something to the overall system. Right now we are still more capable in many ways. So having this connection to AI would be incredible, would make me superhuman in many ways. After a while, if I'm no longer smarter, more creative, really don't contribute much, the system finds me as a biological bottleneck. And either explicitly or implicitly, I'm removed from any participation in the system. So it's like the appendix. By the way, the appendix is still around. So even if it's... you said bottleneck. I don't know if we've become a bottleneck. We just might not have much use. That's a different thing than a bottleneck
Hope for the future
2:09:23
I could be wrong. I've been wrong before. If you look 100 years from now and you're immortal and you look back, and it turns out this whole conversation, you said a lot of things that were very wrong, now looking 100 years back, what would be the explanation? What happened in those a hundred years that made you wrong, that made the words you said today wrong?
Meaning of life
2:13:18
Oh man, humans. What do you think is the meaning of this whole thing? We've been talking about humans and not humans not dying, but why are we here? It's a simulation. We're being tested. The test is will you be dumb enough to create super-intelligence and release it?