55 minutes 1 seconds
Speaker 1
00:00:08 - 00:00:28
So, you've dedicated your life to this extraordinary project of trying to organize electrons into patterns in which they can think useful thoughts. I mean, if your younger self could have a quick visit and an update from you today, how surprised would he be at what had actually happened?
Speaker 2
00:00:28 - 00:00:56
So, I think I'd be, You know, when you're young, you're extremely enthusiastic, and you don't realize how complex things are, right? And so you're a little naive about how easy it would be to build intelligent machines or discover new principles. And that's what makes you fearless. And in a way, that's why, you know, when you're young, you're a little more creative, right? Because you're not scared by the complexity of what you're imagining.
Speaker 2
00:00:56 - 00:01:14
And so very often when you're young, you know, you think you have an idea that, you know, is very different from what everybody else is doing, and then you push it, and then you realize someone else thought about it 20 years ago and couldn't make it work. Or you realize maybe people thought about it and couldn't make it work, but maybe it's still a good idea and you should push, and that's basically what happened to me. CAO MARTIN CHIENG
Speaker 1
00:01:14 - 00:01:18
Maybe if you'd seen how hard it actually was, you'd have thought, I don't know, dentistry seems pretty appealing.
Speaker 2
00:01:19 - 00:01:19
Yeah.
Speaker 1
00:01:21 - 00:01:33
A couple years ago, you won this ACM Turing Prize with Geoffrey Hinton and Yoshua Bengiel. That's the most prestigious prize in computer science. What was that awarded for?
Speaker 2
00:01:34 - 00:02:18
It was awarded to us for basically having promoted and also developed some of the early algorithms for what we now call deep learning. And so deep learning is this idea that you can train a machine end-to-end to do a particular task. So machine learning has been around for a long time, but sort of deep learning is sort of an extreme form of it if you want. And it's based on the idea that a learning machine can be based on a large network, can be built as a large network of very simple elements, which are somewhat analogous to the neurons in the brain, but not really kind of the same thing. It's like they're as analogous to the neurons in the brain as the wings of an airplane are to the wings of a bird.
Speaker 2
00:02:18 - 00:02:41
So it's not that similar. And that idea is very old. The roots of it go back to the 1940s, and there was kind of a wave of interest for this in the 50s and 60s, and it died off. And then it came back to the fore in the mid 80s, late 80s. And Jeff had been working on it for a while.
Speaker 2
00:02:41 - 00:03:15
That's when I started my career and Joshua as well. And there was a wave of interest that kind of died off again in the mid-90s. But the 3 of us knew this was a good set of techniques and 1 day the community will kind of get interested in it again. And so we basically started a conspiracy, you can think of it this way, or at least a deliberate attempt, demonstrating that those methods worked well and that succeeded beyond a wildest dream. So it sort of started a new wave of interest in those methods and basically started a new industry.
Speaker 2
00:03:15 - 00:03:15
CAWTHONS
Speaker 1
00:03:15 - 00:03:45
So that almost, as a sort of first order of approximation, almost like a history of AI, is that it was interesting for a long time, it kind of didn't get very far, and then suddenly, about 10 years ago, certainly from a commercial point of view, it suddenly exploded, and the reason it exploded was because of deep learning, which you helped create. So talk to us about it. I mean, you're credited with creating these things called convolutional neural networks. What is that?
Speaker 2
00:03:45 - 00:04:14
Right, So a convolutional neural network is a particular what we call architecture of a neural net. So a neural net, as I said, is a network of simple elements, neuron-like. And a convolutional net is a particular way of connecting those neurons with each other so that the architecture is particularly well-suited to deal with input data that comes to the system in the form of an array of numbers. So an image is basically an array of numbers. Speech signal can be represented as an array of numbers as well.
Speaker 2
00:04:15 - 00:04:38
And so there's a lot of signals, natural signals, and less natural ones that you can represent this way. And convolutional nets are well suited to this kind of signals. Now, their architecture is inspired by what we know about the architecture of the visual cortex. So there's a lot of inspiration from neuroscience in this. And it's called convolutional neural nets because it's based on a mathematical operation called convolution.
Speaker 2
00:04:40 - 00:04:54
And it's 1 of those multi-layer neural net that automatically learns representations of, say, images that are hierarchical. So these sort of representations are more and more abstract as you go up in the layers.
Speaker 1
00:04:54 - 00:05:12
Right, so that hierarchy is a fundamental part of the structure. At 1 point you're seeing a colored pixel and then you see that that's part of a little shape, and then that shape is part of a more complex object. How does that relate to this idea of backpropagation, which seems to be fundamental to getting deep learning to work?
Speaker 2
00:05:12 - 00:05:35
Yeah, so really, 1 of the things that Jeff, and to some extent I, are famous for is backpropagation. And backpropagation is a way... So I have to explain how machine learning works. So there is a very simple form of machine learning called supervised learning. And the way it works is that you want to train a machine to distinguish images of cars from images of airplanes.
Speaker 2
00:05:35 - 00:05:57
You show an image of a car. You run through the neural net, wait for the output to come out. If the output is car, you basically don't do anything. If the output is not car, then you adjust parameters inside of the neural net so that the output gets closer to the output you want, to car, right? And those parameters are the strength of the connections between the neurons.
Speaker 2
00:05:57 - 00:06:06
So each connection between every neuron, there could be billions of them, has sort of a strength, adjustable positive or negative number that you can adjust. CA1 And this is the
Speaker 1
00:06:06 - 00:06:26
model where the AI is being trained on human-tagged images. So here is an image, this is a car, this is not a car. And so the net is going, OK, I was successful, this is a car. And when it's successful, it strengthens the connections that took it to that point? Is that how to think about it?
Speaker 2
00:06:26 - 00:06:47
That's right, strengthened or not. But it figures out good configurations of all the connections that will produce the correct answer whenever you show it 1 of the training samples it's been trained on. The magic of this is, what is it going to do when you show it an image it's never seen before? Another image of a car it's never seen, for example, Is it going to produce car? That's called the generalization ability.
Speaker 2
00:06:48 - 00:07:20
So here is a problem that back propagation solves. What you have to figure out is in which direction and by how much you have to change a particular weight among the billions that there are in the network so that the output gets closer to the 1 you want. And you can do this by basically computing the equivalent of a derivative. So the function that the network computes produces a number on the output or a series of numbers. And you have to figure out, you know, if I change this weight a little bit this direction, is this number going to go up or down?
Speaker 2
00:07:20 - 00:07:37
And that's a derivative, right? Now, you have 1 of those derivatives for every single 1 of the weights in the network, so you might have a giant list of a billion derivatives, and that's called a gradient. Backpropagation is a way of computing this gradient very efficiently by essentially propagating signals backwards inside of the neural net.
Speaker 1
00:07:38 - 00:08:00
Is there an analog to how brains work there? Because we've heard from neuroscientists that you can think of the brain as a prediction machine where information flows up this hierarchy, but for the brain to interpret it, it's constantly back-propagating down the hierarchy to a certain set of expectations. Is there something similar going on in this deep learning process?
Speaker 2
00:08:00 - 00:08:30
Okay, so there's a terrible confession that all neuroscientists have to make, which is that we actually have no idea what learning algorithm the brain uses. I mean, we have some idea, of course, of we can see the effect of it. We know that learning affects the efficacy of the synapses, which are the connections between neurons. And there are certain rules that we know those kind of synaptic modification obeys. So for example, there's something called spike timing dependent plasticity.
Speaker 2
00:08:31 - 00:08:59
And it means that if a neuron that connects to another 1, it's very often active. And then the second neuron also becomes active as a consequence of the first 1. Then the connections strengthen. But that's probably just a side effect of something much more complicated that we don't understand. And so in my mind, the question is, is the brain doing something similar to a learning machine, which means optimizing some sort of error between the output it produces and the output it wants to produce?
Speaker 2
00:08:59 - 00:09:24
And then the question is, where is this output that it wants to produce? Where does it come from? So that's the first question. And then if it does do that, does it do it by what we call a gradient-based algorithm, which means a method that will evaluate those gradient derivatives that I was talking about earlier. And what is pretty clear is that if the brain does something like this, it's not using straight back propagation.
Speaker 2
00:09:24 - 00:09:33
It's using something a little different, which may have the same effect in the end, but it's not entirely clear what it is. And so there's a lot of hypotheses about this, but no real answer.
Speaker 1
00:09:33 - 00:09:55
CA INCHEKUANGIWUI-WU Interesting. So often, the picture people have is that we're trying to reverse engineer the brain, but actually, that may be a harder job than just to try out different forms of building AI through the human way or the technical way, rather than trying to copy what is still deeply mysterious?
Speaker 2
00:09:55 - 00:10:20
Yeah, I think the sort of bird analogy is very good there, because you can try to build airplanes by copying the birds, but then the birds have a lot of details to them, like feathers and muscles and things like this that may be irrelevant to building airplanes, in fact, are irrelevant to building airplanes. And they have the advantage of having very fast control with brain and vision and stuff like that, which you cannot actually reproduce with machines, certainly not 19th century technology.
Speaker 1
00:10:20 - 00:10:27
Which makes them. It'd be pretty awesome to sit in the plane and look out the window and see the wings flapping like this, that would be kind of incredible.
Speaker 2
00:10:27 - 00:10:49
It's incredible, and you know, birds explore all kinds of properties of aerodynamics. And so the underlying mechanism of flight, whether it's the flight of airplanes or birds, is aerodynamics. And so the question is, we need to understand aerodynamics to build airplanes. Even though they were inspired by Bird initially, they're very different. But the underlying principle is the same.
Speaker 2
00:10:49 - 00:11:04
You generate lift by pushing yourself through the air. What are the underlying principles behind intelligence and learning? What is the equivalent of aerodynamics, if you want, for intelligence and learning? I mean, that's the quest of my entire life, OK? And we don't really have the answer yet.
Speaker 1
00:11:04 - 00:11:29
CAWTHONS So you described a form of machine learning, supervised machine learning, which is, I guess, the form that's most understood by the lay public of these sort of masses of labeled images or videos that a computer gradually gets to recognize. But the real power of machine learning can go far beyond that to something that I guess you call self-supervised learning. Describe that.
Speaker 2
00:11:29 - 00:11:44
OK, well, There are really 3 forms of learning, or paradigms of learning, I should say, that people use today. 1 is supervised learning, which we just talked about. In supervised learning, you tell the machine what the correct answer is, right? And it adjusts itself to get closer to that. In reinforcement learning, you don't tell the machine what the correct answer is.
Speaker 2
00:11:45 - 00:12:00
You only tell it whether the answer it produced was good or bad. And so if there is lots and lots of possible answers, it's much less efficient. The system has to try many things before it figures out how to produce the right answer. So it's much less efficient. It's very successful in games.
Speaker 2
00:12:01 - 00:12:28
So if you want to train a machine to play Go, play chess, play a game, something like this, you have several copies of this machine play against itself. And by this reinforcement system, it basically improves itself. And it doesn't need to be fed the correct answer by humans. So it's successful in a number of those situations. There's very few applications in real life where it's useful because it's so inefficient in learning.
Speaker 2
00:12:28 - 00:12:42
But there are a few. And then the third form is self-supervised learning. And there are several names for it. People in the past used to call this unsupervised learning. And I don't like that name because it's a loaded term and it doesn't really reflect what's going on.
Speaker 2
00:12:42 - 00:13:10
So the idea of self-supervised learning is the type of learning that we think we're observing in young animals and baby humans. It's the ability to learn how the world works by observation. Babies have very little ability to interact with the world. They can't act on it, But they observe all the time. And just by observation, in just a space of a few months, they can distinguish between animate and inanimate object.
Speaker 2
00:13:10 - 00:13:30
They figure that if there is an object that's hidden by another 1, it still exists. That's object permanence. And they figure that if an object has certain shape, it's not going to stay stable. If you put it on a table, it's going to fall. And so that's how you can tell whether a baby has learned a particular concept when this concept is violated by reality.
Speaker 2
00:13:31 - 00:14:02
So we learn models of the world. And those models of the world allow us to represent the world. And that allows us to now learn any particular task very quickly. Because by representing the visual world, for example, we have a good idea of what objects are, that objects are in front of backgrounds, that the world is three-dimensional, that there are objects that move, and there are animals with 4 legs and blah, blah, blah. And so now I show you An example of an elephant, and you know what an elephant is.
Speaker 2
00:14:02 - 00:14:04
I don't need to show you a million examples of elephants.
Speaker 1
00:14:05 - 00:14:39
CA So in terms of understanding the power of that, is this an example? So 1 obvious application for AI is self-driving cars. On the supervised learning model, you know, a car is driving, you're looking at lots of videos of, I don't know, children near a road or tree branches waving near a road. You know, just at the level of pixels, those may not be fundamentally different. The car can't really decide how to alter its behavior compared to that.
Speaker 1
00:14:39 - 00:14:57
But if you start to know, you know, branch, doesn't matter if you hit it. Child, oh, my goodness. And so the learning to actually learn what is an object and what category and that is just a that feels like a hugely steep curve to climb but that's what you've been working on?
Speaker 2
00:14:57 - 00:15:25
Yes, you know learning the concept of object and geometry, learning to represent the world, learning that certain objects behave in certain ways. So on the street, a car and a pedestrian will not behave the same way. And you can tell there's a car at a stop sign, and you're coming up to the car, and you know that the person is not looking at your car, you know that maybe there is danger. You're kind of slowing down, because you can tell. So there's a lot of those things.
Speaker 2
00:15:25 - 00:15:59
Your model of the world basically sort of allows you to do things, to learn things quickly and to do it safely. So if the current models of reinforcement learning, for example, that we could use to train cars to drive themselves are so inefficient that you would have to have a car drive itself for millions of hours, and even then it may not be very reliable. And it's because it's starting from 0. So the idea of self-supervised learning is that you don't want the system to start from 0. You want it to learn as much as possible about how the world works beforehand.
Speaker 2
00:16:00 - 00:16:21
If you don't have a model of the world and you're learning to drive, and you're driving a car right next to a cliff, you have no idea that by turning the wheel to the right, the car is going to run off the cliff and you're going to get killed. Whereas if you do have a model of the world, and you don't need to have a very sophisticated 1, You know that's going to happen, so you don't even try it. You know it's going to be a bad outcome. CAWTHON-WILSON, JR.
Speaker 1
00:16:21 - 00:16:32
So how much progress are we making on that? To what extent are there computer programs now that have some kind of compelling model of the world that they can operate and navigate through?
Speaker 2
00:16:32 - 00:17:06
Okay, there's 2 reasons for self-supervised learning. 1 is learning models of the world that are predictive, so you can use them for planning in self-driving cars, robotics, et cetera, basically to allow a machine to predict in advance the consequences of its actions and so they can plan to reach a particular goal. That's 1 utilization. But the other 1 is just learning to represent the world. Basically, learning as much as you can about how the world works so that this knowledge can be used for learning a particular task subsequently.
Speaker 2
00:17:07 - 00:17:30
And those are kind of 2 very similar things that self-supervised learning might help us solve. There's been a lot of success in the second 1, so learning representations. And There's been a lot of success in the second 1, so learning representations. And there's been a lot of success in prediction, but only for text, basically, not for things like video or images or the real world. There's been some progress, but It's not there yet.
Speaker 2
00:17:30 - 00:17:31
Okay, so the- So can
Speaker 1
00:17:31 - 00:17:37
you give us an example of that for text? You're talking about here basically for written language translation, for example.
Speaker 2
00:17:37 - 00:18:13
That's right. So the way the best natural language understanding systems are trained today, natural language understanding, translation, anything that deals with text, the best systems today are trained in self-supervised mode or at least pre-trained in self-supervised mode. And the way they're trained is that you take a segment of a sequence of words from a text, from a large text corpus, and You remove some of the words, or you substitute them by others. And then you train a very large neural net, which may have billions of parameters, to predict the missing words or the words that have been changed. Tell me which word should be here.
Speaker 2
00:18:15 - 00:18:46
And just by doing this, the system has to learn a good representation of words that will allow it to actually solve that problem. So it needs to learn that when a sentence talks about a pet chasing another animal, It can be a cat chasing a mouse, or it can be something like... Or it could be a lion chasing an antelope in the savannah, if it's not in a house, depending on the context. And so, you know, it learns basically the structure of the world just by, you know, figuring out how to fill in the blanks in text.
Speaker 1
00:18:46 - 00:18:47
So that's...
Speaker 2
00:18:47 - 00:18:59
And the amazing thing is that those systems seem to acquire a surprisingly large amount of knowledge about how the world works without actually having any connection with any reality other than text representations of it.
Speaker 1
00:18:59 - 00:19:32
CA OK, so That is amazing. Let me play that back and see if I've got that right. So someone might think that to teach a computer how to understand language, you have to focus on the rules of grammar and all the rest of it. What you're saying is, no, you take a bunch of text, you just delete words from it, what, pretty much at random, almost, and let a computer loose at it and basically just say, all you have to do is to say, you know, try a bunch of things, have you been successful in identifying the missing words? And that can all be done by a computer.
Speaker 1
00:19:32 - 00:19:49
So a computer can run millions of different attempts, algorithmic attempts, at identifying the words. You have no idea what it's doing. But at the end of it, when it starts to find the missing words, Amazingly, along with that comes a kind of an understanding of language?
Speaker 2
00:19:49 - 00:20:13
Yes, so you collect an enormous amount of text data, you know, probably text with billions of words. You take a segment of about a thousand words, you remove maybe 20 percent of them, maybe a couple hundred words. And then you run this segment through this giant neural net that may have billions of weights in it. And it's got some sort of memory built in. Those are called transformer architectures.
Speaker 2
00:20:13 - 00:20:31
They appeared about 2 and 1 1 half years ago. And then you train the system to predict the missing words, right, or the substituted words. Now, the system cannot do a perfect job at it. It can only predict, you know, I think the word that should be here is probably a pet of some kind. I don't know if it's a cat or a dog, but it's something like that, right?
Speaker 2
00:20:31 - 00:21:05
So it gives you a big list of numbers, which are basically scores for each possible words in the dictionary. So a long list of 100, 000 numbers, basically, tell you for each word in the dictionary, here is how likely it is that the word appears here. And then what happens is the system learns to represent text by doing this. And then what you can do is when you're faced with a real problem like understanding text or wanting to translate it, You give a sequence of words to the system. You run through a subset of the layers of that network.
Speaker 2
00:21:05 - 00:21:20
You cut at some point pretty close to the output. And that's a representation of the meaning of the input sentence. And then using this representation, you train something to predict whatever it is that you want to predict. Is this... What topic is this talking about?
Speaker 2
00:21:20 - 00:21:35
Is this a news article? Is this positive or negative? Is this hate speech or not? Things like that. And the beauty of it is that now you can do this multilingually, so you can train those systems to represent language, regardless of which language it is.
Speaker 2
00:21:35 - 00:21:41
And it basically produces a representation of the meaning independently of language. That is super important and is completely revolutionary.
Speaker 1
00:21:41 - 00:21:56
CAWTHONSEN And that program that results from that, the software that actually results from it. It's not like a human engineer can go in and look at the code line by line and say, oh, I get what it's doing there. The code itself is kind of impenetrable to us, right?
Speaker 2
00:21:56 - 00:22:17
Yeah, because this program is actually very simple. In a lot of those neural nets, like a typical convolutional net, the program could fit in basically a few lines of code. Basically, a program says, take a bunch of inputs, compute a weighted sum of those inputs where the weights are those parameters that are learned, and then compare this to a threshold. If it's above a threshold, turn on the output of the neuron. If it's not, turn it off.
Speaker 2
00:22:17 - 00:22:45
And you do this, and you have a big loop that does this over millions of neurons, billions of connections, and with a layer successively. Those transformer models I was talking about for text are a little more complicated, but it's basically the same principle. So the program is very simple. The knowledge of the program resides in the value of those weights, but that's not part of the program. That's part of data, if you want.
Speaker 1
00:22:45 - 00:23:05
Right. So those few lines of code can generate an algorithm that may be, what, thousands, millions of lines of code, effectively, that work. And you can show that it works, but it's basically impenetrable to human structure. You understand it on a line-by-line basis as to exactly how that machine learning algorithm is working.
Speaker 2
00:23:05 - 00:23:16
Yeah, I mean, once we build, you know, reliable self-driving cars, the way they do it would be as impenetrable as the brain of your taxi driver.
Speaker 1
00:23:16 - 00:23:19
Right. We don't understand either, but they work.
Speaker 2
00:23:20 - 00:23:20
Right.
Speaker 1
00:23:21 - 00:23:28
Talk to me, Jan, about how is Facebook using AI right now? What are some of the main ways it's using it?
Speaker 2
00:23:28 - 00:24:07
Well, so AI, machine learning, deep learning, more generally, has become an essential part of the functioning of Facebook. It's true also of Google and various other companies. You take deep learning out of Facebook now, and it basically crumbles. It's going to build around it. Everything from ranking, newsfeed ranking, for example, ad ranking, content filtering, which is an enormously important topic, recommendation on Instagram, for example, translation, of course, and subtitling of videos.
Speaker 2
00:24:07 - 00:24:11
All of this is AI. All of this uses deep learning in some form.
Speaker 1
00:24:12 - 00:24:30
And how do you feel about the effectiveness of that? There's obviously, Facebook has been, you know, criticized for how some content is handled. Do you think that AI has played a role in that? Could it play a much bigger role in solving some of those issues?
Speaker 2
00:24:31 - 00:25:00
Right. So, I mean, AI is playing a very important role there for the better. So without AI there would not be any possibility of, for example, hate speech filtering, you know, detecting harassment, child exploitation, terrorist propaganda. You know, I mean There's all kinds of really bad things that people want to post on Facebook, and you have to take them down, not post-hoc, after they've been posted and thousands of people have seen it and flagged it. You'd like to take it down beforehand.
Speaker 1
00:25:00 - 00:25:11
Give us a sense of the scale there. So several billion users posting... I mean, how many words a day are posted on Facebook, for example?
Speaker 2
00:25:11 - 00:25:15
I would have to look it up. It's an absolutely gigantic number. I mean, just the
Speaker 1
00:25:15 - 00:25:16
number of
Speaker 2
00:25:16 - 00:25:42
photos posted on Facebook, just Facebook, the blue site. I'm not talking about Instagram, which is enormous, but just Facebook photos. And every single 1 of those photos go through convolutional neural nets that recognize the content of the photo and then use it to decide whether to show it to your friends. Maybe you have a friend, it's a picture of your sailboat, and your friend is interested in sailing, so that photo is going to be shown to that friend. But maybe it's a picture of your cat, and your friend has no interest in cats.
Speaker 2
00:25:42 - 00:25:43
CAO SCHAFFERNACKER.
Speaker 1
00:25:43 - 00:26:22
A lot of people's, I guess, naive or initial model is you write something on Facebook and then it goes to people in your feed or something like that. But what's actually happening is that everything is being processed as soon as it's submitted to this vast AI system that is looking for a bunch of things. It's looking for hate speech, it's looking for child abuse, a number of other sort of things that are perceived to be really harmful. And then presumably it's also looking for, is this likely to be of interest to a lot of people? Is it going to be sticky?
Speaker 1
00:26:22 - 00:26:26
Is it going to hold people's attention and addict people to Facebook?
Speaker 2
00:26:26 - 00:26:26
That's right.
Speaker 1
00:26:26 - 00:26:27
It's looking for that as well, right?
Speaker 2
00:26:27 - 00:26:53
It's also looking, is it from a reliable source? Is it from a fake account? Is it from... So there's all kinds of things that are signals, if you want, that I use to decide whether a piece of content will be distributed or not, or will be distributed widely or not, and to whom, because you have to match this with people's interests. That's what makes the value of the service, is that you show people what they are likely to be interested in.
Speaker 2
00:26:53 - 00:26:54
CA1 And there's a bunch
Speaker 1
00:26:54 - 00:27:03
of content where the algorithm itself can't decide what to make of it, and so it will push it out to a group of human testers or assessors, for example.
Speaker 2
00:27:03 - 00:27:24
CA2 Yeah, that's right. So there is a large number of human moderators that are there to make decisions where it's too difficult for AI to do. I mean, AI technology is not where we want it to be. It's very far from perfect. It will never be perfect in a way, because, I mean, certainly to take down adversarial content is a cat and mouse game, so that will never be over.
Speaker 2
00:27:24 - 00:27:40
But to do things like, you know, what is hate speech, right? How do you define hate speech? That's, It's not a technological question. It's more of a product design, ethics, you know, it's a very, very difficult issue. CALMINGTON
Speaker 1
00:27:40 - 00:28:01
So I'm interested in thinking, talking for a bit of ways in which AI can go wrong. I mean, you've been quite eloquent on arguing the familiar apocalyptic scenarios where AI turns bad and turns on humanity are really unlikely to happen. Give us that argument briefly.
Speaker 2
00:28:01 - 00:28:33
Right. So there is a bit of the fantasy somehow that, you know, which we are conditioned into by science fiction, by Hollywood really, of, you know, AI taking over the world, robots, you know, killing humans and things like this. You know, And it's a projection of human nature onto robots, which really is not appropriate. So we have this idea somehow that because an entity is intelligent, it will want to take over the world. It will want to make decisions.
Speaker 2
00:28:34 - 00:28:52
It will be curious. It will want freedom. Those are human characteristics. And they're human characteristics that have been built into us by evolution. And there's absolutely no reason that intelligent machines will have those, unless we explicitly built those desires into them.
Speaker 2
00:28:54 - 00:29:18
So it's not because, even within the context of humanity, it's not because a person has superior intelligence that that person wants to control everybody else. In fact, that's kind of, in my experience as a scientist, it's kind of the opposite. The most intelligent people are people who are just interested in doing science. They don't want to have anything to do with anybody else. So that's not correlated with intelligence.
Speaker 2
00:29:19 - 00:29:37
It's, you know, I can use myself saying sometimes that the 1 among us who want to become the leaders are not necessarily the smartest, and we have very good examples in the international political scene at the moment. CAO SCHMEINFELDER
Speaker 1
00:29:38 - 00:30:21
But isn't there a risk, though, that if you grant something, if you create something of immense power, even if it has no evolutionary instinct to be evil to humans, there's just huge room for unintended consequences. I mean, the famous 1 is the paperclip factory. You optimize an AI to make as many paperclips as efficiently as possible, and it decides it's going to turn the world into paperclips, which seems far-fetched. But a lot of people would say, let's take Facebook now, take the current moment, where an AI, at a first-level approximation, it seems like the goal is attention. AI's business model is dependent on harvesting as much attention as possible.
Speaker 1
00:30:21 - 00:30:30
So you use all the tricks of AI to figure out which are the posts that will attract most clicks, most likes, most viewing, will hold people to the screen.
Speaker 2
00:30:31 - 00:30:32
Not anymore, actually.
Speaker 1
00:30:32 - 00:30:55
You discover a year later that outrage is what works, that clustering people into sort of isolated self-groups where they can reinforce each other's anger and outrage is an incredibly compelling attention-garnering machine, and it's causing damage. No 1 intended it, but it's causing damage. Isn't that a fair critique?
Speaker 2
00:30:55 - 00:31:18
Okay, so there is a very important question, which is a real question, and It's a problem of value alignment. So there's 2 kinds of intelligent systems. There are systems that are just built to do a particular task, like drive your car, or I don't know, recognize what's in an image. And those are not autonomous AI systems. They're systems that are trained just for 1 task, and there's no ambiguity as to what they need to do.
Speaker 2
00:31:18 - 00:31:47
And then there is intelligent systems that have some autonomy. So basically, they're designed to optimize a particular objective, but the way they optimize it is not determined a priori. So the machine can learn to satisfy that objective in basically any way at its disposal. And if you do not design this objective properly, the machine will find loopholes in it. So it will find ways to optimize the objective without actually solving the problem you're interested in.
Speaker 2
00:31:47 - 00:32:32
Or it will actually satisfy the objective, but then you will quickly realize that the objective is not the 1 that you actually wanted, that the real objective, you need to put guardrails to prevent the system from going haywire, even though it actually solves the problem. So this is a real problem, which is called objective or value function design and value alignment, which is how you align the objective of your system with the goal that you want your system to fulfill. So let's take a very immediate example, which is Facebook, let's say. Indeed, you could design a system like Facebook to just maximize how much time people spend on it. And in fact, that's probably what Facebook was doing until a few years ago.
Speaker 2
00:32:32 - 00:32:51
And you create an incentive for, for example, clickbait companies to just post clickbait so that they attract people and make money from clicks, right? So, you know, Facebook realized this at some point and then changed the criterion, changed the objective. Talk about those changes.
Speaker 1
00:32:53 - 00:32:55
How were those criteria changed?
Speaker 2
00:32:55 - 00:33:23
So at the end of 2017, early 2018, for example, there was a big change. I mean, there was this continuous change, but there was a big change that was announced on Facebook, which is to stop basically trying to maximize the time people spend on Facebook, mostly kind of passively consuming content, and then put more emphasis on sort of meaningful interactions. So content that is mostly posted by your friends and that you interact with. You post comments. You like.
Speaker 2
00:33:24 - 00:33:43
You kind of exchange. You have kind of a meaningful exchange. Because the research showed that people end up being more satisfied with their interaction. They spend less time on Facebook, but they are more satisfied by it. Whereas when they just consume passively, it sort of satisfies them on the moment, but then they have the idea of having wasted their time.
Speaker 2
00:33:43 - 00:34:03
So that was completely changed. It took a while. The decision was made to change this a while back, but it took until about the end of 2017 for this to be implemented. There were a number of other changes that were made, for example, to prevent the economic incentive of clickbait. So the way news is shared now is that publishers cannot promote their own articles.
Speaker 2
00:34:05 - 00:34:31
You see a piece of news like an article from the New York Times or something because 1 of your friends shared it or someone you trust kind of shared it. And so that's more organic and the result is that it basically kills the whole idea of clickbait. So it's evolving continuously and a lot of the idea that people have a Facebook is the idea from maybe 2014 or something that is not any more relevant, it's completely different now.
Speaker 1
00:34:31 - 00:35:07
Aren't there still circumstances where there's at least the theoretical possibility of a conflict between the commercial goals of the company and what an AI that was programmed with purely human well-being in mind would do? Have you run into any of those conflicts? I'm almost curious how your own goals are defined in this AI role. Are you ever bumping into the fact, I would like to do this, but it's not being implemented because there is a commercial reason why it should not be.
Speaker 2
00:35:07 - 00:35:39
I mean there's a lot of things that, you know, are decided by Facebook not to, that Facebook decides not to do because of, you know, negative consequences, you know, on society for example. I mean there are organizations at Facebook that are entirely devoted to basically envisioning societal consequences of technology in general, AI in particular. And so, I mean, this is certainly a big question, but it's something that appears relatively recently. So those organizations are being built. I mean, there are similar organizations at Google and Microsoft.
Speaker 2
00:35:40 - 00:36:07
So yeah, I mean, those questions, people who are deploying AI and designing products are asking themselves those questions all the time. I'm a little bit outside of my comfort zone here in the sense that I do fundamental research in AI. So those questions are a little more distant, but they are, of course, intellectually and philosophically very interesting, and we are sort of concerned by it, because we feed technology into this.
Speaker 1
00:36:07 - 00:36:32
CA MW2I I've met many of the people at Facebook who are seeking to tackle misinformation and so forth, And it's clear that there is a huge effort underway. But the company remains controversial. I mean, as someone focused on AI, what are you working on that you're most excited about that will actually make Facebook and I guess the Internet a better place?
Speaker 2
00:36:32 - 00:36:53
Right. So there's a big question, a big scientific question. You know, I was talking about what's the equivalent of aerodynamics for intelligence, right? What are the principles underlying intelligence and learning? And The reason for looking for this would be to build machines that can learn a bit like animals and humans, acquire knowledge about the world by observation.
Speaker 2
00:36:54 - 00:37:28
And perhaps this is the basis of what we call common sense, the fact that if I tell you that there's a glass on the table next to me, and the glass just flipped. You know what's underneath is wet, most likely. And because you know how the world works, you have this kind of intuitive physics model of the world. You have an intuitive model of me also, right? You can't exactly tell whether I'm going to move my hands in this particular way or move my head to the left or to the right or what exact word I'm going to say.
Speaker 2
00:37:28 - 00:38:04
But you know I'm not going to turn into a frog right now, even though I'm French. So there are things you know about how the world works that are the basis of common sense. And what I'd like to build or discover are the recipe or underlying techniques that would allow a machine to learn enough about the world so that some sort of common sense will emerge. And this will be the basis for completely new AI-based system services, things like intelligent virtual assistants that are not frustrating to talk to, that you can hold a conversation with, and they can help you in your daily lives. You can carry them with you everywhere you go if you want.
Speaker 2
00:38:05 - 00:38:32
And they can serve as your extra memory if you want, your exocortex if you want. So That's 1 application. There's, of course, tons of applications in virtual reality, augmented reality, and things of that type, and things like self-driving cars that are reliable, accelerating the progress of science. There is a huge amount of applications of AI in science. So science is progressing faster because of deep learning?
Speaker 2
00:38:32 - 00:38:35
Something not many people know.
Speaker 1
00:38:35 - 00:39:16
Let's come back to common sense a minute, because that strikes me as the project, if that could be solved. And in a way, if I understood how you were describing self-supervised machine learning, that the logical consequence, once you've really got that going, that the end game there is to develop this sort of common sense model of the world. I mean, how far are we along that journey? You know, I'm thinking about the movie Her, where the Scarlett Johansson person assistant, virtual personal assistant, says to the Joaquin Phoenix character, oh by the way, I read all your email, and wow, you're a good writer. I'm submitting your book for publication, or words to that effect.
Speaker 1
00:39:16 - 00:39:25
I mean, how far are we away from that kind of relationship with a personal assistant?
Speaker 2
00:39:25 - 00:39:49
Very far. That's the bad news. Okay, so we don't have... It's not just a question of, do we have the machines, do we have the technology, we don't even have the science and the mathematical principles to get machines to learn like humans and animals and to acquire common sense. So this will, once we figure this out, this may take 2 years, 5 years, 10 years, 20 years, 50 years, we don't actually know, really.
Speaker 2
00:39:49 - 00:40:14
I have a good hope that this may happen in the next 5 years. But who knows? AI researchers have a long history of being overly optimistic about the progress. So it may be more difficult than we think. But if it does happen in the foreseeable future, then this will create a new revolution in AI where machines will be considerably more intelligent and have some level of common sense.
Speaker 2
00:40:14 - 00:40:56
Think about the level of common sense of a house cat. A house cat is way more intelligent than the most intelligent machines that we have. A house cat cannot talk, but when you think about the intuitive physics model of the world that a cat has, it's incredible, right? A cat can do all kinds of things, you know, jump and walk on, you know, there's all those videos on YouTube you can watch on cats walking on, you know, around very fragile things without touching any single 1 of them. So, you know, there is a form of physical intelligence in there that we're not able to reproduce with machines, probably because we don't have a good recipe for self-supervised learning and similar things.
Speaker 2
00:40:56 - 00:41:02
And a cat only has about 800 million neurons in the brain. It's, you know, a lot smaller than...
Speaker 1
00:41:02 - 00:41:09
So it's all in the architecture, it's all in the software. We've got plenty of hardware, we just need to organize it the right way.
Speaker 2
00:41:09 - 00:41:17
No, it's not just the hardware, it's not the details, there is an underlying principle that we don't understand that we need to discover. CAWTHONS
Speaker 1
00:41:17 - 00:42:03
But if your work has shown anything, it's shown that you can have nothing happen for a long, long time, but if you then get the architecture right, suddenly, boom, amazing things can happen, just from the basic structure of deep learning, you know, within a few years, it's been applied to every industry in the most spectacular ways. And so I guess, given how much computing power there is, given how cheap it's getting, once something like self-supervised learning really takes off, isn't it possible that we go from nothing happening here to, oh my God, that is breathtaking in a shockingly short period of time, that it's likely to be a curve that goes looks like that?
Speaker 2
00:42:03 - 00:42:26
Absolutely, but we need we need a conceptual breakthrough and we have no idea when that conceptual breakthrough will occur. And I may be wrong about it. It may be that this breakthrough is you know we have a brick wall in front of us, and we're trying to kind of pierce through it. And we don't know how many walls there are behind it. So maybe that first wall is not actually is not going to unlock all the things.
Speaker 2
00:42:26 - 00:42:58
But I'm pretty hopeful that it will. I mean, very often for a revolution like this to occur, a lot of different things have to happen at the same time. So neural nets have been around for a long time, since the 50s. Then multi-layer neural nets with back propagation have been around since the 80s. But then it required the internet, basically, so that we could collect lots of data and then GPUs, okay, so basically the gaming industry, so that we could have powerful, cheap computers that we could run neural nets on, large neural nets on.
Speaker 2
00:42:58 - 00:43:17
And then the combination of back propagation, a few new algorithms, big data sets, and GPUs is what basically started the new revolution of AI. You know, it's the same for aviation. People had the idea of airplanes for a very long time before it was technologically practical to actually build airplanes. CARTER-LONG, JR.:
Speaker 1
00:43:17 - 00:43:25
Are there any surprises about to be unveiled on Facebook that are built on AI that we could look forward to?
Speaker 2
00:43:26 - 00:43:52
Oh, wow, OK. I mean, for people who are very connected to the field, it's probably not very surprising. But yes, I mean, the degree to which natural language understanding, multilingual natural language understanding, translation is used at Facebook is gigantic. Image recognition, Video interpretation is used also on a very, very large scale. There's also something called similarity search.
Speaker 2
00:43:52 - 00:44:14
So you want to be able to tell if a photo is similar to another 1 or if a video is similar to another 1. It's very important because it's good for recommendation. For example, you're on Instagram. You're kind of browsing through pictures and you want to be shown pictures that are similar somehow. But also very important for integrity of the data as well.
Speaker 2
00:44:14 - 00:44:44
Someone posts a terrorist propaganda video. And then all kinds of groups can repost that video, but they modify it a little bit because they know that it's easy for an online service to detect an exact copy of a video. So they modify it a little bit. So now you have to basically detect similar versions of the same video that have been transformed so that they are somewhat dissimilar. And you have to do this for billions and billions of items that are posted on Facebook every day.
Speaker 2
00:44:44 - 00:44:47
So it's very, very challenging technologically.
Speaker 1
00:44:49 - 00:45:07
Does Facebook have any vision of a world where you can befriend anyone in the world and basically speak to them in real time, in whatever language, And Facebook will look after the translation layer. Is that coming soon?
Speaker 2
00:45:08 - 00:45:35
Yeah, that's a vision, which, you know, the technology for this is almost available. And, I mean, certainly Facebook has services through Portal, for example, which is kind of a physical video conferencing system, and Messenger and WhatsApp. And eventually, those systems will have real-time simultaneous translation. I don't think the translation is good enough now for it to be non-frustrating, but it's going to come pretty soon, I think. CAO SCHAFFERNACK
Speaker 1
00:45:35 - 00:46:06
Is it realistic that any time in the reasonably near future, the AI analysis of written content will be good enough to know when something is being posted that is just misleading, or weaponized, or obnoxious, or dangerous in some way, so that it's possible to scale the kind of warnings that I guess most people would want to see on a platform as powerful as Facebook?
Speaker 2
00:46:06 - 00:46:35
Yeah, well, so, you know, it's a very complicated question, both from the technical point of view and from the sort of policy or product design point of view, content policy. So content policy is a very difficult question. Like, should you suppress speech of a particular kind? How do you define, where do you put the boundary between speech that should be suppressed and speech that is fine? And At what point do you start basically practicing censorship?
Speaker 2
00:46:36 - 00:47:06
Is it the role of Facebook to be the arbiter of truth, for example? Facebook so far has been extremely careful not to actually get into the slippery slope of establishing itself as the arbiter of truth. So misinformation that is directly dangerous to people certainly needs to be taken down. And that is done by Facebook. When there is public health issues, for example, sort of an anti-vax page that has a lot of followers, is dangerous to public health.
Speaker 2
00:47:06 - 00:47:23
So that is taken down. But then where do you put the limit? People tell all kinds of false stories on Facebook, and they should be able to do so. I mean, poetry is 1 of those, right? So it's more a question of policy, I think, than a question of technology, really.
Speaker 2
00:47:23 - 00:47:26
But technology certainly is not where we want it to be.
Speaker 1
00:47:26 - 00:48:10
The issue is often framed as whether what should be censored and what shouldn't, which is a polarizing way of framing it. Because the real question is not whether something should be censored, but whether it should be amplified. And the real question is how many people get to see that dangerous information. So you could picture a scenario where it would be a very extreme case where you'd actually censor something, but as a matter of routine, you would not amplify the stuff that even though it was attention-attracting, was dangerous or obnoxious or damaging to civil society. That, I think, is the kind of AI that people would love to see, is sort of, you know what, there could be a problem here, let's not amplify it.
Speaker 2
00:48:10 - 00:48:39
Right, so there are content policies that are clear where a piece of content that is Clearly hate speech is just taken down. And then there are other policies that are, you know, is this source of information reliable, and things like this. And the content will be promoted more or less depending on things like the reliability of the source, the character of the message being transmitted. So the question is whether hurtful messages are amplified or not. I personally don't think they're amplified.
Speaker 2
00:48:39 - 00:49:16
There's certainly no systems in Facebook that attempt to amplify controversial statements. But what I'm using is that everyone on Facebook, on both sides of the political spectrum or on an issue, are absolutely convinced that Facebook favors the other side. And It's true in both directions. And so because Facebook could have the power of shutting down your opponents, you might want to accuse Facebook of not helping you enough, right? But then It's a steeper slope.
Speaker 2
00:49:17 - 00:49:42
Is it a role that Facebook should play, knowing that Facebook has such a large footprint? Do you want a single large private American company to be the arbiter of truth? And my answer to this is no. Mark Zuckerberg's answer to this is no. And I think everybody's answer should be no, because you don't want a single large entity with a huge footprint to basically determine what is truth, in particular what is political truth.
Speaker 2
00:49:42 - 00:49:56
This would be the role of a highly diverse and independent press, and interest groups, advocacy groups, individuals, which is basically, you know, Facebook is basically a platform for them. So um... CAO LEE. But there might be, many
Speaker 1
00:49:56 - 00:50:37
people would argue that there might be a way for the company to, you know, use the power of crowd wisdom, perhaps, to more thoughtfully address issues where there is clear reason why you'd think that certain content might be damaging. Like I say, it's not about censorship, it's about trying to use a broad... Trying to tap into the power of crowd wisdom to dial stuff up or down. It's a much longer conversation, and I appreciate that it's a hard conversation to have, almost in your role and in general. So I'm gonna ask you 1 other question.
Speaker 2
00:50:37 - 00:50:41
Yeah, it's also a conversation for another person from Facebook, because I don't do policy at
Speaker 1
00:50:41 - 00:50:41
Facebook. You know
Speaker 2
00:50:41 - 00:50:43
what I'm telling you now is my personal opinion.
Speaker 1
00:50:43 - 00:50:43
If we could
Speaker 2
00:50:43 - 00:50:43
persuade that person
Speaker 1
00:50:43 - 00:50:46
to come on to TED, we'd have it in our pocket.
Speaker 2
00:50:46 - 00:51:01
You know, it's my personal opinions, from the inside, from knowing how this functions. And people have a lot of wrong ideas about Facebook's kind of motivations and how it operates. But, you know, it's a complicated story. Yeah.
Speaker 1
00:51:02 - 00:51:13
So let me ask this as a final question. Just tell us a story about the future, a possible future, that makes you feel hopeful, could make us feel hopeful.
Speaker 2
00:51:14 - 00:51:53
So, as I said, AI, deep learning, is used in science more and more, and it's going to accelerate the progress of science, which includes things like medicine, et cetera. So there's good systems now to predict the confirmation of proteins that are based on deep learning systems that can predict whether a protein is going to stick to another 1. Maybe that's going to be a way of designing monoclonal antibodies for things like COVID-19 or other diseases. Lots of new diagnosis systems based on AI. Medical imaging is a huge success for conventional meds in particular, but deep learning more generally.
Speaker 2
00:51:53 - 00:52:37
So this is starting to be deployed everywhere. Facebook actually has worked on a project in collaboration with NYU on accelerating the data collection for MRIs to collect data for, I don't know, knee replacement surgery or something like that. And so that makes the whole thing faster, cheaper, which means it's going to be more available. And so I think in medicine, in safety, so driving assistance for example, we may not have autonomous cars yet, but we do have what's called EABS, emergency automatic braking systems. And those are systems that are now built into most new cars that come out.
Speaker 2
00:52:37 - 00:52:55
So I think pretty much every car in Europe has to have those, even low-end cars. And those systems, it's basically accomplished on that, you know, with a camera that looks out the window and it's gonna break if a pedestrian happens to cross the street in front of you and you didn't pay attention. And those things reduce collisions by
Speaker 1
00:52:55 - 00:52:55
40%.
Speaker 2
00:52:57 - 00:53:24
It's a very large number. And so AI is going to save lives, right? And then there is, are we going to have robots to take care of everything in our house? Are you going to have virtual assistants of the type that you see in the movie Her, the Spike Jonze's movie her, et cetera, that can help you in your daily lives. Possibly, that may take a long time, but possibly better telepresence systems, things like that.
Speaker 2
00:53:24 - 00:53:53
People are just going to be more connected. People are going to have better access to information. Basically, we'll have AI assistants to kind of manage the enormous flow of information that we are bombarded with every day. In some ways, the, you know, Google search engine and Facebook's ranking system are basically of that type. They select information that those systems think are relevant to you, and they're trained, they're learning systems, right?
Speaker 2
00:53:53 - 00:53:56
Those systems are trained by your tastes, essentially.
Speaker 1
00:53:57 - 00:54:31
Mm. Well, it's always encouraging to hear that someone who's deeply in working in AI remains convinced that NetNet is going to have a positive impact on the world. Yeah, there's this idea, Jan, of the adjacent possible, where some inventions expand the possibility of what humanity can achieve. And it really seems to me like your work has made a massive increase in the size of the adjacent possible. You know, however the future turns out, What you've done has been very consequential.
Speaker 1
00:54:31 - 00:54:38
So thank you for what you're doing and for sharing so openly with us how you're thinking. Thank you for this conversation.
Speaker 2
00:54:38 - 00:54:59
Well, thank you so much. I'm a believer in technology. Technology can be used for good and bad. I don't see myself as having the legitimacy to decide for society whether a particular piece of technology should be used for 1 thing or another. I think it's the role of the democratic process to do this for us.
Speaker 1
00:54:59 - 00:54:59
Thank you. Thank you, Jan.
Omnivision Solutions Ltd