WEF Panel - The Expanding Universe of Generative Models
World Economic Forum Annual Meeting

WEF Panel - The Expanding Universe of Generative Models

In Davos, I sat down with five great AI innovators and entrepreneurs: Yann LeCun , Kai-Fu Lee , Daphne Koller , Andrew Ng , and Aidan Gomez . Here is a transcript of our conversation about scaling, progress, open source, and the quest for AGI.

You can watch the panel here, if you'd like.

Transcript:

Nick Thompson: Hello, everybody. Good afternoon. I'm Nicholas Thompson. I'm CEO of The Atlantic. I'm the moderator here. I was told this panel sold out in 30 seconds. I know that's not because of me, because I moderate a lot of panels, and that doesn't normally happen. I am extremely excited for this group. 

We have an incredible group here. In fact, I'm so excited that when I was walking here, I was looking at my questions getting ready, walking down the promenade, and I saw a group of people ahead of me. And I'm reading my questions, walking, and they're kind of slowing me down. So I'm doing the New Yorker thing, and I move in between them. And then I notice they all have machine guns. And so then suddenly, two guys with machine guns look at this little guy squirreling through them. It's John Kerry's entourage. 

But I was very eager to get here. I'm going to introduce our panelists in chronological order from their first major contribution to the field of AI and machine learning. What's amazing about this group of five is that they all have done incredible and different things in the past and are doing incredible and different things right now. 

So on my left, Yann LeCun, Turing Prize winner, inventor of the architecture of neural networks. He also runs AI at Meta. We have Kai-Fu Lee, machine learning for speech recognition. He now runs 01.AI, which is booming up the Hugging Face charts. We have Andrew Ng, he first learned to scale GPUs for AI of his many affiliations right now. He runs the AI Fund. We have Daphne Koller, using Bayesian models in AI for the first time, now runs Insitro. And we have Aidan Gomez, one of the people who came up with the transformer architecture. GPT - T is transformer. And he now runs Cohere. 

So an amazing panel, what they did, what they're doing. This should be great. First question. The thing I most want to understand from all of you, when I leave this room, I want to understand what the rate of change in AI will be in the years to come. I want to know whether the crazy innovations we've had the last two years will continue, will scaling laws continue, will they continue to go faster than Moore's Law, or are we approaching a plateau of some sort? Will things start to slow down? 

So I'm going to take that question, I'm going to give it to you. Kai-Fu Lee, tell me whether the rate of change will increase or slow down in the years to come, and why.

Kai-Fu Lee: I think it will slow down a little bit, but I think it will still be at incredible rates. If you look at just in the last two years, how much the quality of these models have improved. You know, two years ago, the MMLU, which is roughly a measure of intelligence, was in the 40s, 50s, now it's 90, and there's more room to grow. Obviously, there is growing by adding more compute and data, but there's also room for tuning and improving different aspects as more and more entrepreneurs and large companies get in the game. 

We are very late to the game, and we were able to make substantial improvements over existing models, not by going after scaling, but by doing, you know, different things in the tweaking of the data, the training, the infrastructure, and so on. So I'm optimistic.

NT: You're optimistic it will stay, both because you think the recent scaling laws will hold, and also because there'll be innovations? 

KFL: They will hold, but they will obviously slow down. There's a diminishing return to everything that one does, but it's definitely not at the plateau.

NT: Andrew, what do you think? 

Andrew Ng: So scaling gets harder and harder, but I feel like the pace will feel like it's still accelerating to most of us because of the number of innovations and our algorithmic evaluations. So some quick examples. We saw the text revolution happen last year, kind of. I think this year we'll see the image processing revolution take place. It's kind of here already with GPT-4V and Gemini Ultra, but really, computers will see much better. I'm seeing a lot of innovation in autonomous agents. Rather than prompt an LLM, it gives you a response. You can give an LLM an instruction, it'll go off and do work for you for half an hour, and browse web pages, and do a lot of research and come back. This is not totally working right now, but a lot of people are working on it to achieve another innovation. 

Edge AI. We're used to running LLMs in the cloud, but because of open source and other things, Yann, Meta's done a lot of good work on this and other things as well. I think we'll be running a lot more large language models on our own devices in the future. With all of these factors of innovation, I'm actually optimistic that it will feel like the future is continuing to accelerate forward.

NT: Well, let me ask you this variation on the question, which is, if you were able to double the amount of compute applied to large language models right now, will you double their power? Will you cube their power? What will happen as we increase the amount of compute?

Aidan Gomez: I mean, I think, I would say that I agree it's going to keep pace, or I would even go as far to say it's going to start to accelerate. There are huge bottlenecks to what we have today. We know the limitations of the architectures that we have, the methods that we're using, and I think that's going to get easier. At the same time, the hardware platforms are getting better and better. So the next generation of GPUs are going to be a big step over the generation we have today, and that unlocks new scale, much more expensive algorithms and methods to run.

So I'm very optimistic things are going to accelerate. In terms of the specific question of if you double the compute capacity, what sort of decisions do you make? I would say at the moment, we're not done with scaling. We still need to push up. All of us, pretty much all of us building large language models are, and so I would double my model size.

NT: I think everybody would double their model size if it was free. Well, Yann or Daphne, let me ask you. Part of the reason I'm curious about this is that if the relationship holds, and that the better the GPUs, the more compute you have, electricity you have, the better your models, then it means that power will consolidate with the small number of companies that have access to it. If there's more of a plateau, it makes it a more competitive market, right?

Daphne Koller: I'm actually going to turn your question in a slightly different direction. 

NT: Tweak it however you want. 

DK: Say that in your taxonomy of the enabling forces, you mentioned compute, you mentioned electricity, you did not mention data, and I think that has been probably the single biggest enabler to the incredible amount of progress that we've seen today. And I think we're only starting to scratch the surface of the data that are going to become available to models over time. So right now, for example, yes, we train on all the web scale data, and that's amazing, and that's incredible, but these agents are not yet embodied. They don't yet interact with the world. And so as we start to carry these things around with augmented reality, as we start to get more data from self-driving cars, as we start to tap into data modalities such as biology and healthcare and all sorts of data that are currently hidden in silos, I think those models will develop new levels of capability that they don't currently have today. And so I think that's going to be a major contributor to the expansion of capabilities of these models.

NT: That's so interesting, because I've had conversations in Davos where people have said, well, we're running out of data, right? There's not that much more on the web.

DK: True. 

NT: That's great. 

DK: Perhaps. 

NT: But self-driving cars, more data. Yann, where are you?

Yann LeCun: I totally agree with Daphne. SCertainly if we concentrate on the paradigm of LLM, autoregressive LLMs, it's accelerating. There's no question it's accelerating. Indeed, we're running out of data. We're basically using all the public data on the internet. Littlest LLMs are trained with, what, 10 trillion tokens? Okay, that's about two bytes per token, so it's about two 10 to the 13 bytes of training data. It would take most of us here between 150 and 200,000 years to read this, okay? Now, think about what a child sees through vision and try to put a number on how much information a four-year-old child has seen during his or her life. And it's about 20 megabytes per second going through the optical nerve for 16,000 wake hours in the first four years of life. And 3,600 seconds per hour, and you do the calculation, and that's 10 to the 15 bytes. So what that tells you is that a four-year-old child has seen 50 times more information than the biggest LLMs that we have. And a four-year-old child is way smarter than the biggest LLMs that we have. 

The amount of knowledge that's accumulated is apparently smaller because it's in a different form, but in fact, a four-year-old child has learned an enormous amount of knowledge about how the world works. And we can do this with LLMs today. And so we're missing some essential science and new architectures to take advantage of sensory input that future AI systems would be capable of taking advantage of. This will require a few scientific and technological breakthroughs, which may happen in the next year, three years, five years, 10 years. We don't know, it's hard.

NT: I want to make sure I understand you here, Yann. So the amount of text data that's available will grow, but not infinitely. But the amount of visual data that we could potentially put into these machines is massive, much more.

YL: Well, the 16,000 hours of video I was telling you about, that is 30 minutes of uploads on YouTube. I mean, we have way more data than we can deal with. The question is, how do we get machines to learn from video? We don't know.

NT: Right, so what is the new architecture that is needed if the next step is gonna be video inputs? Obviously, a large language model isn't exactly right. The way it's been constructed isn't optimized for it. What do we have to build now?

YL: Okay, so large language models are trained in—or NLP systems more generally—are trained in one way. You take a piece of text, you corrupt it, and then you train some gigantic neural net to reconstruct the full text, to predict the words that are missing, basically. You corrupt it by removing some of the words. LLMs, like ChatGPT and Llama and others: you train them by just removing the last word. I mean, technically, it's more complicated, but it's basically what they do, right?

So you train the system to reconstruct missing information about the input. So of course, the obvious idea is, why don't we do this with images, right? Take an image, corrupt it by removing some pieces or corrupting it, and then train some big neural net to recover the image. And that doesn't work, or it doesn't work very well.

There is a whole thread of efforts in that direction that has been going on for a while, and it doesn't really work very well. It doesn't work for video either. I've been working for nine years on video prediction. Show a piece of video to a system, and then train it to predict what's gonna happen next. And if the system is capable of doing this, it probably has understood something about the underlying nature of the world, the same way a text system that is trained to predict the next words captures something about the meaning of the sentence. But that doesn't work either.

NT: And so what you mean is you take a video and you have me dropping a pen, and it'll be able to predict that the pen will fall. But right now a machine can't do that.

YL: A machine can. So the question is: your pen has a particular configuration. When you drop it, it's gonna follow a particular trajectory. Most of us cannot predict exactly what the trajectory is, but we can predict that the object is gonna fall. It takes babies about nine months to figure out that an object that is not supported falls. Intuitive physics, that takes us nine months to learn when we’re babies. How do we do this with machines?

NT: Wait, sorry if this is a dumb question, but I don't understand. If in the future these things are gonna work and continue to be revolutionary because they're gonna understand video because that's where the data is, but we don't understand video, how do you square that?

YL: So the potential solution to this, so there is no real solution yet, but the things that are most promising at the moment, at least the things that work for image recognition, I'm gonna surprise everybody, are not generative. Okay, so the models that work best do not generate images. They do not reconstruct. They do not predict. What they do is they predict, but in a space of abstract representation. So the same way I cannot predict exactly how the pen will fall in your hand, I can predict that it will fall. So at some abstract level, a pen being here or there without the details of exactly what its configuration is, I can make that prediction. So what's necessary would be to make predictions in abstract representation space as opposed to pixel space. That's why, you know, all the prediction pixel spaces have failed so far. It's just too complicated.

DK: But Nick, it's more than just a video. I think the other thing that babies learn is the notion of cause and effect, which they learn by intervening in the world and seeing what happens. And we have not yet done that at all with LLMs. I mean, they are entirely predictive engines. They're just doing associations. Getting to causality, which is so critical when one tries to cross the chasm between bits and atoms, that's a huge capability that's missing in current day models. It's missing in models that are embodied. It's missing in the ability of our computers to do common sense reasoning. It's missing when we try to go to other applications, whether it's manufacturing or biology or anything that interacts with the physical world.

YL: Well in an embodied system, it's actually kind of working. So I mean, some of the systems have world models. Here's a representation of the state of the world at time T. Here's an action I might take. Tell me the state of the world at time T plus one. And this is called a world model. And if you have this kind of world model, you can plan a sequence of actions to arrive at a particular goal. And we don't have any AI systems based on this principle at the moment, except very simple kinds of robotic-like systems that don't run very fast. 

And so once we can scale this kind of model up, we'll have systems that can understand the world, understand the physical world. They can plan, they can reason, they understand causality because they understand what effect an action will have. And it will be goal-oriented, objective-oriented because we can give them goals to satisfy with this planning. So that's the future architecture of AI systems. And in my opinion, once we figure out how to make this work, nobody in their right mind would use autoregressive LLM anymore.

NT: All right, well, I don't understand how to make a neural network, but I do know a little bit about children. I've raised three of them. I think you guys overrate babies. Cause and effect. When my youngest was nine months old, I remember him standing at the side of the crib, hollering, we were trying to sleep train, and he flipped out. Thankfully, he landed on his butt, but I don't think he understood the object of, notion of objects falling and exactly what cause and effect are. 

Aidan and Kai-Fu, is Yann correct on what needs to happen? Is he pursuing it the right way? Or is he falling short because his ideas are wrong?

KFL: No, Yann is always right. 

YL: Thank you, Kai-Fu. 

KFL: However, we shouldn't lose sight of the incredible commercial value that exists in the text-based LLMs, right? I mean, they give an incredible pretense of logical reasoning, even common sense. They solve real problems. They can generate content. They dramatically improve our productivity. They're being deployed everywhere. So, putting on my more entrepreneur hat, I just see so much value that remains to be reaped. On this opportunity to have a world model, I think that's a great thing for researchers to work on. For me as a startup company, that's something that's a bit farther out. And we'd love to have academia and large company research labs make the discoveries then we'll follow.

NT: So your view is that even if we stick to text-based large language models, if we don't move on to all this crazy stuff that Yann is talking about here on my left, the world will still be turned upside down?

KFL: We're already seeing it, absolutely. I mean, we're seeing content generation, the emulation of people, the creation of interesting experiences, the making of better search engines, and basically everywhere you can imagine, office productivity, creating PowerPoints content. We have way too many things for us to think about and work on when we think about what can make the most money and produce the most value for users today.

NT: Aidan, do you agree with Kai-Fu? 

AG: Yeah, I definitely agree about the market opportunity and the value that exists today and will be coming very, very soon, even if we don't make it all the way to AGI. I think Yann answered a really interesting question, which may have been different than the one asked. I think you were asking if we just keep doing what we're currently doing with autoregressive models, will we make it? And I would agree. The answer is no. And for the reasons that both of you state, which is grounding and the ability to actually experience the real world gives you those causal insights. 

I do believe those aren't insurmountable hurdles. And I think you both believe that as well. And so I think people are working on that. So far, the dumb strategy has worked so well that we've just been able to be like, “Build a bigger supercomputer, scale up more data,” and we get performance. We get extraordinary performance. And so what happens when that's to tire? I think we know what's next. We need online experience. We need to be able to actually interact with the real world. 

The way these models are deployed today, we do all of this offline training, and offline means there's no interaction with a person, there's no interaction with the environment. And we deploy them, we put them into a product. It's static. It doesn't learn, it's fixed from there. Nothing you do changes that model. It's weights. And so that needs to change in order for these things to continuously learn. And the other big hurdle is, we humans, we learn through debate like this, right? We discover new ideas, we explore the space of truth and what's possible knowledge.

Models, they need to be able to do that amongst themselves as well. So this idea of self-play and self-improvement. Right now, a major bottleneck, I'm sure Kai-Fu feels it as well, I'm sure we all feel it, is getting access to data that is smarter than our model as it is. So before, you could just pull anyone off the street and be like, please teach my model to speak properly. And it would improve your model, it would increase the score. They're starting to get really, really smart. And so you can't go to the average person off the street. You need to go to a master's student in mathematics. You need to go to a bio student, and then a PhD, and then who? And so humanity and its knowledge is kind of a limiting, an upper limit to the current strategy. So we need to break through that.

NT: So hold on. So eventually, we've got the man off the street, then we've got the PhD, then we've got the smartest person in the world, and then eventually the machines are just talking to each other, they're training each other, they're creating all this synthetic data, they're training on each other's synthetic data. We meager humans have no idea what's going on. How do we know that they're not corrupting themselves, polluting themselves?

AG: I would just interrupt to say, I think Daphne would tell you that it's not just synthetic data and them interacting with themselves in this little box of isolation. They need access to the real world to run those experiments and experience that. To form a hypothesis, test a hypothesis, fail 1,000 times and succeed once, just like humans do, to discover new things. Sorry, I cut you off. What was your question?

NT: No, that was awesome. That was great.

AN: Can I jump in and comment on the technology limitation? I don't know if Yann and I agree or disagree on this, but people talk about how bad large language models are at math. If you ask them to predict the next character, multiply two large numbers, it often gives a wrong answer. Turns out humans are also really bad at that. But if you give a human pencil and paper, if you give me pencil and paper, I'm much better at math than if you force me to just spit out the answer without working it through. And I think large language models are like that too. 

So I think one of the ways to overcome some of these limitations is tool use. If you give a large language model the equivalent of a stretch pad, you could actually get it to be much better at math. In fact, large language models today are using all sorts of tools like a sketch pad to actually make things happen in the world, to browse web pages. And these tools are one way that in the short term—and there are probably more fundamental things to be done in the long term—that make it break through some of these limitations of purely autoregressive models.

DK: But I think more broadly, and coming back to the point that Aidan attributed to me, and thank you for that, we do not have the ability at this point to create an in silico model of the world. The world is really complicated. And the ability that we have to experiment with the world and see what happens and learn from that, I think is absolutely critical to what makes for human intelligence. So if we want these machines to grow, we need to give them the ability, not just in silico talking to each other and kind of based in the juice of their own little universe, but really to experiment with the world and generate the kind of data that helps them continue to grow and develop. 

And I think the big differentiator as we move forward is giving the computers access to designing experiments, whether it's simple experiments, like what happens when you drop the pen, or more complex experiment, which is what happens when I put these five chemicals together in a cell, what happens to the cell, or what happens to the human? Those are the kinds of experiments that are going to teach the computer about the incredible complexity of the world and allow it to really go beyond what a person can currently teach them once you've kind of plateaued with the math expert or the biology expert.

NT: That's so interesting. I feel like I'm getting a lot smarter, so maybe I'm useful to your machine for like one extra day.

KFL: Can I just jump in and say that, yes, those are all great aspirational goals, but we have so much engineering work, and you could call it a patch if you want to make the current algorithms better, like with the problem that it makes things up out of the blue, and you can correct that by gluing a rag search engine to it. Mathematical problems, theoretically you could glue a wolf from alpha to it. Various companies have attempted to do that, and I think a lot of the issues that we see today can be addressed by that. I know it is not elegant. One could argue GPT-4V is a smart way to glue things as well, but I think we have a lot more engineering gluing that can cover up a lot of the issues today as researchers aspire to build embodied AI and advanced things.

NT: I want to jump back to something that Aidan said, where you're talking about AGI, and then you talked about making it, and so artificial general intelligence is a term I don't want to get stuck in a definition, but super intelligent machines, or the goal of Open AI, the stated goal of Open AI is to build a machine that is smarter than humans at everything. That was the original goal of FAIR at Facebook.

YL: Still the goal. 

NT: So I want to ask all of you, is this the proper goal for so many AI researchers? Should we be trying to build machines that are better than humans in general ways?

YL: Should we build airplanes that go faster than birds? The purpose of building airplanes was not to go faster than birds, but to figure out how you fly. And I think the problem of FAIR research at the moment is discovering the underlying principles behind intelligence and learning, whether it applies to humans or not. And the best way to do this is to actually build machines that are intelligent. You can't preserve sanity, I guess, without actually building those things. 

And so there's a scientific question, what is intelligence? What are the required components to make machines intelligent? One of them is learning, we know about that. We've made some progress on this, but we so far have not been able to reproduce the type of learning that allows a 10-year-old to learn to clear out the dinner table and fill up the dishwasher in one shot, or allows a 17-year-old to learn to drive in 20 hours. We still don't have level five self-driving cars. So what is that? What type of learning takes place in humans and animals?

NT: But Yann, isn't your metaphor—I would push back on your metaphor because what we did is we took one thing that a bird does, which it flies, and we got way better at it. We didn't try to create a better bird. We said, oh, look, a bird, right? So I would think that maybe the parallel would be AI researchers saying, instead of trying to make something that's like a human mind and better at everything, let's figure out how AI can best serve human biology, for example.

DK: So I'm gonna, since you asked us to disagree, I'm going to disagree with Yann. 

NT:  Beautiful. 

DK: You're welcome. 

NT: No one ever disagrees with Yann, as you can tell from Twitter. 

DK: Well, so I'm going to try and disagree with Yann, which is to say, I'm not convinced that the current way in which we're designing AI systems is teaching us about human intelligence. It's teaching us about an intelligence. We're certainly learning about how to build an intelligence. I'm not sure that building planes has taught us how birds fly. I'm not sure that building LLMs has taught us how people reason. So I think it is a very worthy goal. I'm not sure that the path that we're on is necessarily going to lead us to that specific goal of understanding human cognition. 

On the other hand, I'm going to now agree with Kai-Fu, which is to say there are multiple other endeavors that one can undertake in this world of AI, which is to solve really hard, important, aspirational problems that people maybe are not capable of solving and we will not get there by trying to replicate how people reason and the problems that you alluded to that I'm working on, which is problems in biology and health. People are actually really bad at solving those problems. And I think that because people are really bad at finding these subtle patterns in very large amounts of heterogeneous, multimodal data that just don't fit together into the kind of natural patterns that a baby learns when they're interacting with the world. 

And so you can imagine that a different aspirational goal, not to diminish from Yann's goal, is to just build computers that are capable of addressing really hard societal problems, whether it's in health or in agriculture and the environment, climate change, and so on, that people are just not going to get on their own. And so I think that is kind of two different paths, if you will.

YL: It's absolutely the case that we understand how birds fly because we built airplanes. It's absolutely the case that we understand much better about, for example, human perception because of neural nets. There is literally thousands of papers in neuroscience that use convolutional nets as a model for the visual cortex.

DK: We understand it better, but it's not the same. 

YL: It's as good as it gets. It's much better than whatever we were using before, which were basically template matching. However, people are trying to do the equivalent using LLMs to explain how the human brain understands stories, for example. That doesn't work. So convolutes are a good model of visual perception. LLMs are not a good model of text understanding.

DK: Okay, so we agree. 

NT: I want to stay on AGI because I think it's leading to some very interesting places. Andrew, what do you think of it as a goal?

AN: Just to share, I think AGI is a great goal, but the AI community is big enough. We shouldn't all have one goal. I think we should have some of us, let's work on AGI. Some of us work on climate change. Some of us work on life sciences, and let's do all of these things and celebrate. And let's just celebrate them all.

AG: I agree. 

NT: Thank you. 

KFL: Well, when I was 18, I had AGI as my dream, to figure out how human cognition works. And while I can't disagree with Yann's analogy about how human cognition helped the state of the art, I have to side more with Daphne in that today's LLM is a really different kind of animal that you teach differently. It's really good at some things. It's really bad at some other things. And I think it's admirable to try to fix some of the problems with LLMs, but its upside is totally unrealized right now. 

So there's so much you can do that will make LLMs create more value for the world without really necessarily making it emulate or learn or beat humans. It's like when people invented the automobile, no one ever wanted to teach it to walk, right? But cars became trucks, became all kinds of other things and engines started making airplanes. And so I think we're at that phase when Henry Ford, I guess you're kind of the Henry Ford who invented the technology. And we're now trying to build on top and say there are so many more different kinds of things you can build once you have the engine. And whether that engine is or isn't like the human brain, I'm less interested in that. 

I think realizing the value ought to cause us to want to find ways to make that new LLM engine create more value. So that's what I'm focused on. But I think other people can love AGI and try to fix problems. I think scientifically, that's interesting. But I think the fastest path to value creation is to take this great engine we have and build automobiles, trucks and airplanes from it.

NT: So Aidan, when you were talking about AGI, I was about a minute after that and you said, you said something along the lines of if we're gonna make it. And I wanted to pause and ask you what it meant, “make it,” but you had such an interesting answer going that I didn't. But now I wanna ask you whether when you said make it, did you mean AGI or did you mean continuously improving AI that serves humanity in some broader way? Because there are a lot of people in the field who it's almost like AGI is the end zone. How do you view this?

AG: No, it's continuous, right? 

NT: I don't think it's discrete. 

AG: I think you can be superhuman in a specific domain, right? And so the models might be superhuman in particular aspects. We have models that play Go extremely well, beyond any human player. So I think it's a continuum. Maybe there is some point where models are better than humans at everything, but I think it's probably an ill-specified definition, and this is much more of a continuous change. 

On the question of AGI and whether it's the right goal, I mean, for me personally, I just wanna create value for the world. And so I don't care if we stop short of AGI, if we fall short, there's a lot to be done with that technology, there's a ton of value to be created, and I'll be happy with it. But of course I hope that we achieve the maximum value possible, and I think that does necessitate having a tool that we can lean on, which is as powerful as possible. I think that's what all of us here are trying to build.

AN: One funny thing about the definition of AGI is there's been the biological path to intelligence, say human brains, and then there's been the digital path to intelligence, AI models we have. And they have different strengths and weaknesses. And the funny thing about the definition of AGI is we're forcing the digital path to intelligence to be benchmarked against a biological path. And it's worth doing, but that makes it really tough to actually hit AGI, even as clearly we're building some incredibly valuable digital intelligences.

NT: I love that point. 

YL: I think a big difference here is on the progression of science and technology, there is an off ramp where you can sort of milk an advance to make practical solutions. And a lot of people in this room are either investors or technologists or CEOs of companies who need to do this in the short term because that's their purpose. Or it could be long term.

DK: I wouldn't call it an off ramp when you're trying to get machines to help you deconvolute human biology and make better medicine. I wouldn't call that an off ramp.

YL: It's a perfectly good goal. 

DK: Okay. 

AN: But just not as good a goal as… haha. 

YL: No, it's a different goal. No, I think it's perfectly fine. It's a different goal.

YL: And all of us were inspired by the science of making progress towards AI. To some extent, you two have abandoned it. I haven't. I'm still 19 years old.

AN: I have not abandoned it. I just haven't.

YL: Good for you. Okay, great.

NT: All right, I wanna talk about one of the most controversial ideas in AI, which is open source and the role of open source in future AI. So as we've seen by following the conversation in the United States, among the United States government, there is a lot of fear of open source AI. And there is a fear that if there's open source AI, that people can modify, that they can use, they can be shared, then lots of bad guys will be able to do bad things with AI. And not only that, horrible things might happen. Like you might end up with a Chinese-based large language model that was based partly on the infrastructure of an American-based built by a French guy, took the infrastructure, built a different model, different data. You might end up with something terrible like that that would shoot up the hugging face rankings. 

So a lot of fear about open source, but in this panel, I suspect we're gonna lean somewhat towards open source. You are one of the chief advocates of open source for everything in AI. So why don't we start with you, present your view, and then let's push back and see where we go.

YL: So it's been the case in the history of technology that when something is viewed as a basic common platform, it ends up being open source, right? That's true for the software infrastructure of the internet. And the reason is that it's safer, it progresses faster. The reason why we've seen such fast progress in AI over the last decade or so is because people practice open research. It's because as soon as you get a result, you put it on archive, and you open source your code, and everybody can reuse it, and we have common software platforms like PyTorch that everybody uses. And so it makes the entire field accelerate because you have a lot of people working on it, you know, that are self-selected. 

That's why, that's the magic of open research and open source. And so the way to keep that progress fast is to keep the research open. So if you legislate open source out of existence because of fears, you slow down progress, and the bad guys are gonna get access to it anyway, and they're just gonna catch up with the rest of the world.

NT: So you're in favor of maximalist open source all the way. So even when you have Llama 200, and you know, people are building, you know, I don't know, infinitely conversant sex bots, and people no longer relate to each other because the Llama 2000, Llama 200 bots are just so amazing. Continue to be open, let people do whatever they want? There's no point where you'll say, eh?

YL: Okay, so there is a big question there, which is, what do you consider acceptable behavior for the intelligence system? And our answers would be different in this one. Particularly if you are from, you know, countries with different languages, cultures, values, or centers of interest. So it's not like we're going to have the possibility of having AI systems that cater to our interests if they are produced by a handful of companies on the West Coast of the US. The only way we can have AI systems, you know, all of our interaction with the digital world in the somewhat not too distant future is gonna be mediated by AI systems. They're gonna live in intelligent glasses, or smartphones, or whatever device, and they are going to be like human assistants, assisting us in our daily lives. All of our digital diet will be mediated by AI systems. You do not want this to be under the control of a small number of private companies.

So the basic foundation models will have to be open source, and then on top of this, you will have systems that are built for particular cultures, languages, values, centers of interest, and whatever, by groups that can fine tune those systems for that purpose. And for the same reason we need a diverse press. I'm talking to a journalist, a former journalist at least.

NT: I just wanted an alive press, but keep going. 

YL: You know, we need a diverse press. We'll need a diverse population of AI systems to cater to all the interests. And so, you know, this is not gonna be legislated, it should not be legislated by any.

NT: So does everybody here agree with the chief AI researcher at Facebook on the desire to maximally devolve power from the large West Coast tech companies? 

AG: I was gonna say I'm in favor of that. I like that, but there are points I want to push back on.

NT: Identify the point, well, go ahead. Aidan, go ahead, and then Andrew.

AG: Yeah, I was just gonna say that I don't think it's a binary. I don't think you have to choose between open source and closed source, and I completely agree you should not regulate it and create a policy that forbids open source, I think. I mean, I don't know a lot of people who are proposing that, but anyone who is, they are wrong.

I do think there's a middle ground here, and I think that there are some categories of models that can be open source and should be open source and fuel the creators, right? Like the hackers who wanna build new experiences, experiment, try to subvert the large West Coast tech companies. They should absolutely have access to technology to do that. At the same time, there are organizations who are trying to build businesses around this. We wanna create a competitive advantage. We should have the right to keep our data and keep our models closed, and that should be okay. There should be this ecosystem of a hybrid as opposed to one. And I know middle answers are boring, and I should probably take the extreme to make this a little bit more interesting, but I think that's the reality of that.

NT: But there are lots, not who are directly trying to legislate open source out of existence, but the Biden administration passed an executive order, which requires massive legal teams to submit red team results to the United States government. It's not as though some open source AI company can comply with that. I mean, there are legislative proposals that would have the effect of consolidating power in a relatively small number of companies.

KFL: That's true. 

AG: Yeah, I believe that's true. 

NT: So does anybody wanna push back or augment Yann's arguments about open source? 

AN: Yeah, so I wanna agree with Yann on something and disagree on one thing, which is so, I think that, to me, it's a question of, do we think the world is better off with more intelligence? We use, primarily, human intelligence, now we have artificial intelligence. I think that intelligence, net net, tends to make societies wealthier, make people better off, and with open source intelligence, we can make it available to more people, acknowledging that, yes, intelligence can, in some cases, be used for nefarious purposes, but on average, I think it actually makes us all much better off. 

There's one thing that I may have a different perspective than Yann on, which is that infrastructure naturally gravitates to open source. I wish it were true, but today we all build on, you know, NVIDIA and AMD and Intel, things we all build on the clouds, but the semiconductor designs are very closed, not at all open, and to me, I think it's actually up to us to work over governments, the positive regulations, it's actually up to us collectively how open or close society we want, and there are definitely powerful forces at this moment that, to some extent, have succeeded in pushing forward very tricky proposals to put in place very burdensome requirements on open source, and I think we're actually in a very dangerous moment where there's a significantly greater risk of over-regulation than under-regulation, and frankly, when I'm in Washington, DC, interacting with people, I can almost feel the tens of hundreds of millions of dollars of lobbyists' attempts to regulate AI with a significant part of the agenda being to shut out open source that some companies would maybe rather not have to compete with.

NT: Kai-Fu, are you maximally open source? 

KFL: I'm not maximally, but I do agree with almost everything that was said. I think I would amplify one point, which is that one of the issues with one or a few companies having the most power and dominating the model is that it creates tremendous inequality, and not just with people who are less wealthy in less wealthy countries, but also professors, researchers, students, entrepreneurs, hobbyists. If there were no open source, what would they do to learn? Because they might be the next creator, inventor, or developer of applications, and to force them to develop on one or two platforms I think will dramatically limit innovations. So I think that's why we've also followed the footsteps of Meta.

NT: Is that necessarily true? Last night I heard Sam Altman at the Innovators Dinner, I believe you were there too, make the argument that essentially Open AI, the best position would be for Open AI to build sort of the perfect machine that exceeds AGI and then to give it away for free in order to reduce income inequality. So you don't actually need open source to reduce income inequality.

KFL: Well, that's the kind of altruism that I think everyone will naturally doubt, right? So I think it's much better to make available technologies to the most people possible, and I'm absolutely against regulation. And one point I would say is I'm not sure I would agree with Yann that the foundational model has to be totally open source. I think for some companies they might want to create a competitive advantage at the foundation model level. But I think both are viable possibilities, and we'll see how it goes.

NT: Wait, and Daphne. Okay, important interjection and then Daphne next.

YL: Interjection to what you just said. 

NT: I didn't say anything. 

YL: Well, you said what Sam Altman said. Open AI does not have a monopoly on good ideas. They are not going to get to AGI, whatever they mean by that, by themselves. In fact, they're using PyTorch. They're using transformers. I mean, they're using stuff that was published by many of us, infrastructure, and they just climbed up recently because of competitive advantage, and that's the only way to generate revenue. But they are profiting from that open research landscape that I was talking about earlier.

KFL: And the two largest companies in this space are limiting publication more than I've ever seen in any field that's growing like this. 

YL: Well, not Meta. 

KFL: No, I said two largest. 

YL: Well, Meta is one of the largest. 

KFL: Okay, all right, you know which two I'm talking about. 

NT: Two of the three. Daphne? 

DK: But continuing on this theme, I mean, the models that are constructed by these companies might be great for certain applications but might not be the right foundation for other applications. And so, if you're restricting the set of available foundations to those of one or two or three companies that are designing towards a particular use case, it might not be the right foundation for the next innovator to build their new application that uses their new creative idea or a new data modality that wasn't used in the original conception of the models that were developed by these companies. I think you're stifling innovation if you do not create a strong foundation of open source models in which other people can build. Now, I agree with Aidan and with Kai-Fu that not every model that anyone generates has to be open source. I don't think Yann was claiming that.

YL: No, I agree, I agree. 

DK: I think that people can certainly have a competitive advantage by a new model architecture, by a new type of data modality, but there needs to be a strong foundation of open source models for the community to build on. 

NT: All right, so we have about a minute and a half left, so I wanna do a quick round where we go through one of the things Kai-Fu talked about is that open source is essential to make the world more equal. The last revolution in technology seemed, more or less, economists would argue, to make the world less equal. What has to happen in AI to make the world more equal? Just quickly, one thing you wanna see happen.

AG: Can we come back to me? 

NT: Yes, Yann, we'll go this way. 

YL: I think the question of equality, of course, open source dissemination is a big factor for equality. Now, for economic equality, it's a fiscal policy question. It's not a technological question.

NT: Andrew? 

AN: I think training. When I look at adoption of AI in enterprises, often one of the biggest bottlenecks in many corporations is training. The idea is that there are options in there. You need to upskill the workforce, not just to build AI apps, but also just to use ChatGPT and bots as that drives productivity gains really quickly.

NT: Daphne? 

DK: Maybe not surprisingly, because Andrew and I co-founded Coursera, I'm going to twist Andrew's answer a little bit and say education. One of the biggest delimiters between people who are able to use this technology and ones who are not is structured thinking. Structured thinking is not something that we teach our kids nearly well enough, and I think that is a skill that we need to start at the elementary level because that will be the big differentiator between people who successfully use this technology and ones who do not.

KFL: Global competition. I think not only is it dangerous to have one company dominate everything, that stifles innovation. More competition will help address that, but also global because we have to be careful that this is the first time a technological platform comes with its own ideology, values, and biases. So we have to let different people, cultures, countries, religions develop their own model and compete equally globally and not let any one country dominate.

AG: Yeah, I think it's access. I think access to the technology, making sure that everyone is able to contribute to it, that it speaks their language, that it knows their culture, is going to be crucial to equality.

NT: All right, amazing panel. Should have sold out in 15 seconds. Thank you guys. You're all great.Thanks.

Sam Toolan

Ecology & economy | Creativity & imagination | Cultural knowledge

11mo

Great discussion, thank you

Like
Reply
Dan Södergren

Inspirational professional keynote speaker / trainer and author about #AI, #Technology and the #futureofwork. Hire Dan as your inspirational #keynotespeaker for your next event, conference or training day.

11mo

Some legends right there.

Like
Reply
Leonard Donnelly

Chief Executive Officer at Waylay.io

11mo

Nicholas Thompson great job. I found Daphne Koller contributions practical and relatable. Also hi to Andrew Ng who I have seen in a while. Leonard

To view or add a comment, sign in

More articles by Nicholas Thompson

Insights from the community

Others also viewed

Explore topics