From Horse Carriages to Rockets: Why Data Quality Drives the Speed of Innovation

From Horse Carriages to Rockets: Why Data Quality Drives the Speed of Innovation

I was invited to the 9th Quality Conference in Riyadh, Saudi Arabia, to participate in a panel discussion about data quality, and its impact on entrepreneurship, with a particular view from an AI and startup perspective.

In this conversation about the transformative power of high-quality data in AI innovation, I reflect on my journey from the early days of AI in the 1980s to today’s exponential advances. Through personal anecdotes and observations, I share how a single dataset (ImageNet, meticulously curated by Fei-Fei Li) has catalyzed the current AI revolution, demonstrating how high-quality data can spark extraordinary technological leaps.

I discuss what I call the Paradigm of Jolting Technologies, where innovation’s rate of acceleration is increasing. Using analogies from horse-drawn carriages to rockets, I explain why data quality becomes increasingly critical as the speed of innovation grows. Just as a horse carriage can correct course slowly while a rocket requires split-second precision, today’s rapidly evolving AI systems demand increasingly sophisticated data quality controls.

I also address the human element of data quality and innovation, emphasizing that despite our technological advances, success ultimately depends on building and nurturing teams that feel empowered to experiment, fail, and innovate. As we look toward a future where autonomous systems will generate unprecedented amounts of data, I believe we must resist the temptation to standardize everything globally, instead embracing the rich diversity of approaches that different perspectives can bring to AI development.

Watch the video of the supercut of my remarks at the 9th Quality Conference in Riyadh, and read the edited transcript.

Mohammed Alsolami, panel moderator: Can you share how high quality data has catalyzed AI-driven innovation and improved strategic outcomes for startups especially?

David Orban: I would like to tell a couple of anecdotes with respect to this. I started in AI in the 1980s. At that time, we had very ambitious aspirations around what AI could do. There was a company called Hecht Nielsen Corporation, HNC, that produced a hardware accelerator that for the first time using neural networks could reliably recognize handwritten numbers. That was the start of applying neural networks in financial applications. But because the available data was of low quality and covered not many fields, the promise of widespread applications of this particular technology and approach couldn’t be delivered.

Now, fast forward 30 years, it is incredibly important to acknowledge that the current revolution in AI, the exciting and powerful applications of deep learning and other types of neural networks that we are seeing are actually due to the effort of a person where she started from collecting, cleaning, and then providing a new data set for others to use in order to catalyze their efforts. This person was Fei-Fei Li at Stanford University with very little money, with very scarce resources. For over two years, she spent time to collect and clean and make available millions of images labeled accurately, so with high quality, that became the basis of the ImageNet competition at MIT, which in 2012 with AlexNet for the first time performed at human level and then superhuman level and really ignited the enthusiasm that we are seeing today.

Because even though we are all very much aligned with the objective results of scientific research and their applications, it is also the case that we are human beings and we need to see that our efforts can produce what is needed. That is why these fundamental steps are so powerful and I think it is important to remember it is high quality data that started the current wave of AI revolution.

Mohammed: In what ways does high-quality data accelerate innovation and make businesses more competitive, especially when leveraging AI?

David: The opportunity for innovation to improve products and services is accelerating thanks to the technologies that we apply to the processes that we use to design and deploy those products and services. The next step is going to be to think about the business models themselves. How can we increase the speed of change in how companies, startups, enterprises, or even countries collaborate efficiently and sustainably using advanced technologies?

The role of AI in this is fundamental, especially the latest generations of tools made non-structured data accessible. Generative AI is wonderful in the sense of being able to analyze and help human leadership make decisions in a faster but well-informed manner.

Now, the question of data quality becomes important in this dynamic scenario because of the speed of innovation that requires a fine-grained, sure-footed set of controls in the feedback mechanism. If you are driving a slow vehicle, let’s say a horse-drawn carriage, just very slowly moving along your path, if the horse gets distracted and wants to eat something on the side of the road, you will have plenty of time to correct and bring the carriage back to the middle of the road. Nothing happens. But if you are inside a rocket taking off, accelerating with all the engines, and you have cycles of sensor feedback that need to be in a millisecond cycle, then decisions need to be made rapidly and reliably. If you are not able to do them, maybe because your sensors are faulty and the data that you acquire is of low quality, then it is a guaranteed disaster.

Today, that is what we are seeing in a lot of scenarios where not only innovation is very fast, but the speed of innovation itself is increasing. We are all familiar with Moore’s Law that for over 50 years dictated in a kind of self-fulfilling prophecy through the common effort of thousands of engineers all over the world that the next smartphone or computer that you would buy will have more power and approximately doubling power every couple of years. Statistically speaking, this has been true for 50 years, an amazing development.

However, recently we have started to take advantage of a new type of innovation where rather than the doubling rate being constant, it is itself shrinking and the rate of innovation is increasing. I call this the Paradigm of Jolting Technologies. If you go and watch almost any presentation by Jensen Huang, the CEO of NVIDIA, he will show charts that on a logarithmic scale show an exponential curve. It is the representation, a graphical representation of a double exponential, of an increasing rate of innovation, of a shortening cycle of doubling power.

If they were merely following Moore’s Law, they would have increased the power of their AI systems in particular over 10 years less than a thousand fold, which is already amazing. But what happened is that they increased it more than a million fold because it is a new type of innovation. This is why the types of planning, the types of control, the types of execution, and the types of feedback that we need must be based on not only quality data, but quality that is itself increasing. Sometimes quality is the measure of not very good data, but it is already reliably not good. We have to be able to execute product and process and business model innovation on our ability to close this high feedback loop. And that can only be done with high quality data.

Mohammed: On those terms, we can see in Elon Musk’s company SpaceX how the power of AI is very useful for every single thing and you mentioned the change in milliseconds. Would you elaborate more on that?

David: I’m only an external observer of what various companies do at the forefront like Tesla and SpaceX and others like X AI, which was recently in the news for their ability to deploy a cluster of NVIDIA servers for their data center in record time. It is definitely the case that in each of these operations, there’s a lot of data being deployed. Acquiring and using data at high reliability, at high speed, is a key component for their success.

Mohammed: How can high quality data be leveraged to predict market trends and sustain business growth over time?

David: I would like to offer two, three different readings of this, following on the previous remarks. It is always useful to ask ourselves what we are measuring, why and how? Starting from that question. As an example, there could be a smart city wanting to understand pedestrian traffic. A very simple way to address that could be putting cameras in the intersections, and with image recognition say these are the people passing at the intersection. It would immediately open up concerns of privacy considerations – what is happening with the data, who is handling the data, who has access to the data. But instead of going with traditional cameras today we have technology that can use interference of wi-fi signals as they are bouncing off from people who are walking around and that is inherently incapable of identifying personal features.

Just by using a different technology, a lot of issues are eliminated, everything is streamlined, maybe becomes even more scalable and it can be deployed faster in a more sustainable trust leveraging manner, as long as the information is also well shared, not only inside, but also with every stakeholder, the regulators, the public, and so on.

Then the question also arises, if technology is such a defining factor, what are the sources of data? What are potentially new sources of data that can be used? For a long time now, almost 20 years, we have become accustomed to the fact that we need humans to efficiently deploy sensors around the world, because these sensors are our smartphones. They are the best sources of data with dozens of different types of sources that we can leverage – their location, the way they are used, the applications, such a rich source of data. However, we are only 8 billion people on the planet. We cannot sell more than 8 billion smartphones on the planet.

One of the unexpected consequences of autonomous vehicles is going to be that at the beginning we will think, oh, of course, cars without a driver. It is exactly like horseless carriages. At the beginning, they looked exactly like the ones with a horse, just without a horse. And our self-driving cars will look exactly like cars. But very soon we will realize, oh, it’s 100 pounds or 200 in my case of human that is not needed in the car. Maybe the shape of the self-driving vehicle can be very different. And we will have an explosion of form factors from very big to very small. And maybe in 10 years’ time, a conference like this will be full of self-driving things, autonomous things of all kinds, of all form factors, supporting completely novel use cases of data.

The last remark that I want to make as a consequence of this, not only of why we measure and how and what we measure and the unexpected sources of data, is that this imagination drives innovation. The reason why we have this enormous interest in robotics research, in particular humanoid robots, is because of the unquenchable thirst of AI applications for new kinds of data. Yes, the traditional sources of human-generated data are relatively limited. So we need to generate orders of magnitude new data which we are in the process of doing thanks to the autonomous platforms that interact with and interface with the physical world are able to generate an unbounded set of valuable data that we will then leverage and use in our applications.

This is what I would like to offer, that too many people see the economy as a zero-sum game and competition as a win-lose proposition. But the way I see it is that technology, the economy, and the way human societies work together is a positive-sum game. The more technology, the more collaboration, the more all of us together can advance.

Mohammed: Innovation always comes after regulations. And entrepreneurs are driven by innovations. So there is another side of slowing the innovations when we add more standards, sometimes regulations. Would you elaborate more on that?

David: The way I would put it is a question rather than an answer. I think it is very important for standards bodies and for regulators to ask themselves, when is the right time to standardize and to regulate? Too late and maybe the playing field is monopolized by a single player or a small set of players rather than giving rise to healthy competition among a multiplicity of players. Too late and maybe some unintended consequences play out in the market in other ways that a regulator’s responsibility is to prevent.

However, too soon, and it could very well be that the regulation and the standardization that has been put in place is only looking at the early iterations of that particular technology with new and novel ways of using it and deploying it that the regulator cannot take into consideration yet. So it is a delicate balance and a very important responsibility. Without pretending to be able to offer a solution, I think that a smart approach is to incorporate a sunset clause in every possible regulation and standard, if possible, so that when a certain limit in time is reached, it must be re-evaluated with an equal effort as it went into its design initially in order to establish if it is still applicable or it must be redesigned from scratch. And if that effort is not invested, then the regulation expires and it is back to radical innovation again or the standards body or the regulator can assess a new update with the various stakeholders collaborating knowing that the technology that evolved in the meantime can thrive with the help of the new standard.

Mohammed: What are the data quality issues you have encountered? How did you address them to drive innovation?

David: I love your question because you mentioned exactly the key component, which is the human component. Building, motivating, retaining, growing teams is a beautiful, complex challenge. It is fascinating to see how a novel generation of tools is starting to help with that as well. Better and better ways, not only to recruit and to interview, but also starting to quantify things that were intangible before, like the evolution and the support of a positive company culture.

How do you measure the feeling of your team, the buzz? How are the meetings? Are they productive? What is the emotional fingerprint of the interactions of people? All of this is becoming possible to capture and to envision a new kind of enterprise where the ever more sophisticated automation and the forthcoming availability of AI agents on one side is complemented by human operators, powerful members of your team that are empowered and feel enfranchised to speak up, to experiment, to be proud of the tests that they execute knowing that a lot of them will fail and knowing that the company culture allows it and embraces this kind of iterative discovery and innovation. So designing and deploying tools that are like the ghost car for a super performing organization and you know how to measure yourself against the best possible performance as you are in the driver’s seat and try to excel in what you do. This is a novel opportunity that innovative organizations will embrace at the fullest.

Mohammed: Will high quality data become the main competitive advantages for businesses in the next decade?

David: Yes, absolutely. Without it, people won’t be able to start. They will look back and say, how could people innovate? How could people build an enterprise without what we have available today? And that will be the natural way.

Mohammed: Should data quality be regulated by a strict global standard?

David: If you want a single word, then the answer is no. But if I am allowed to give you a sentence, let’s take, for example, the issue of AI models that are biased. Do we really believe that a single AI model can work for the entire world? Or are we ready to acknowledge that multiple points of view require different approaches? And as a consequence, different data sets that inform and allow training of different kinds of AI. And I think that is actually a richer and more resilient approach than believing that a monoculture should be instead dominating.

Sallyann Della Casa 🦓

Building GLEAC| Knowledge Transfer Platform w/ Experts, Mentors + Coaching Community| Knowledge management GPTs #aifirst company

2w

The unseen work drives the visible results. 

Like
Reply

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics