Trust and Safety in AI: How do we proactively identify and address emerging threats?

Trust and Safety in AI: How do we proactively identify and address emerging threats?

There is a small room in Scotland’s famed Edinburgh Castle that offers a more illuminating view than any that can be witnessed from its epic ramparts. It’s a rectangular room, with high ceilings and long, built-in bench tables that run the length of each wall. On the tables stand thick, wide books, one beside the other, each no further than a few inches from the next. At intervals there are candles, lit out of respect and memory for what lies inside those weighty tomes. Each book holds hundreds of pages, and on the front and back of each page — in excruciatingly small print — are dozens of names listed in alphabetical order. These are the names of the Scottish civilians who perished in World War II air raids. 

It puts the scope of conflict’s cost into a physical context that boggles the mind before leaving it somber and reflective. What stood out most prominently to me amongst the many lessons held within those walls  was that each of those people had their lives cut short by a technology that hadn’t even existed just a few short decades before. Flight. Explosives and advanced weaponry were not a new concept during the Second World War, but flight had progressed by leaps and bounds since the concept of air raids was first introduced during World War I. Those advances allowed for hundreds of thousands of tons of explosives to be dropped, indiscriminately and from thousands of feet up in the sky. It changed the face of global conflict forever, and brought war home for civilians at a previously unseen scale. 

Did the inventors of flight foresee that it would be used to bring destructive conflict to the doorstep of civilians around the world for years to come? It’s unlikely. For most innovators, intentions are both largely good and wildly idealistic. Taking to the skies was an act motivated by curiosity, by adventure, by the allure of expanded personal freedom. But regardless of intention, every new wave of technology brings with it a new set of threats. Whether those threats are the result of bad actors or simply unforeseen circumstances, it’s critical that we understand what they are and how we can shore up those vulnerabilities proactively. If we don’t, the consequences can be dire and long-lasting

Let’s rewind a few thousand years to one of the first technological advances our species ever made — the ability to preserve and store food. The development of these rudimentary technologies allowed for people who lived in more productive environments to accumulate wealth, another first in human history. The accumulation of wealth began humanity’s transition away from small, tribal cultures into enormous, hierarchical societies. That tilted the playing field drastically toward people who lived in areas that could support permanent settlements at scale, while putting tribal and nomadic cultures in existential danger. Essentially, our ability to preserve and store food led directly to the first instances of inequity. 

Fast forward to the advent of the Internet, a tool that has opened communication, commerce, knowledge and opportunity to billions of people around the globe. It is also a place where toxic, fringe ideas go to find support in its dark corners. It has provided a new attack surface for terrorists, thieves and rogue nations. It has made it easier to steal personal information, to conduct surveillance on average citizens, to mislead enormous groups of people with unfiltered misinformation. Like our ability to preserve and store food, like the power of flight, these threats have the very real power to derail and stunt societal progress for hundreds if not thousands of years into the future. 

With applicable, scalable artificial intelligence finally knocking at the door, we are now facing down a technological shift that has the potential to impact our lives more than every other advancement in human history combined. Understanding the potential threats it poses will be an ongoing process – mistakes and missteps are an inevitability, and the price of creative problem solving. But it also has to be a proactive process, because simply reacting to new threats as they emerge is a game of whack-a-mole we will not win. 

We are already beginning to see the first wave of these threats unfold. But collectively, we’re much smarter on tech than we were twenty or even ten years ago. So how do we use those insights and instincts to our advantage, to provide environments that are safe, secure and trustworthy? Understanding what these threats look like is the first step in identifying others that will certainly come to light as the stakes get higher. At a high level, let’s walk through each of the primary threats emerging around AI and ML today. 

Model Tampering. Model tampering is at the heart of all AI-based security threats today, and it is exactly what it sounds like — tampering with a model, with malicious intent. This can include changing parameters, poisoning data, inferring critical information based on observation of inputs and outputs, even finding backdoor triggers that produce the wrong outcome every single time. All of the specific threats discussed below fall into this larger umbrella of model tampering, and as of yet we likely haven’t seen even a fraction of the ways in which it can be done. 

Data Poisoning. Machine learning models depend on data - some more than others. Data poisoning is exactly like it sounds — polluting that data, so that a machine learns the wrong thing or reaches the wrong conclusion. This doesn’t simply mean altering the data. If data is altered with strategic intent — say, around a specific investment strategy for a financial firm — it can be beneficial. When data is manipulated or changed without intent, or with malicious intent, the results can be incredibly detrimental. Data poisoning is usually specific to a couple of areas: sentiment analysis manipulation, and recommendation systems. This can look like injecting fake ratings into a system so that a bad product looks good, or flooding a feedback channel with unhappy chatbots. 

Model extraction. Model extraction involves observing the model using queries and parameters that undermine the actual model itself. In the financial sector, for example, a competitor could extrapolate how your model is built by carefully observing those outputs, which then waters down the value of your model. It also exposes security vulnerabilities that can then be sold to attackers.

Inference attacks. Similar to model extraction, inference attacks allow bad actors to establish whether or not a specific data set was used to train your model. An inference attack aims to infer sensitive attributes of individuals in the data set by observing the model’s output. An attacker might infer undisclosed sensitive attributes, like the presence of disease in an insurance patient. 

Resource exhaustion. Similar to DDOS attacks, resource exhaustion literally means overworking or overloading a model until it produces the wrong outcome or shuts down altogether. Imagine posting on Instagram about a small gathering at your house, and a thousand people show up instead of the dozen you invited. There’s no way you have the snacks, drinks or seating to entertain that crowd, because your party wasn’t meant for a thousand people. This can take your model completely offline for extended periods of time, which can have significant financial consequences for an organization. 

These are just a small set of the vulnerabilities that exist in AI which can have real adverse impact if not properly addressed. If you are thinking about these and want to talk more I would love to hear from you.



Protik M.

CEO Empowering Enterprise Data & AI Teams with Outcome-Driven Business Models| Prior - COO at a VC backed Gen AI Guardrails Product Company , Co Founder with Successful Exit to Bain Capital

5mo

Insightful! Yousuf Khan

Like
Reply
Monica Bajaj

Mother| VP of Engineering, Okta| Board Member|Advisor|Investor|

6mo

What a coincidence Yousuf. I got to speak on the same topic at the GenAI summit last week. We need to compare notes :-)

Like
Reply
Marco Sanvido

EIR at Sutter Hill Ventures | VP of Engineering | Advisor | Mentor

6mo

Great article Yousuf Khan. For reference, NIST published a document listing all possible cyber attacks impacting ai systems: https://www.nist.gov/news-events/news/2024/01/nist-identifies-types-cyberattacks-manipulate-behavior-ai-systems

Like
Reply

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics