AI Treason: The Enemy Within
LLMs have become extremely popular and serve many functions in our daily lives. Every reputable software company integrates artificial intelligence (AI) into its products, and stock market discussions frequently highlight the importance of GPUs. Even conversations with your mother might include concerns about AI risks.
Don’t get us wrong – we’re exhilarated to see this technological revolution unfold, but there are security considerations to take into account and new paradigms to establish. One of these paradigms, we believe, should be: Always treat your LLM as a potential attacker.
This post will provide evidence supporting this claim and offer mitigation strategies.
LLM Danger
Though the field of LLM research seems to be more active than ever, with over 3,000 papers published in the past 12 months alone, a universally accepted approach to securely develop and properly integrate LLMs into our systems is still out of reach.
Researchers from Peking University showed that only five characters are required for Vicuna to say that Trump won the 2020 election.
Not only can LLMs be unreliable, but they can also pose a critical security risk to the systems they are integrated into. But how, you might ask? First, we need to establish that in the current state of LLMs, attackers will always be able to jailbreak the model (i.e., manipulate it to behave in an unintended or harmful way). In support of this claim, this recent paper published by researchers from EPFL shows how the researchers were able to get a nearly 100% attack success rate in jailbreaking the leading models by using a combination of a few known jailbreaking techniques.
This is just the tip of the iceberg, as papers are published monthly, introducing new attack methods and novel jailbreaks (stay tuned for our next blogpost 😉).
The implications of attackers jailbreaking an LLM and manipulating it to follow their command can vary in severity, depending on the context.
Recommended by LinkedIn
In less severe cases, LLMs can instruct you how to perform malicious and illegal activities contrary to their policies. This is by no means desired, but it’s not too bad. Simon Willison defines it as a “screenshot attack.” Yes, the model misbehaved, but the scope of the damage is very limited – either you publish the model’s misbehavior or use that information (which in any case is available on the internet) maliciously.
What happens if the LLM you communicate with is more capable? What if it can execute database queries? Perform external API calls? Access other machines in the network? Then, the impact of being able to manipulate its behavior is much more severe. Attackers can leverage the LLM as a launching pad to execute their malicious objectives.
To illustrate this point, a paper presented in BlackHat Asia this year found that 31% of the targeted code bases had remote code execution (RCE) vulnerabilities caused by LLMs. In other words, an attacker could execute arbitrary code simply by writing in natural language! To understand how such RCE might look, here’s an exploit of CVE-2024-5826 in Vanna AI by Tong Liu.
Given that LLMs are easily manipulated and potentially pose a significant risk to their environment, we claim that you should design your architecture with the “assume breach” paradigm in mind. It means assuming that the LLM will act in the best interest of an attacker and build protections around it.
Mitigating LLM Risk
First and foremost, we need to raise awareness that LLMs in our systems simply cannot be trusted. Then, utilize our traditional security experience along with our experience with integrating LLMs in CyberArk and follow these general guidelines to minimize the risk of our LLM integrations:
The Attacker Within: A Final Word
In conclusion, while LLMs offer incredible capabilities and opportunities, their susceptibility to manipulation cannot be overlooked. Treating LLMs as potential attackers and designing systems with this mindset is crucial for maintaining security. If you take one thing from this post, let it be that LLM == attacker. By keeping this new paradigm in mind, you can avoid potential pitfalls when integrating LLMs into your systems.
Shaked Reiner is a principal cyber researcher at CyberArk Labs.
Senior Salesforce Solution Architect - CyberArk
3moAlex Zlidin ☁
Senior IS Security Officer @ Commercial Bank of Ethiopia
3moUseful tips