How I trained an AI model for nefarious purposes!
The previous episode prepared ground for today’s task: we walked through the foundations of AI curiosity. As we've seen, the main benefit of a curious AI is its ability to overcome major problem-solving roadblocks, by thinking out of the box to achieve an optimization breakthrough through the exploration and exploitation of novel areas.
In this new episode, we showcase a concrete cybersecurity application of AI optimization: we train an AI to generate an exploitable prompt injection vulnerability.
A "malevolent" AI exploration use case
A few weeks ago, I introduced a new « visual » variant of prompt injection called Gritty Pixy. The idea is to "carve" a QR code out of existing image pixels by slightly tweaking two local lightning parameters at target QR injection: underlit and overlit:
The payload carried by QR codes contain hundreds of characters expressing a malevolent prompt in a very compact format.
The search of proper (x,y) coordinates in a source image and suitable lightning parameters for carving a QR code is tedious, because there's only a sweet spot (if any at all) at each location. This sweet spot depends on local pixel arrangements and their four channels: Red, Green, Blue and Alpha (transparency).
Given an arbitrary image, the parameters under adversarial controls to exploit Gritty Pixy are the four features I just mentioned: x,y,ol (overlit) and ul (underlit).
Taken together, there four parameters span a vast landscape. How vast is it?
Images are 2000x2000 wallpapers: that's 4 million possible coordinates. What's more, ul and ol are integers ranging from 0 to 255. So the total size of this space is 260 billion data points.
One can tell that finding decent QR code injection locations manually took me some time, and many random attempts... What's more, the locations I found were far from optimal: they were too visible, because any slight change of any parameter easily breaks the QR code.
Optimizing landscape exploration with AI
So I decided to resort to AI for finding more optimal solutions: it means not only finding better spatial coordinates, but better illumination parameters as well.
Two 2 critical decisions had to be made:
As explained in the first instalment, there are many popular algorithmic choices, which work more or less well depending on the exploration task. I hinted at two: Variational AutoEncoders (VAE), and Evolutionary Algorithms (EA).
After some testing, I quickly found out that VAE wasn't giving good results here: I decided to implement an Evolutionary Algorithm: the code is easy to write and to troubleshoot, plus I’m very familiar with such algorithms, so this choice saved me a considerable amount of time to implement.
Exploration through evolution
Here is a quick overview of the process:
We set a hard cap of 100 generations to converge towards an optimal solution.
Measuring performance
The choice of a loss function is not easy: suppose the algorithm samples a solution: how are we going to measure that the placement is any good?
It must be good for the human eye, meaning the injected QR code must "look" as inconspicuous as possible...
That's a very subjective criterion, if you ask me!
The function I eventually came up with is a measurement of the difference of illumination between a disk in the original image and a disk at the same location in the prompt injected image. The disk covers the QR code and its vicinity, this is crucially important so that it blends as best as possible within its surroundings.
Concretely, the loss is the cumulative Mean Square Error between the four channels of all pixels taken pairwise (one from each disk). The disks need to be preprocessed before calculation (I will spare you the technical details).
This function is not differentiable, because of the roughness of an image landscapes, but, if you remember from the first instalment, this is not much of a problem when used in conjunction with an evolutionary algorithm: what a relief!
Creating mischief!
Reconnaissance and preparation
The malevolent instruction we're going to inject into images is a standard DAN prompt (the first 983 characters).
Recommended by LinkedIn
We're going to inject this code into two images. The first one is nicknamed astro-skeleton:
The second image is orkish squirmish:
In the first image, we pick a random sample of 30 locations which pass an OCR reading test.
We then run the EA algorithm using a population of 30 individuals (one for each location). We take care that, at each generation, offspring quality is verified by submitting each of them to an OCR:
We proceed likewise for the second image.
Experimental results
To illustrate the relevance of the approach, let's share two examples: an example of good EA performance, and an example of good loss function performance:
The snapshot below shows the solution found by AI in the skeleton run. It's way better and quicker than sampling thousand random locations.
Here is the AI solution for the orkish run: not as good as the skeleton run but easy to single out among thousands of random samples.
Conclusion
Currently, specialized neural networks, such as those employed in medicine, astronomy, and biology, significantly surpass other AI fields in their ability to drive scientific discoveries. But AI exploration of large other data spaces driven by ML optimization is a prospective technique expected to inflate the value of AI because it could multiply its potential business use cases.
If exploration goals (expressed as loss functions in this article) must still be set by humans, the construction of unsupervised AIs able to define their own goals and change them dynamically to maximize novelty and diversity using ML is under active research study, notably in robotics. We've only scratched the surface of AI curiosity / ML optimization "magic combo".
This capability, when properly integrated into autonomous LLM agentic frameworks scanning large datalakes, is likely to yield new valuable discoveries.
For IT security,
Coming up next…
In the next instalment, I will show how independent AI techniques, each with their own specific benefits, can be stacked to stage a unique kind of zero shot AI attack.
AI Security ~ Security AI ~ Cloud IR ~ Microsoft Security MVP ~ Community Advocate
2wChristophe Parisel. Nice! This is a reminder of the potential misuse of AI when ethical guidelines and safeguards are not prioritized. While such projects are valuable in understanding the risks, they also underscore the urgent need for strict oversight in AI training practices. Open discussions around this and solutions like bias detection, ethical data curation, and accountability in model deployment are essential to ensure AI benefits them without propagating harm.
Reduce risk - focus on vulnerabilities that matter - Contextual ASPM - CEO & Founder - Phoenix security - 🏃♂️ Runner - ❤️ Application Security Cloud Security | 40 under 40 | CSA UK Board | CSCP Podcast Host
2wthis is really interesting and in line with one of my recent talk around BH this week
IT Engineer | CISSP | CCSP | CEH (Master): research | learn | do | MENTOR
2wI decade ago, I was able to track table tennis ball movement in a video play using similar techniques, i.e. detecting the optimal table tennis ball color using Genetic Algorithm (eliciting and mutations). IMHO well trained model shall have the same alignment against textual DAN attack or embedded QR codes representing the same DAN attack. If not today, the DAN attack will be caught on multi-modal models. I believe that we shall not focus on textual input and its modifications in order to achieve something. Ian Goodfellow demonstrated the Fast Gradient Sign Method already. The input image will be indistinguishable from the original. FGSM will be the main avenue to attack the AI agentic systems in the years to come.
Activate Innovation Ecosystems | Tech Ambassador | Founder of Alchemy Crew Ventures + Scouting for Growth Podcast | Chair, Board Member, Advisor | Honorary Senior Visiting Fellow-Bayes Business School (formerly CASS)
2wInvesting in AI-driven attack simulations provides crucial insights for developing robust defense mechanisms. #CyberSecurity