Watermark for LLM-Generated Text

Researchers at Google have developed a watermark for LLM-generated text. The basics are pretty obvious: the LLM chooses between tokens partly based on a cryptographic key, and someone with knowledge of the key can detect those choices. What makes this hard is (1) how much text is required for the watermark to work, and (2) how robust the watermark is to post-generation editing. Google’s version looks pretty good: it’s detectable in text as small as 200 tokens.

Posted on October 25, 2024 at 9:56 AM10 Comments

Comments

Clive Robinson October 25, 2024 11:12 AM

Any real difference between it and the Digital Watermarking that failed in the 1990′?

By the sound of it it is very little different to the variation of “Low Probability of Intercept”(LPI) systems they tried and failed to get working in the 1990’s for Digital Watermarking of Pictures, Sound,and Audio files.

The “secret sauce” this time is the use of encryption to provide “psudo-random” for token sekection.

Think if you will as “words from a dictionary” being the same as “letters from an alphabet”. The word or letter is a token from a limited subset of the dictionary or alphabet at each step.

The trick is to have say two or more words of equiprobablity for each step.

The problem is that the words are very very far from independent of each other at each step.

That is each word is in effect selected by the previous words on so many different spectrums/vectors that it could easily boil down to only a single word after as little as five preceding words.

Spotting such word sequences and the random weighting depends on how much your detector predicts word sequences.

In a way it’s a form of frequency analysis that was the earliest of known crypto attacks.

Whilst detecting the “syndrome” or “distinguisher” being present will not be that hard in a simple system. The question is can it be “fully removed” without leaving a trace in some spectrum.

My gut feeling on this based on previous experience of designing matched filter systems to remove non random artifacts (man made as opposed to natural noise) is that you will get an exponential cost. That is removing the first 50% will cost the same as the next 25% and the same again for the next 12.5% or worse. Each step there will be residual trace that can be shown if you know the “secret key” that defines the pseudorandom signal.

BCS October 25, 2024 12:08 PM

I wonder how well the same encoding could be used for steganography? Could you encode LLM version strings? Or maybe do an English to English “translation” of human written text that adds extra meta data?

ratwithahat October 25, 2024 1:27 PM


With text containing around 200 tokens, the authors showed that they could still detect the watermark, even when a second LLM was used to paraphrase the text.

Sounds fantastic, but I’m wondering why they didn’t test real people paraphrasing it which seems obvious and a huge oversight.

Also wondering how many AI makers would implement this. It could become a regulated requirement for big/popular LLMs, I suppose. Of course, this wouldn’t be very effective against homemade AI.

Morley October 25, 2024 5:23 PM

It might be important for short text, like from social media bots. That seems harder to solve. Maybe something involving fingerprinting.

Daniel Popescu October 26, 2024 5:48 AM

Apart from what Clive said(wich I understood about 30% of 😁), I suppose the copyright industry would gain a lot, the academic integrity world and so on. If implemented on a larger scale.

Clive Robinson October 26, 2024 7:24 AM

@ Morley,

With regards,

“Maybe something involving fingerprinting.”

What you are trying to do is actually quite hard. Which is add “traceable forensics” which is of the tangible physical world, to the intangible information world.

Like adding oil to water, unless you do it right they are very easy to separate to a point where any measurement is meaningless within the noise.

This is because information has no actual physical component that is no matter or energy thus forces etc to be measured. The physical component comes from the matter/energy that the information is impresses onto or modulates and “the coding method used”.

So to “fingerprint information” has to be done by information to information, and they both have to have certain properties for it to work.

The first such property that is essential is “redundancy” within the base information (not it’s coding) and that is not always consistent in it’s availability[1]. In fact all to often there is not sufficient “redundancy” but “complexity” and that can be distinguished in a way that facilitates it’s removal or negation.

The second essential property is there has to be some form of “reliable distinguisher” in the fingerprint otherwise it’s presence is not going to be found on examination or is going to be open to challenge.

There are other essential features but the problems of these two alone are probably enough to stop any “general fingerprinting” system in current Generative AI output.

The fact is we have a clear indicator this is likely to be so. Students have been accused of using Generative AI to write coursework. On testing the systems that “find traces of AI” they all have significant failings and challenges through legal processes have been started.

Educational establishments, tend to be good targets for taking legal action against. Because they tend to have a lot of fixed assets like building land and patents, but little in the way of available assets to pay for decent legal representation.

It’s likely that “AI spoting AI” or similar systems we currently see will be sufficiently discredited in the not to distant future that like the 1990’s DRM watermarking they will quickly become “forgotten” like many other bad ideas… Thus lurk just waiting to be “re-invented” in a couple of decades or so. Such is the history of crime, charlatans and their con and snake oil systems.

[1] It’s seen as “fairly easy” to find redundant information in a picture of a forest, but how about the oldest “trade mark” of a red triangle on a pale coloured background? You only have two colours, and three edges that have a fixed ratio to each other to play with. The next obvious thing is scale but you don’t have a reference to work against. So the next thought is make the edges slightly fuzzy by adding noise or dither and hiding your fingerprint in there… Only it’s way to easy to remove. Have a look through the 1990’s papers on “Digital Rights Management”(DRM) additive watermarking schemes. They were all to fragile and not universal enough thus failed. In part this was because the human senses had so much error correction that simple blanketing or distorting techniques could be used to bury any DRM watermarks to uselessness by using just random noise and distortion that had no relationship to the watermarking coding. You will find this still holds true for any current type of Generative AI output. So your “finger print” gets smudged and distorted beyond the point where it is usable.

Jelo 117 October 27, 2024 9:36 PM

This seems in principle like the alterations that cause images to be machine recognised as things completely different from what the eye sees. These alterations were intended to defeat machine recognition, but could just as well be considered as a kind of watermarking.

In the image case, the alterations are essentially never features of a “natural” image so their use as watermarking is reliable.

Text characteristics are more apparent and it seems harder to claim text that appears natural to us includes characteristics that would be unlikely to ever occur naturally and so their use as watermarking is less reliable.

probable October 31, 2024 9:41 AM

@mark

This is pointless, unless it’s mandated by law, and all the chatbots are coded to add the watermark.

Also when you ask ChatGPT to write a text, you do not need to accept the first one. You can ask for other versions, or a version in a particular style (e.g. in the style of Scott Westerfeld). It’s doubtful that all variants would have the tokens needed for the watermark.

Leave a comment

All comments are now being held for moderation. For details, see this blog post.

Login

Allowed HTML <a href="URL"> • <em> <cite> <i> • <strong> <b> • <sub> <sup> • <ul> <ol> <li> • <blockquote> <pre> Markdown Extra syntax via https://michelf.ca/projects/php-markdown/extra/

Sidebar photo of Bruce Schneier by Joe MacInnis.

  翻译: