OpenAI's Attempts to Watermark AI Text Hits • businessroundups.org

Did a human write that, or ChatGPT? It can be hard to say – maybe too hard, thinks its creator, OpenAI, which is why it’s working on a way to “watermark” AI-generated content.

In a reading at the University of Austin, computer science professor Scott Aaronson, currently a visiting researcher at OpenAI, revealed that OpenAI is developing a tool for “statistically watermarking the output of a text [AI system].” Whenever a system, say ChatGPT, generates text, the tool would embed an “imperceptible secret signal” indicating where the text came from.

OpenAI engineer Hendrik Kirchner built a working prototype, Aaronson says, and the hope is to build it into future OpenAI-developed systems.

“We want it to be much harder to take [an AI system’s] output and pass it off as if it came from a human being,” Aaronson said in his remarks. “This can be useful, of course, to prevent academic plagiarism, but also, for example, to generate mass propaganda — you know, spamming every blog with seemingly on-topic comments that support the Russian invasion of Ukraine without even building a troll-filled building in Moscow . Or mimic someone’s writing style to accuse them.”

exploit arbitrariness

Why need a watermark? ChatGPT is a strong example. The chatbot developed by OpenAI has taken the internet by storm, showing an aptitude not only for answering challenging questions, but also for writing poetry, solving programming puzzles and waxing poetic about a variety of philosophical topics.

While ChatGPT is a lot of fun – and really useful – the system raises obvious ethical concerns. Like many of the earlier text-generating systems, ChatGPT can be used to write high-value phishing emails and malicious malware, or to cheat on school assignments. And as a tool for answering questions, it’s actually inconsistent – a shortcoming that led the Q&A site to program Stack Overflow to ban answers coming from ChatGPT until further notice.

To understand the technical underpinnings of OpenAI’s watermarking tool, it’s helpful to understand why systems like ChatGPT work as well as they do. These systems view input and output text as strings of “tokens”, which can be words, punctuation marks and parts of words. At its core, the systems constantly generate a mathematical function called a probability distribution to decide which token (e.g. word) to execute, taking into account all previously executed tokens.

In the case of OpenAI hosted systems like ChatGPT, after the distribution is generated, OpenAI’s server does the task of sampling tokens according to the distribution. There is some randomness in this selection; therefore, the same text prompt may return a different answer.

OpenAI’s watermarking tool acts as a “wrapper” over existing text-generating systems, Aaronson said during the talk, using a cryptographic function running at the server level to “pseudo-randomly” select the next token. In theory, the system-generated text would still look random to you or me, but anyone holding the “key” to the cryptographic function could discover a watermark.

“Empirically, a few hundred tokens seem like enough to get a reasonable signal that yes, this text came from [an AI system]. In principle, you could even take a long text and isolate which parts it is likely to come from [the system] and which parts are probably not.” Aaronson said. “[The tool] can create the watermark with a secret key and it can check the watermark with the same key.

Main limitations

Watermarking AI-generated text is not a new idea. Previous attempts, most of them rule-based, relied on techniques such as synonym substitutions and syntax-specific word changes. But beyond the theory Research OpenAIs, published by the German institute CISPA last March, appears to be one of the first cryptography-based approaches to the problem.

When contacted for comment, Aaronson declined to reveal more about the watermark prototype, except that he expects to co-author a research paper in the coming months. OpenAI also declined, saying only that watermarking is one of several “provenance techniques” it is exploring to detect output generated by AI.

However, unaffiliated academics and industry experts shared mixed opinions. They note that the tool is server-side, meaning it wouldn’t necessarily work with all text-generating systems. And they argue that it would be trivial for opponents to work around it.

“I think it would be pretty easy to get around it by rephrasing it, using synonyms, etc,” Srini Devadas, a computer science professor at MIT, told businessroundups.org via email. “This is a bit of a tug of war.”

Jack Hessel, a research scientist at the Allen Institute for AI, pointed out that it would be difficult to imperceptibly fingerprint AI-generated text because each token is a separate choice. A fingerprint that is too obvious can lead to foreign words being chosen that reduce fluency, while a fingerprint that is too subtle leaves room for doubt when the fingerprint is sought.

ChatGPT answers a question.

Yoav Shoham, the co-founder and co-CEO of AI21 Labs, a rival to OpenAI, doesn’t think statistical watermarks will be enough to help identify the source of AI-generated text. He calls for a “more comprehensive” approach that includes differential watermarking, where different parts of the text are watermarked differently, and AI systems that more accurately cite the sources of actual text.

This particular watermarking technique also requires a lot of trust — and strength — in OpenAI, experts noted.

“An ideal fingerprint would be imperceptible to a human reader and allow highly reliable detection,” Hessel said via email. “Depending on how it’s set up, OpenAI itself may be the only party that can confidently provide that detection because of the way the ‘signing process’ works.”

In his talk, Aaronson acknowledged that the plan would only really work in a world where companies like OpenAI are at the forefront of scaling up advanced systems – and they all agree to be responsible players. Even if OpenAI shared the watermarking tool with other text-generating system providers, such as Cohere and AI21Labs, it wouldn’t stop others from using it.

“If [it] becomes free, many of the security measures become more difficult and perhaps even impossible, at least without government regulation,” Aaronson said. “In a world where everyone could build their own text model, that was as good as [ChatGPT, for example] … what would you do there?”

That’s how it plays out in the text-to-image domain. Unlike OpenAI, whose DALL-E 2 image-generating system is only available through an API, Stability AI has made its text-to-image technology open source (called Stable Diffusion). While DALL-E 2 has some API-level filters to prevent problematic images from being generated (plus watermarks on images it generates), the open source Stable Diffusion does not. Bad actors have used it to make deepfak porn, among other things.

Aaronson, for his part, is optimistic. In the talk, he expressed his belief that, if OpenAI can demonstrate that watermarking works and does not affect the quality of the generated text, it has the potential to become an industry standard.

Not everyone agrees. As Devadas points out, the tool requires a key, meaning it can’t be fully open source – potentially limiting adoption to organizations that agree to partner with OpenAI. (If the key were made public, anyone could deduce the pattern behind the watermarks and defeat their purpose.)

But maybe it’s not so far-fetched. A Quora representative said the company would be interested in using such a system, and it probably wouldn’t be the only one.

“You could worry that all that stuff about trying to be safe and responsible when scaling AI… once it seriously hurts the bottom line of Google and Meta and Alibaba and the other big players, a lot of it will go out the window, Aaronson said. “On the other hand, we’ve seen over the last 30 years that the big internet companies can agree on certain minimum standards, either out of fear of prosecution, because they want to be seen as a responsible player, or whatever.”

OpenAI’s Attempts to Watermark AI Text Hits • businessroundups.org

exploit arbitrariness

Main limitations

About Us

Categories

Userful Links

Latest Articles

Editor's Picks

Risk Management Strategies for Investment...

Why Financial Services Are Turning...

What is Erome? A Comprehensive...

The Rise of Autonomous Vehicles:...

OpenAI’s Attempts to Watermark AI Text Hits • businessroundups.org

exploit arbitrariness

Main limitations

Getaround defies chilly public markets with SPAC combination • businessroundups.org

Arii and Bini still together: find out the details of their relationship!

You may also like

About Us

Categories

Userful Links

Latest Articles

Editor's Picks