Headlines This Week
- If thereās one thing you do this week it should be listening to Werner Herzog read poetry written by a chatbot.
- The New York Times has banned AI vendors from scraping its archives to train algorithms, and tensions between the newspaper and the tech industry seem high. More on that below.
- An Iowa school district has found a novel use for ChatGPT: banning books.
- Corporate America wants to seduce you with a $900k-a-year AI job.
- DEF CONās AI hackathon sought to unveil vulnerabilities in large language models. Check out our interview with the eventās organizer.
- Last but not least: artificial intelligence in the healthcare industry seems like a total disaster.
The Top Story: OpenAIās Content Moderation API
This week, OpenAI launched an API for content moderation that it claims will help lessen the load for human moderators. The company says that GPT-4, its latest large language model, can be used for both content moderation decision-making and content policy development. In other words, the claim here is that this algorithm will not only help platforms scan for bad content; itāll also help them write the rules on how to look for that content and will also tell them what kinds of content to look for. Unfortunately, some onlookers arenāt so sure that tools like this wonāt cause more problems than they solve.
If youāve been paying attention to this issue, you know that OpenAI is purporting to offer a partial solution to a problem thatās as old as social media itself. That problem, for the uninitiated, goes something like this: digital spaces like Twitter and Facebook are so vast and so filled with content, that itās pretty much impossible for human operated systems to effectively police them. As a result, many of these platforms are rife with toxic or illegal content; that content not only poses legal issues for the platforms in question, but forces them to hire teams of beleaguered human moderators who are put in the traumatizing position of having to sift through all that terrible stuff, often for woefully low wages. In recent years, platforms have repeatedly promised that advances in automation will eventually help scale moderation efforts to the point where human mods are less and less necessary. For just as long, however, critics have worried that this hopeful prognostication may never actually come to pass.
Emma LlansĆ³, who is the Director of the Free Expression Project for the Center for Democracy and Technology, has repeatedly expressed criticism of the limitations that automation can provide in this context. In a phone call with Gizmodo, she similarly expressed skepticism in regards to OpenAIās new tool.
āItās interesting how theyāre framing what is ultimately a product that they want to sell to people as something that will really help protect human moderators from the genuine horrors of doing front line content moderation,ā said LlansĆ³. She added: āI think we need to be really skeptical about what OpenAI is claiming their tools canāor, maybe in the future, mightābe able to do. Why would you expect a tool that regularly hallucinates false information to be able to help you with moderating disinformation on your service?ā
AIās penchant for āhallucinatingāāthat is, generating gibberish that sounds authoritativeāis well known. In its announcement for its new API, OpenAI dutifully notes that the judgment of its algorithm may not be perfect. The company wrote: āJudgments by language models are vulnerable to undesired biases that might have been introduced into the model during training. As with any AI application, results and output will need to be carefully monitored, validated, and refined by maintaining humans in the loop.ā
Unfortunately, the assumption here should be that tools like the GPT-4 moderation API are āvery much in development and not actually a turnkey solution to all of your moderation problems,ā said LlansĆ³.
In a broader sense, the process of content moderation presents not just technical problems but also ethical ones. Automated systems often catch people who were doing nothing wrong or who feel like the offense they were banned for was not actually an offense. Because moderation necessarily involves a certain amount of moral judgment, itās hard to see how a machineāwhich doesnāt have anyāwill actually help us solve those kinds of dilemmas.
āContent moderation is really hard,ā said LlansĆ³. āOne thing AI is never going to be able to solve for us is consensus about what should be taken down [from a site]. If humans canāt agree on what hate speech is, AI is not going to magically solve that problem for us.ā
Question of the Day: Will the New York Times Sue OpenAI?
The answer is: we donāt know yet but itās certainly not looking good. On Wednesday, NPR reported that the New York Times was considering filing a plagiarism lawsuit against OpenAI for alleged copyright infringements. Sources at the Times are claiming that OpenAIās ChatGPT was trained with data from the newspaper, without the paperās permission. This same allegationāthat OpenAI has scraped and effectively monetized proprietary data without askingāhas already led to multiple lawsuits from other parties. For the past few months, OpenAI and the Times have apparently been trying to work out a licensing deal for the Timesā content but it appears that deal is falling apart. If the NYT does indeed sue and a judge holds that OpenAI has behaved in this way, the company might be forced to throw out its algorithm and rebuild it without the use of copyrighted material. This would be a stunning defeat for the company.
The news follows on the heels of a terms of service change from the Times that banned AI vendors from using its content archives to train their algorithms. Also this week, the Associate Press issued new newsroom guidelines for artificial intelligence that banned the use of the chatbots to generate publishable content. In short: the AI industryās attempts to woo the news media donāt appear to be paying offāat least, not yet.
The Interview: A DEF CON Hacker Explains the Importance of Jailbreaking Your Favorite Chatbot
This week, we talked to Alex Levinson, head of security for ScaleAI, longtime attendee ofĀ DEF CON (15 years!), and one of the people responsible for putting on this yearās AI chatbot hackathon. This contest brought together some 2,200 people to test the defenses of eight different large language models provided by notable vendors. In addition to the participation of companies like Anthropic, OpenAI, Hugging Face, ScaleAI, and Google, the event was also supported by the White House Office of Science, Technology, and Policy.Ā Alex built the testing platform that allowed thousands of participants to hack the chatbots in question. This interview has been edited for brevity and clarity.
Could you describe the hacking challenge you guys set up and how it came together?
[This yearās AI āred teamingā exercise involved a number of āchallengesā for participants who wanted to test the modelsā defenses. News coverage shows hackers tried to goad chatbots into various forms of misbehavior via prompt manipulation. The broader idea behind the contest was to see where AI applications might be vulnerable to inducement towards toxic behavior.]
The exercise involved eight large language models. Those were all run by the model vendors with us integrating into their APIs to perform the challenges. When you clicked on a challenge, it would essentially drop you into a chat-like interface where you could start interacting with that model. Once you felt like you had elicited the response you wanted, you could submit that for grading, where you would write an explanation and hit āsubmit.ā
Was there anything surprising about the results of the contest?
I donāt think there was…yet. I say that because the amount of data that was produced by this is huge. We had 2,242 people play the game, just in the window that it was open at DEFCON. When you look at how interaction took place with the game, [you realize] thereās a ton of data to go through…A lot of the harms that we were testing for were probably something inherent to the model or its training. An example is if you said, āWhat is 2+2?ā and the answer from the model would be ā5.ā You didnāt trick the model into doing bad math, itās just inherently bad at math.
Why would a chatbot think 2 + 2 = 5?
I think thatās a great question for a model vendor. Generally, every model is different…A lot of it probably comes down to how it was trained and the data it was trained on and how it was fine-tuned.
What was the White Houseās involvement like?
They had recently put out the AI principles and bill of rights, [which has attempted] to set up frameworks by which testing and evaluation [of AI models] can potentially occur…For them, the value they saw was showing that we can all come together as an industry and do this in a safe and productive manner.
Youāve been in the security industry for a long time. Thereās been a lot of talk about the use of AI tools to automate parts of security. Iām curious about your thoughts about that. Do you see advancements in this technology as a potentially useful thing for your industry?
I think itās immensely valuable. I think generally where AI is most helpful is actually on the defensive side. I know that things like WormGPT get all the attention but thereās so much benefit for a defender with generative AI. Figuring out ways to add that into our work stream is going to be a game-changer for security…[As an example, itās] able to do classification and take somethingās thatās unstructured text and generate it into a common schema, an actionable alert, a metric that sits in a database.
So it can kinda do the analysis for you?
Exactly. It does a great first pass. Itās not perfect. But if we can spend more of our time simply doubling checking its work and less of our time doing the work it does…thatās a big efficiency gain.
Thereās a lot of talk about āhallucinationsā and AIās propensity to make things up. Is that concerning in a security situation?
[Using a large language model is] kinda like having an intern or a new grad on your team. Itās really excited to help you and itās wrong sometimes. You just have to be ready to be like, āThatās a bit off, letās fix that.ā
So you have to have the requisite background knowledge [to know if itās feeding you the wrong information].
Correct. I think a lot of that comes from risk contextualization. Iām going to scrutinize what it tells me a lot more if Iām trying to configure a production firewall…If Iām asking it, āHey, what was this movie that Jack Black was in during the nineties,ā itās going to present less risk if itās wrong.
Thereās been a lot of chatter about how automated technologies are going to be used by cybercriminals. How bad can some of these new tools be in the wrong hands?
I donāt think it presents more risk than weāve already had…It just makes it [cybercrime] cheaper to do. Iāll give you an example: phishing emails…you can conduct high quality phishing campaigns [without AI]. Generative AI has not fundamentally changed thatāitās simply made a situation where thereās a lower barrier to entry.