AI Training and the Slow Poison of Opt-Out

Morten Rand-Hendriksen

Tech Educator | Keynote Speaker | Pragmatic Futurist | Critical Writer | Neurodivergent System Thinker | Dad

Published Jun 3, 2024

Asking users to opt-out of AI training is a deceptive pattern. Governments and regulators must step in to enforce opt-in as the mandated international standard. In my opinion.

In May 2024, European users of Instagram and Facebook got a new system message informing them all their public posts would be used for training AI starting June 26th. To exclude their content from this program, each user (and each business account) would have to actively opt-out - a process that requires knowing where to go and what to do. Additionally, even if you do opt out, and even if you don't even have a Facebook account, Meta grants itself generous rights to use any content it can get its hands on for AI training. From their How Meta uses information for generative AI models and features page:

"Even if you don’t use our Products and services or have an account, we may still process information about you to develop and improve AI at Meta. For example, this could happen if you appear anywhere in an image shared on our Products or services by someone who does use them or if someone mentions information about you in posts or captions that they share on our Products and services."

Bottom Trawling the Internet

Meta is not alone in this. The established standard for acquiring AI training data has been to scrape the internet of any publicly available data and use it as each AI company sees fit. And as with bottom trawling, the consequences to privacy, copyright, and the livelihoods of many creators are severe.

Historically, AI scraping has been done by default, without warning or even acknowledgement, often as part of general web scraping to support search indexes. As awareness of this practice has grown, some companies like Automattic (WordPress.com, Tumblr, etc) and now Meta now offer opt-out features so users can exclude their content from AI scraping, but this often comes with direct consequences to visibility and functionality. My cynical hunch is the platform companies are aware of the public pushback around these practices and they are now covering themselves legally. My hope is platforms offering an explicit opt-out potion means they have realized the wholesale scraping of the web is ethically problematic and they are at least trying to do something about it.

Here's the thing: The opt-out is part of the problem!

Power and the Principle of Least Privilege

A few years ago I attended a conference where each attendee was given a choice to attach a black or red lanyard to their badges. Black meant the event had permission to take photos and videos of the attendee, red meant they did not. If you didn't choose (or like me didn't listen when it was explained) they gave you a red lanyard.

This is a real-world implementation of the Principle of Least Privilege: Photographers were only allowed to create images of people who gave explicit permission; the attendees who opted in.

At a different conference that same year I saw the reverse of this approach: Scattered around the venue were posters reading as follows:

"The [Conference] reserves the right to photograph any attendee for use in promotional materials. If you do not wish to be in the pictures, please notify the roaming photographers."

Here, the attendees were opted in by default, and it was up to each attendee to actively opt out at each interaction with a photographer. Needless to say this is not feasible, and as a result everyone at the conference either relented to having their pictures taken or left.

I think most will agree the first conference acted ethically towards the attendees, the second did not. In fact, the second conference experienced a major backlash after the event, and the following year they handed out "NO PHOTO" stickers for attendees to put on their badges if they so desired.

There are two important takeaways here:

First, when it's a real-world situation, most people immediately see the ethical missteps of the second conference. And second, even so most attendees stayed at the conference knowing they might be photographed against their will.

The conference created a power dynamic where people who didn't want to be photographed were left with bad options: Constantly be on guard for photographers to tell them they did not want their picture taken, or leave the conference they paid and probably travelled to attend. It's unethical, but it's not explicitly illegal, and in the end it means they get more promo shots to use. So be it if some attendees are uncomfortable.

AI scraping and the current opt-out strategy falls squarely in the same category as the second conference. While the obvious ethical choice is to let people opt-in to AI scraping, an opt-out option provides just enough cover to not get sued while ensuring broad access to content because most users won't go through the trouble of opting out - especially if you make the feature hard to find and hard to use.

My Content, My Choice

Platforms have long argued they can do what they will with user content. In fact, using user content to meet business needs is the economic basis for most platforms, and this is the bargain we've collectively agreed to.

The Choice is Ours

We are at the very beginning of a new era of technology, and we're still figuring it all out. This means right now we have the power to make decisions, and the responsibility of making the right decisions.

This is the moment for us to learn from our mistakes with surveillance capitalism and take bold steps to build a more just and equitable world for everyone who interacts with technology.

One of the first, and most straightforward steps we can take right now is to make a simple regulation for all tech companies dealing with user data:

Users must opt-in to any change in how their data is handled.

And to protect users:

Choosing not to opt in must not impact the user experience of existing features.

This puts the onus on the AI companies to get consent when collecting data to train their models, and gives users agency to choose what if any AI training they want their data included in.

If I wanted to make a name for myself in the political realm, this is where I'd start: With a self-evident regulation protecting the rights of every person to own their own work.

We shall see.

Cross-posted to mor10.com

Morten Rand-Hendriksen is a philosopher and tech educator specializing in AI, emerging technologies, the web, and the intersection between technology and humanity. He creates courses at LinkedIn Learning, speaks at major conferences, and voices his opinions about technology and how it shapes us across several channels.

Emergent Thoughts

22,259 followers

+ Subscribe

Berny Abarca Coto

6mo

Morten Rand-Hendriksen pertinent as usual.

Sergio R.-Solís

Content Manager for ES Library in Tech, AI, Data Science, Cybersecurity, Business Software and Marketing @ LinkedIn Learning

6mo

Any option different from opting-in (for whatever use that a company makes from user data) is giving private entities power over individual rights about their privacy, IP, and so on. It is extremely harsh against non-users, because they can't even have a say. Also, scrapping content by default may be against many kind of content licenses, like CC, specially when attribution is requested or when derivatives are not allowed to get profit, modifications or when share-alike licensing is required. Nobody would question that scrapping Disney+ movies would be illegal to train another service without paying and having Disney's permission. Why would it be legal to do the same with a random person blog? There is no doubt any self respected company would defend their IP tooth and nail. But when it is about training AI models, some just feel entitled to take whatever they want from wherever it is and that isn't fair. The end does not justify the means.

1 Reaction

Tiago Gameiro

6mo

Thank you for raising this flag Morten Rand-Hendriksen :) I'm very curious if all of this, including the elusive opt-out process, is even legal under EU regulations.

2 Reactions

Debbie Reynolds

6mo

Love this. Thank you and I agree. Opt-in should be the global default.

2 Reactions

Alexandru Luchiian

Connectivity Specialist-Environment, climate change, air quality,Impact Rater

6mo

Good point!

1 Reaction

See more comments

To view or add a comment, sign in

See all

AI Training and the Slow Poison of Opt-Out

Morten Rand-Hendriksen

Tech Educator | Keynote Speaker | Pragmatic Futurist | Critical Writer | Neurodivergent System Thinker | Dad

Bottom Trawling the Internet

Power and the Principle of Least Privilege

My Content, My Choice

Recommended by LinkedIn

The Choice is Ours

Emergent Thoughts

22,259 followers

More articles by this author

Insights from the community

Others also viewed

AI for Business Leaders - Newsletter 30 May 2024

Exploring Sora: OpenAI's Leap into AI-Powered Video Generation

#89 - Slop Streams

Goodbye Google: SearchGPT & Perplexity Are Taking Over AI Search

Latest Breakthroughs in the World of AI: Weekly Update!

Is Microsoft close to sealing a deal to invest in OpenAI?

🧙🏼♂️ Lore AI Newsletter #16

Google CEO Addresses Gemini's Bias Concerns | Zuckerberg Visits South Korea | OpenAI Accuses The New York Times of "Deceptive Prompts".

ChatGPT and SEO – A Digital Marketing Crossroad

Want your own personalized chatbot? It's possible now!

Explore topics

Bottom Trawling the Internet

Power and the Principle of Least Privilege

My Content, My Choice

Recommended by LinkedIn

The Choice is Ours

Emergent Thoughts

22,259 followers

After WordPress

Dec 21, 2024

As the Mask Drops, It's Time to Face the Politics of Tech

Nov 8, 2024

Rubicon

Oct 15, 2024

As We Break Surface – The AI Transmutation of Web Dev

Sep 10, 2024

It’s time to abandon reckless oil propagandists

Jul 3, 2024

GPT-4o, OpenAI, and Our Multimodal Future

May 14, 2024

Ten Questions for Matt Mullenweg Re: Data Ownership and AI

Feb 29, 2024

AI Coding Assistants Made Me Go Back to School

Feb 20, 2024

The Challenge, and Opportunity, of OpenAI's GPT Store

Jan 9, 2024

Do Humans Dream of Electric Minds? How language influences our thinking about AI

Dec 7, 2023

Insights from the community

Others also viewed

AI for Business Leaders - Newsletter 30 May 2024

Exploring Sora: OpenAI's Leap into AI-Powered Video Generation

#89 - Slop Streams

Goodbye Google: SearchGPT & Perplexity Are Taking Over AI Search

Latest Breakthroughs in the World of AI: Weekly Update!

Is Microsoft close to sealing a deal to invest in OpenAI?

🧙🏼♂️ Lore AI Newsletter #16

Google CEO Addresses Gemini's Bias Concerns | Zuckerberg Visits South Korea | OpenAI Accuses The New York Times of "Deceptive Prompts".

ChatGPT and SEO – A Digital Marketing Crossroad

Want your own personalized chatbot? It's possible now!

Explore topics