AI Training and the Slow Poison of Opt-Out
Asking users to opt-out of AI training is a deceptive pattern. Governments and regulators must step in to enforce opt-in as the mandated international standard. In my opinion.
In May 2024, European users of Instagram and Facebook got a new system message informing them all their public posts would be used for training AI starting June 26th. To exclude their content from this program, each user (and each business account) would have to actively opt-out - a process that requires knowing where to go and what to do. Additionally, even if you do opt out, and even if you don't even have a Facebook account, Meta grants itself generous rights to use any content it can get its hands on for AI training. From their How Meta uses information for generative AI models and features page:
"Even if you don’t use our Products and services or have an account, we may still process information about you to develop and improve AI at Meta. For example, this could happen if you appear anywhere in an image shared on our Products or services by someone who does use them or if someone mentions information about you in posts or captions that they share on our Products and services."
Bottom Trawling the Internet
Meta is not alone in this. The established standard for acquiring AI training data has been to scrape the internet of any publicly available data and use it as each AI company sees fit. And as with bottom trawling, the consequences to privacy, copyright, and the livelihoods of many creators are severe.
Historically, AI scraping has been done by default, without warning or even acknowledgement, often as part of general web scraping to support search indexes. As awareness of this practice has grown, some companies like Automattic (WordPress.com, Tumblr, etc) and now Meta now offer opt-out features so users can exclude their content from AI scraping, but this often comes with direct consequences to visibility and functionality. My cynical hunch is the platform companies are aware of the public pushback around these practices and they are now covering themselves legally. My hope is platforms offering an explicit opt-out potion means they have realized the wholesale scraping of the web is ethically problematic and they are at least trying to do something about it.
Here's the thing: The opt-out is part of the problem!
Power and the Principle of Least Privilege
A few years ago I attended a conference where each attendee was given a choice to attach a black or red lanyard to their badges. Black meant the event had permission to take photos and videos of the attendee, red meant they did not. If you didn't choose (or like me didn't listen when it was explained) they gave you a red lanyard.
This is a real-world implementation of the Principle of Least Privilege: Photographers were only allowed to create images of people who gave explicit permission; the attendees who opted in.
At a different conference that same year I saw the reverse of this approach: Scattered around the venue were posters reading as follows:
"The [Conference] reserves the right to photograph any attendee for use in promotional materials. If you do not wish to be in the pictures, please notify the roaming photographers."
Here, the attendees were opted in by default, and it was up to each attendee to actively opt out at each interaction with a photographer. Needless to say this is not feasible, and as a result everyone at the conference either relented to having their pictures taken or left.
I think most will agree the first conference acted ethically towards the attendees, the second did not. In fact, the second conference experienced a major backlash after the event, and the following year they handed out "NO PHOTO" stickers for attendees to put on their badges if they so desired.
There are two important takeaways here:
First, when it's a real-world situation, most people immediately see the ethical missteps of the second conference. And second, even so most attendees stayed at the conference knowing they might be photographed against their will.
The conference created a power dynamic where people who didn't want to be photographed were left with bad options: Constantly be on guard for photographers to tell them they did not want their picture taken, or leave the conference they paid and probably travelled to attend. It's unethical, but it's not explicitly illegal, and in the end it means they get more promo shots to use. So be it if some attendees are uncomfortable.
AI scraping and the current opt-out strategy falls squarely in the same category as the second conference. While the obvious ethical choice is to let people opt-in to AI scraping, an opt-out option provides just enough cover to not get sued while ensuring broad access to content because most users won't go through the trouble of opting out - especially if you make the feature hard to find and hard to use.
My Content, My Choice
Platforms have long argued they can do what they will with user content. In fact, using user content to meet business needs is the economic basis for most platforms, and this is the bargain we've collectively agreed to.
Recommended by LinkedIn
Building on this premise, platforms and AI companies now want to extend this principle to AI training, claiming both that they have a right to use the data without explicit permission because it's public, and that not being able to use it without explicit permission would make it impossible for them to operate at all.
I think it's high time we to question both these stances:
Letting platforms do what they wish with our content was always a Devil's bargain, and we're now acutely aware of how bad of a deal it really was. The negative effects of surveillance capitalism, filter bubbles, and ad-driven online radicalization engines (nee "recommendation algorithms") are plain to see and play a significant part in the erosion of everything from privacy to democracy.
The claim that an entire business category can't be competitive unless it has free access to raw materials is one we've heard before, and again we know the consequences. Bottom trawls and overfishing has depleted our oceans, pollution chokes our air and waters, the exploitation of cheap labour in the global south keeps billons of people in chronic poverty. To say these are false equivalences is to ignore the reality of what we're talking about. While the actual bits and bytes collected during an AI scrape are not a finite resource, the creative energy that went into creating them are. And the purpose of scraping data from any source is to train a machine to mimic and otherwise use that data in place of a human mind.
Opt-out is a slow poison because it puts choice just far enough away that becomes out of reach for most people. It makes a choice on our behalf and then forces us to negate it. It's exactly opposite of how it should be.
The Choice is Ours
We are at the very beginning of a new era of technology, and we're still figuring it all out. This means right now we have the power to make decisions, and the responsibility of making the right decisions.
This is the moment for us to learn from our mistakes with surveillance capitalism and take bold steps to build a more just and equitable world for everyone who interacts with technology.
One of the first, and most straightforward steps we can take right now is to make a simple regulation for all tech companies dealing with user data:
Users must opt-in to any change in how their data is handled.
And to protect users:
Choosing not to opt in must not impact the user experience of existing features.
This puts the onus on the AI companies to get consent when collecting data to train their models, and gives users agency to choose what if any AI training they want their data included in.
If I wanted to make a name for myself in the political realm, this is where I'd start: With a self-evident regulation protecting the rights of every person to own their own work.
We shall see.
--
Morten Rand-Hendriksen is a philosopher and tech educator specializing in AI, emerging technologies, the web, and the intersection between technology and humanity. He creates courses at LinkedIn Learning, speaks at major conferences, and voices his opinions about technology and how it shapes us across several channels.
Morten Rand-Hendriksen pertinent as usual.
Content Manager for ES Library in Tech, AI, Data Science, Cybersecurity, Business Software and Marketing @ LinkedIn Learning
6moAny option different from opting-in (for whatever use that a company makes from user data) is giving private entities power over individual rights about their privacy, IP, and so on. It is extremely harsh against non-users, because they can't even have a say. Also, scrapping content by default may be against many kind of content licenses, like CC, specially when attribution is requested or when derivatives are not allowed to get profit, modifications or when share-alike licensing is required. Nobody would question that scrapping Disney+ movies would be illegal to train another service without paying and having Disney's permission. Why would it be legal to do the same with a random person blog? There is no doubt any self respected company would defend their IP tooth and nail. But when it is about training AI models, some just feel entitled to take whatever they want from wherever it is and that isn't fair. The end does not justify the means.
Front-End Developer | Web Designer | Wordpress | Elementor | Woocommerce | HTML | CSS | Javascript
6moThank you for raising this flag Morten Rand-Hendriksen :) I'm very curious if all of this, including the elusive opt-out process, is even legal under EU regulations.
The Data Diva | Data Privacy & Emerging Technologies Advisor | Technologist | Keynote Speaker | Helping Companies Make Data Privacy and Business Advantage | Advisor | Futurist | #1 Data Privacy Podcast Host | Polymath
6moLove this. Thank you and I agree. Opt-in should be the global default.
Connectivity Specialist-Environment, climate change, air quality,Impact Rater
6moGood point!