My Data and Design Ethics Manifesto
Unsplash.com

My Data and Design Ethics Manifesto

As a design researcher who works on any number of data-driven projects I've been thinking a lot about the ethical concerns surrounding "Big Data."

Though based in math and statistics, data-driven models are anything but impervious to the bias, prejudice and downright malevolent nature of humans. Whether these unethical values are passed on to the mathematical models by their creators or a function of their use (or abuse) math with no point of view doesn't exist in the world of Big Data. Because behind every data point, every binary expression, every formula, function and statistical model are actual human beings. Forget that at your peril.

For evidence that shit can go very wrong, you could take a cue from such dystopian science gone wrong fiction novels such as Huxley's "Brave New World," or Asimov's "I Robot," series or you can just look at our recent history of math used as a weapon detailed so bluntly in Cathy O'Neil's book, "Weapons of Math Destruction."

Writing like a bitter divorcee who caught her once beloved partner cheating on her, O'Neil paints a bleak picture of ethics and big data. She assails arbiters of algorithms that turn the beautiful elegance of math into "weapons of math destruction," (WMDs) with their distorted half-baked proxy predictive modeling that substitute real data for opinions and half-truths.

O'Neil spins oodles of tales about the evil that drove the algorithmic fueled house-of-cards housing crisis and the bad math used by bad actors to get rich and leave the rest of us shell shocked. She debunks the implied equity of standardize testing (which is neither standard or up to a fairness test) as a tool for evaluating teachers. She points out the inherent bias in algorithmic tools used in predictive sentencing modeling for those in the criminal justice system. Her list goes on and on.

To those of us who know full well the capacity for humanity to take even the most benign set of tools and turn them into arbiters of evil (insert minority card label here) O'Neil's case studies aren't a surprise.

After more than a decade of Big Data taking center stage and the fields of AI, sentient beings and robotics being thrust into the mainstream we're getting pretty darn good at recognizing bad actors wielding the power of data. The next challenge though drifts from the red flags to an area that's decidedly gray.

To the Facebook, Amazons and Googles of the world and, of course, to us, as we design data-fueled systems, this is where we really start to see thorny ethical issues quickly become knee deep crises without even trying.

The data-driven troubles range from the seemingly harmless to the downright offensive- the fact that one of Google's search algorithms showed higher paid jobs ads to men and not women, or the claim that Amazon's search algorithm seems to favor it's own products denying customers the best, or that Facebook plays around with your NewsFeed from showing you only sad news to gauge your reaction (icky) to suppressing conservative or other ideological news (iffy). To just showing you all fake news (sad). And not to mention what I call the "one up effect," of machine learning. And while the above mentioned quirks can often happen with biased data-driven models, it's the utter lack of understanding human nature coupled with a blind belief in the goodness of programming code that can be a dangerous and democracy-ended combination.

The Amplification of Algorithms

The Facebook Russian Election Scandal is probably a Law and Order episode soon, but let's revisit one of the biggest stories in tech in a decade. If you've been seeing a slew of data and ethics blog posts, lots of canary in coal mine crying about tech's lack of empathy and its danger it's probably because one of the biggest tech companies in the world was manipulated by a few ad-buying Russians who used the company's product to influence a national presidential election.

When you write it like that it sounds bizarre. How could one of the biggest platform's in the world with some of the smartest engineers on the planet get so duped? And not know it? And not stop it?

In a sense, Facebook has unwittingly become the classic case study of why you need more than engineers to create a data-driven product that does little harm.

To understand why lets take a look at how intelligent systems are built.

First, things first - what goes into an intelligent system like the ones Facebook creates to help users create and target information? The main ingredient are algorithms.

Algorithms are the building blocks of intelligent system models. Put simply an algorithm is a problem solving process. Most algorithms that form the backbone of intelligent programming or AI are decision-making models that learn to return an outcome.

They are programmed to sift through information and return results, like Google's search engine. Those results have criteria that allow them to be selected over other information. This is programmed by a human, usually a computer engineer or data scientist.

If they were organic matter, I'd label algorithms a virus. Algorithms, once programmed, are exceedingly efficient at task accomplishment, especially once they learn the basic tasks. Let's walk through an example.

Let's say you want to find the word "seed," in every piece of literature you come across. you could do CTRL+F for a million times but a more efficient way is to have a computer solve this problem. So, you can create an algorithm to comb through information and find the word "seed."

But how does an algorithm know what seed is? Like how is seed different from meed? They're both kinda close. Well, you can train it, so the machine, your computer, learns to recognize "seed," but avoids words like "weed," "need" and "meed."

You train it by feeding it a bunch of information that includes lots of words including "seed," and you train it to recognize what the word "seed," is. And then you tell it to repeat itself until it settles upon the word "seed," enough times without mistaking it for another word. This is what happens with image processing, natural language processing and other algorithmic based machine learning models.

But once an algorithm learns a task, once the learning curve is jumped, the program can get good at recognition much faster and it attacks this new found knowledge with a vengeance.

The algorithm can grow in knowledge, knowledge you can teach it or knowledge it can learn on it's own. Now it can match an image to the word "seed." Next it can match entire paragraphs where "seed," is barely mentioned but is relevant to the word. It can do this fast and overall accurately. And then, in "deep learning," it can start making its own decisions about relevancy to the word "seed." It's gone beyond your training and its interpreting and giving you and others not what they ask but what it thinks you want.

But an algorithm run amok and manipulated is why Facebook CEO Mark Zuckerberg testified before Congress. Throw a couple of bad actors - stay-at-home Russian hackers - and you have a recipe for abuse and disaster. With just a few "training," ads, Russian hackers used Facebook's algorithms to target Republicans and right-leaning audiences with fake, unproven and sensational, but popular, content. Other bad actors used Facebook's PageRank algorithm's penchant for likes, shares and comments to spread false, conspiratorial content like wildfire throughout social media.

But Facebook isn't the only one. Social media platforms who make their money selling to the eyeballs that consume their content use quantitative-based metrics to drive revenue. The quantitative quest focus of the algorithm led these platforms to distribute content that was embedded with racism, sexism and just about every other -ism bias known to humanity.

Facebook's PageRank systems, chasing likes, comments and shares, created a literal and figurative "filter bubble," serving up one-note wonder content that mainstream news could not penetrate.

Former Obama operative, Eli Pariser warned in 2011 that online platforms were creating the conditions for individuals to live in an alternate reality, one where only their opinions and their beliefs were reinforced. The 2016 elections were more than Pariser's threat come true. In reality, it was much, much worse.

Young Me Kim, a professor of journalism at the Univ. of Wisconsin-Madison, did extensive research on the targeted Russian ads created on and distributed by Facebook's platform during the 2016 elections. Her research is illuminating and depressing and shows the amplification of algorithmic models in stark, dark results.

She found that Russian-backed Internet Research Agency, was prolific in buying and distributing ads on Facebook that contained extreme and inflammatory and unproven right-leaning allegations. Here's an example, according to Wired Magazine:

One ad shared by multiple suspicious groups read: "Veterans before illegals. 300,000 Veterans died waiting to be seen by the VA. Cost of healthcare for illegals 1.1 billion per year.

The Russian ad buy was about $100,000 resulting in just 3,000 ads. But again, we're talking about the amplification of data-driven tools. These ads got liked and shared and therefore became more popular than actual news content because of Facebook algorithms.

In fact, Jonathan Albright, research director of the Tow Center for Digital Journalism at Columbia University, published an amazing research study on just much those Russian ads got shared throughout Facebook platform.

Using Facebook's own analytical tools and names of the Russian ad buying accounts, Albright estimates their content got shared 340 million times on the platform. That's beyond a filter bubble, that's a deluge the size of Niagara Falls.

Next week, I'll write a piece that breaks down how just 3,000 ads could influence millions of people and their election decisions, thanks to data-driven platforms.

But beyond Russia’s bad actors, the wildfire spread of Russian ads, fake and conspiracy content that dominated social media platforms in our last elections would never had happened if the platform’s data models hadn’t been designed to usurp human behavior. This isn’t happenstance. The very design of these models are predicated upon some of the most universally embodied biases that motivate human behavior. They were designed to use our biases against us.

The Russian ads would have fallen flat without a very real cognitive bias called availability cascade. Put succinctly, availability cascade is the tendency for humans to believe something solely because they see it repeated. The adage “repeat something long enough and it’ll come true,” describes the propensity people have to believe what they hear simply because they hear it a lot.

Former Facebook engineer, Antonio Garcia Martinez, amped up Pariser's canary cry, when he said to Buzzfeed about Facebook's complicity in the 2016 elections:

“I think there's a real question if democracy can survive Facebook and all the other Facebook-like platforms,” he said. “Before platforms like Facebook, the argument used to be that you had a right to your own opinion. Now, it's more like the right to your own reality." - former Facebook engineer to Buzzfeed.

You could say that these were bad actors using a tool to do bad things. In fact, that's what Facebook repeatedly said, at first.

Sheryl Sandberg, Facebooks COO said, as late as October 2017 that "At our heart, we're a tech company, we hire engineers. We don't hire reporters, no one's a journalist, we don't cover the news."

Facebook's insistence that it was a platform not a publisher of content was clearly wrong. The minute you decided to choose which content people consumed you became a publisher. Didn't matter if the content was created by you. You created its avenue for consumption. You used a data-driven model to decide which content was better served up to users.

And it's that desire for people to consume content, presumably to keep them on the platform and help them rake in ad dollars, that got the company into the uncharted territory of ethical dilemmas, wrestling with how to stop disseminating hate speech, racist content and even abusive language to others while keeping their platform pure but popular.

Facebook is now creating programs to ward against the abuse of its platform but there are some foundational beliefs that have to evolve for this problem to be solved.

One belief change is fundamental: Companies who use these data-driven tools must see itself as a designer, a creator of worlds and reality. We must see our active role in all this and cease thinking of ourselves as a passive platform. Another is a deeper understanding of human nature and how humans are interacting, using and being affected by its data-driven tool. And this, of course, is where need for ethical guidelines come in.

Experts say, tech companies and their "quant-focused," culture left social media titans blind to the consequences of their data-driven models. But there's also this dangerous belief that data-driven tools are, in and of themselves, incapable of harm. This is, frankly, not true. The tool is no more innocuous than the person who created it. As I've shown, these models are created, designed, trained and tested by humans. And that implies all that goes into making them has our human baggage. To assert differently, is just to ignore the truth.

But beyond filter bubbles and manipulated products, it's the data-driven models themselves that can cause havoc. Their focused on efficiency has led some decision-making data driven models to inject extremism directly into places unknown. The key here is it's an overt action by the tool itself. Not some unintended consequence. And, of course, this is the real danger.

Artificial Thinking Creates Real Action

Take Google's YouTube algorithm. Well, it's proprietary so you'll have to take it by virtue of how experts think it works. It's not just that the data-driven model distributes content to users based upon their requests. It's that once the algorithm learned; it decided to one up the user. YouTube's algorithm began recommending videos that were affiliated with search requests but surfaced content that had up until a few years ago been left to the dark corners of A.M. radio.

As detailed in The New York Times, Fortune Magazine and the Wall Street Journal, Google's YouTube's algorithm produced content results way beyond what users requested. With no discerning of the type of content it was recommending, this data-driven model served up extreme versions of users' original search requests.

Instead of just returning what you asked for, it begins "recommending" what it thinks is like content. This content, an investigation by the WSJ found, was far more extreme than the user ever requested.

So people like Alex Jones, Michael Savage, Mike Drudge, men only known to the small groups of listeners existing on a diet of A.M. radio and infomercials, suddenly began showing up YouTube recommendation lists for soccer moms looking for information about the days news.

For example, the WSJ staffers conducted an experiment searching for "FBI memo," on the day the news was all about the Republicans releasing a memo about intelligence officials conduct during the Trump campaign. The result:

On YouTube, after small thumbnails from mainstream news sources, the top result came from BPEarthWatch, which describes itself as “Dedicated to Watching the End Time Events that Lead to the Return of Our Lord Jesus Christ. Comets, Asteroids, Earth Quakes, Solar Flares and The End Time Powers.”

They also noted:

There were also videos from Styxhexenhammer666, whose informational page simply says, “I am God,” and from Alex Jones, the founder of Infowars, a site that often promotes conspiracy theories.

Correct me if I'm wrong but I'd never heard of BPEarthWatch and they certainly don't sound like what someone was looking for when they typed in "FBI Memo."

Facebook, Twitter and Google didn't build these algorithms with evil intent. But their ability to neglect their potential for harm was a major mistake. So how in the world do we prevent that?

Preventing AI "Agnostic Innocuousness"

Unless you work firmly in the military, artillery, bombing, or otherwise death-focused industries, the odds of you designing a deliberately, human-harming, data-fueled tool is slim, but not altogether impossible. 

It’s the inadvertent, unintentional consequence that presents danger for the rest of us not living on the edges. And that’s where a shared set of ethical guidelines are the most useful.

Without an intentional ethics discussion, no matter the project or the design, technologists, engineers, and designers risk falling into the hubris humanity trap I label AI “Agnostic Innocuousness.”

Agnostic Innocuousness is this belief that because the model uses data it’s exempt from human frailty. Read more about Agnostic Innocuousness and how to design our way out of that philosophy in this medium piece I wrote on Human Ideal Centered Design.

After years of working in this space, first as a journalist doing data-driven stories, now as a design researcher working on data-fueled systems, I found that to design with minimal harm to human beings, ethical guidelines must be woven into the creation process.

Ethics, the idea of what's right and wrong to govern the creation, have to be addressed as soon as possible. The whole create and then show my creation to an ethics panel is no good. As soon as the project, idea, platform or product is being brought into fruition someone who has a deep understanding of human behavior needs to be in the room, sleeves rolled up ready to create alongside the engineers, technologists and data scientists.

For some, the answer is teaching algorithms how to navigate ethical dilemmas. To me, that misses the point. It's the people designing these data-driven models that have to have the ethical discussions first. The machines learn from us. (At least for now...)

Working on these models for more than four years now, I've discovered there are some practical, concrete ethical guidelines I try to follow to ensure that the unintended doesn't stray from harmless to harm.

Taking my cue from the completely voluntarily, but utterly effective Code of Journalism Ethics (thanks Society of Professional Journalists) I've crafted a similar ethics framework that help me navigate the minefield that is Big Data and design.

These ethical guidelines aren’t meant to cover every possible scenario. In fact, they're not strict guidelines at all. They're more like conversation starters and beginning exploration topics to put you and your design team on the path of do no harm.

These guiding principles are meant to serve as the canary in the coal mine whenever you latch upon a data-driven design project.

They stem from a foundational place, where goodwill is assumed and the desire for the willful elimination of purposeful bad actors is at your core. In other words, I'm assuming you begin at a place of goodness.

If the idea of data mining income data in an attempt to redline poor people from securing home loans makes you wanna’ holla’ then you’re our kind of data designer. All others seek counseling. 

Let us begin.

What Ethics is and Is Not

To develop ethical principles it's better to have a good understanding of what ethical principles are. Simply put, ethical principles are the rules of behavior generally agreed upon by a group, culture, organization, industry or profession. They are rules people who have something in common agree to live by. (For a "short introduction to ethics," check out Simon Blackburn's Being Good."

One of the most famous examples people like to use when it comes to ethics is the Hippocratic Oath. But that's even problematic. Because it makes a common mistake: confusing ethics with morality. (Side note: Have you read the classic Hippocratic Oath? It's a quite interesting mission statement.)

Named after Hippocrates, a Greek physician who lived circa 460-370 B.C., the oath has some interesting principles embedded. Most people recall the phrase "First, do no harm," when thinking of the oath but of course all it doesn't exactly say the words "do no harm." More like, "keep [the sick] from harm and injustice." Sounds benign. Keep reading.

If you want to understand the difference between ethical and moral principles then check out the classic version of the medical oath and its new modern version written in 1964 by Louis Lasagna, Dean of the School of Medicine at Tufts University. (The American Medical Association has since come up with a lengthier code of ethics, that's a little less didactic on the morality.)

The classical oath bakes morality into the established code of conduct. It says the swearer "will neither give a deadly drug to anybody who asked for it, nor will I make a suggestion to this effect."

Wait what? Forbidding doctor-assisted suicide is baked into the oath. That's a moral stance. A bias that says doctors, though they could, should not take a life even if people ask for it to be taken. Read on.

The classic oath in the next sentence says, "Similarly, I will not give to a woman an abortive remedy. In purity and holiness I will guard my life and my art."

Say what? Yes, the original Hippocratic oath forbade doctor-assisted abortions. Even more bias.

There was much controversy for years over those two lines in the oath. Scholars say most medical schools and physicians weren't even aware how much morality was embedded in the oath; including the repeated references to God, gods and goddesses. In the 1960s, a more modern version of the oath was created and toned down the religious references in favor of thoughtful language. It contained phrases like "I must tread carefully in matters of life and death."

As you can see, there's an art to creating ethical principles that straddle the line between advocating best practices that do not harm others while leaving out specific moral standards and conduct dependent upon strident points of view that aren't universal. You have to be clear without being too specific. You have to be encompassing; yet remain universal.

From Design Ethics to Data Design Ethics

With that in mind...how might we design ethical data-driven products that are more beneficial to society than they are harmful?

 Well, let’s start with our principals as human-centered designers. What do we as human-centered designers value most?

 At IDEO we say we want to create positive impact through design. We do this through a human-centered approach to design our CEO Tim Brown dubs “design thinking.” This method yields some principles of design that include:

  • Transparency
  • Participatory
  • Contextual
  • Sustainable
  • Transformational
  • Inspiring 

I would argue that when approaching design projects which are backed by the creation or use of intelligent data systems, these design principles do not change.

With these as a guiding light, I have what I call my “Data Design Principles.” These are the specific buckets to begin broad exploration of topics that will help you come to a conclusion about whether to design a data engine or to walk away.

If the answers to the questions brought up by these categories aren't what you desire, make you feel icky or go against your ethics as a designer, then you probably should not design this data engine.

If you do, you need to put in place protocols, fail safes, and adjustments to ensure that what you create, when scaled, doesn't come back to haunt you or anyone else. Within these must explore categories are simply my must ask questions on projects that involve intelligent systems and their intersection with humans. They categories and questions are: 

  • Humans First: Is what I'm creating centered around a human need? Will it serve humans? Will it allow humans to do, be and become better? In short, is it human centered? If not, rethink the solution to make it so.
  • Privacy: Will what I create violate someone’s privacy rights, if so how are they notified, affected and in what ways? Can they use this product and maintain their individual privacy rights? If not, do they know what they're sacrificing? Is what there gaining really as valuable as the fundamental right to privacy? Or is the give-get exchange lopsided.
  • Identity: Will what I create strip away a person’s anonymity, if so, why, how, what are the ramifications of that, is this acceptable and if not are there ways to prevent this?
  • Safety: Will what I create harm others? If so, how in what way and why? How easily can someone use what I create to harm others? Are there ways to prevent this? How do I create checks and balances to prevent abuse and misuse?
  • Fairness and Equity: Can what I build be used to harm a protected class? And in what way?
  • Intended Use: Does this tool have an intended non-harmful use? Even so, could it eventually do harm? To whom and in what ways? How do you guard against abuse? Do I have checkpoints and milestones to iterate and check on my model embedded in the design? How will my model be used five years from now? 10? How will I know?
  • Transparency: Can others trace how I created this data-driven product? Is my process auditable? Can anyone trace back to understand how my model was created? If not why? Are all the people involved made aware of what I’ve created? If so why not?
  • Legal: Is my tool legal? Does it violate the letter or even the spirit of the law? Beyond privacy does it comply with other national, state, federal and international laws?

There's a lot to be done incorporating ethics into AI and intelligent systems design. It's a complicated issue. But we can no longer pretend that data is unbiased. It's created by people. It has it's problems. And any model created from data could potentially have those problems as well.

 

Candis Sistrunk

UX Researcher at Procter & Gamble- IoT / Connected Devices, Mobile Apps

6y

We would have to be deliberate in exercising the framework-- like creating an algorithm that identifies implicit bias or plays out scenarios of possible unintended consequences. Otherwise-the AI decision making would continue to mirror that of human decision making -- often lacking in representation, exploitative, tribal and potentially harmful to humans long term.

Candis Sistrunk

UX Researcher at Procter & Gamble- IoT / Connected Devices, Mobile Apps

6y

A much needed framework... I'll be re reading and digesting this entire piece later. But meanwhile... I'll comment that is just good business as well. It's the difference between making a sleep/alertness detection system for cars that works for Asian drivers, an AI like Alexa that works for folks with certain English accents ¿¿ people from India with accents, folks from the South, Black dialects??, and photo recognition features that don't categorize black people as apes, motion detection sensors that work for darker hued folks... The list goes on.

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics