All Else Equal

Daniel Tunkelang

Query Understanding

Published Sep 10, 2024

In The Three-Body Problem, Liu Cixin describes how an alien species drives scientists to suicide by making it impossible for them to produce consistent experimental results. Some might find it difficult to relate to the scientists’ existential despair, but I found the premise compelling and chilling.

In this post, I do not tackle anything so sinister or abstract. Rather, I challenge a key assumption of A/B testing — namely, that all else is equal. I hope to inspire curiosity and reflection rather than existential despair.

A/B Testing: A Simple Example

A/B testing is the most popular method for online experimentation. It compares two versions of an application to determine which one performs better. Typically, one is the “treatment” we are considering as a change and the other is a “control” that represents the current state of the application.

For example, consider a simple A/B test to determine whether increasing the page size for search results from 10 (control) to 20 (treatment) leads to an increased conversion rate. This is about as simple an A/B test as it gets.

Let us imagine that the test is successful, delivering a statistically significant increase in the conversion rate. What does this tell us?

The World is Not So Simple

The answer may seem obvious: doubling the page size increases the conversion rate. More precisely, this result only holds if we hold all else equal — since the change in page size might interact with other changes, such as changing the page design. However,.we have to be even more pedantic: the result only holds given the current state of the world.

Consider the factors of screen size and network latency, both of which are determined by the searchers’ devices and locations. Both of these factors interact with page size to affect the experience. Increasing the page size may increase conversion in one set of conditions but decrease it in others.

Recommended by LinkedIn

The Enchantress of Numbers: Ada Lovelace and Charles…

AI for Good 3 months ago

The Snowball Effect

Mukul Pal 2 years ago

Unleashing the Power of DNA: The Future of Data Storage

Professor(Dr) Sanjay Kuma R. 8 months ago

In the physical world, we do not generally worry about the laws of nature being time-dependent. We treat the law of gravity and the speed of light as constants. However, the digital world changes far more rapidly than the physical one, as does user behavior. That makes it dangerous to assume that the conditions for an experiment hold indefinitely.

Do Not Despair!

If you rely on AB testing as part of your day job, you might find this state of affairs disheartening. But please do not despair! We have it much better than the scientists in The Three-Body Problem. No aliens are out to get us!

Fortunately, there are things we can do to detect changes likely to invalidate our experiments over time. Here are a few:

Reverse Testing. You can revisit an A/B test by reversing it — that is, using the current version as the control and the old version as the treatment. The catch is that maintaining the ability to perform reverse tests requires discipline and can incur technical debt.
Long-Term Holdout. Typical A/B tests are short, e.g., two weeks. Running the control for longer (e.g., three months or a year) hedges against conditions changing during that time.
Monitoring. While it is important to look at metrics when evaluating a change as part of an A/B test, it is also important to look broadly at metrics over time when you are not making any changes. Trends or sudden changes in metrics can tell you when the world is changing.
Snapshots. While monitoring can alert you to unexpected changes in metrics, a more direct approach is to take a snapshot of metrics that hold at the time an A/B is conducted. Changes in those are particularly likely to invalidate the test results.

This list is not exhaustive. Hopefully, it helps you think about ways to keep in mind conditions outside the explicit scope of your experiments.

The Only Constant is Change

As Heraclitus said, the only constant is change. When we perform A/B tests, we need to bear in mind that the results assume present conditions that are subject to change. As Ferris Bueller warned us, “Life moves pretty fast. If you don’t stop and look around once in a while, you could miss it”.

Russell Jurney

Graphs and Generative AI

3mo

Hahahaha, nice opener.

1 Reaction

Joel Barajas

PhD, Principal Data Scientist, Ad Measurement Architect at Walmart Ads

3mo

I am glad you are talking about the dynamics of changing conditions (often over-looked by people running the tests). You are mainly touching about the external validity of a result (say it was tested in June) to hold all the time moving on (say in Christmas holidays). I tend to believe that the misconception comes from importing RCTs from medical treatments, where people’s health outcomes are easier to extrapolate. Long-term holdouts are probably best, but they are noisier (smaller groups) and sometimes difficult to disentangle if you have >5 small changes released over the course of the holdout. Probably it is better to to keep questioning side effects or unexpected behavior that trigger a new test again changing the “improved version”. This probably what keeping A/B tests even for mature products is needed to adapt to a ever-changing landscape. Just a POV

2 Reactions

See more comments

To view or add a comment, sign in

All Else Equal

Daniel Tunkelang

Query Understanding

A/B Testing: A Simple Example

The World is Not So Simple

Recommended by LinkedIn

Do Not Despair!

The Only Constant is Change

More articles by Daniel Tunkelang

Insights from the community

Others also viewed

The Exotic Geometry of Randomness

A Gentle, Original Approach to Stochastic Point Processes

RESPONDING THIS THOUGHT PROVOCKING PICTURE...

URSABLOG: If You Can’t Take A Joke

The Conscious Web

Internet, from Utopia to Nightmare

5 parables from science & technology that Illustrate the nature of God

haud hogmanay - do you linkin 5 skills most needed for MS to be 1st Sgen?

Improve your fact-checking with one simple change

Behold: The Thompson Postulate

Explore topics

A/B Testing: A Simple Example

The World is Not So Simple

Recommended by LinkedIn

Do Not Despair!

The Only Constant is Change

More articles by Daniel Tunkelang

Modeling Queries as Bags of Documents

Documents, Queries, and Categories

Where Do Categories Come From?

Categories are Fundamental for Search

Quo Vadis Nunc, Quora?

Seriously or Literally?

Cold Start, Practical Edition

Take Searchers Seriously, Not Literally

Hallucinating a Post-Search World

Handling Facets With Many Values

Insights from the community

Others also viewed

The Exotic Geometry of Randomness

A Gentle, Original Approach to Stochastic Point Processes

RESPONDING THIS THOUGHT PROVOCKING PICTURE...

URSABLOG: If You Can’t Take A Joke

The Conscious Web

Internet, from Utopia to Nightmare

5 parables from science & technology that Illustrate the nature of God

haud hogmanay - do you linkin 5 skills most needed for MS to be 1st Sgen?

Improve your fact-checking with one simple change

Behold: The Thompson Postulate

Explore topics