All Else Equal
In The Three-Body Problem, Liu Cixin describes how an alien species drives scientists to suicide by making it impossible for them to produce consistent experimental results. Some might find it difficult to relate to the scientists’ existential despair, but I found the premise compelling and chilling.
In this post, I do not tackle anything so sinister or abstract. Rather, I challenge a key assumption of A/B testing — namely, that all else is equal. I hope to inspire curiosity and reflection rather than existential despair.
A/B Testing: A Simple Example
A/B testing is the most popular method for online experimentation. It compares two versions of an application to determine which one performs better. Typically, one is the “treatment” we are considering as a change and the other is a “control” that represents the current state of the application.
For example, consider a simple A/B test to determine whether increasing the page size for search results from 10 (control) to 20 (treatment) leads to an increased conversion rate. This is about as simple an A/B test as it gets.
Let us imagine that the test is successful, delivering a statistically significant increase in the conversion rate. What does this tell us?
The World is Not So Simple
The answer may seem obvious: doubling the page size increases the conversion rate. More precisely, this result only holds if we hold all else equal — since the change in page size might interact with other changes, such as changing the page design. However,.we have to be even more pedantic: the result only holds given the current state of the world.
Consider the factors of screen size and network latency, both of which are determined by the searchers’ devices and locations. Both of these factors interact with page size to affect the experience. Increasing the page size may increase conversion in one set of conditions but decrease it in others.
Recommended by LinkedIn
In the physical world, we do not generally worry about the laws of nature being time-dependent. We treat the law of gravity and the speed of light as constants. However, the digital world changes far more rapidly than the physical one, as does user behavior. That makes it dangerous to assume that the conditions for an experiment hold indefinitely.
Do Not Despair!
If you rely on AB testing as part of your day job, you might find this state of affairs disheartening. But please do not despair! We have it much better than the scientists in The Three-Body Problem. No aliens are out to get us!
Fortunately, there are things we can do to detect changes likely to invalidate our experiments over time. Here are a few:
This list is not exhaustive. Hopefully, it helps you think about ways to keep in mind conditions outside the explicit scope of your experiments.
The Only Constant is Change
As Heraclitus said, the only constant is change. When we perform A/B tests, we need to bear in mind that the results assume present conditions that are subject to change. As Ferris Bueller warned us, “Life moves pretty fast. If you don’t stop and look around once in a while, you could miss it”.
Graphs and Generative AI
3moHahahaha, nice opener.
PhD, Principal Data Scientist, Ad Measurement Architect at Walmart Ads
3moI am glad you are talking about the dynamics of changing conditions (often over-looked by people running the tests). You are mainly touching about the external validity of a result (say it was tested in June) to hold all the time moving on (say in Christmas holidays). I tend to believe that the misconception comes from importing RCTs from medical treatments, where people’s health outcomes are easier to extrapolate. Long-term holdouts are probably best, but they are noisier (smaller groups) and sometimes difficult to disentangle if you have >5 small changes released over the course of the holdout. Probably it is better to to keep questioning side effects or unexpected behavior that trigger a new test again changing the “improved version”. This probably what keeping A/B tests even for mature products is needed to adapt to a ever-changing landscape. Just a POV