Here’s an interesting example of A/B testing leading you astray. Suppose you’re working on a language learning app, and one of the metrics that you’re trying to optimize is the number of lessons completed.
It seems straightforward: the number of lessons completed is a good proxy for the volume of learning that takes place.
And so you start A/B testing a lot of things. It starts with something simple. You find that changing the color of the buttons helps. You find that allowing the user to replay the audio helps. You find that showing a statistics dashboard after the lesson helps. You find that making the questions easier helps…
But wait a minute. Wasn’t the entire point to optimize for learning? The “making the questions easier” experiment showed a clear improvement in the number of lessons completed, but does it really help increase the total volume of learning? Users could be blasting through a bunch of easier lessons, thus increasing the metric, but it doesn’t necessarily mean the learning improved.
If you’re addicted to A/B testing, you might go down the rabbit hole of making the lessons easier and easier, until there’s basically no learning value anymore. But hey, more lessons are completed, so of course there’s more learning!
In these experiments, there’s some abstract variable that we’re trying to improve, in this case, the total volume of learning. To try to get at this, we choose an observable quantifiable metric, which in this case is the number of lessons completed. But it’s important to make sure that the variable you’re A/B testing is truly independent from the variable you’re testing. Otherwise, it’s easy to end up optimizing the wrong thing.