How to Measure the Success of Your Overall A/B Testing Strategy
If your A/B tests work, make sure they’re working for the whole business. Without a strategy and a North Star metric, it’s possible to stray from the ideal product roadmap and optimize for the wrong things—like when Blockbuster became so reliant on late fees that it couldn’t pivot to streaming. Measure carefully, measure often, and always ask why.
Your overall, organization-wide A/B testing strategy isn’t so different from your tactical A/B testing strategy. With both, you validate iteratively and test to the funnel. Think of the overall strategy like a murmuration of starlings—millions of A/B tests form one giant A/B testing strategy.
A murmuration of starlings side by side a bunch of small A/B tests building up to a strategy
[Photo credit: Daniel Biber]
What is A/B Testing in Analytics?
A/B testing in analytics is about validating your assumptions. A cohort of new users may seem to prefer reading long articles over short ones. But is that always true? And is it true for all users? Before codifying your assumptions into rules that everyone on the team relies upon, use A/B tests to establish facts.
How to Measure A/B Test Results For Your Entire Organization
You can measure the organization-wide impact of your A/B testing efforts by looking at quality, quantity, and downstream effects: Are you running good tests? Are you running enough of them? Can you see a net impact on key metrics for the business, like quarterly revenue?
At Airbnb, we are constantly iterating on the user experience and product features. This can include changes to the look and feel of the website or native apps, optimizations for our smart pricing and search ranking algorithms, or even targeting the right content and timing for our email campaigns.
– Jonathan Parks, Data Engineering Manager, Airbnb
Quality of experiments
Good experiments have definitive outcomes. It doesn’t matter if you proved or disproved the hypothesis, so long as the outcome was certain. You can lift conversions, sentiment, and sign ups by ceasing to do the wrong things as well as you can beginning to do more of the right things.
If too high of a percentage of a team or the company’s experiments are inconclusive, it might be a sign that they’re setting experiments up incorrectly. Perhaps they’re writing ambiguous hypotheses like ‘This change will make the app better’ rather than explicit ones like ‘This change will increase monthly sign-ups five percent.’
If there are vastly more negative results (disproven hypotheses) than positive results across the organization, it may suggest that your team’s intuition isn’t yet refined enough. Companies don’t arrive at product perfection through endless blind tests—they get there, like a scientist, with highly educated guesses. Over time, you should see positive results increase. For reference, at Google and Bing, about 10-20% of tests have positive results.
The observer’s choice of what he shall look for has an inescapable consequence for what he shall find.
– John Archibald Wheeler
Quantity of experiments
Running too few experiments can be a bad thing, but so can running too many. If leadership mandates top-down product recommendations, the tail is wagging the dog and you probably need to test to uncover and document users’ needs and preferences. But if the team is testing everything, it can grind the entire organization to a halt.
For reference, in 2018, Airbnb tests a lot, and was running 500 experiments and using 2,500 distinct metrics across its platforms at any given time. But they’ve got a sizeable team.
One danger with learning to A/B test is that it can lead to over-reliance, and nobody will want to make decisions without testing. Don’t get lazy. Tests are costly. Intuition and judgement point you to the target, A/B tests simply fire the arrow.
– Josh Decker, UX Researcher
Speed to change
This measures how long it takes the team to implement something they learned from a test. For example, if the test results proved that a “Skip the intro” button on a streaming video site makes users more likely to renew, how long did it take the company to roll it out? Generally, the shorter the speed to change, the better: Teams that implement their lessons quickly can run more tests. Their product evolves and improves at a more competitive pace.
Turning big ships takes time. The larger the organization, the slower speed to change is, generally. If you’re a big organization that implements changes quickly, you’re at a serious competitive advantage.
We know more about our customers, statistically, than anyone else in our market. It also means that we can run more experiments with statistical significance faster than businesses with less user data. It’s one of our most important competitive advantages.
– Wyatt Jenkins, VP of Product, Shutterstock
The overall revenue impact of your A/B testing is what’s known as a lagging indicator: You often can’t measure it until after the fact. For instance, when Microsoft’s Bing made a feature change that resulted in $100 million in additional revenue, that wasn’t clear until the following quarter. Neither was the loss of 100,000 monthly visitors they experienced after making an SEO change. Lagging metrics are, however, the truest measures: It’s the only sure way to know that your A/B tests are having a positive impact.
Hard A/B test result metrics you can measure:
- Revenue: Does testing increase purchases? Sign-ups? Upsell or cross-sell? Look at revenue, revenue per user, and average order value.
- Support costs: Does testing decrease complaints or questions about how to use the app or site? Look at ticket response times, handling time, resolution time, and sentiment.
Soft A/B test result metrics you can measure:
- Product team feedback: Does the product team have more user insights and data than before? Are their product launches growing more successful?
- Satisfaction: How does testing affect your users’ loyalty and satisfaction? Look at NPS, CSAT, referrals, and sharing.
As organizations grow, so do their metrics. Give yours a clear hierarchy: A North Star metric (a bit of a misnomer, it can be more than one metric) as the primary focus, followed by core metrics, target metrics, and certified metrics, all of which roll up to the North Star.
Make an A/B testing strategy core to your culture
Once each business unit understands how A/B testing can benefit them, it becomes second nature. But that benefit isn’t always obvious. People are busy, they’re accustomed to how they already do things, and there’s a switching cost. Learning to A/B test takes mental effort and competes with their other priorities.
To ensure that an A/B testing culture takes hold within your organization, make the benefits clear to all:
Marketing benefits to A/B testing:
- Gather valuable messaging feedback
- Gather detailed user data
- Increase conversions and engagement
- Increase user trust
- Limit negative impact of projects: Avoid “featuritis”, where the product accumulates so many features that it collapses and needs a full redesign
- Learn why successful marketing campaigns worked
Product benefits to A/B testing:
- Ship better designs faster with less guessing
- Quantify impact, measure investment of design and research resources
- Limit effort on bad ideas
- Validate prototypes
Engineering benefits to A/B testing:
- Validate before building
- Shorter development cycles
- Fewer redesigns
- Testing creates data and data settles debates
- Work on interesting, new projects
Business / Analyst / Finance team benefits to A/B testing:
- Reduce fraud
- Increase profit margin
- Identify demand for new features and products
Start A/B testing fast with Taplytics — Learn more here.