PM Spotlight: How a Strong Culture of Experimentation Helps Grubhub Iterate and Excel
Taplytics caught up with Clara Li, a product leader on the Diner team at Grubhub, who is working to support local businesses and help Grubhub users find their next delicious meal. Clara is an SEO expert who is now focused on search optimization to help Grubhub diners receive the most tailored results.
We chatted about her thoughts on the importance of personalization and having a culture of experimentation to create experiences that drive both user engagement and retention. Enjoy!
How have the last few months impacted your strategy and how you’re approaching the situation to continue optimizing the diner experience?
Like most of us, COVID put a lot of our previously planned initiatives on hold. It allowed us to take on some initiatives that we really wanted to prioritize – like Supper for Support which we’re proud to have invested a lot of effort. It wasn’t a problem to re-prioritize and allocate resources towards these initiatives which are more meaningful. We’re focused on supporting local businesses to make sure these restaurants receive the necessary lift and exposure through this special time. Along side these initiatives, I’m focused on driving search optimization to better customize the diner’s search experience and make sure they have tailored search results based on different topics like dining preferences.
The last few months have definitely been hectic with COVID which forced us to try and get everything out the door in a very timely manner. With that being said, we have a very strong culture of experimentation and we test everything.
Even though we needed to make the decision to ship as fast as possible, we still used Taplytics to give us the control over the feature’s distribution. For example, we would still be able to roll the feature back to say 10% and measure the impact of the release retroactively. A lot of things were happening all at once, but for everything we’re building to support these local restaurants, we want to ensure that they are sustainable. We want to make these features reusable for future feature development so we can continue building on this foundation.
How has having such a strong culture of experimentation at Grubhub influenced the way you work?
I think it’s easier to work this way. Especially when it comes to features for our diners, everyone has a different opinion on what these features should look like. Product might have a different opinion than Design which might have a different opinion than Finance so experimentation helps us keep an open mind and let the data speak for itself. We are always open to learning from A/B tests and will never implement something purely based on one person’s opinion.
We are able to adapt to what needs to happen for the short term but are committed to measuring the impact of those decisions to plan for the long term.
In this climate where speed to market is so critical, what’s your approach to prioritizing A/B tests while also having to ship as fast as possible?
The first thing I like to think about is: what’s the goal for developing and shipping this feature? Generally, we always want to focus on the long term benefit to diners. We don’t want to ship something that will give us quick conversions but hurt loyalty in the long term. This has always been our primary goal but it was important for us to be able to shift our perspective and adapt our acceptance criteria due to COVID.
For each initiative that we rolled out, we broke them down into individual phases and looked at what was immediately achievable for each phase. For example, in the first phase of Supper for Support, we wanted to promote local restaurants to the top of the funnel as fast as possible, so we shipped this using combination of our existing features. As we shipped the first phase, we were able to start planning optimizations and looking into the best ways to execute.
We then scoped out new features and frameworks for A/B testing these features. Later on, this allowed us to test these new features and placements against what we shipped in the first phase in addition to testing against what we had even prior to COVID. This enabled us to gain a better understanding of the opportunity cost associated with shipping phase one without the optimizations that were added in the later phases. To summarize, we have both short term and long term factors we take into account. We are able to adapt to what needs to happen for the short term but are committed to measuring the impact of those decisions to plan for the long term.
You work with so many teams and stakeholders, what are some best practices to make sure that each test and each team is coordinated?
When I run experiments, I make sure to align on overarching success metrics like conversions and AOV. I work very closely with Finance to look at what is the short term and long term impact when evaluating hypotheses. For example, we started looking at call center metrics associated with rolling out a new feature because although it may increase conversions, they may come at the expense of an increased volume to the call centers.
We want to make sure we’re taking into account all of the relevant revenue and cost metrics for each feature. We also work with the Logistics team in running experiments to measure the impact on the driver network based on the search results that are presented and ultimately chosen by the diner. Depending on which restaurants are presented, the drivers would potentially need to drive further to complete their deliveries. It’s a very collaborative process across teams to measure the overall impact to the business.
It’s so important to make sure that each stakeholder is always kept in the loop about the performance of a given test and to define the success metrics together especially if the test is cross functional
Our workflow to launching an experiment involves regular communication across different teams to review hypotheses and propose experiments as well as the corresponding success metrics. It’s so important to make sure that each stakeholder is always kept in the loop about the performance of a given test and to define the success metrics together especially if the test is cross functional. For example, recently we ran a test to present different types of messaging to diners in order to promote pick-up options during peak ordering hours to ease the load on the drivers in the delivery network. In this case, we need to measure diner experience (owned by the Diner team) as well as the impact of this new messaging on the driver network (owned by the Driver team).
We use a central document which includes the scope of all the proposed and live experiments. From here, I’ll set up the test in Taplytics. On a weekly basis, the product managers and Finance team meet to review the tests which are due for a recommendation. We normally let a test run for about two weeks to get a good reading on LTV before we make a decision on its performance and learnings we can take away.
It’s fundamentally very difficult to categorize people into cohorts and personas because naturally, preferences vary so much. A better way to approach this challenge is to look at the diner as an individual
With personalization being almost an expectation in a user’s digital experience, what is your approach to personalization?
I can’t emphasize how important personalization is. Open any app – Netflix, Amazon, etc – every app has a customized interface for their users depending on what they like so we’re trying to learn as much from our users as possible.
Everyone is different and there are just so many different types of food available to cater to these unique preferences. It’s fundamentally very difficult to categorize people into cohorts and personas because naturally, preferences vary so much. A better way to approach this challenge is to look at the diner as an individual and present them with recommendations based on their specific preferences.
The first step is to allow for some level of randomization to first learn about what the diner likes. Combining both implicit and explicit triggers, the goal is to strike a balance between these queues and really listen to our users.
Thanks for the insight, Clara!