The product messaging team at Netflix has a variety of high-level goals that guide its strategy. These include objectives like:
- Keeping our customers informed and delighted about the best content available to them on the service.
- Building awareness and buzz around our new original movies and shows.
- Making the case for subscription to potential customers.
An essential component of being a data-driven team is the ability to measure the connection between choices we make in messaging (how many messages customers get, what kinds of messages they get, when they get them, etc.) and the downstream behaviors we aim to encourage (signup, retention, activity on the service).
Traditional Messaging Effectiveness Measurement
Historically we’ve relied on two broad measurement strategies:
- Frequent A/B tests to measure the causal impact that specific tactics have on downstream behavior, at a specific point in time.
- Ongoing measurement of message interaction signals (email opens, push notification clicks, etc.) as a proxy for message performance.
Can we measure better?
There are no perfect measurement systems. While our A/B tests measure precise causal impacts on the exact behaviors we are interested in, they typically compare a small subset of strategy variations for a short period of time. As such, they are less efficient at tracking broad messaging performance over time. Message interaction tracking mitigates this by measuring all messages on an ongoing basis, but has its own problems — namely, metrics like email open rates are not what we fundamentally care about, and we have to make assumptions about how these metrics proxy the signals we do care about (like hours of streaming).
Introducing Message Level Holdback
To address these gaps, we’ve introduced a third messaging measurement strategy at Netflix: ongoing message-level holdback. The strategy itself is quite simple: all non-essential messages we send have a small (2%) random chance of being withheld from delivery. While all of our downstream systems (e.g., personalization models that decide the optimal cadence of messages for each customer) behave as if these messages were sent, customers never receive them and we can easily identify which messages were dropped. Comparing customers who did / did not receive a specific message (for example, the announcement that Velvet Buzzsaw is available to stream) allows us to measure how the decision to send the message impacts downstream behavior (viewing of Velvet Buzzsaw).
The message-level holdback combines some of the benefits of A-B testing (measurement of the incremental impact of messaging on behaviors we most care about) and message engagement tracking (always-on measurement across all messages). It has revealed signals that we previously only speculated about, or hadn’t even considered.
- An always-on measurement system saves us from having to re-run A/B tests to verify that our existing messages still perform well. Our customer base and content catalog evolve over time, as does the broader messaging ecosystem. With the message-level holdback we’re able to track how these shifts impact message performance, and quickly respond when a message becomes less efficient.
- Message engagement metrics are often poor proxies for more important incremental behaviors. For example, two of our content recommendation messages shown below have similar open and click rates. However, message B is far more effective at driving incremental viewing:
More importantly than any individual insight, these data have allowed us to shift away from open/click tracking, and towards the core objectives that many teams across the business are working towards. We can more easily compare the impact of messaging in the context of other tactics (say, paid advertisements on Facebook or personalization of titles within the Netflix app). Similarly, other teams can more easily understand how messaging fits into the product, since we are speaking the same “incrementality language” and aligned on quantitative objectives.
Our message-level holdback system provides us with an incrementality-focused, always-on system to measure the broad impact of messaging at Netflix. No single measurement system is perfect; however, a portfolio of A/B testing, engagement tracking, and always-on message holdback gives better insight into message performance than any individual approach.
By: Chris Beaumont
Opinions expressed here are solely the author’s and do not represent the views of any employer or organization