Measuring marketing incremental impacts beyond last click attribution
July 14, 2022
Co-authors: Maggie Zhang, Joyce Chen, and Ming Wu
What’s my ROI?
In every company, there’s a fundamental need to understand the impacts of marketing campaigns. You want to be able to measure how many incremental conversions different channels and touchpoints are successfully driving. The best practice of A/B tests at individual level is not applicable in traditional channels such as TV ads, radios, or billboards. Even in digital marketing channels, new regulations and public awareness for data privacy have made A/B testing on third party platforms, which require transferring user level data, harder than ever. As a compromise, companies often rely on the last-click attribution model, which gives 100% credit for a conversion to the last marketing touchpoint/campaign in a user’s journey. This means that not only does it ignore everything (i.e. engagement, other media exposure) that happened before the final touchpoint throughout the user journey, it also tends to over-credit the last touchpoint (usually a paid media exposure) for conversions that would have been achieved organically without the media exposure.
To accurately quantify the true incremental impact of marketing campaigns, we adopted a powerful approach -- a Bayesian Structural Time Series (BSTS) model approach to measure the causal effect of an intervention.
The basic idea is simple and intuitive: we design an experiment where the experimental units are defined by targetable geographical areas. Planned marketing intervention is applied in the selected areas (the test areas). The remaining areas are used as Control. The BSTS model is created to predict the Test areas’ would-be performance in an alternative scenario with no marketing intervention. The delta between the observed and the predicted performance of the Test areas enables us to measure the true impact of the marketing intervention.
What is BSTS?
BSTS model is a statistical technique, designed to work with time series data and used for time series forecasting and inferring causal impact. You can refer to this paper, and Google’s open source R Causal Impact package for more details.
Let’s use geo based marketing campaign measurement as an example. At a high level, in order to construct an adequate counterfactual for the test marketers’ performance, three sources of information are needed. The first is the time series performance of the test markets, prior to the marketing campaign. Second is the time series performance of the Control markets that are predictive of the test markets performance before the campaign (there are a lot of considerations that go into picking the most relevant subset to use as contemporaneous controls). The third source of information is the prior knowledge about the model parameters from previous studies as an example.
BSTS causal impact analysis steps
To infer causal impact of a marketing campaign with BSTS model approach, the following steps need to take place.
A true north metric will be used to select comparable markets. Whether it’s the traffic, job views, or job applications, we have to be very clear about what we want to drive and what we want to measure.
One key assumption of a geo test is that control markets’ time series data are predictive of test markets’ time series data. We can form test and control groups by leveraging a sampling/matching algorithm to select comparable groups of markets based on historical time series data. There are two algorithms to form the comparable groups depending on the actual business needs:
MarketMatching is used to find matching markets when marketers already have a list of markets they want to run campaigns with. For example, a billboard campaign is set to launch in New York and the matching algorithm might find that San Francisco and Chicago are good markets to use as control.
Stratified sampling approach pre-divides the list of markets into homogenous groups called strata based on characteristics that they share (e.g., location, revenue share), then it draws randomly from each strata to form the test sample. It can guard against an "unrepresentative" sample (e.g., all-coastal states from a nationwide Google search campaign) by ensuring each subgroup of a given population is adequately represented within the whole population. This allows the marketers to properly infer the performance of a large scale non-local campaign.
Theoretically, geo-split can be implemented at various levels (nations, state, county). In reality, a good selection of geo-split level should fulfill these requirements:
Targetable: it is possible to fully control the marketing activities at this level on the desired ad platforms. Geo-targeting capability and restrictions vary across platforms. It is important to understand them before planning your test.
Measureable: it is possible to observe the ad spend amount and accurately measure the response metric at this level.
Economical: For example, it is not a good idea to run a job promotion campaign with a state level split. Some people may reside in New Jersey while working in New York City. Instead, the campaign should be run in the entire New York metropolitan area, which covers both key areas in New Jersey and New York and therefore reduce risks of cross-group contamination.
After decisions have been made on the geo group assignments and true north metrics, we can construct two time series (test/control) using historical data aggregated at the assigned geo group level. We recommend finding a period without major regional marketing activities. The period required for training the model depends on the availability of the data and variance of the time series. If the training period is too short, there will not be enough data to learn the relationship between test and control time series, thus high bias. If the training period is too long, the relationship may change over time and won’t apply anymore. In practice, we find one to three months to be a good duration.
The next step is to build a model that can accurately predict test time series based on the control time series.
A good time series model needs to be flexible and transparent and should take in account the seasonality, the macroeconomic trend, and the business drivers tobe able to quantify the impact from each. BSTS allows you to explicitly specify the posterior uncertainty of each individual component (regression, seasonality, trend). You also can control the variance of each component and impose prior belief in its Bayesian framework. Mean Absolute Prediction Error (MAPE) () is used to evaluate the goodness of the fit of the model during the training period. A good MAPE score (usually <5%) is a strong signal that the selected control group can be used to accurately predict the counterfactual of the test market.
Prior to the campaign launch, we’d like to establish an AA-testing process to validate the model performance and rule out the possibility of pre-existence bias that could potentially undermine the causal inference. During the AA test period, no marketing intervention is applied to either treatment or control. We expect the model to report no statistically significant difference between the predicted time series and the observed time series. Further deep-dives and re-design of the test is required if AA test fails.
Power analysis and budget scenarios
Similar to an A/B test, we'd like a power analysis at the design stage for a geo experiment. If those markets are used as control and treatment in the experiment and the true north metric is session, what is the probability of detecting an effect if a session lift of a particular magnitude is truly present? Unlike A/B tests, there is no theoretical approach to conduct a power analysis. The current approach to estimate minimum detectable effect (MDE) and the required test duration is through simulation where a synthetic lift is added to the treatment group to approximate the effect of a marketing campaign. We can then work with marketing partners to create budget scenarios at different MDE levels to ensure incrementality can be detected with a reasonable chance and with a reasonable budget and pacing plan. A budget scenario usually takes account of several factors including media cost, MDE, targeting plan (audience size/launch areas), and campaign duration.
At the end of the campaign, we’ll apply the previously trained BSTS model and forge a synthetic control based on the control time series data from the post intervention period. Comparison of the synthetic control (the predicted) and the observed time series of the test markets will be performed to measure the true impact of the marketing intervention. Similar to A/B tests, impacts are only considered statistically significant if the p-value of (delta > 0) is below 0.05.
Successful use case of BSTS at LinkedIn
At LinkedIn, our Data Science team has successfully applied the BSTS approach to many unique business cases and answered questions that would otherwise have remained myths to our business.
In one of our full funnel brand marketing national campaigns, which lasted for two months with multi-channels deployment, including TV, billboard, audio, digital, and social, we applied our BSTS approach and concluded that the national full funnel campaign drove almost double digit lift in targeted metrics.
In one of our paid job distribution programs, we successfully designed go-dark city selection using the aforementioned stratified approach. By applying BSTS, we successfully proved that the Return on Advertising Spend (ROAS) of the program has a healthy reading that’s well above 1.0.
In our paid app activation program, we were able to leverage BSTS to infer member’s incremental lifetime value (LTV) by country and by operating system (iOS, Android). The results guided our app activation program’s future investment.
In a recent Google Universal App campaign where we promoted LinkedIn Apps on Android, we again applied BSTS and concluded that in the tested geography, about half of the app installs reported through last click are incremental.
Understanding marketing campaign ROI is a crucial business challenge. When the golden standard of A/B testing or measurement at the individual unit level are not available, BSTS is a powerful alternative to measure a marketing campaign’s causal impact at a geo aggregated level. The LinkedIn Data Science team, by establishing a BSTS measurement framework and best practices, has successfully applied the approach to deliver insightful measurement results that led to improvements on our marketing channel efficiency and budget allocation.
We’d like to end this blog post by highlighting that in addition to measuring past marketing campaign performance using BSTS, in our subsequent work, we also feed the BSTS results into a marketing mixed model (MMM) to optimally allocate spend on future investments. Media mix models can provide a high-level cross channel view on how marketing channels are performing. By triangulating modeling results with rigorous BSTS causal experimentation, one can improve the model's robustness and its capability to recover some of the lost signals.
We would like to acknowledge Rahul Todkar and Ya Xu for their leadership in this cross-team work and Minjae Ormes, Ryan McDougall, Tim Clancy, Kim Chitra, Ginger Cherny, for their business partnership and support. We would like to thank all of the collaborators, reviewers and users who assisted with BSTS Geo Causal Inference Studies from the Data Science Applied Research team (Rina Friedberg, Albert Chen, Shan Ba), the Go-to-Market Data team (Fangfang Tan, Jina Lin, Catherine Wang, Kelly Chang, Sylvana Yelda), the Consumer Product Marketing Team (Rajarshi Chatterjee, Shauna-kay Campbell, Emma Yu), the Paid Media team (Nicolette Song, Krizia Manzano, Sandy Shen), the Careers Engineering Team (Wenxuan Gao, Dheemanth Bykere Mallikarjun), and our wonderful partners, the DSPx team (Xiaofeng Wang, Daniel Antzelevitch, Kate Xiaonan Ding) who helped build the automated solution.