Building Conversion Tracking at LinkedIn
September 29, 2016
Advertisers want to know what happens after a LinkedIn member sees an ad from one of LinkedIn's marketing opportunities. Do they lose interest? Do they close the browser and come back a few days later to sign up? Do they actually convert? With Conversion Tracking, we can finally answer these questions. We can see how many members who view the advertiser's ads actually convert—that is, engage or perform relevant actions (like making an online purchase or signing up for a webinar), and thereby provide a measurable rate of return for advertising on LinkedIn.
Designing the system
Overview of system components
Conversion tracking can be broken down into three main components:
Insight Tag Endpoint: When a potential customer hits the advertiser's site, the page makes a call out to the Insight Tag Endpoint, which hits a cached datastore derived from MySQL. Its purpose is to determine whether the page the user lands on is eligible for a conversion—which campaigns are associated to the Insight Tag, whether the campaigns are active, whether the URL fits the conversion action match rule, etc. Eligible conversion “fires” (events that trigger our attention) only occur if a user is found to be a LinkedIn member, and this will emit a Kafka event which is loaded into our offline HDFS (Hadoop file system) storage medium.
Data and reporting: Reading in the HDFS data, an offline process takes in each individual conversion fire to determine the potential impressions (ad views) and clicks for each conversion event. Conversion fires without associated impressions and clicks are dropped, and the remaining ones are aggregated into meaningful metrics broken down by conversion type, campaign type, etc. The results are stored in Pinot, an in-house online analytical processing (OLAP) data store, which provides an interface for very high performant data queries and reporting.
The user interface (API partners and UI): This system includes the conversion action definitions, reading the Insight Tag, the reporting dashboard, and general management of conversion tracking.
We chose to use a dual AWS-LinkedIn stack in our architecture. The AWS stack is part of the previous LinkedIn Lead Accelerator product offering, and already contains various integrations to support Insight Tag management. The LinkedIn stack allows us to use the LinkedIn domain cookie to extract tracking information, as well as included widely-adopted systems like Pinot and the Campaign Manager UI, to build conversion tracking on LinkedIn. More importantly, because of our members-first policy, it allows us to respect our members’ opt-out preferences.
There are three primary data entities in Conversion Tracking:
Domain: We do not want arbitrary conversion fires to occur for website domains not registered to the account owner of the Insight Tag. Therefore, each Insight Tag needs to be registered to a valid domain in order to function.
Conversion Action: The conversion action dictates at what point on the advertiser's web page we should consider it to be a conversion fire. The match rule is entirely URL-based, and can contain match rules like "starts with" "business.linkedin.com" or a specific sub page.
We only need one-time placement of the Insight Tag on an advertiser's web page for conversion tracking to work.
- It lets us configure URL match rules, conversion value, conversion type, and other properties without changing the Insight Tag. This is key because account managers can edit the parameters of the conversion action independent of the people managing the web pages.
The Insight Tag during Conversion Tracking setup
When we register a conversion fire event, we filter out fires that are not associated with a valid LinkedIn member. Then, only a subset of these conversion fires can actually be associated to an impression and possibly a click that the member has seen or performed. To determine the true conversions, we use Scalding to perform the offline attribution job in order to determine the total conversions a campaign recorded in various attribution windows (default is 30-day lookback).
The attribution job does the following:
Determine the last click, and if not found, the last impression for a single user prior to converting for each campaign type. If no impressions or clicks are found, the conversion fire is dropped.
For most conversion action types, we de-duplicate repeated conversions after a single ad is shown (impression) or clicked. Therefore, we only count the first conversion that occurred if there are multiple that are found, unless more impressions occurred after the first conversion.
Example attribution timeline
For example, in the timeline above, a member views three impressions and one click event. We can make several observations:
Because our maximum attribution window is 30 days, Impression 1 is ignored.
Even though Impression 3 was the last event to occur prior to the conversions, we will consider the conversion "post-click" because there was a click present in the timeline, and clicks takes precedence over impressions. We can reason that a click event is a much stronger indicator of intent over the impressions.
For certain conversion action types, such as "download," we de-duplicate on repeated conversions. In this case, we will discard Conversion 2 in our metrics because no events occurred between Conversion 1 and Conversion 2. This has the side effect of a lowered count of conversion events when some advertisers compare their conversion numbers with ours based on summing the LinkedIn visitors on their conversion page. This is because we want to count each series of conversions as a single instance of intent for this activity attributable to the LinkedIn ad.
Putting all of this together, we will report our metrics as “1 post-click conversion” for an attribution window of 30 days.
Challenges we overcame
Designing a system to meet the scale of the combined traffic of all of LinkedIn's advertisers’ websites was of utmost concern. This is because the Insight Tag is designed to be placed in every single advertiser's website, meaning we had to resolve all the traffic for every single user that visits every page, not just LinkedIn members.
We solved this issue in a number of ways:
Compartmentalizing the Insight Tag Endpoint separate from downstream services (as opposed to one monolithic service), so we can scale each service independently;
Prefiltering relevant traffic in upstream stages;
Using Kafka as a horizontally-scalable stream to inject data into HDFS;
Utilizing aggregate and precompute views for multi-dimensional storage and querying.
Because it's very common for a single advertiser to manage multiple accounts, and an Insight Tag is typically associated with individual accounts, we had to build a permission system into Conversion Tracking that allowed accounts to share their Insight Tag with other accounts.
In order to facilitate the member-to-account permissioning for the Insight Tag, we utilized a separate in-house ACL service to manage whether an account has read/write permission to an Insight Tag. These ACLs are modeled as separate objects apart from the campaign and Insight Tag level, and we can define the required permissions needed, such as "modify domain" or "read conversion." By having a set of permissions across several roles, we were able to define and implement a clean set of APIs for "revoke" and "grant" actions for each Insight Tag.
Conversion Tracking allows advertisers to measure the effectiveness of their campaigns based on meaningful outcomes that matter to their businesses. With the help of Conversion Tracking, advertisers can identify a traffic pattern, influence their sales strategy, and obtain a measurable rate of return, such as "cost per lead" or "cost per download."
Tracking conversions and analyzing data itself won't increase sales overnight, but it can lead to better creatives and smarter marketing decisions.