Less Is More: Optimizing Email Volume - Part 1

Rupesh Gupta

AI at LinkedIn

In July 2015, we announced that we are reducing email volume so that members receive only the most relevant email communication from us. We have been making a concentrated effort in this direction, the results of which are hopefully already noticeable. In this post, we explain how we achieved those results. In part two of this series, we will discuss our new improved technique for email volume optimization which is being rolled out this year.

LinkedIn provides members with an easy way to connect with other professionals and exchange information. It facilitates this exchange of information by giving members the opportunity to send private messages to other members, upload and share rich media with other members, self-organize into interest groups where they can ask and answer questions about specific subject matter, subscribe to channels delivering news on topics of interest, etc. It helps members keep up with information through pull-model based in-app services like feed and in-app notifications, as well as push-model based content distribution services like email and push notifications. Members can subscribe to receive various types of email communications such as job opportunities, news, connection requests and event notifications.

Such communication is of special interest to LinkedIn because of its potential for providing valuable information to members without requiring them to be actively logged in to LinkedIn’s mobile or web application. This in turn helps drive mid and long-term member engagement. However, excessive email communication can result in the problem of email overload for a member which can hamper the effectiveness of communication, particularly if the emails are not sufficiently relevant to the member's interests.

Cost-Benefit Analysis Of Email Communication

In order to conduct a cost-benefit analysis of email communication for LinkedIn, we set up an experimental user bucket. In this bucket, each email generated for a user was dropped with a random probability. We denote this bucket as the random-drop bucket. The remaining users received all emails generated for them. We denote this bucket as the send-all bucket.

We collected data from this experiment for a period of one week, over which users received various types of emails. Each type of email contained information corresponding to a particular product vertical, and contained at least one link, which if clicked, would bring the user to a page under the product vertical on the mobile or web application. The targeted user of each email could interact positively with the email by clicking a link. Alternatively, she could interact negatively by either clicking the unsubscribe option within the email, or by reporting it as spam to her email service provider.

As tabulated below, we observed 2.6% less page views from members in the random-drop bucket compared to the members in the send-all bucket. In other words, members who received all email messages generated for them performed 2.6% more page views than the members who received about half of the email messages which were generated for them. Similarly, a substantial loss in page views was observed for pages under various product verticals.

Delta in page views for various product verticals (random-drop vs. send-all)

Total	-2.6%
Homepage	-1.4%
Jobs	-4%
Profile	-4.5%
PYMK	-4.5%
Search	-4%

To understand this better, we performed a fine-grained analysis of members in the random-drop bucket. We divided the members into four segments based on the frequency of their visits to the mobile or web application before the start of the experiment. These four segments were: daily-active (visited every day), weekly-active (visited once a week), monthly-active (visited once a month) and dormant (visited less than once a month).

As shown in the figure below, it was found that the number of active members (who visited the application at least once over the course of the experiment) increased with an increase in the number of emails sent to them. Also, the members performed more downstream page views (page views within sessions that start from an email click) with an increase in the number of emails sent.

We found that excessive email communication can have several negative consequences. We observed 45% more negative responses to emails in the send-all bucket compared to the random-drop bucket. This was not surprising since sending more emails is likely to result in more responses, both positive and negative. However, unlike a positive response to an email, the scope of a negative response to an email does not end at that particular email. If a member clicks the unsubscribe option within an email, we lose the ability to send any emails of that type to that member in the future. If a large number of members report emails from a particular sender (such as LinkedIn) as spam to their email service providers, then this can result in an email service provider blocking and filtering all emails from that sender.^[1] In general, such deliverability issues are not easy to resolve.

Optimizing The Email Communication Experience

Observation of these negative consequences of excessive email informs us that email communication must be used judiciously. However, it is clear that a naive reduction of email volume can have adverse effects on the growth of LinkedIn since email is among the principal drivers of engagement. This necessitates the need for intelligent algorithms to identify less-important emails which may be dropped.

LinkedIn allows members to personalize their communication experience of various types of emails through the communication settings panel, under the user account settings tab on the web or mobile application, as shown below.

In this panel, for each type of email, a member can choose to turn off communication completely, or to receive communication at event-occurrence, daily, weekly or recommended cadence. The recommended setting is designed to enable the member to receive email communication regarding only the most relevant information in a timely manner. LinkedIn optimizes the communication experience of the member when the member opts in to this setting. Optimizing the communication experience of a member entails optimization on several fronts, one of which is volume optimization.

The decision to send or drop an email needs be based on the likelihood of the targeted user's reaction to the email. Ideally, we would like to cut down on email volume in such a way that the positive outcome is maximized and negative outcome is minimized. It is evident that this would require optimizing across multiple different objectives.

Our Volume Optimization Technique

We formulate email volume optimization as a Multi-Objective Optimization (MOO) problem. Let's consider a downstream session as the positive outcome of interest, and an unsubscribe or reporting of an email as spam (denoting either one as a complaint) as the negative outcome of interest. On one hand, we would like to maximize the number of downstream sessions; on the other hand, we would like to minimize the number of emails sent and the number of resulting complaints.

Consider any given window of time starting from now, say, the week starting today. Let's assume we have the entire set of emails which will be generated over this week, and denote it as E. Also assume that each email e ^EE will be one of T different email types. Let t = 1, ..., T be an index over the email types and E_t denote the set of emails of type t.

We now solve the following optimization problem:

To express this problem mathematically, we assign a decision variable z_e = Pr (sending e) to each email e. The set z_e = {z_e : e ^EE} is referred to as the serving plan, which is to be optimized. For each email e, let's say we can predict:

by using the email features x_e in trained response prediction (logistic regression) models. Since each email e is targeted to a particular user, we also include features of the targeted user in x_e.

The above optimization problem can now be rewritten as a linear programming problem as follows:

When writing down the constraints in the above formulation, we are making an independence assumption between emails. Let’s take the first constraint for example. Here we are approximating the total number of downstream sessions as a summation over the expected number of downstream sessions from each email. On the right hand side we are specifying the target as a fraction α_global ^E[0,1] of the maximum achievable session count if all emails in E are sent. We approximate this maximum achievable session count in the same way:

The other constraints are similar in nature. We typically specify the two global constraints, and a few local constraints for only the most important email types. We try large values of α_globalAPPROXIMATELY EQUAL TO 0.99 and small values of Bglobal APPROXIMATELY EQUAL TO 0.5 to obtain a feasible solution.

The linear program above, though simple, cannot be solved in practice due to the following challenge: we do not know in advance the set of emails E that will be generated in the upcoming week. Fortunately, we observe that the distribution of our generated emails does not change significantly from week to week. So we use the set of emails generated in the past week as a forecast for E. However, similarity in distribution does not provide us the solution Ze for every email that will actually be generated. To this end, we make use of the primal-dual trick introduced in. [2]This trick is applied as follows.

We solve the dual of the above problem for the set of emails generated in the past week. Let uglobal, ut, vglobal, vt be the solutions to the dual problem corresponding to the global session constraint, the local session constraints, the global complaint constraint, and the local complaint constraints respectively. Then, Ze for any newly generated email can be efficiently computed on-the-fly using the following equation. For an email e of type t:

where ut* = uglobal + ut, vt* = vglobal + vt, II [0,1] (.) stands for the projection onto [0,1], q E [0,1] is some prior on the send probability, and y > 0 is a regularization parameter.

If we set y to 0 then the decision is further simplified into the following intuitive deterministic rule. For an email e of type t:

Intuitively, this rule says that email e should be sent only if the expected positive outcome exceeds the weighted expected negative outcome by a threshold.

This decision rule requires us to maintain a set of just a few coefficients {ut*, vt*} to make a send/drop decision for each individual email. We call these the MOO coefficients. We use this simplified rule in our current implementation which makes the system design very simple.

Our Volume Optimization System Design

Our volume optimization system consists of an online serving system and an offline training system as illustrated in the block diagram below:

Depending on a member's message subscription settings, the message generator will produce a message for the member. For example, if the member subscribes to network updates, then a message is generated once a week which contains important updates pertaining to the member’s connections, such as job changes, work anniversaries, profile updates, etc. This message is passed down to the response prediction engine along with the ID of the member (targeted user). The response prediction engine extracts the message features from the message to create a partial feature vector, and appends member features of the targeted user from a tracking data store. This feature vector is used in the response prediction models to predict expected responses. These predictions are passed to the Volume Optimization (VO) decision engine which makes the final send/drop decision based on the MOO coefficients. The member's interactions with the received emails, along with member profile and activity data are recorded in the tracking data store. Hourly snapshots of this database are loaded into Hadoop. The snapshot data is used in the response prediction model trainer for training models on Spark. These models are fed into the response prediction engine as well as the MOO solver. The MOO solver employs these models on the training data for predicting expected responses, which are used in the optimizer to produce optimal MOO coefficients.

Note that the beauty of the design lies in the ability to make a send/drop decision for each individual email independently, based on the learned response prediction models and a few MOO coefficients. Further, this simple but effective design is what enables us to scale the online decision-making to handle millions of message requests every week.

References

^[1] https://www.spamcop.net/bl.shtml

^[2] D. Agarwal, B.-C. Chen, P. Elango, and X. Wang. Personalized click shaping through lagrangian duality for online recommendation. In ACM SIGIR, 2012.

Acknowledgements

Brought to you by the communication relevance group at LinkedIn: Xiaoyu Chen, Rupesh Gupta, Guanfeng Liang, Romer Rosales, Hsiao-Ping Tseng and Ravi Kiran Holur Vijay.