DuaLip: Solving extreme-scale linear programs for web applications

Kinjal Basu

Co-Founder, Aliveo AI

August 6, 2021

Co-authors: Kinjal Basu, Yao Pan, Rohan Ramanath, Konstantin Salomatin, Amol Ghoting, and S. Sathiya Keerthi

Building thriving internet or web products requires optimizing for multiple business metrics. In many applications, these business metrics can move in conflicting directions, and delicately balancing these tradeoffs can be vital for the success of a product. For example, emails and notifications are foundational components of many companies’ growth strategies today. At LinkedIn, when deciding which emails and notifications to send, we focus on the engagement generated, while remaining mindful of the volume sent. Similarly, when providing edge recommendations, total edges formed, and the engagement on those edges can be at odds: optimizing for total connections might lead to a large number of edges that have no engagement, bringing no additional value to the members. Other examples of conflicting objectives exist in newsfeeds, and search ranking—which are core applications for social media and e-commerce platforms. Balancing overall revenue, in-house revenue, and member satisfaction or retention, becomes the key problem in these spaces. Finally, in the various gig economy platforms, there are multiple sides of the marketplace, whose interests can be competing.

It is important but often difficult to strike the desired balance between the different business objectives in a recommender system. At LinkedIn, we refer to such problems as “multi-objective optimization (MOO).” We previously wrote about how we develop the multiple objectives that power the MOO models for feed ranking. In this post, we focus on how we strike the right balance between them. Outside of LinkedIn, MOO formulations are popular for optimizing an entire page, especially homepages like Netflix and Amazon, which direct traffic towards various parts of the platform and drive associated business metrics. Other companies, such as Facebook, adopt a “pareto optimality” approach for balancing objectives. A commonality of these approaches is that they balance the tradeoff between metrics through a choice of model hyperparameters that prioritize the objectives under consideration.

concept-illustrations-for-edge-building-email-and-e-commerce

Figure 1: Products and services such as edge building, email, and e-commerce that are routinely trying to optimize for multiple metrics.

Many of these objective balancing approaches provide the same experience for all members, missing an important opportunity for personalization to adapt these objectives to the individual preferences of the end-users. Personalized solutions to many of the aforementioned problems can be framed as a Linear Program (LP) (Agarwal et. al. 2011, 2012); however, it has been near impossible to solve them at the scale required by large-scale internet applications. Typically, such LP problems consist of hundreds of millions of members and hundreds of decision variables per member, making the overall problem range in the tens of billions to trillions of variables. As a result, many ad-hoc approximations and sampling techniques are often used in practice, resulting in sub-optimal solutions. To optimally solve this problem, we developed and are now open-sourcing DuaLip, a distributed Linear Program solver that can tackle such problems on an extreme scale.

Multiple objectives in AI-driven products

Over the past few years, machine learning models, especially deep networks, have become ubiquitous in various AI-driven products such as recommender systems and search. They are popular for estimating the likelihood of various objectives for each product. For example, in a growth application (focusing on email or notifications), the objectives can be the propensity of a member to click on an email or notification to visit the platform, or to complain about or disable the same. Once the likelihood of these multiple objectives are estimated, we need to use them to appropriately send or drop the email or notification.

One approach to combine the different objectives is to learn a mapping from them to a single, typically longer-term objective. Reinforcement learning is a great fit for this approach. For instance, if we only cared about the long-term daily retention of members, then we could learn a mapping to optimally combine the click and complain objectives for each member to retain them in the longer term. However, since emails or notifications are also driving sessions and, in turn, downstream impact, then that trade-off needs to be explicitly handled.

Another possibility is to combine the different objectives using a linear function with weights. The weights denote the relative importance of each objective, as can be seen in this simplified representation of a scoring function:

score-equals-click-minus-alpha-complaints-plus-beta-downstream-impact

Using this score, we can decide to send or drop the email or notification. So, if we wanted to obtain a certain balance between the objectives, how should we choose the weights to get us there? To answer this, let us consider a simplified version of this problem. Let us assume there are two business objectives: increase member sessions generated from the emails, and reduce member complaints from receiving them.

screenshot-of-email-notification-example

Figure 2: Each email can help members come back to the platform or cause members to unsubscribe.

Our solution form picks one of the objectives as the primary objective and the others as constraints. For example, we can write out this problem as:

max-sessions-such-that-complaints-are-less-than-or-equal-to-a-bound

Here Sessions is the primary objective and Complaints is converted into a constraint, where the bound value can be changed to obtain different balances between the two objectives. With the appropriate combination of weights and the two individual utility estimates, we can decide whether to send each of those emails.

Many other applications at LinkedIn and at other companies have a very similar form. They care about multiple critical objectives, which are often conflicting among themselves. On the newsfeed, we try to balance revenue and engagement; the People You May Know (or PYMK) application tries to optimize for both connections and downstream interactions once the connection is formed, and job recommendations try to balance hiring outcomes with revenue from paid jobs.

Multi-Objective Optimization (MOO) and Matching problems

In the examples described above, we care about a particular collection of business metrics, and balance these metrics through Multi-Objective Optimization. Irrespective of the type of metric each example is concerned with, a unifying property of these kinds of problems is that the number of metrics they care about is small. Since these metrics are represented as constraints of a LP, you can think of these problems as large-scale LPs with millions of variables but with a small number of constraints. The large number of variables in the LP are the decision variables corresponding to each member and each item. The small number of constraints are the encoding of the different metrics as a function of these decision variables.

Other than MOO, we also often have an additional class of problem in the marketplace setting, which is matching two (or more) sides in the marketplace, while balancing individual-level constraints. For example, in the network growth or edge recommendation problem, our overall objective is to maximize the number of connections formed. However, we would like to do this in a way that ensures that no member is overwhelmed by receiving too many invitations. That is,

max-connections-such-that-invites-sent-to-member-j-are-less-than-or-equal-to-b-sub-j

While the structure of the problem is similar to MOO problems, there is one key difference: the number of constraints is much larger. Each member can potentially have a constraint. We call these “matching problems,” since we are pairing items (j) with members (i) (or entities on different sides of the marketplace) while considering the interests and capacity or limitations of each entity, and then optimizing the holistic objective. This typically means hundreds of thousands to millions of constraints.

talent-solutions-has-more-than-14-million-active-job-listings-and-40-million-job-seekers-visit-linkedin-every-week-marketing-solutions-has-57-million-company-pages-and-members-interact-with-linkedin-pages-more-than-1-billion-times-per-month

Figure 3: Typical sizes of LinkedIn’s key marketplaces

Linear Programming formulations

We’ll now describe how we can use Linear Programs (LPs) to solve these two classes of problems, MOO and Matching. For MOO, we’ll run through the email example. We denote a member as i and an email by j. The optimization variable is x_ij which denotes the chance of sending email j to member i. That is,

x-sub-i-times-j-equals-p-parenthesis-send-email-j-to-member-i-parenthesis

Through quantities p_ij(estimate of downstream sessions) and q_ij(estimate of complaints) that come from deep models' predictions, the objective (downstream sessions) and the constraint (complaints) can be formed as linear functions of the x_ij. Using this, we can write our problem as

formula-to-maximize-total-downstream-sessions-subject-to-complaints-and-probability-constraints

Given the scale of members (for example, 100M) and set of candidate emails (for example, 10 per day to each member), the above problem becomes an LP with 1 billion variables x_ij, but just a single global constraint. Here, “global” refers to the fact that the constraint is across all members. In general, MOO problems have

A large number of variables;
A small number of global metric constraints; and
Are heavily decoupled with simple probability constraints.

Similarly, for the matching problem, we can also frame it as an LP. For the network growth example, i and j are both members, and x_ij is the chance of showing member j to member i. That is,

x-sub-i-times-j-equals-p-parenthesis-showing-member-j-to-member-i-parenthesis

Through deep model predictions of invitations (p_ij) and accepts (q_ij) we can express the objective and the constraints as linear functions of the x_ij.

formula-to-maximize-total-connections-subject-to-per-recommended-user-bound-and-simplex-constraints

The final set of recommendations for member i, are the list of members j, sorted according to x_ij. The key differences of matching from MOO are that

There are per-member or per-item constraints that make the number of constraints large.
The number of variables x_ij is typically much larger. For instance, with 100 million members and 10,000 per-member constraints, we have a 1 trillion variable LP.

That being said, we can still think about the matching problems as MOO problems with a large number of entity-specific constraints. Hence, both of these formulations can be written in a unified way:

Here c encapsulates the objective function, A encapsulates the global or per-member constraints, b is the bound on the constraints, and C_is are the probability or simplex constraints that are easily separable across members denoted by i.

We have developed and are now open-sourcing DuaLip to solve this problem on a massive scale. For more examples and the theory behind our algorithm, please see the full papers (Basu et. al. (2020), Ramanath et. al (2021)). For the rest of this post, we’ll provide a very high-level overview of various aspects of DuaLip’s design.

DuaLip: Dual-decomposition based Linear Program solver

We built DuaLip to solve LPs arising from web applications and to possess the following key features:

Extreme-Scale: DuaLip is specifically developed to tackle problems arising in web applications that usually have hundreds of millions of members and millions of items, pushing the number of optimization variables into the trillions range (if not more). It uses a dual decomposition technique to be able to scale to such large problems. For details and a wide range of applications, see Ramanath et. al. (2021) and Basu et. al (2020).
Efficient: Although we follow first-order gradient methods to solve the problem, we implement several highly efficient algorithms for each of the component steps. This allows us to scale up 20x over a naive implementation. Please see the recent paper for a comparative study.
Modular Design: In our implementation, any problem can be formulated through a highly modular approach.
- solver: We begin by choosing a first-order optimization solver. We currently support Proximal Gradient Ascent, Accelerated Gradient Ascent, and LBFGS-B.
- projectionType: We implement several very efficient projection algorithms to allow for the wide class of constraint sets Ci
New formulations can also be added by appropriately stitching together these different components.
Detects Infeasibility: We have incorporated simple checks on infeasibility (see Appendix D of our paper). This helps the end-user to appropriately tweak their problem space.
Extensive Logging: We have added extensive logging to help users understand whether the solver has converged to a good approximate solution.
Warm Start: We allow the user to input an initial estimate of the dual solution if she is familiar with the problem space. This allows for a very efficient solving of the overall problem.

The overall system architecture is shown in Figure 4.

Figure 4: The overall Spark-based architecture of DuaLip Solver

DuaLip has been open-sourced and is now available on GitHub. For more details on the library and its various usages, we invite you to take a look at our GitHub documentation for the most up-to-date information. We welcome contributions to improve the overall system.

Acknowledgments

It takes a lot of talent and dedication to build the AI products that drive our mission of connecting the world's professionals to make them more productive and successful. We would like to thank Lingjie Weng, Shipeng Yu, Hema Raghavan, Deepak Kumar, Romer Rosales, Suju Rajan, and Igor Perisic for their instrumental support, and Cyrus DiCiccio, Shaunak Chatterjee, Ankan Saha, Ajith Muralidharan, and Souvik Ghosh for the detailed and thoughtful discussions. We would also like to thank Rahul Mazumder for the great collaboration during the algorithmic development of the solver. Finally, we are grateful to our fellow team members from PYMK AI, Notifications AI, Jobs AI, and Ads AI for adoption and collaboration.

Topics: Artificial intelligence Open Source Scalability Machine Learning