A closer look at how LinkedIn integrates fairness into its AI products

January 13, 2022

Co-authors: Heloise Logan, Preetam Nandy, Kinjal Basu, and Sakshi Jain

At LinkedIn, we work constantly to improve our platform with evolving AI models and systems. Delivering fair and equitable experiences for each of our nearly 800M members is paramount to this work, and we have designed our AI systems in ways that help us provide the right protections, mitigate unintended consequences, and ultimately better serve our members, customers, and society.

As part of our ongoing journey to build on our Responsible AI program, we wanted to share more insight into how we think about algorithmic fairness and explainability. We think of algorithmic fairness as following the principle that AI systems should provide equal opportunities to equally qualified members; and we think about algorithmic explainability as unraveling the AI mystery box, helping modelers and members to better understand the decisions that models make.

In this post, we will discuss our progress by detailing an internal, platform-based solution that can support numerous AI models being trained across different product verticals to automatically detect and mitigate unfair bias with minimal developer touchpoints. This system helps ensure that AI modelers can easily iterate on their models knowing that if and when fairness issues are detected, they will be flagged. Ultimately, we hope to contribute many of the designs described in this post to the LinkedIn Fairness Toolkit (LiFT), which we announced last year.

Overall design

To devise a practical fairness measurement and mitigation solution, we want to share how we design solutions that can be readily embedded into the existing machine learning development lifecycle. This provides our AI modelers visibility, control, and flexibility so they can easily act on identified fairness gaps.

The essential steps of our comprehensive solution include the following (Figure 1):

Start by evaluating the current model with the fairness metric.
If the model passes the fairness evaluation, mitigation may not be required.
If the model fails the fairness evaluation, our system learns and appends a post-processing layer in the form of a score transformation after the original model’s scoring step.
The mitigated model is then launched as an A/B experiment.
We collect online experimental data and validate the fairness metric.

Fig 1. Overall measurement and mitigation system architecture that fits into every LinkedIn AI vertical

Steps 1-3 above are part of assessing a positive intent of not doing harm (yellow triangle in the diagram), while Steps 4 and 5 constitute the notion of actually making a positive impact (blue triangle). We also envision how the solution should be delivered in a way that fits naturally into a user’s existing AI workflow, requiring minimal effort while meeting the ease of maintainability and extensibility design considerations (detailed in the next section).

Figure 2 shows a typical ML lifecycle that consists of a model training phase and a serving phase.

illustration-of-typical-ml-system-lifecycle

Figure 3 illustrates the components you would add to apply the fairness measurement and mitigation solution directly in the model training stage.

illustration-of-ml-system-lifecycle-with-fairness-measurement-and-mitigation-added

Fig 3. Adding in fairness measurement and mitigation by inserting additional components, fair model analyzer and mitigation trainer, to directly correct for detected unfairness in models before experimental launch.

In addition to pre-model-launch mitigation, we also want a way to measure post-model-launch fairness using realized outcomes. This is useful in two ways: it validates the effect of a mitigated model, and it also measures the impact of an existing unmitigated model. Both scenarios can be covered by applying the solution directly on realized outcome data collected from experimentation. This decouples the application of the solution from the model training workflow, therefore making it agnostic to the recommendation architecture, and evaluates the direct impact of the recommendation as it works in a highly complex ecosystem.

Design considerations

The fairness solution needs to support the user workflow as depicted in Figure 1. Most importantly, we want to ensure the design inserts the mitigation trainer right after the fairness measurement step, so that mitigation follows detection naturally.

Ease of use and maintainability
The overall goal is to provide a framework and tooling support to measure the unfair bias in a model and support a mitigation reranker that fits on top of the model to correct for its fairness issue. This suite of provisions should meet the following criteria:

Agnostic to the clients’ modeling architectures. It doesn’t matter if the clients’ ML system is a single model or a combination of multiple models in a multi-objective optimization case, a logistic regression learner, or a deep neural network—we provide a post-processing reranking layer that goes on top of the original score regardless of how the score is produced.
Widely applicable across vertical teams with minimal maintenance cost. Since the development and delivery of AI models and recommender systems are powered by ProML, the core machine learning infrastructure at LinkedIn, it is natural for the fairness solution to be a part of the ProML offering and for it to highly leverage the existing framework.

Extensibility
The measurement component allows different fairness metrics to be plugged in, allowing us to explore and evaluate models with different fairness metrics. Similarly, the mitigation mechanism inside the mitigation trainer is also pluggable, facilitating experimentation with different mitigation strategies.

Methodology improvements: It is important for us to understand the impact of a mitigation strategy on the business and fairness metrics, as well as the potential trade-offs that could result when applied on different vertical applications. Additionally, in a marketplace set up where we recommend candidates (e.g., Recruiter Search, People You May Know), it is important to support measurement and mitigation based on group attributes, which can be viewer-based, candidate-based, or a combination of both.
Going beyond group fairness: Another aspect of extensibility is the ability to handle intersectionalities. Both measurement and mitigation are designed so that they can take in group attributes not restricted to a single dimension (for example, instead of mitigating against global male versus female population, we may want to investigate and mitigate the gaps per geographical region, e.g. males in the US vs males in Japan).

System components

Fair model analyzer
In a model training workflow, typically there will be a step where a test set is used to evaluate the trained model. This test set includes predicted scores, labels, and member IDs, and we retrieve protected attributes through a privacy-preserving setup as previously discussed. AI engineers who are building the models do not need to worry about access to such data. In fact, any data set containing at least these three pieces of information can be evaluated by the fair model analyzer. This means that we can also leverage the fair model analyzer for post-model launch evaluation by composing the input from the online experimental outcome data, thus receiving the realized impact of the launched model.

Based on the fair model analyzer’s evaluation, we provide our clients with a visualization of how model performance is for different attribute groups, as well as a quantifying metric. The quantifying metric is the decision variable for applying a mitigation reranker, and the visualization provides additional insights.

Mitigation trainer
The mitigation trainer component learns the amount of correction to apply by taking into account the fair model analyzer’s result. Naturally, measurement and mitigation have to align. The trainer supports different mitigation algorithms, allowing us to easily evaluate the impact of each algorithm on various metrics (e.g., fairness metric, business metric) of different verticals. The core algorithms powering these are directly called from the LinkedIn Fairness Toolkit (LiFT), and we continue to add new algorithms to the open-source project and leverage them through this platform.

By stacking fair model analyzer and mitigation trainer components together, this internal platform-based solution aids in the model development process by flagging and mitigating fairness issues when detected. Since all verticals are onboarded to ProML at LinkedIn, this solution fits well into every AI vertical’s stack with ease.

What’s next

While there has been a lot of progress made in measuring and mitigating unfair bias at scale, this is just the beginning. We are continually ideating and experimenting with methodologies to more accurately understand unfair bias in AI models, increasing granularity by slicing our analysis across interesting dimensions. While most of this post discusses fairness mitigation based on members’ protected attributes, we have also made significant progress in understanding and mitigating unfair bias in the content subject itself. For example, do our models score the phrase “women are bad drivers” differently than the phrase “men are bad drivers”, undermining the fairness in our content moderation tools? We are excited to share more about our work on this topic in upcoming blog posts.

While our goal is to ensure we have a solution at LinkedIn that provides fair recommendations, we would also like to share our learnings and tools to communities beyond LinkedIn, so that together, we can ensure all AI products are delivering fair recommendations to the users they serve.

Acknowledgments

We would like to thank our current and former colleagues who have contributed to the realization of designing and building the overall measurement and mitigation solution. Special thanks to Sameer Indarapu, who worked closely with us on the design of the mitigation system, Cyrus Diciccio, who developed the initial mitigation algorithm, Dean Young, who contributed to the fair model analyzer development, and Xiaodan Sun and Yu Gong, who provided ProML technical support and guidance. We also thank our partners in various client teams who provided us feedback on usability, and Shaunak Chatterjee, Ram Swaminathan, Romer Rosales, and Ya Xu for their continued support of this fairness initiative.

Topics: Artificial intelligence