Optimizing People You May Know (PYMK) for equity in network creation

Qiannan Y.

Engineering Manager @ LinkedIn | Machine Learning Expert

August 5, 2021

Co-authors: Qiannan Yin, Yan Wang, Divya Venugopalan, Cyrus Diciccio, Heloise Logan, Preetam Nandy, Kinjal Basu, and Albert Cui

LinkedIn’s mission is to be a place that is focused on creating economic opportunity and driving more equitable outcomes for every member of the global workforce. A key part of that mission is enabling people to connect with each other and build online professional networks. And we know, networks are important because they can translate into tangible, significant professional opportunities. Because of the important role networks can play in our members’ lives, we’re continually looking at ways we can help improve the network experience for all members, such as our commitment to help close the network gap and sharing data-backed recommendations for how to boost your network. Our latest work to optimize member experiences and create more equity in connection opportunities ties to one of our foundational features of network building on LinkedIn, our People You May Know (PYMK) recommendation system.

Introduction to People You May Know (PYMK)

One of the foundational features of network building on LinkedIn is a recommendation system called People You May Know (PYMK). PYMK has been a long-standing part of the LinkedIn platform, and is powered by some of our earliest machine learning (ML) algorithms. The goal of PYMK is to help members connect to people who may be relevant additions to their professional networks. There are different reasons why a member might request to be connected with another member on LinkedIn, such as staying in touch with past classmates or co-workers, or seeking job opportunities or career advice from a connection. Today, you can see this system in action on the LinkedIn “My Network” tab.

PYMK primarily uses data like the Economic Graph and platform interactions to mine features and use ML algorithms to come up with relevant recommendations. Specifically, it uses a combination of linear and non-linear models to estimate the propensity to connect between two members. This probability generates a P(connect) score, and PYMK subsequently recommends a list of potential new connections using members ranked according to this score. However, like any AI system, a significant challenge for the accuracy of this system is controlling for external sociological factors, like a member’s general visibility off-platform or the tendency for technologies (such as professional social networks or the internet) to be adopted gradually. This can lead to situations where AI-powered products can reflect an existing bias towards some groups of people over others.

Over the last year, we made several changes to the underlying PYMK algorithms in order to improve the PYMK experience for all members. These changes had the practical effect of making PYMK a more equitable feature by making it more effective for members regardless of their existing network strength or frequency of platform usage—rather than disproportionately serving “power users” of the site. Furthermore, though we expected some traditional key engagement metrics for PYMK, such as invitations sent, to decrease as a result of these changes, we actually saw net engagement wins from our interventions. These results were similar to our previous experience with changing the LinkedIn Feed to also optimize for creators, and not just viewers; in that instance, too, moving away from strictly ranking feed updates based on, essentially, the potential for virality led to positive engagement wins. As more of the workforce has become remote over the last year, our intuition has been that the ability to network and make professional connections online will become a necessary skill for more and more people. AI-powered systems like PYMK are necessarily most valuable if they can address the needs of a broad cross-section of individuals.

In this post, we’ll walk through two of these changes, looking at the problems we needed to solve, how we implemented a solution, and the results we’ve seen so far from our work.

Holistically optimizing the sender-receiver PYMK experience

When a member wants to connect with someone on LinkedIn, she (the sender) will need to send an invitation to that person (the recipient). A connection is only formed after that invitation is accepted. In other words, a connection cannot be achieved unless the recipient/invitee has a chance to look at the invitation and wants to accept it. For better user experience and a healthier connection network ecosystem on LinkedIn, we need to ensure both members benefit from the connection.

Recently, the Growth Data Science team sought to tackle a problem that may initially seem disconnected from improving equality: the poor experience that PYMK provided to extremely popular LinkedIn members.

Impression discounting leads to member experience wins
There are a subset of members on LinkedIn who receive a large number of connection requests, e.g., an influencer in an industry, a high-profile senior executive, or a recruiter from a big company. At a high level, having a disproportionate number of connection requests may appear to simply run counter to our stated goal of closing the network gap. However, it can also lead to the member’s network becoming overrun with feed updates and notifications that may seem random or from members who are only tangentially relevant to their own career. A member inundated with invites may also simply miss relevant networking opportunities. This negative experience showed up both in the data about how these members were using PYMK and in the form of direct user feedback.

example-screenshot-of-member-having-many-pending-invitations

In order to address this problem, we added a re-ranker on top of the P(connect) score by discounting/decaying the score of recipients with an excess number of invitations. In other words, as a recipient receives more and more invitations, the original score needs to be higher and higher in order for them to show up in PYMK results.

illustration-showing-effect-of-re-ranker

To understand how the system works, let us take a simple example. In the above figure, we have a ranked list of recommendations and the number on the top right shows the number of invitations already received by the member. Candidate A is ranked in the top position before we decay the score. To re-rank members, we compute newScore = score * df , where score is pConnect and df is the decay factor in [0,1]. Suppose we want to deprioritize recipients who have received >10 invitations in the past week. After decay, A’s new score becomes smaller, hence A becomes ranked below B, C, and D. In production, we use a piecewise linear function to compute the decay factor given the number of invitations received.

Evaluating our approach using A/B testing
At LinkedIn, we make extensive use of A/B tests to evaluate the performance of most of our products and AI models. This approach extends to testing our PYMK models, in which members are randomly assigned to different treatment groups and see recommendations from different models. If a model has better recommendations, members will send more invitations. This is the impact on the sender side, which can be directly read out from the A/B testing results.

The impact of the PYMK model is not limited to the sender side, however. When members receive invitations, they will come to LinkedIn to view and accept the invitation. This is the impact on the receiver side, which is harder to measure because some recipients can receive invitations from multiple senders, so it is hard to attribute the impact to a particular sender.

illustration-showing-complexity-of-a-b-testing-for-pymk-models

To overcome this challenge, we developed an attribution framework to attribute the sessions of recipients to the correct senders. For example, if the recipient gets a notification saying that Sender A wants to connect and then comes to LinkedIn to accept the invitation, the session will be attributed to Sender A. If the recipient comes to LinkedIn proactively but directly goes to accept the invitation from Sender B, then the session will be attributed to Sender B.

illustration-of-attribution-framework-developed-for-testing

Experimental results
With the addition of PYMK decay, we expect the user experience to be improved because we’ve prevented members from receiving too many invitations. On the other hand, we also expect a small decrease in member engagement because fewer invitations will be sent. And we assume members will be less active if fewer invitations are sent. However, we were still willing to move forward with the change despite that predicted increase in engagement metrics in order to achieve better user experience.

After implementation, we observed the alleviation of the member pain point as expected. We reduced the number of overloaded recipients (members who received too many invitations in the past week) on our platform by 50%, and noticed that the user experience of sending and receiving connection requests had improved, according to our survey results.

On the other hand, we actually achieved member engagement wins, which is counter-intuitive. While connection requests sent indeed decreased 1%, we observed sessions from the recipient side increased by 1%. This is because members with fewer invitations were receiving more invitations, for whom invitations are more efficient in driving engagement. The session impact from the sender side was neutral. So, overall, there was a 1% session win.

From these experiments and changes to PYMK, it was clear that the distribution of recommendations, not just the structural makeup of the ties that resulted, matters greatly for overall platform health.

Interventions to assist less-engaged members

Another key area of consideration for LinkedIn is providing a good experience for members who are at varying stages of their career journey or who may not yet be familiar with some features of LinkedIn (or online social networks generally). For these members, PYMK is a feature that has historically had much potential upside: members can find coworkers and peers faster, as well as identifying other people who might be friends, potential mentors, or future collaborators. Intuitively, a member who has very few connections might receive greater incremental utility from adding one more connection than a member who has hundreds or even thousands.

Like many AI algorithms, the ones that underpin our PYMK recommendations learn from recommendations that result in successful matches. Frequent members (FMs; members who are more engaged on LinkedIn) tend to have greater representation in the data used to train these algorithms than their less active counterparts, infrequent members (IMs). In ML applications, from computer vision to recommendation systems, algorithms can become biased for some groups due to uneven representation in training data. In the case of PYMK, we observed that frequent members (who are more represented in training data due to their higher activity levels on the site), are typically recommended to other members at a higher rate. Subsequently, these members can make even more connections, giving them further representation in the training data.

So how can we ensure that PYMK is fairly representing members from both groups and avoid reinforcing existing biases in networking behavior?

As discussed in our prior blog post about the use of the LinkedIn Fairness Toolkit (LiFT) for PYMK fairness use cases, we have developed and tested methods to re-rank connection suggestions according to equality of opportunity and equalized odds. In this case, we have applied these fairness notions to the PYMK problem of fairly representing infrequent and frequent members by giving qualified IMs and FMs equal representation in recommendations. As a result, we saw a 5.44% increase in invitations sent to IMs and a 4.8% increase in connections made by IMs, while remaining neutral on the respective metrics for FMs. This is interesting because typically, when invites are shifted from the FM group to the IM group, we would expect to see a metric increase for the latter and a decrease for the former. However, we observed neutral metrics for FMs and positive metrics for IMs, which indicates that recommendation quality has improved overall.

Conclusion

As stated previously, one goal of our responsible AI and responsible design efforts is to close the gap in economic opportunity that some experience due to where they grew up, where they went to school, or where they work. Another goal is to identify and mitigate systemic inequality and any unfair biases in our products, reflecting our position as a truly global platform.

Given the success of our work with frequent and infrequent members, the team is also working on short- and long-term testing to understand the effects of modifying exposure in PYMK recommendations to obtain equality of opportunity. Future efforts will compare groups using protected attributes like gender, for instance.

Editor’s note: Members who are interested in learning about the Follow button and other content sharing tools should visit LinkedIn for Creators.

Topics: Recommendations Artificial intelligence Product Design Data Science Machine Learning