Keeping LinkedIn professional by detecting and removing inappropriate profiles
January 16, 2020
Our members place their trust in us, and expect and deserve a safe and trusted community where they can express themselves professionally. We are constantly investing in a variety of systems (Automated Fake Account Detection at LinkedIn, Defending Against Abuse at LinkedIn’s Scale, and How We’re Protecting Members From Fake Profiles) to help automatically detect and remediate behaviors that violate our Terms of Service, such as spam, harassment, scams, and account takeovers. In this post, we’ll detail our approach for handling one particular type of bad behavior: profiles that contain inappropriate content. This can range from profanity to advertisements for illegal services.
Finding and removing profiles with inappropriate content in an effective, scalable manner is one way we’re constantly working to provide a safe and professional platform. This blog post outlines how our team’s approach to this challenge has evolved and what we’ve learned along the way.
Scaling our approach
Our initial approach was to identify and establish a set of words and phrases, also known as a blocklist, that violate our Terms of Service or Community Guidelines. When an account was found to contain any of these inappropriate words or phrases, it was marked as fraudulent and removed from LinkedIn. While this approach was successful at containing some of the most egregious content, it had a few drawbacks:
- Scalability. This approach is a fundamentally manual process, and significant care must be taken when evaluating words or phrases.
- Context. Many words may be used in both appropriate and inappropriate contexts. For example, the word “escort” is often associated with prostitution, but may also be used in contexts such as a “security escort” or “medical escort.”
- Maintainability. Blocklists only grow larger over time as more phrases are identified. Tracking performance as a whole is simple, but doing so on a phrase-by-phrase basis is non-trivial. Significant engineering effort is required to ensure the stability of the system is maintained.
Given the challenges shared above, the manual aspects limit the total volume of accounts we can surface with this approach. In addition, we must be highly confident that an account is inappropriate before it is removed from the site. The challenge of context means that, in order to meet this standard, we can only include phrases we are highly confident would never be used in a legitimate profile. For example, we cannot use words like “escort” that could also have a legitimate context.
New approach: A machine learned model
To improve our performance in this area, we decided to change to a machine learning approach. Our machine learning model is a text classifier trained on public member profile content. To train this classifier, we first needed to build a training set consisting of accounts labeled as either “inappropriate” or “appropriate.” The “inappropriate” labels consist of accounts that have been removed from the platform due to inappropriate content. The bulk of these accounts were captured using the blocklist-based approach described above, while a smaller portion were surfaced by a manual review of accounts reported by our members.
Only a very small portion of accounts on LinkedIn have ever been restricted for containing this type of content. So, if we trained our model on the entire public member base, the extreme ratio of good to bad accounts would cause significant issues with its ability to learn the correct patterns present in inappropriate accounts. To mitigate this problem, we downsampled from the entire LinkedIn member base (660+ million) in order to obtain our “appropriate” labeled accounts.
For our model, we leveraged a Convolutional Neural Network (CNN), a specific type of deep learning architecture. Convolutional neural networks are particularly useful for data that has “spatial” properties, meaning that there is information contained in the fact that two feature values are adjacent to each other. Because of this quality, they perform particularly well on image and text classification tasks. For example, a CNN can easily learn that while the word “escort” may often be associated with inappropriate content, its meaning changes entirely when it is used in contexts like “security escort” or “medical escort.”
Challenge: Training set bias
One of the most significant challenges in any machine learning task is assembling a training set that contains enough breadth of information to be useful. To put it simply, the model will only learn what you tell it to. This is especially challenging in the anti-abuse space, where examples of bad behavior in particular are scarce and often unreliable. In general, labels for training data are surfaced using accounts that have been previously restricted for a variety of reasons. When new models are trained using these labels, there is an inherent bias towards re-learning the patterns of existing systems.
In the case of this project, a large portion of the positive labels came from inappropriate word or phrase restrictions. This created a significant risk: without careful curation of the training set, the model may simply learn to mimic the previous approach. To understand why, consider the word “escort,” used previously as an example. Escort is an extremely common word in profiles associated with prostitution services; therefore, a signficant portion of our inappropriate labels contain this word. In comparison, uses of the word in connection with profiles permitted on the platform (e.g., “security escort”) only make up a tiny portion of LinkedIn’s entire 660M+ member base. A problem arises because appropriate labels (in other words, appropriate uses of the word escort) are downsampled but inappropriate labels are not. This results in a training set where inappropriate uses of the word escort far outweigh appropriate ones. However, when we look at the global member base, the opposite is true - permitted uses of “escort” are much more common than prohibited uses.
The only way to address this issue is to systematically build a training set with these challenges in mind. For this problem, we identified a variety of problematic words that were creating high levels of false positives and sampled appropriate accounts from the member base that contained these words. These accounts were then manually labeled and added to our training set.
Impact and next steps
We've leveraged this particular model as a part of our ongoing work to take down abusive accounts on our platform. This system scores new accounts in production daily, and was also run on the existing member base to identify old accounts containing inappropriate content. Moving forward, we will continue to refine and expand our training set to increase the scope of content we can identify with this model. In addition, we intend to leverage Microsoft translation services to ensure strong performance across all languages supported by the LinkedIn platform.
Detecting and preventing abuse on LinkedIn is an ongoing effort requiring extensive collaboration between multiple teams. I would like to acknowledge everyone involved for their ongoing efforts in tackling these difficult problems to help keep LinkedIn a safe and professional platform. I would also specifically like to thank Carlos Faham, Jenelle Bray, Grace Tang, Zhou Jin, Veronica Gerassimova, and Rohan Ramanath for their invaluable assistance in this ongoing work.