Bringing Salary Transparency to the World

This blog post was updated on June 15, 2017

Coauthor: Santosh Kumar Kancha

There are any number of different factors that go into a decision about accepting a new job opportunity. What are the values of the company? Does the role fit my skillset? Is there growth trajectory? How long is my commute going to be? What are the benefits? Who would I be working with? But one of the most basic factors in any job that we typically consider is whether it meets our financial needs.

Today, we’re excited to introduce our new LinkedIn Salary product. This is a tool that will not only help our members understand their value in the marketplace, but also help LinkedIn as an organization better understand the economic aspects of the Economic Graph. This is data that will help us learn how various roles are valued in different regions of the world, how years of experience and seniority affect compensation, and help us do our part to help bring salary transparency to the world. At LinkedIn, we believe that by combining our detailed understanding of our member base with an understanding of what employers value, we can empower people around the globe with insights. This includes how they can understand their earning potential today, and also how to continue to grow in their careers though the identification and acquisition of new, valuable, and in-demand skills.  

Why LinkedIn?

With 460+ million members, a deep trove of structured information that helps us understand a person’s background and skills, and an engaged member base, LinkedIn may be the only organization in the world with the opportunity to collect and leverage this data at scale, turning it into rich and accurate insights for our members. Today, searching for salary insights on the web yields a number of different free services with wildly varying ranges of salary estimates. LinkedIn’s connections to all aspects of the Economic Graph (including employers, industries, regions, individual skillsets, years of experience, background, etc.) allows us, over time, to be able to provide more accurate and rich insights than any of these organizations. We are also exploring a formal process for companies to directly verify salaries through our service, further increasing the accuracy and confidence of the values we reflect.  

Technical goals

Give to get
Employers have access to paid third party enterprise solutions that provide salary information along many dimensions (title, location, company, seniority, etc.).To level the playing field, we want to bring the same depth of information to our members for free. This will encourage employers to be more transparent with the salaries they offer, job listings they post, and future employees they approach. Upon submission of their own salary information, we provide one year of full access to salary insights to members.  After one year, members can continue to extend full access by submitting their current salary information.

For our paying Premium Job Seeker subscription members, the features of LinkedIn Salary are provided without restriction, and with additional, premium salary-related features available throughout our job seeking experiences.

Privacy
One of the interesting dichotomies with compensation data is that many users want this information, but don’t want to have their individual data exposed or connected back to them. Salary information is personal to each of our members. With this in mind, and consistent with our Members First organizational philosophy, one of the first goals we established when we set out to embark on this project was to provide powerful salary insights in aggregate without risking an individual’s private information. In the end, we built a salary collection system to provide protections for the identities of our members—no easy task. Parts of our approach are detailed later in this article.

Security
Security was a top priority in the creation of the compensation system backend, and goes hand-in-hand with our privacy goals. There are a number of factors that we considered when designing this experience, including:

Encryption
In the first step of the salary collection, members’ identities are separated from their submission data. Since we use a give-to-get approach, we do track that a member has submitted salary data, but proactively separate their Member ID from the compensation details of their submission. The salary information is subsequently encrypted and stored, with ACL restrictions on systems access and decryption key access. Only subsystems that process salary submission data have access to decrypt these data points. Even though this information is de-identified, one of the reasons we also encrypt this data is to prevent member re-identification prior to establishing a critical mass of data points for each of our cohorts, which we call “slices.”

Slice generation
Each submission generates a new data entry into multiple “slices.” Each slice is a value tuple for predefined dimensions. Dimensions are the intersection of multiple data points that are interesting for the purpose of estimating salary insights. Examples might include your job title, years of experience range, company, and location. You can see an example below of how one salary submission can result in a number of slice entries across many dimensions.

Slice generation image

After slice generation, insights are created based off of the slices, as opposed to off of the salary submission itself. A slice does not have any associated member identifiers. This design ensures that the insights can not be assertively traced back to the member that submitted the salary.

Slice thresholds
Before any slices are processed and can be used for estimating salaries for our members, they are held in an encrypted temporary store until that slice reaches a defined number of entries. This again helps prevent re-identification. For example, if we processed a slice for salaries of CEOs at LinkedIn, and there was only one entry, that would be pretty darn identifiable. Simultaneously, this same submission would also have created an entry in the “CEOs in the San Francisco Bay Area” slice, and could still be used by, and useful for, our service—provided that other CEOs in the Bay Area also provided salary details to the system.

Other security considerations
The above is just the beginning of the security considerations we have thought about in order to help lock down this system. However, there are many other potential attack vectors that we also wanted to guard against. A few other things we considered when designing this system include (but are not limited to):

  • Delayed processing: Preventing time-based inference attacks was also factored into our handling. Random delays in the processing of slices helps make sure that salary information is not inferred based on when a member submitted their salary information.
  • Data center encryption: Not only do we encrypt submissions from client browsers to our data centers, but our data is also encrypted within our internal data centers, helping protect against wire sniffing-based attacks.
  • Internal network lockdown: Due to the way we encrypt and enforce internal ACL restrictions, even our own engineers working directly on the project are not able to use the information to uncover the identity behind a salary submission.

Accuracy
The accuracy of the salary insights is very important to our members. Our members make important decisions based on this data. From time to time, we may receive inaccurate salary data points. It may be due to mistakes at the time of submission or malicious intent. Filtering out incorrect data points is therefore one of our key objectives. We have wide variety of modeling-based accuracy detection systems in place to fine tune the accuracy. Some of the different approaches we consider for improving accuracy are outlier detection (people submitting invalid salaries which would interfere with the system), training/validation with additional public and private sources such as Bureau of Labor Statistics, statistical inference for low inventory cohorts, similar company groupings, cost of living adjustments (COLA), title standardization and synonyms, and a whole lot more. By adding more data to the system—whether from end users or directly from employers—we will continue to be able to provide an ever-increasing reliability to the projections that we make for compensation information.

What’s next?

Our goal is to facilitate a vibrant and sustainable ecosystem for collecting and estimating compensation information for our members. Beginning in early 2016, we started collecting salaries on an invitation basis, and prior to launching one million people submitted their salary across U.K., Canada and U.S. LinkedIn Salary was built on top of more than a million salary data points that we collected in just a few short months. We have a lot of new ideas in this space to explore going forward, such as integrating salary features with core pillars of LinkedIn, helping identify and guide a member’s career trajectory through the identification and acquisition of valuable skills, expanding to international markets, and more. We are just getting started in this space. Stay tuned for more exciting features to come!