Building member trust through a centralized and scalable settings platform
May 29, 2019
At LinkedIn, we are entrusted with protecting the privacy and data of our members. Storing members’ choices on how they intend their data to be used is vital to building and maintaining member trust, as well as ensuring compliance with legal and regulatory requirements. It is also essential that we respect members’ preferences for how they wish to interact with LinkedIn’s many products and features.
These choices and preferences are expressed in the form of settings. This post covers the architecture and design of a new backend system for storing all settings at LinkedIn to enable better support for complex use cases, a faster developer experience, and future ease in enforcing settings.
What are member settings?
LinkedIn stores hundreds of member settings, most of which can be viewed and modified on the Settings & Privacy page. These settings fall into four main categories:
- Controlling the visibility of your data (e.g., Is your primary email address visible to second degree connections, first degree connections, or only you?)
- Controlling how your data is used (e.g., Is LinkedIn allowed to use the companies you follow to show you more relevant advertisements?)
- Limiting the frequency of communications (e.g., Would you prefer to receive emails from LinkedIn about invitations to connect on a daily basis, weekly, or never?)
- Indicating your preferences on how LinkedIn features should behave (e.g., Would you like videos to auto-play when scrolling through your feed?)
Clearly, there is a working solution today for storing these settings. So, what’s the problem?
Problem #1: Lack of support for complex settings
The main legacy settings service was designed for a very specific purpose: storing simple setting values (usually a boolean or an enum) that are associated with a single LinkedIn member. However, as LinkedIn has grown, we have seen a need for more complex use cases that are not supported by this service:
Defining hierarchical relationships between settings
For example, let’s say we have various settings that control the frequency at which members receive certain types of email campaigns. The options for each email frequency setting are DAILY, WEEKLY, or NEVER. We might want to define a third “All LinkedIn emails” setting that is the “parent” of all email frequency settings. This parent setting enables a member to globally turn off or on all LinkedIn email communications, while preserving the original value of each individual “child” setting.
The effective value of the “Invitations to connect email frequency” setting would be computed as follows:
|Actual value of “All Emails” parent setting||Actual value of “Invitations to connect email frequency” setting||Effective value of “Invitations to connect email frequency” setting|
The legacy settings service only supports settings that are keyed based on a single LinkedIn member ID. However, we need to be able to also store settings that are keyed on multiple entities. For example, settings that control preferences for each group that a member is a part of would need to be keyed based on a combination of a member ID and a group ID.
Non-member based settings
Although member settings account for the majority of LinkedIn settings, there is also the need to store settings for other types of LinkedIn entities—for example, settings that are associated with a group or a business account.
Problem #2: Decentralized settings storage
Because of the lack of support for the complex use cases described above, various teams created custom settings services and databases that, overall, serve very similar purposes with minor tweaks. As a result, settings are now stored in various services and databases across the LinkedIn stack. The cost of building and maintaining each of these services has resulted in significant additional development effort across the company. In addition, the decentralized nature makes it harder to reason about or change how settings work across the whole LinkedIn ecosystem.
Problem #3: Time to create a new setting
Almost every major feature developed at LinkedIn requires the creation of one or more corresponding settings. To create a new setting today, a developer needs to make manual backend changes to enable the storage of the setting values, as well as frontend changes to display the new setting on the Settings & Privacy page. Across code development, testing, code reviews, approval processes, and service deployments, the entire process takes an average of about five weeks of elapsed time and two weeks of real developer time for each new setting. Approximately 50 new settings are added at LinkedIn every year—that’s a lot of developer time!
Although the process is slow, one advantage is that the manual code changes provide a history of setting metadata changes and a built-in approval process by way of code reviews. This is important due to the sensitive nature of member settings, so any new system should also provide this functionality.
Solution: New settings platform
To address these problems, we built a new backend platform to create and store settings at LinkedIn. The central component of the new system is a new mid-tier service that provides Rest.li APIs that other LinkedIn microservices will interact with in order to read, modify, and create settings. The Settings Mid Tier exposes two main services: one for serving setting values to external clients and one for storing setting metadata. The system also includes an internal developer tool that makes it easy for developers to create and modify setting metadata.
At a high level, here is how each of the components of the new system aims to solve the above problems:
Problem #1 (Lack of support for complex settings)
The setting values service supports hierarchical settings logic, both single-keyed and double-keyed settings, and settings that are keyed on any type of LinkedIn entity. The setting metadata service enables this functionality by storing the hierarchical relationships between settings and the key types for each setting.
Problem #2 (Decentralized setting storage)
The setting values service acts as a central access point for all settings across LinkedIn, and also provides a centralized database for storing this data. Eventually, all settings data will be migrated to this database.
Problem #3 (Time to create a new setting)
The setting metadata service allows settings to be created and modified dynamically without manual code changes. The developer tool makes this process even easier. These services reduce the elapsed time for creating a new setting from a matter of weeks to a matter of days, with minimal developer effort.
Setting values service
This service provides access to the setting values for each entity that settings are associated with. For example, LinkedIn’s email sending service might query the setting values service to fetch the value of Member A’s “Connections in the News” email frequency setting to determine whether or not to send Member A a new “Connections in the News” email. Or the member profile service might query for the value of the “group visibility” setting for (Member A, Group B) to decide whether to display Group B on Member A’s profile.
As implied in the above examples, this service actually exposes two APIs: one for single-keyed settings and one for double-keyed settings. The keys for both APIs are not restricted to member IDs, but rather can be any type of LinkedIn entity. This service will also handle the business logic of computing the effective values of settings that have hierarchical relationships to other settings.
Based on analysis of traffic to the legacy settings services, the new setting values service will eventually need to support over 600,000 queries per second at a low latency, some of which will be batch queries, resulting in over two million setting values read per second. To support this kind of traffic, setting values are stored in Espresso, LinkedIn’s in-house distributed NoSQL database that can easily scale to accommodate traffic growth.
Migration and centralization
Ultimately, the long-term goal is to centralize storage of all settings in this Espresso database. However, at a large company like LinkedIn with hundreds of different microservices that interact with each existing setting service, this can only be done through a carefully executed migration plan. This will be done in two main phases:
Federation: The setting values service acts as a pass-through to the existing legacy sources of settings, and routes each incoming call to the appropriate downstream service. This allows us to centralize client calls to the setting values service as soon as possible. The new settings system is currently in this phase.
Centralization: Migrate all client calls to the setting values service. Then, all existing settings data will be migrated to the dedicated setting values database. After data migrations are complete, the legacy settings services can be shut down and the new Settings Platform will be the central source of truth for all LinkedIn settings.
Setting metadata service
The setting metadata service serves all of the data that defines each setting type, including functional data used to validate and compute setting values, as well as data that is used purely for documentation and ownership purposes.
Some examples of metadata that define a setting type include, but are not limited to:
|Setting type id||A unique numerical identifier for this particular setting type|
|Key count||Is the setting type singled-keyed or double-keyed?|
|Key types||What type of LinkedIn entity or entities is this setting type associated with? (e.g., member, group, business account, etc.)|
|Value type||What type is the setting value? (e.g., boolean, enum, list of enums, etc.)|
|Default value||If the setting value has never been explicitly changed, what is the default value of this setting?|
|Off value||What is the value of this setting if any of its parent settings are turned off?|
|Parent setting type ids||Which settings are parents of this setting, if any?|
|Documentation||What is this setting type used for? Which LinkedIn team owns this setting type?|
This type of data is more suited to a relational database. For example, we might want to do a query across all setting types to determine which settings are children of a given parent setting. Or we may simply want to retrieve all possible setting types. These kinds of queries are difficult to do with a key-value store like Espresso, so we decided to store setting metadata in an Oracle database, which is a well-supported relational database system at LinkedIn.
Although this database serves as the source of truth, the fact that the data does not change frequently means that it isn’t necessary to query the database every time setting metadata is needed. As a result, we built a simple in-memory cache on top of the database.
Versioning and review process
Each setting metadata record has a state associated with it. A new setting metadata record is always created in a DRAFT state, and must go through an approval process before it becomes ACTIVE.
The setting metadata service also serves as a history of all setting metadata changes. Each setting metadata record is marked with a version number. Changes are made by creating a new version rather than directly modifying the existing active metadata. Newly created versions must go through the same approval process outlined above.
Instead of outright deleting old metadata records, each setting metadata version is marked as a DEPRECATED state when a newer version replaces it as the active version or when the setting is simply no longer needed. This ensures we preserve a history of all past setting metadata.
Settings developer tool
The introduction of the setting metadata service means that the creation of a new setting will no longer require code changes or service deployments. This will vastly speed up development time. To further improve the experience, we are designing a tool to help developers create the metadata that defines a new setting, get it through the approval process, and easily browse existing settings. This tool will interface directly with the setting metadata service.
Conclusion and next steps
Settings are central to the LinkedIn experience in that they empower our members to control how their data is used and customize how the platform behaves. Due to increasing complexity of our products from new features and the adoption of microservices, we have outgrown the original settings storage systems and need a new solution. With the completion of this new settings system, LinkedIn developers will be able to leverage better support for complex use cases and experience a more streamlined onboarding process, accelerating the development of new features for our members.
The current system we have built serves clients who make online Rest.li calls to read and write settings. Although this covers the majority of use cases, it is not the only way in which clients access this data. The next step is to extend the system to provide a solution for clients who read settings data offline, and clients who access settings data in near-realtime by consuming change events.
Equally important as the creation and storage of settings is the enforcement of the member choices that they represent. Once the Settings Platform is complete, the team will work towards building a consistent framework for enforcing settings across the LinkedIn stack.
This project has been a collaborative effort across many teams at LinkedIn. Many thanks to Michael Ackerman, Qiru Liu, Sanmin Liu, Giridhar Kommisetty, Chun Zhang, Xiaofeng Wu, Max Wolffe, Kevin Fu, Armen Hamstra, and Alex Sung.