Feed Infrastructure owns multiple large scale distributed systems that power the feeds and many of the search experiences core to our LinkedIn members’ experiences. Our technology domain includes information retrieval, machine learning, and distributed datastore. It is composed of multiple business-critical services including maintaining a source of truth store for feed data, creating indexes for timelines, a federation layer for second pass ranking, indexes for content and messaging search, and search history retrieval. Our tech stack includes Kafka, Rest.li, Espresso, RocksDB, Caffeine cache, Galene, and Apache Lucene.
FollowFeed - Activity Feed Indexing
FollowFeed is the backend indexing system for social activities at LinkedIn. It ingests and indexes social activities like posts, shares, reactions/likes, and comments from the feed with other items of interest for our members like publishing an article, a job change, work anniversaries, etc. These activities, represented as Actor-Verb-Object tuples, from user’s network make up for a significant portion of the user’s main feed and many other feeds across LinkedIn, requiring the serving layer to serve tens of thousands of QPS.
Since the feed favors surfacing activities that are recent, FollowFeed’s fundamental storage layer is a timeline database. This database allows for fast and efficient retrieval of recent activities. In addition to a timeline database, the index also leverages an activity graph, that maintains relationships between objects and related activities.
Artificial Intelligence is central to FollowFeed surfacing the most relevant activities in feed. AI models are used to score and rank tens of thousands of potential candidates, for every feed request, to help select the top-K items.
Community Feed Indexing and Messaging Search
The team owns the indexing of feed entities that help build global professional communities for LinkedIn members. They collaborate with partners to enrich product experiences such as hashtags, videos, groups, etc., and improve the ranking and personalization with iterative machine learning models and diverse features.
The team also owns LinkedIn’s messaging search backend. It is also a distributed system that involves data ingestion, encryption, decryption, persistence, and index generation, etc.
Feed Serving Infra
This team is responsible for all the run-time query serving stack of Feed infrastructure.
Services maintained by this team include Feed Mixer, which powers a large number of feed use cases at LinkedIn (including feed, hashtags, QnA, polls, events and many others). Feed Mixer as a platform provides a plugin framework to customize workflow for building a feed and run machine learned ranking models. As a recommendation platform, Feed Mixer supports high velocity for AI model iterations at the rate of more than 200 new models deployed per quarter. It also provides capabilities to onboard new use cases quickly and enable product engineers to experiment new ideas with minimal friction.
Additionally, the team is also responsible for building the Activities backend that serves as the source of truth for all organic user actions happening in LinkedIn.
Other services include Activity Graph, a distributed graph index that is used for feature propagation, efficient decoration and impression discounting.