Search Infrastructure

Search is fundamental to virtually every LinkedIn experience. Our members use search to find people, jobs, companies, groups, and other professional content. To power these solutions, our search platform brings together information retrieval, machine learning, distributed systems, big data, and other fundamental areas of computer science.

Our goal is to provide deeply personalized search results. To do this, our distributed search platform scales seamlessly across data and traffic with high availability and performance, while enabling engineers to rapidly innovate, experiment and improve relevance. Any engineer within LinkedIn can use our platform to quickly build their own search solution, searching their specific document corpus. Our platform also ensures it will operate with performance, stability, and scalability.

At the top of the search stack is a federation layer that allows our members to find and discover content across many products. It is the gateway to search at LinkedIn as it is used by all customer-facing products across the company. It also powers critical search capabilities like typeahead, query understanding, spell checking, and results blending.

Domain specific search queries are then handled by the serving platform, which provides a distributed system and operability layer that integrates with LinkedIn’s internal cloud. This layer includes mechanisms for cluster management, data distribution, cross-component communications, deployment management, and real-time metrics and diagnostics.

At the core is a search engine that powers our retrieval and ranking. Our architecture combines proprietary and open source technologies to allow us to efficiently scale across thousands of machines, while keeping the searchable data updated in real-time. We also partner closely with data scientists to build and serve performant online scoring models since machine learning is central to returning the best results.