Our data infrastructure team’s mission is to build a world-class data infrastructure that helps us ultimately deliver value to our members. Through highly operable, high leverage, and easy-to-use online and nearline infrastructure for data storage, indexing, streaming, media, information retrieval and derived data applications, we’re powering LinkedIn and the products and services our members use everyday.
Developed at Linkedin, Apache Kafka, Apache Samza and Brooklin form a world-class data processing infrastructure that powers our community of more than 660 million members.
LinkedIn has a diverse set of needs for online storage. We store images such as member profile photos, documents such as resumes, a variety of message attachments, videos, and more. Ambry is LinkedIn’s source-of-truth distributed blob storage system.
Search & Discovery
Our members use search to find people, jobs, companies, groups, and other professional content. To power these solutions, our search platform brings together information retrieval, machine learning, distributed systems, big data, and other fundamental areas of computer science.
Datasets on the scale of the Economic Graph cannot be encoded in the storage of a single computer, hence we designed a distributed system that could scale to support—both now and in the future—one of the world’s largest social network graphs.