Our data infrastructure team’s mission is to build a world-class data infrastructure that helps us ultimately deliver value to our members. Through highly operable, high leverage, and easy-to-use online and nearline infrastructure for data storage, indexing, streaming, media, information retrieval and derived data applications, we’re powering LinkedIn and the products and services our members use everyday.
Developed at Linkedin, Apache Kafka, Apache Samza and Brooklin form a world-class data processing infrastructure that powers our community of more than 660 million members.
LinkedIn has a diverse set of needs for online storage. We store images such as member profile photos, documents such as resumes, a variety of message attachments, videos, and more. Ambry is LinkedIn’s source-of-truth distributed blob storage system.
Feed Infrastructure owns multiple large scale distributed systems that power the feeds and many of the search experiences core to our LinkedIn members’ experiences. Our technology domain includes information retrieval, machine learning, and distributed datastore.
Machine Learning Infrastructure
What is the point of learning if you don't apply the learning to change yourself and the world? In partnership with AI and our sister teams in AI Infrastructure, the Machine Learning Infrastructure facilitates the robust, efficient, and straightforward application of machine learned capability to LinkedIn's mission.
Our members use search to find people, jobs, companies, groups, and other professional content. To power these solutions, our search platform brings together information retrieval, machine learning, distributed systems, big data, and other fundamental areas of computer science.
Datasets on the scale of the Economic Graph cannot be encoded in the storage of a single computer, hence we designed a distributed system that could scale to support—both now and in the future—one of the world’s largest social network graphs.
Data Productivity’s mission is high productivity via easy and powerful interfaces to data systems. The data productivity team focuses on the experiences of creating, using, or reviewing data entities, with the goal to intelligently improve developer agility.
Media Infra (Vector)
Vector is LinkedIn’s media processing and serving infrastructure. Vector handles creation, processing, storing, and presentation of all media content that includes images, videos, documents, audio and related metadata. Vector provides both internal and member-facing APIs, powering use cases like LinkedIn profile images, feed videos, messaging attachments, LinkedIn Live and more. Over 100M+ images and videos are processed per day and served at more than a million QPS at peak.