Graph

The Economic Graph is a digital representation of the global economy expressed as edges connecting over 675 million members, more than 36,000 skills, 50 million companies, millions of open jobs, and 90,000 schools. Querying this data, identifying relationships and understanding the patterns that connect users with opportunity requires a graph database. Our vision is to enable any complex graph traversal (which may consist of hundreds of joins) as one single declarative query from the Economic Graph.

The graph team builds and operates a novel distributed graph database which supports tens of terabytes of graph data and half a million QPS. We do cutting-edge research and development on in-memory graph databases and graph computation. We also build and operate the distributed system that scales to the staggering size of the Economic Graph, while supporting all of the queries that power LinkedIn’s many products and core member experience.

Graph databases aim to solve the shortcoming that relational databases have when dealing with graph patterns: joins yield expensive cross-products and traditional indices are inadequate for matching large subgraphs or performing large numbers of joins.

The fundamental principles we are using in building our database is that the primary concept is a relationship between entities: an edge. The schema (as a collection of edge labels) can be extended in constant time in the live graph. Our indexing data structures provide constant time access to any relationship and all results are sub-graphs. These principles lead our researchers and developers to design state-of-the-art solutions, captured in patents and technical publications.

Building and Serving the Economic Graph of Record

Datasets on the scale of the Economic Graph cannot be encoded in the storage of a single computer, hence we designed a distributed system that could scale to support—both now and in the future—one of the world’s largest social network graphs. The system’s horizontal scalability allows us to scale to the traffic that LinkedIn users demand. We operate this system to provide the strictest service level objectives (availability, scalability, repairability, latency).