Rui Wang, Christopher Conrad, and Sam Shah
In the 5th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 2013)
Social networks often require the ability to perform low latency graph computations in the user request path. For example, at LinkedIn, we show the graph distance and common connections when we show a profile in any context on the site. To do this, we have developed a distributed and partitioned graph system that scales to hundreds of millions of members and their connections, handling hundreds of thousands of queries per second.
To accomplish this scaling, real time distributed graph traversal is converted into set intersections that are accomplished in a scatter/gather manner. A network performance bottleneck forms on the gather node as it must merge partial results from many machines. In this paper, we present a modified greedy set cover algorithm that is used to locate the minimal set of machines that can serve the partial results. Our results indicate that we are able to save 25% in the 99th percentile latency of these graph distance calculations for LinkedIn’s social graph workloads.