• espresso1

    Migrating to Espresso

    August 2, 2017

    Espresso is LinkedIn's strategic distributed, fault-tolerant NoSQL database that powers many LinkedIn services. Espresso has a large production footprint at LinkedIn, with close to a hundred clusters in use, storing about 420 terabytes of Source of Truth (SoT) data and handling more than two million queries per second at peak load. This post discusses our...

  • featuredimage7

    Open Sourcing Jaqen, A Tool For Developing DNS Rebinding PoCs

    July 27, 2017

    Editor’s note: Members of the information security team at LinkedIn have an opportunity to work on research topics under a well-defined framework that allows them to evaluate new products and technologies, as well as explore the related threat surface. The team strives to find new and innovative ways to help simplify and strengthen security and contribute back...

  • helixupdate2

    Powering Helix’s Auto Rebalancer with Topology-Aware Partition Placement

    July 26, 2017

    Typical distributed data systems are clusters composed of a set of machines. If the dataset does not fit on a single machine, we usually shard the data into partitions, and each partition can have multiple replicas for fault tolerance. Partition management needs to ensure that replicas are distributed among machines as evenly as possible. More crucially, when a...

  • ipv6milestone1

    LinkedIn Passes IPv6 Milestone

    July 25, 2017

    Earlier this month, and for the first time in our company’s history, more than 50% of pages on LinkedIn were accessed over IPv6 from...

  • featuredimage3

    Creating #DataScienceHappiness

    July 21, 2017

    In a previous post, I gave some advice for those who are interested in a career in data science. One of the suggestions I made was to...

  • activitygraphpt22

    Building the Activity Graph, Part 2

    July 19, 2017

    Co-authors: Vivek Nelamangala and Val Markovic Editor’s note: In Part 1 of this series, we talked about how we built the Activity...