SRE Articles

  • fossor2

    Automating Your Oncall: Open Sourcing Fossor and Ascii Etch

    December 14, 2017

    One of our sayings in Site Reliability Engineering (SRE) is that the goal of your job is to “automate yourself out of the job.” While some may have concerns of being replaced by robots, SRE’s see the value of automating work. It opens up time, removes tedious or repetitive tasks from a workflow, and allows our engineers to spend their valuable time on more...

  • couchbase2

    Couchbase Ecosystem at LinkedIn

    December 6, 2017

    Couchbase is a highly scalable, distributed data store that plays a critical role in LinkedIn’s caching systems. Couchbase was first adopted at LinkedIn in 2012, and it now handles over 10 million queries per second with over 200 clusters in our production, staging, and corporate environments. Couchbase’s replication mechanisms and high performance have enabled...

  • Waterbear-logo

    Resilience Engineering at LinkedIn with Project Waterbear

    November 10, 2017

    Coauthors:  Bhaskaran Devaraj and Xiao Li   Over the last several years, many companies have discussed ways to improve the resiliency of their services and infrastructure. Many projects, like Netflix’s Simian Army, have spawned open source projects that have been adopted by other companies. Other discussions about resilience engineering focus on cultural and...

  • queryanalyzer2

    Query Analyzer: A Tool for Analyzing MySQL Queries Without ...

    September 15, 2017

    Introduction LinkedIn uses MySQL heavily, as more than 500 internal applications rely on MySQL. For easy management and better...

  • featureimage7

    Hiring SREs at LinkedIn

    July 17, 2017

    Hiring engineers is a challenging task. Doing it right can be difficult, and many companies struggle with it. I gave a talk at the...

  • iris1

    Open Sourcing Iris and Oncall

    June 29, 2017

    At a company as large as LinkedIn, service degradation isn’t a question of “if” so much as “when,” and when things do break, we need...