SRE Articles

  • featureimage7

    Hiring SREs at LinkedIn

    July 17, 2017

    Hiring engineers is a challenging task. Doing it right can be difficult, and many companies struggle with it. I gave a talk at the Velocity conference a while back discussing how LinkedIn’s SRE team solved this problem and how I designed a hiring process that’s fair, interesting, and gets results. If you don’t want to read the whole post, you can just watch the...

  • iris1

    Open Sourcing Iris and Oncall

    June 29, 2017

    At a company as large as LinkedIn, service degradation isn’t a question of “if” so much as “when,” and when things do break, we need to escalate as quickly as possible to make sure the problem gets fixed. This usually takes the form of calling up an on-call engineer, but what if this person doesn’t answer the phone? In the past, LinkedIn addressed this question...

  • SREculture2

    Building the SRE Culture at LinkedIn

    May 15, 2017

    Co-authors: Bruno Connelly and Bhaskaran Devaraj   Being a Site Reliability Engineer (SRE) means having to talk about hard problems. Site outages, complex failure scenarios, and other technical emergencies are the things we have to be prepared to deal with every day. When we’re not dealing with problems, we’re discussing them. We regularly perform post-mortems...

  • Trafficshift2

    TrafficShift: Load Testing at Scale

    May 11, 2017

    Co-authors: Anil Mallapur and Michael Kehoe   LinkedIn started as a professional networking service in 2003, serving user requests out...

  • EveryDay1

    Failure is Not an Option

    January 16, 2017

    This is the final post of the series “Every Day Is Monday in Operations.” Throughout this series we’ve discussed our challenges,...

  • Everyday1

    MTTD and MTTR Are Key

    December 12, 2016

    This post is part of the series “Every Day Is Monday in Operations.” Throughout this series we discuss our challenges, share our war...