Introduction In 2013, when LinkedIn moved to multiple data centers across the globe, we needed a way to redirect traffic from one data center to another in order to mitigate potential member impact in the event of a disturbance to our services. This need led to the birth of one of the most important pieces of engineering at LinkedIn, called TrafficShift. It...
SRE Articles
-
- Topics:
- data center,
- Automation,
- SRE
-
Co-authors: Todd Palino, Samir Jafferali, Kurt Andersen, and Carolyn Blood LinkedIn hosted its 4th annual SRE[in]con conference in late October that brought together over 700 LinkedIn site engineers, as well as partners from Microsoft, Github, Drawbridge and Glint, for more than 60 talks, workshops, and main stage keynotes. The purpose? To provide engineers...
- Topics:
- engineering culture,
- events,
- SRE
-
At LinkedIn, our on-call incidents are managed using Iris and Oncall, two tools that we released as open source to the community about two years ago. Oncall allows our teams to manage their on-call shifts in a largely automated fashion, scheduling rotations without any human intervention. At the same time, it allows teams to be agile and adaptable when defining...
- Topics:
- Mobile,
- Open Source,
- SRE
-
Editor’s Note: This article originally appeared as a guest post on VentureBeat titled “What I learned by bringing down LinkedIn.com.”...
- Topics:
- engineering culture,
- SRE
-
LinkedIn has made significant investments in resilience engineering over the past few years. As Site Reliability Engineers (SREs),...
- Topics:
- Resilience,
- SRE
-
At LinkedIn, we ship hundreds of command-line utilities to every machine in our data centers and to all of our employees’ workstations...
- Topics:
- Open Source,
- SRE