Igeometry Podcast

The 2021 Slack Outage (Detailed analysis)

Informações:

Synopsis

On Jan 4th 2021, Slack experienced a global outage that prevented customers from using the service for nearly 5 hours. Slack has released the Root cause analysis incident report which I’m going to summarize in the first part of this video. After that Ill provide a lengthy deep dive of the incident so make sure to stick around for that. If you are new here, I make backend engineering videos and also cover software news, so make sure to Like comment and subscribe if you would like to see more plus it really helps the channel, lets jump into it. So This is an approximation of Slack’s architecture based on what was the described in the reports. Clients connects to load balancers, load balancers distribute requests to backend servers and backend servers finally make requests to database servers which is powered by mysql through vitess sharding. All of those are connected by routers in cross boundary network. Around 6AM jan 4 , the cross network boundary routers setting between LB and backend and backend to DB star