Watching your tail with latency histograms by Luke Goyer
Using percentiles to observe the tail of your service's latency
This blog post explores:
READ MORE
This blog post explores:
READ MORE
On a frigid night in late February 2012, I received a call for an emergency production issue. One of the critical batch jobs that processes data for the next day’s business had failed. At 9PM, there I was back in the office with 11 other State Farm developers. My then-product manager got us pizzas, coffee, and hot chocolate to cheer up the room, and the team spent the next couple hours pushing an emergency fix to production. Over the past decade, system failure analysis and maintenance…
READ MORE
Want to learn about actuators and which ones may benefit you from a Site Resiliency Engineering (SRE) perspective? This article is for you!
READ MORE