Why SREs are Protectors of the User Experience

Being a site reliability engineer isn’t easy. As described by Andrew Widdowson, “it’s like being a part of the world’s most intense pit crew. We change the tires of a race car as it’s going 100mph.”

Known as the “automaters”, SREs are often asked to observe application environments and manage incidents… at all hours of the day. Because everyone knows, when your app is down, so is your business.

The SRE’s job is to secure a flawless user-experience. To deliver site reliability. SREs bridge Dev and Ops, ensuring new releases improve the product, rather than breaking it.

The Challenge

The trouble with monitoring application environments is that there are hundreds of thousands of monitoring data points. How do you prioritize which data points are useful, and which can be ignored? Alarm storms aren’t helpful. They prompt panic, instead of resolution.

…And when a crucial incident does occur, how do you quickly mitigate it? The common SRE approach is to spend a ton of time and energy manually sifting through data – often at the expense of other initiatives, or worse, personal time (e.g. responding to the dinner-time incident alert).

What if you could get to that Aha! moment faster? What if instead of the typical hair-on-fire response, you had a trusted guide that could quickly lead you to the source of the incident?

Automation.ai as your trusted guide

What if you could empower SREs with the insights needed to drive improvements? What if instead of the typical war rooms and on-call burn out, SREs had a trusted guide to quickly fix problems?

Broadcom provides a “just add water” approach that can help your IT teams automate incident response through our AI-driven, self healing platform automation.ai. Leveraging our deep domain expertise, we can help your SRE teams prevent alert fatigue by triaging alerting rules continuously using a combination of notification rules, process changes, dashboards and machine learning (ML) to proactively monitor the SRE four golden signals and measure what really matters for customer experience.