Auto-Remediation with Sensu Go

In this testimonial, Sr. Software Engineer Doug Maske at T-Mobile talks about his organization’s infrastructure, including how they’re onboarding Kubernetes and migrating the old clients for Sensu Go. They’ve turned on auto-remediation, with an API that sits in front of Sensu Go — when an alert comes in, it tries to re-start the service and notify the appropriate teams. He’s found having less configuration in Sensu Go really helpful — watch the video to learn more.

IT Environment

The T-Mobile team has been using Sensu for at least three years and they have a Sensu cluster for each of their environments. They are not only onboarding Kubernetes, but also migrating the old clients with just an Ansible script that goes in and takes the old agent outputs and puts the new ones in. They are using Sensu with Ansible for automation and also to monitor their Kubernetes environments.

Why Sensu

Using Sensu’s API, they have started using auto-remediation to perform repetitive tasks. When an important alert comes in, Sensu goes ahead and tries to restart that service and if that fails, notifies the correct teams by creating JIRA or Rally service tickets. The automation from integrating with infrastructure-as-code platforms is very useful in simple auto-remediation tasks.