Team Sensu recently released Sensu Core 1.1, which includes a powerful new feature called Check Hooks. Check Hooks are commands run by the Sensu client in response to the result of check command execution. Check Hooks allow Sensu users to automate actions routinely performed by operators in response to monitoring alerts, freeing precious operator time to be used elsewhere and to greater effect!
Why Check Hooks?
The practice of using Sensu for auto-remediation tasks and automated data gathering for incident triage continues to gain traction amongst Sensu users. Although these practices were achievable with previous Sensu releases, Sensu did not make it easy, as all actions had to be orchestrated by a Sensu server. Nick Stielau wrote a great blog post about auto-remediation with Sensu back in 2013! Now, these practices are more simple and straightforward.
Check Hook Configuration
Check Hooks have names that associate them with one or more Sensu check status codes (e.g. 1) or severities (e.g. “critical”). Valid Hook names include (in order of precedence) “1”-”255", “ok”, “warning”, “critical”, “unknown”, and “non-zero”. Following a check command execution, the Sensu client will execute the appropriate configured Hook command, depending on the check execution status (e.g. 1). A Check Hook must have a configured command to execute and can have an execution timeout (default is 60s). Optionally, a Hook command can expect and consume JSON serialized Sensu client and check data via STDIN.
The following example Check Hook (“non-zero”) captures the system process tree if the check command returns a non-zero status (e.g. 2), indicating that the Nginx process is unhealthy.
{
"checks": {
"nginx_process": {
"command": "check-process.rb -p nginx",
"subscribers": ["proxy"],
"interval": 30,
"hooks": {
"non-zero": {
"command": "ps aux",
"timeout": 10
}
}
}
}
}
More Examples
Check Hook commands support check token substitution, just like check commands! The following example will capture disk utilization information for the database disk. The Sensu client definition custom attributes, e.g. db.disk.mount
, are used to allow Sensu client specific values for the database disk mount point and alert thresholds. If a Sensu client definition does not have a db.*
custom attribute, the declared default value is used, e.g. /
.
{
"checks": {
"db_disk_usage": {
"command": "check-disk-usage.rb -I :::db.disk.mount|/::: -w :::db.disk.warn|90::: -c :::db.disk.warn|95:::",
"subscribers": ["linux"],
"interval": 30,
"hooks": {
"non-zero": {
"command": "df -h :::db.disk.mount|/:::",
"timeout": 10
}
}
}
}
}
As mentioned previously, Check Hooks can be used for rudimentary auto-remediation tasks, such as (re)starting a process. The following example will start Nginx if the check command execution returns a status of 2 (critical, not running). If the check command execution returns any other non-zero status, the process tree will be captured, assisting operators by providing additional context. This example could be used on systems that lack proper service management.
“This example effectively replaces Monit.” — Justin Kolberg
{
"checks": {
"nginx_process": {
"command”: "check-process.rb -p nginx",
"subscribers": ["proxy"],
"interval": 30,
"hooks": {
"critical": {
"command": "sudo /etc/init.d/nginx start",
"timeout": 30
},
"non-zero": {
"command": "ps aux",
"timeout": 10
}
}
}
}
}
Check Hook commands can optionally receive JSON serialized Sensu client and check data passed via STDIN. This functionality allows Hook commands (scripts) to take different actions depending on the data they receive. I am excited to see what the Sensu community does with this! The following example will execute a custom Hook command when the local HTTP API is determined to be unhealthy.
{
"checks": {
"api_health": {
"command”: "check-http.rb -u http://localhost:8080/healthz",
"subscribers": ["api"],
"interval": 20,
"hooks": {
"non-zero": {
"command": "custom-script.py",
"stdin": true,
"timeout": 10
}
}
}
}
}
NOTE: Because the Hook attribute stdin
is set to true
, the command script custom-script.py
must expect the JSON data via STDIN, read it, and close STDIN, or its execution will hang until the timeout terminates it.
When Not To Use Them
So often when we are given a new tool or feature, it becomes our new hammer, and everything around us starts to look like a nail. Check Hooks are powerful, with many applications, however, it’s important to acknowledge their limitations.
Check Hooks are not Sensu Event Handlers. As Hooks are executed by the Sensu client, their local data is limited compared to the data available to a Sensu server. For example, the Sensu client is not aware of the check’s execution history, whether its status is flapping, the number of state occurrences, or if the check is silenced. I do not recommend using Hooks for alerting purposes due to the lack of this data.
Check Hooks are also not for complex auto-remediation tasks. While Hooks are more than capable of executing single step remediation tasks, they lack any sort of built-in rules or workflow engine, which would enable multi-step or multi-stage tasks with conditional logic. If you must do complex auto-remediation tasks, I recommend you take a look at StackStorm, they even have a Sensu integration pack!
Give Check Hooks A Try!
I hope this post helped you learn what Sensu Check Hooks are and has you excited about them! I suspect that you already have Hook ideas of your own, ideas that are probably much better than the examples above. Upgrade your Sensu installations to 1.1 and give Hooks a whirl! Share your Hook ideas in the Sensu Community Slack!