Sensu Checks are monitoring jobs that are managed by the Sensu platform (control plane) and executed by Sensu Agents. A Sensu Check is a modern take on the traditional “service check” – a task (check) performed by the monitoring platform to determine the status of a system or service.
Sensu Checks are monitoring jobs that are managed by the Sensu platform (control plane) and executed by Sensu Agents. A Sensu Check is a modern take on the traditional “service check” – a task (check) performed by the monitoring platform to determine the status of a system or service.
Example monitoring job (check) configuration template:
type: CheckConfig
api_version: core/v2
metadata:
name: node_exporter
spec:
command: wget -q -O- http://127.0.0.1:{{ .labels.node_exporter_port | default "9100" }}/metrics
runtime_assets: []
publish: true
interval: 30
subscriptions:
- linux
timeout: 10
ttl: 0
output_metric_format: prometheus_text
output_metric_handlers:
- elasticsearch
output_metric_tags:
- name: entity
value: "{{ .name }}"
- name: region
value: "{{ .labels.region | default 'unknown' }}"
Although service checks were originally popularized by Nagios (circa 1999-2002), they continue to fill a critical role in the modern era of cloud computing. Sensu orchestrates service checks in a similar manner as cloud-native platforms like Kubernetes and Prometheus which use “Jobs” as a central concept for scheduling and running tasks. Where Prometheus jobs are limited to HTTP GET requests (for good reason), a Sensu monitoring job (“check”) provides a significantly more flexible tool.
A valid service check must satisfy the following requirements:
That’s the entire specification (more or less)! Service checks have provided sustained value thanks to this incredibly simple specification, providing tremendous extensibility. In fact, service checks can be written in any programming language in the world (including simple Bash and MS DOS scripts).
The Sensu backend handles the scheduling of all monitoring jobs (checks). Check scheduling is configured using the following attributes:
publish
: enables or disables schedulinginterval
or cron
: the actual schedule upon which check requests will be published to the corresponding subscriptionssubscriptions
: the subscriptions to publish check requests toround_robin
: limits check scheduling to one execution per request (useful for configuring pollers when there are multiple agent members in a given subscription)timeout
: instructs the agent to terminate check execution after the configured number of secondsAs discussed in Lesson 7, Sensu uses the publish/subscribe model of communication. Sensu schedules monitoring jobs (checks) at a pre-set intervals, automatically “publishing” requests to the configured topics (subscriptions).
Because subscriptions are loosely coupled references, Sensu checks can be configured with subscriptions that have no agent members and the result is simply a “no-op” (no action is taken). This works especially well in ephemeral or elastic infrastructures where host-based monitoring configuration is ineffective. Instead of configuring monitoring on a per-host basis, monitoring configuration can be predefined following a service-based model (e.g. with one subscription per service, such as “postgres”), and agents on ephemeral compute instances simply register with a Sensu backend, subscribe to to the relevant monitoring “topics” and begin reporting observability data.
Monitoring jobs (checks) can be templated using placeholders called “Sensu Tokens” which are replaced with entity information before the job is executed. Token substitution is performed by the Sensu Agent[1], during which all tokens are replaced with the corresponding entity data prior to check execution. Sensu Tokens are available for Checks, Hooks (see Lesson 9: Introduction to Check Hooks), and Assets (see Lesson 10: Introduction to Assets).
Sensu Tokens are references to entity attributes and metadata, wrapped in double curly braces ({{ }}
). Default values can also be provided as a fallback for unmatched tokens. Sensu Tokens can be used to configure dynamic monitoring jobs (e.g. enabling node-based configuration overrides for things like alerting threshold, etc).
Examples:
{{ .name }}
: replaced by the target entity name{{ .labels.url }}
: replaced by the target entity “url” label{{ .labels.disk_warning | default "85%" }}
: replaced by the target entity “disk_warning” label; if the label is not set then the default/fallback value of 85%
will be usedOne popular use case for monitoring jobs (checks) is to collect various system and service metrics (e.g. cpu, memory, or disk utilization; or api response times).
To learn more about Sensu metrics processing capabilities, please visit the Sensu Metrics reference documentation.
The Sensu Agent provides built-in support for normalizing metrics generated by service checks in the following output_metric_format
s:
prometheus_text
: Prometheus exposition formatinfluxdb_line
: InfluxDB line protocolopentsdb_line
: OpenTSDB line protocolgraphite_plaintext
: Graphite plaintext protocolnagios_perfdata
: Nagios Performance DataConfiguring output_metrics
causes the agent to extract metrics at the edge – before sending event data to the observability pipeline – optimizing performance of the platform at scale.
NOTE: Sensu also provides support for collecting StatsD metrics, however these are consumed via the StatsD API – not collected as output of a monitoring job (check).
In addition to output_metric_format
, Sensu checks also provide configuration for dedicated output_metric_handlers
– event handlers that are specially optimized for processing metrics (only). If an event containing metrics is configured with one or more output_metric_handlers
, a copy of the event is forwarded to the metric handler prior to Sensu’s own event persistence; this specialized handling is implemented as a performance optimization to prioritize metric processing.
NOTE: Sensu checks may be configured with one or more
handlers
andoutput_metric_handlers
, enabling service health checking and alerting and metrics collection in a single monitoring job.
Metrics extracted with output_metrics_format
can also be enriched using output_metric_tags
. Metric sources vary in verbosity – some metric formats don’t support tags (e.g. Nagios Performance Data), and even those that do can be implemented in ways that simply don’t provide enough contextual data. In either case, Sensu’s output_metric_tags
are great for enriching collected metrics using entity data/metadata. Sensu breathes new life into legacy monitoring plugins or other metric sources that generate the raw data you care about, but lack tags or other context to make sense of the data; simply configure output_metric_tags
and Sensu will add the corresponding tag data to the resulting metrics/measurements.
Example:
output_metric_tags:
- name: application
value: "my-app"
- name: entity
value: "{{ .name }}"
- name: region
value: "{{ .labels.region | default 'unknown' }}"
- name: store_id
value: "store/{{ .labels.store_id | default 'none' }}"
Metric tag values can be provided as strings, or Sensu Tokens which can be used for generating dynamic tag values.
The Sensu check scheduler can orchestrate monitoring jobs for entities that are not actively managed by a Sensu agent. These monitoring jobs are called “proxy checks”, or checks that target a proxy entity. Proxy checks are discussed in greater detail in Lesson 13: Introduction to Proxy Entities & Proxy Checks.
At a high level, a proxy check is a Sensu check with proxy_requests
, which are effectively query parameters Sensu will use to look for matching entities that should be targeted by the check. Proxy requests are published to the configured subscription(s) once per matching entity. In the following example, we would expect Sensu to find two (2) entities with entity_class == "proxy"
and a proxy_type
label set to “website”; for each matching entity, the Sensu backend will first replace the configured tokens using the corresponding entity attributes (i.e. one request to execute the command nslookup sensu.io
, and one request to execute the command nslookup google.com
). To avoid redundant processing, we recommend using the round_robin
attribute with proxy checks.
---
type: CheckConfig
api_version: core/v2
metadata:
name: proxy-nslookup
spec:
command: >-
nslookup {{ .annotations.proxy_host }}
runtime_assets: []
publish: true
subscriptions:
- workshop
interval: 30
timeout: 10
round_robin: true
proxy_requests:
entity_attributes:
- entity.entity_class == "proxy"
- entity.labels.proxy_type == "website"
---
type: Entity
api_version: core/v2
metadata:
name: proxy-a
labels:
proxy_type: website
annotations:
proxy_host: sensu.io
spec:
entity_class: proxy
---
type: Entity
api_version: core/v2
metadata:
name: proxy-b
labels:
proxy_type: website
annotations:
proxy_host: google.com
spec:
entity_class: proxy
Configure a Sensu Check for monitoring disk usage.
Copy and paste the following contents to a file named disk.yaml
:
---
type: CheckConfig
api_version: core/v2
metadata:
name: disk
spec:
command: check-disk-usage --warning 80.0 --critical 90.0
runtime_assets:
- sensu/check-disk-usage:0.4.2
publish: true
interval: 30
subscriptions:
- system/macos
- system/macos/disk
- system/windows
- system/windows/disk
- system/linux
- system/linux/disk
timeout: 10
check_hooks: []
Notice the values of subscriptions
and interval
– these will instruct the Sensu platform to schedule (or “publish”) monitoring jobs every 30 seconds on any agent with the system/macos
, system/windows
, or system/linux
subscriptions. Agents opt-in (or “subscribe”) to monitoring jobs by their corresponding subscriptions
configuration.
Create the Check using the sensuctl create -f
command.
sensuctl create -f disk.yaml
Verify that the Check was successfully created using the sensuctl check list
command:
sensuctl check list
Example output:
Name Command Interval Cron Timeout TTL Subscriptions Handlers Assets Hooks Publish? Stdin? Metric Format Metric Handlers
────── ───────────────────────────────────────────────── ────────── ────── ───────── ───── ────────────────────────────────────────────────────────────────────────────────────────────────── ────────── ────────────────────────────── ─────── ────────── ─────────────────────── ─────────────────
disk check-disk-usage --warning 80.0 --critical 90.0 30 10 0 system/macos,system/macos/disk,system/windows,system/windows/disk,system/linux,system/linux/disk sensu/check-disk-usage:0.4.2 true false
NEXT: do you see the disk
check in the output? If so, you’re ready to move on to the next exercise!
Sensu’s service-oriented configuration model (as opposed to traditional host-based models) makes monitoring configuration easier to manage at scale. A single check definition can be used to collect monitoring data from hundreds or thousands of endpoints! However, there are often cases when you need to override various monitoring job configuration parameters on an per-endpoint basis. For these situations, Sensu provides a templating feature called Tokens.
Let’s modify our check from the previous exercise using some Tokens.
Update the disk
check configuration template.
Modify disk.yaml
with the following contents:
---
type: CheckConfig
api_version: core/v2
metadata:
name: disk-usage
spec:
command: >-
check-disk-usage
--warning {{ .annotations.disk_usage_warning_threshold | default "80.0" }}
--critical {{ .annotations.disk_usage_critical_threshold | default "90.0" }}
runtime_assets:
- sensu/check-disk-usage:0.4.2
publish: true
interval: 30
subscriptions:
- system/macos
- system/macos/disk
- system/windows
- system/windows/disk
- system/linux
- system/linux/disk
timeout: 10
check_hooks: []
NOTE: this example uses a YAML multiline “block scalar” (>-
) for improved readability of a longer check command
(without the need to escape newlines).
Did you notice? We’re now making the disk usage warning and critical thresholds configurable via entity annotations (disk_usage_warning_threshold
and disk_usage_critical_threshold
)! Both of the tokens we’re using here are offering default values, which will be used if the corresponding annotation is not set.
Update the Check using sensuctl create -f
.
sensuctl create -f disk.yaml
Verify that the Check was successfully created using the sensuctl check list
command:
sensuctl check info disk-usage --format yaml
Share your feedback on Lesson 08
Lesson 9: Introduction to Check Hooks
Token substution is performed by the Sensu Agent for standard checks only. Token substitution is performed by the Sensu Backend for proxy checks. ↩︎