Kubernetes Monitoring: Metrics, Tools & Best Practices

Kubernetes Monitoring: Metrics, Tools & Best Practices

Monitoring any type of resource can be challenging. But Kubernetes monitoring is a special kind of challenge. Not only are there a variety of different Kubernetes layers and resource types to monitor, but collecting monitoring data from Kubernetes can be difficult if you use a managed Kubernetes service that limits your access to the underlying infrastructure.

For all of these reasons, Kubernetes monitoring requires a different approach. You need to use monitoring tools and processes that are designed to accommodate the unique requirements of Kubernetes monitoring. Luckily, Sensu gives you all the tools you need to successfully monitor every aspect of Kubernetes.

Keep reading for a look at what those requirements are, and what kind of tools are available to support them.

What Is Kubernetes?

As you likely know if you help to build or manage cloud-native applications, Kubernetes is an open source orchestration platform. Its main job is to automate and orchestrate the processes required to deploy and manage containers across a cluster of servers. With certain extensions, Kubernetes can also manage serverless functions and virtual machines.

It is, of course, possible to deploy containerized applications without Kubernetes. However, an orchestration solution like Kubernetes is essential if you want to run containers at scale. Deploying and managing dozens or hundreds of individual containers manually would be like herding cats – not as cute as it sounds, and likely to result in more than one pain point.

How to Monitor Kubernetes

The first thing to know about monitoring Kubernetes is that Kubernetes is not a singular thing. Instead, it’s a collection of distinct components. From a monitoring perspective, the two most important things to monitor are nodes (which are the servers on which workloads run) and Pods (which consist of a container or group of containers that host an application).

Because these components work in different ways, they need to be monitored separately.

Kubernetes Node Monitoring

There are three main ways to monitor Kubernetes nodes.

One is to track metrics from within the operating system running on each node. Under this approach, which requires you to have full control over the node, you would install the Sensu agent to run checks and collect metrics from the server and send that data to a monitoring and analytics platform like Sumo Logic.

A variation on this approach to node monitoring is to create a DaemonSet that runs a monitoring agent. This allows you to deploy a monitoring agent on each node in your cluster automatically, even if you don’t have OS-level access to the nodes. The monitoring agent would run within its own Pod, but would focus on monitoring the underlying node, or running a log collection service.

A third node monitoring method is to collect node metrics via the Kubernetes metrics API. The advantage of this approach is that it doesn’t require full control over the nodes. This is useful in managed Kubernetes environments, where a provider provisions the nodes for you and you don’t have sufficient access to  install agents on them. The downside is that the Kubernetes metrics API supports only certain types of node metrics, so there are limits on exactly which types of monitoring data you can collect.

No matter which node monitoring strategy you adopt, your goal should be to collect the metrics that impact your business. A good place to start is to look at basics such as:

  • Total node count.
  • CPU and memory usage per node.
  • Available storage resources (if your storage pools are hosted on nodes).

This data helps you determine whether you have sufficient nodes – and sufficient resources available on each node – to meet workload requirements. With Sensu, you can easily collect these metrics, and store them long-term in Sumo Logic for later analysis. As your business grows, and with it your understanding of what metrics matter the most, you’ll find that Sensu  makes it easy and painless to deploy new monitoring configurations to your Kubernetes environment.

Kubernetes Pod Monitoring

But monitoring your Kubernetes node infrastructure is only part of the story. You will also need to monitor Kubernetes Pods, using a slightly different approach.

One method is to deploy a so-called sidecar container within each Pod. A “sidecar container” is a term for a common pattern for adding on additional features to your core application services. A sidecar container is an extra container in your Pod which hosts something like a Sensu monitoring agent. It runs alongside your primary application containers,collecting monitoring data from them, and streaming that monitoring data to a monitoring data storage platform, such as Sumo Logic. This approach is easy to implement in most cases. The downside is that running a sidecar container within each of your Pods increases the resource utilization footprint of each Pod, so there may be an impact on overall cluster performance.

You can also use a DaemonSet for monitoring Pods. The Daemonset would deploy a monitoring agent on each node in your container but instead of monitoring the underlying node (as we discussed earlier), it would monitor the Pods running on the node.This approach is efficient as you could run a single Sensu agent on each node, and configure different kinds of checks which run on the same agent, one to monitor the node and another to monitor the Pods running on the node.

Alternatively, you can instrument your containerized application itself. Doing so requires adding functions for collecting and exporting monitoring data within the application source code or container image. The most popular approach to cloud native instrumentation would be to add a /metrics endpoint to your app, allowing it to be monitored by any Prometheus-compatible monitoring tool, such as Sensu. The difficulty of this approach may vary depending on how complex your application is and how much control you have over its source code. However, when done well, this Kubernetes Pod monitoring method can reduce the overall operational burden placed on your cluster, because you avoid having to run extra containers.

Finally, you could use the Kubernetes metrics API to collect basic monitoring data about Pods, but this is limited to information like CPU and memory usage statistics for each Pod. If you want more detailed application-specific  monitoring data, you’ll need to use one of the two monitoring approaches described above.

When monitoring Pods, you’ll want to track metrics such as:

  • The location of each Pod within your cluster (in other words, which node each Pod is running on).
  • CPU and memory usage by each Pod.
  • Pod start time and restart count, which help measure how quickly your Pod was launched and how stable it is once running.

Data like this helps you understand the health of applications that you have deployed inside your Kubernetes Pods.

For Kubernetes Monitoring, Context Is Everything

It’s important to note that node and Pod monitoring data is not very useful on its own. What really matters for effective Kubernetes monitoring is the ability to correlate and contextualize data so that you can trace problems back to their root cause. This is why it’s helpful to use a platform like Sumo Logic in conjunction with Sensu. Sumo Logic can aggregate all your logs and metrics in a single platform, allowing you to query and perform analysis over both the current metrics and the historical performance information.

For example, if you notice that certain nodes are using an unusually high amount of CPU or memory, you’ll want to know whether those nodes are all hosting the same Pod or not. If so, it’s a pretty safe bet that the Pod is triggering the high usage statistics. You’ll know that the root cause of the problem lies in your Pod, and you can look at the Pod monitoring data to get even more insights, such as identifying which specific containers are consuming high rates of resources.

On the other hand, if node monitoring anomalies don’t correlate with Pod placement, it’s likely that the root cause of your problem is in the underlying infrastructure. Perhaps your hypervisor has an issue, for example, if your nodes are VMs, or there may be a problem with the way the node operating systems are configured.

This kind of analysis requires the ability to pull all these various metrics together into a single platform for data analysis. With Sensu in place, sending your real-time data to Sumo Logic for storage, you can understand relationships within the data and see historical trends that help you narrow in on the problem areas, making your remediation efforts faster and more efficient.

The point here is that, although you need to monitor Kubernetes nodes and Pods separately, the ability to bring the monitoring data together is critical for gaining a full-picture understanding of what is happening within your clusters. To get a comprehensive understanding of your Kubernetes infrastructure, you’ll need both an observability pipeline like Sensu, as well as a storage and analysis platform like Sumo Logic.

Kubernetes Monitoring Tools

There are a variety of tools available to help with Kubernetes monitoring. Generally speaking, they fall into three categories, each of which serves a different need:

  • Kubernetes-native monitoring tools, like the metrics API. These are built-in Kubernetes features that provide simple (but basic) metrics collection functionality. There are a variety of solutions out there for collecting and alerting on this kind of information, such as Sensu, Prometheus, and others.
  • Kubernetes monitoring tools that are provided as part of managed Kubernetes services. For example, if you use a managed Kubernetes service like AWS EKS, you can use CloudWatch to monitor your clusters. These tools are convenient because they come pre-integrated with your Kubernetes environment. The downside is that they are tied to specific managed Kubernetes platforms, so they are not vendor-agnostic. They also usually provide only basic monitoring and visualization features. However, with Sensu, you can pull data out of CloudWatch, alert on it in realtime, or send it off to another platform like Sumo Logic, for analysis and storage.
  • Third-party monitoring tools that can ingest Kubernetes monitoring data from a variety of sources in any type of Kubernetes environment. These tools provide the greatest flexibility and deepest level of monitoring data analytics and visibility. Sensu is unique here, in that it can operate in all of the modalities that you might need for Kubernetes monitoring, and in fact, can do all of those at the same time in a very efficient manner.

Because each of these tools addresses a different monitoring need, it’s common to use all of them in tandem. You might use Sensu to pull metrics from the Kubernetes API to collect basic data about node and Pod status, while also using Sensu to monitor CloudWatch to track basic cluster metrics in AWS EKS. You might deploy a custom application-specific monitoring check via the same Sensu agent, to gain the deepest level of visibility into both your application and your Kubernetes infrastructure, at the same time.

Kubernetes Monitoring Best Practices

Kubernetes clusters come in many shapes and sizes, and monitoring needs will vary depending on how your cluster is designed and which Kubernetes service or platform you use to host it. Kubernetes is at its best when the flexible nature of its infrastructure allows your organization to grow and change with the needs of your business. When choosing a monitoring solution, you’ll want one that is as flexible as Kubernetes is, allowing you to rapidly modify your monitoring configuration to respond to changing requirements.

That said, there is a core set of best practices to follow when devising a Kubernetes monitoring strategy, and any monitoring solution should at least be able to do these four things with ease:

  • Minimize monitoring agent burden: As noted above, you don’t want your monitoring agents to place such a heavy load on your clusters that they deprive your actual applications of important resources. Try to keep your monitoring footprint as light as possible. Sensu has strengths over competitors like Datadog here, because our agent is lightweight, and our asset delivery model ensures that only the pieces you are actually using are deployed into the environment. By contrast, Datadog packages everything it can do up into one large resource-intensive agent, shipping a lot of additional functionality that you will never actually use. You pay the cost of this overhead every time the agent runs, using significantly more memory and CPU than is required for the task.
  • Contextualize, contextualize, contextualize: As we also noted above, you want to be able to correlate and contextualize every Kubernetes monitoring data point available to you. To do this, ingest your monitoring data into a platform that can centralize and standardize monitoring insights from across your cluster. Pairing Sensu with Sumo Logic gives you the best of both world; flexible collection of observability data, and rock solid storage and analysis with easy to use querying and visualization tools. Sensu also has some secret super-powers here, with its ability to dynamically collect extra context from the entity via Check Hooks that only activate when the entity is operating outside of expected parameters.
  • Keep historical data: Sometimes, you don’t become aware of a Kubernetes problem until after the fact. It’s useful for that reason to keep historical monitoring data on hand, so that you can research past issues if necessary. While Sensu doesn’t store data itself, it makes it very easy to send your data off to any number of different environments. You can simultaneously store information in a self-hosted time series database like InfluxDB, or send it off to a hosted platform like Sumo. You can do both at the same time, and set up smart rules within the Sensu observability pipeline to dynamically choose which data goes to which platform, and under what conditions that happens.
  • Define intelligent alerting thresholds: Because Kubernetes workloads tend to scale up and down constantly, creating static or generic alert thresholds typically doesn’t work well. Instead, you’ll need to define dynamic, granular alerting rules. For example, high node CPU usage may not merit an alert if it occurs at a time when the Pod running on that node is handling a high volume of requests. But you probably do want an alert if node CPU usage spikes suddenly without a corresponding change in Pod traffic. Sensu’s check templating makes it possible to define entity/tag-specific thresholds and to update those easily via the web app and command-line administration tools. This allows Sensu to be much more flexible than other solutions and doesn’t rely on automated anomaly detection, which is notorious for being inaccurate and causing either false negative alerts, or a false-positive sense of security. It’s better to make it easy for humans to understand, define, and update these manually, with as few steps as possible. That’s where Sensu shines.

Hopefully you now have a good understanding of the various concerns involved with monitoring Kubernetes successfully. Sensu, especially when combined with Sumo Logic, is a flexible tool that was designed with this kind of workload in mind. It’s not the only solution to solving these various problems, but we are confident that you will find it to be the easiest and most flexible way to monitor Kubernetes and to adapt to the constantly changing needs of that environment.

Get Started With Sensu

If you haven’t tried Sensu yet, we encourage you to give it a spin! You can download the latest version of Sensu Go here. Did we mention all commercial features are available to you for free up to 100 nodes? If you have more than 100 entities, you can contact our team for a free trial.

To make the most out of Sensu, be sure to check out the Sensu Go Workshop, a collection of learning resources designed to help new users learn Sensu.

If you ever get stuck or have questions, our team will be happy to help you out in Discourse.