Palantir Case Study

Palantir Technologies [PLTR] is a software company that specializes in big data analytics and visualization. Jared Ledvina, an engineer at Palantir describes their Sensu deployment in their 12,000+ node AWS environment.

The Problem

Palantir was using Nagios to monitor over 12,000 AWS nodes. As their IT infrastructure grew, trying to hardcode hosts into a central provisioning system grew increasingly challenging. After trying to create a home-grown system of reading host names and tests from files on disk, the team quickly realized that they needed a modern system that was designed for auto-scaling cloud environments.

Sensu lets me get a full night’s sleep. 😴

— Jared Ledvina, Cloud Operations Engineer @ Palantir

The Solution

After evaluating a number of monitoring solutions including Datadog, Nagios derivatives and Prometheus, Palantir selected Sensu for their monitoring needs. Sensu offered them:

  • highly scalable, fault-tolerant architecture that could scale as their IT infrastructure grew
  • auto registration/deregistration of hosts without the need for centralized provisioning
  • auto-remediation to resolve issues

These core Sensu features offered them direct ROI by lowering their opex by not having to manually provision devices as well as by using auto-remediation in Sensu.