Skip to content

Visualize StreamSets Data Collector Metrics with Datadog

By Posted in Data Integration June 20, 2016

Datadog LogoBack in January, Adam blogged about StreamSets Monitoring with Grafana, InfluxDB, and jmxtrans. While the Grafana/InfluxDB/jmxtrans open source stack works great, there’s quite a lot of setup and configuration to keep track of. Another tool that StreamSets customers are using to monitor their infrastructure is Datadog; in today’s blog post I’ll show you the basics of configuring Datadog with StreamSets Data Collector (SDC).

Sign up for Datadog

There are three tiers of Datadog service: Free, Pro and Enterprise. The free tier covers up to 5 hosts, so it’s ideal for experimenting. You can also try out a higher tier free for 14 days, so you can see how it works for dozens, or even hundreds of servers before handing over your credit card. Sign up for Datadog (free).

Download the Datadog Agent

I was prompted to download an agent after I signed up. There are agents for a wide range of environments; I was able to install and run it on my MacBook Air without any problems, and see my laptop’s CPU usage, memory usage, etc

Datadog System Overview

Enable JMX Remote Monitoring in StreamSets Data Collector

Follow the documentation to enable remote JMX access to SDC. I added the suggested SDC_JAVA_OPTS setting to the SDC environment file at libexec/, appending ${SDC_JAVA_OPTS} to the setting so that SDC also picks up any existing settings in the environment:

export SDC_JAVA_OPTS=" \ \ \ \ \

After restarting SDC, I tested JMX connectivity with JConsole:

$ jconsole localhost:3333

Success! I was able to see all of the MBeans exposed by SDC:


Enable the Java Integration

The Datadog agent’s configuration directory (/opt/datadog-agent/etc/conf.d on a Mac, but this varies by platform) contains sample YAML configuration files for a wide variety of integrations. I opened jmx.yaml.example and saved it as jmx.yaml, with the following content:

  - host: localhost
    port: 3333
    name: streamsets
      # Gather record count and rate metrics from StreamSets pipelines
      - include:
          domain: metrics
            - Count
            - OneMinuteRate
            - FiveMinuteRate
            - FifteenMinuteRate
            - MeanRate
      # Gather threading stats for the JVM
      - include:
          domain: java.lang
          bean: java.lang:type=Threading
            - DaemonThreadCount
            - PeakThreadCount
            - ThreadCount
            - TotalStartedThreadCount

The Datadog JMX documentation covers the YAML config file’s syntax; you should be able to add more metrics by browsing the available MBeans in jconsole and editing jmx.yaml. Note – you will need to restart the Datadog agent after each change to the config file.

Visualize SDC Metrics

I created a new dashboard in Datadog, and added time series graphs and query values for a pipeline’s record count and the SDC JVM’s thread count. It was pretty easy to add metrics just by typing part of the metric name and picking from the popup list. SDC pipeline metrics have the form:

sdc.pipeline.<pipeline name>.<pipeline revision>.<category: pipeline|stage|custom>.\
[<stage library>_<library revision>].<metric name>.<metric type>

They’re pretty easy to find in the Datadog UI:

Datadog Config

The resulting visualization looks great! I can see at a glance the record count of my pipeline, and the number of threads that SDC is running:

Datadog with StreamSets

This blog post just scratches the surface of Datadog’s capabilities. As well as showing graphs and values, you can annotate dashboard graphs, share them across teams, and set alerts on critical issues.

Conclusion: Configuring Datadog with StreamSets.

StreamSets Data Collector’s JMX implementation exposes a whole raft of metrics for collection and analysis, while Datadog provides an attractive, easy to use visualization service. Download SDC, configure a pipeline, hook it up to Datadog and see for yourself!

Conduct Data Ingestion and Transformations In One Place

Deploy across hybrid and multi-cloud
Schedule a Demo
Back To Top