skip to Main Content

The DataOps Blog

Where Change Is Welcome

Getting Started with StreamSets Data Collector

By March 14, 2016

Hi, I’m Pat Patterson, newly minted ‘community champion’ here at StreamSets. As I get up to speed with big data in general and StreamSets Data Collector (SDC) in particular, I’ll write up my exploits here on the StreamSets blog to help other novices as they get started with open source big data ingest.

I’m going to assume you know the basics of what StreamSets Data Collector can do, and you want to get started actually using it. If you do need some background, the product page and FAQs are great places to start.

Now, let’s get hands on!

StreamSets Monitoring with Grafana, InfluxDB, and jmxtrans

By January 14, 2016

The ability to monitor your critical infrastructure is a must, and we designed the StreamSets Data Collector (SDC) with this in mind: metrics are exposed through both the REST API and JMX. While there are many approaches to monitoring these metrics, let’s walk through a specific end-to-end example using jmxtrans to collect metrics, InfluxDB to store them, and Grafana to visualize them.

Ingesting Streaming Data from JMS into HDFS and Solr using StreamSets

By November 10, 2015
A step-by-step walkthrough of how Mac Noland implemented StreamSets to move away from hand-coded ETL and scale out an increasingly complex ingestion pipeline. Mac is a Solution Architect for phData, a Twin Cities services firm focused on Hadoop. He has spent 17 years as a software engineer and architect for projects in the legal, accounting, risk and medical device industries.

Back To Top