March 2016

New Tutorial: Creating a Custom StreamSets Destination

By Pat Patterson March 23, 2016

One of the first things I hear after I explain the basics of StreamSets Data Collector is, “Cool, so can I ingest data from/send data to X?”, for varying values of X. The short answer is, “Yes, you can!”, while the longer answer involves checking the lists of origins (for ingesting data from X) and destinations (for writing data) included with the product, and writing custom code if X is not on the list.

“My X isn’t on the list! How do I get started writing that custom code?”, I hear you shout; well, I just wrote a detailed tutorial for creating your first custom StreamSets destination that explains all. Fire up your IDE, follow the steps, and you’ll build a sample destination that sends records to RequestBin, but could be adapted to send them pretty much anywhere.

How Trend Micro Uses StreamSets – An Interview with the Threat Research Team

By Kirit Basu, Head of Strategy March 21, 2016

The Forward-Looking Threat Research team at Trend Micro were early adopters of StreamSets Data Collector. They use StreamSets to ingest data from a wide variety of sources to create a Threat Assessment Dashboard in Elasticsearch. In this interview, we talk with members of their team about how they evaluated StreamSets and implemented it in their production environment in a short period of time.

Visualizing Apache Log Data… with Minecraft!

By Pat Patterson March 18, 2016

A key differentiator of StreamSets Data Collector (SDC) is that it operates in continuous mode – set a pipeline running and it will continue to read files from a directory or take messages from a queue. A Twitter conversation with Richard Tuttle, a solution architect at CRM Science, prompted me to wonder, would it be possible to ingest Apache Web Server log data, lookup the geolocation from the client IP address, and plot the results on a map… in Minecraft?

What’s the Biggest Lot in the City of San Francisco?

By Pat Patterson March 16, 2016

After building my first pipeline with StreamSets Data Collector, I wanted to give the framework more of a workout. I've spent a lot of time working with JSON data over the past few years, and the biggest, baddest JSON data set I…

Getting Started with StreamSets Data Collector

By Pat Patterson March 14, 2016

Hi, I’m Pat Patterson, newly minted ‘community champion’ here at StreamSets. As I get up to speed with big data in general and StreamSets Data Collector (SDC) in particular, I’ll write up my exploits here on the StreamSets blog to help other novices as they get started with open source big data ingest.

I’m going to assume you know the basics of what StreamSets Data Collector can do, and you want to get started actually using it. If you do need some background, the product page and FAQs are great places to start.

Now, let’s get hands on!

Announcing StreamSets Data Collector ver 1.2.2.0

By Kirit Basu, Head of Strategy March 11, 2016

We’re happy to announce a new version of the StreamSets Data Collector.

Building a Real-Time Retail Analytics Solution with StreamSets, MapR Streams and MapR FS

Data Integration

Operational Analytics

By Kirit Basu, Head of Strategy March 10, 2016

Today’s complex retail applications have changed dramatically and in order to compete, enterprises must adopt new strategies for working with data. Big data and Hadoop enable retailers to connect with customers through multiple channels at new levels by leveraging traditional…

Binlog Processing Using Maxwell, Kafka & StreamSets

By Rick Bilodeau March 2, 2016

This is a nice example of Kafka enablement using Maxwell (a mysql-to-kafka binlog processor) and StreamSets Data Collector from the folks at B23. It includes a schema change listener for handling data drift. Enjoy! Innovate on Your Data - Maxwell…

StreamSets Data Integration Blog