StreamSets Data Integration Blog
Where change is welcome.
AWS Reference Architecture Guide for StreamSets
Using StreamSets DataOps Platform To Integrate Data from PostgreSQL to AWS S3 and Redshift: A Reference Architecture This document describes…
May the 4th Be With You – Analyzing Star Wars Twitter Mentions in Minecraft
A couple of weeks ago, as May the 4th approached, a lively Star Wars debate brewed at StreamSets: “Do new school characters get as much play as old favorites like Darth Vader, Yoda and Han Solo?” “Does the Dark Side of the Force dominate the Light?” “Does Yoda prevail over Darth Vader?” It occurred to us that, with the Twitter Streaming…
Ingest Salesforce Data for Analysis Using StreamSets
UPDATE - Salesforce origin and destination stages, as well as a destination for Salesforce Wave Analytics, were released in StreamSets Data Collector 2.2.0.0. Use the supported, shipping Salesforce stages rather than the unsupported code mentioned below! As I've mentioned a couple of times, my previous gig was as a developer evangelist at Salesforce, with particular focus on integration. A few weeks…
Ingesting JSON Data Into Apache Kudu with StreamSets Data Collector
At the Hadoop Summit in Dublin this week, Ted Malaska, Principal Solutions Architect at Cloudera, and I presented Ingest and Stream Processing - What Will You Choose?, looking at the big data streaming landscape with a focus on ingest. The session closed with a demo of StreamSets Data Collector, the open source graphical IDE for building ingest pipelines. In the…
Announcing Data Collector ver 1.3.0.0
With this release we have a number of exciting new features and integrations. And as usual, we've addressed a number of bug fixes. Integrations: Want to send data to Amazon Redshift? Use the new Kinesis Firehose destination to do it. If you deal with a lot of unstructured data, here's a MongoDB destination you can use. Testing Kudu within your Hadoop environment?…
Data in Motion: Simplifying Security & Building Custom Integrations
At the Strata+Hadoop World conference last week, I met with Pratik Verma, Chief Product Officer at BlueTalon, a Bay Area startup focused on big data security. As Pratik and I were talking, he explained some of the problems that arise when organizations collect more and more data, and they need to start thinking about exactly who should have access to that data. There's a…