SDC Edge Demo (ultra lightweight data ingestion for IoT)

December 20, 2017

Open source SDC Edge provides ultra lightweight ingestion for memory, CPU and connectivity constrained devices and sensors such as those used in IoT applications. This demo walks you through a simple use case with sensor-enabled Raspberry Pis.

Create Apache Kafka Pipelines in Minutes

December 3, 2017

Ian Wrigley, Technology Evangelist at StreamSets, walks you through how to create and run an Apache Kafka pipeline that reads, enriches and writes data, all without requiring a line of code.

StreamSets Apache Sqoop Importer Demo

October 26, 2017

This is short demonstration of a cool free tool that converts a Sqoop command line into a StreamSets Data Collector pipeline to allow for drag and drop manipulation and runtime metrics and monitoring. The tool can be accessed through the Python command “pip3 install StreamSets”

Welcome to ‘Ask StreamSets'

September 6, 2017

‘Ask StreamSets' is the new question & answer site for the StreamSets community. This short video explains how to register, search, ask and answer questions.

Cache Salesforce Data in Redis with StreamSets Data Collector

June 26, 2017

A look at how the open source Jedis library provides a small, sane, easy to use Java interface to Redis, and how a StreamSets Data Collector (SDC) pipeline can read data from a platform, such as Salesforce, write the data to Redis via Jedis, and keep Redis up-to-date by subscribing for notifications of changes in Salesforce by writing new and updated data to Redis.

Triggering Databricks Notebook Jobs from StreamSets Data Collector

June 20, 2017

Learn how StreamSets Data Collector can write data to Amazon S3, triggering Databricks notebook jobs to run as S3 objects are written.

StreamSets Multitenant Support

March 9, 2017

See how StreamSets Data Collector and DPM let you create multitenant environments that isolate workflows and protect data integrity.

Drift Synchronization with StreamSets Data Collector and Azure Data Lake

March 2, 2017

StreamSets Data Collector can read data from a variety of sources, including relational databases, detect data drift, and automatically reconcile schema changes into destinations such as Hive on Azure Data Lake and HDInsight.

Bryan Duxbury, StreamSets – Spark Summit East 2017

February 9, 2017

Bryan Duxbury, Vice President of Engineering at StreamSets, sits down with Dave Vellante & George Gilbert at Spark Summit East 2017 at the Hynes Convention Center in Boston, Massachusetts.

Replicating Relational Databases with StreamSets Data Collector

Friday, February 3, 2017

The JDBC Multitable Consumer in StreamSets Data Collector allows you to ingest an entire database with a single pipeline. This video shows how to replicate a MySQL database in Apache Hive.

Ingesting Data into Apache Kudu Using StreamSets Data Collector

Monday, January 30, 2017

This demo walks you through a Customer 360 use case ingesting batch and streaming data into Apache Kudu.

How to Use Dataflow Triggers

Thursday, January 12, 2017

In 2 minutes, learn how to trigger tasks in external systems based on events that occur in a StreamSets Data Collector pipeline.

Ingest in 60 Seconds – Microsoft Azure Data Lake

Monday, December 12, 2016

See how easy it is to build a data pipeline into Microsoft Azure Data Lake in this short video demonstration.

Building Data Pipelines with Apache Spark and StreamSets

Thursday, October 27, 2016

Big data tools such as Hadoop and Spark allow you to process data at unprecedented scale, but keeping your processing engine fed can be a challenge. Metadata in upstream sources can ‘drift’ due to infrastructure, OS and application changes, causing ETL tools and hand-coded solutions to fail. In this session we’ll look at how SDC’s “intent-driven” approach keeps the data flowing, with a particular focus on clustered deployment with Spark and other exciting Spark integrations in the works.

Adaptive Data Cleansing with StreamSets and Cassandra

Thursday, September 8, 2016

Cassandra is a perfect fit for consuming high volumes of time-series data directly from users, devices, and sensors. Sometimes, though, when we consume data from the real world, systematic and random errors creep in. In this session, we'll see how to use open source tools like RabbitMQ and StreamSets Data Collector with Cassandra features such as User Defined Aggregates to collect, cleanse and ingest variable quality data at scale. Discover how to combine the power of Cassandra with the flexibility of StreamSets to implement adaptive data cleansing.


Why StreamSets?

Friday, October 21, 2016

Learn about StreamSets, its mission, heritage, products and value, in the words of its founders and employees.

Ingesting Drifting Data into Hive and Impala

Thursday, October 20, 2016

This video guides you through configuring StreamSets Data Collector to ingest data from MySQL to Apache Hive running on any of the Apache, Cloudera, MapR or Hortonworks distributions.


Demonstration of StreamSets DPM

Monday, October 8, 2016

Enjoy this walk through of the new StreamSets Dataflow Performance Manager featuring StreamSets CTO Arvind Prabhakar and Director of Product Management Kirit Basu.

Connected Car (IoT) Demo with Cloudera

Monday, September 26, 2016

Watch this demo to see StreamSets Data Collector ingest IoT data from cars into Kafka and Solr, including enforcing business rules, handling data drift and scripting custom transformations.

Continuous IoT Ingestion Using StreamSets & Cloudera

Monday, June 27, 2016

See how to use StreamSets Data Collector to easily ingest IoT data into a Cloudera cluster in the face of data drift from changing versions of sensors in the field.

2016 Star Wars Tweet Off

Wednesday, May 4, 2016

Watch Yoda, Darth Vadar and other Star Wars favorites grow out of the sand of Tatooine in Minecraft as their Twitter mentions rise.

How We Did It: 2016 Star Wars Tweet Off

Wednesday, May 4, 2016

How we used StreamSets Data Collector to ingest and analyze #StarWars tweets for display in Minecraft.

Ingest and Stream Processing: What will you choose?

Wednesday, April 13, 2016

StreamSets + Cloudera presentation

Integrating StreamSets with Salesforce Wave Analytics

Saturday, April 2, 2016

Uploading data to Wave Analytics from StreamSets Data Collector.

Visualizing Apache Log Data… with StreamSets & Minecraft!

Thursday, March 24, 2016

StreamSets Data Collector is the perfect tool for streaming data from Apache logs, processing it, and sending it to destinations such as Kafka. This video presents a fun visualization of the geographic origin of those requests.

Simple Kafka Enablement Using StreamSets

Monday, January 25, 2016

StreamSets Data Collector makes it very easy to create Kafka Producers and Consumers without writing a single line of code.


Monday, January 25, 2016

Installing StreamSets Using Cloudera Parcels

December 2, 2015

This short clip shows you how to install StreamSets using the CSD and Parcels features of Cloudera Manager.

Hands on: Open Source StreamSets Tackles Data Drift

October 27, 2015

Watch Streamsets Field Engineer Jonathan “Natty” Natkins demonstrate how you can use the open source Streamsets Data Collector to flexibly handle “data drift” – the inevitable and painful evolution of infrastructure, semantics and schema that can corrupt data and break pipelines.

Download Streamsets at

Introduction to StreamSets Data Collector

October 7, 2015

StreamSets is an open source, continuous big data ingest infrastructure.

Introduction to StreamSets

September, 2015

Getting Started

September, 2015

Shona DavidsonVideos