• Ultralight Data Movement for IoT with SDC Edge

    May 25, 2018

    Edge computing and the Internet of Things bring great promise, but often just getting data from the edge requires moving mountains. Let's learn how to make edge data ingestion and analytics easier using StreamSets Data Collector edge, an ultralight, platform independent and small-footprint Open Source solution written in Go for streaming data from resource-constrained sensors and personal devices (like medical equipment or smartphones) to Apache Kafka, Amazon Kinesis and many others. This talk includes an overview of the SDC Edge main features, supported protocols and available processors for data transformation, insights on how it solves some challenges of traditional approaches to data ingestion, pipeline design basics, a walk-through some practical applications (Android devices and Raspberry Pi) and its integration with other technologies such as Streamsets Data Collector, Apache Kafka, Apache Hadoop, InfluxDB and Grafana. The goal here is to make attendees ready to quickly become IoT data intake and SDC Edge Ninjas.

    Speaker: Guglielmo Iozzia, Big Data Delivery Manager, Optum

  • How to make API requests to StreamSets Control Hub managed Data Collectors

    May 25, 2018

    In order to make API requests from a Control Hub managed Data Collector, it’s necessary to get a cookie from Control Hub and use that cookie to make requests from Data Collector instances.

  • RingCentral Scales Out with StreamSets

    May 18, 2018

  • Streamsets and Cloudera IOT Demo

    May 4, 2018

    See a demonstration of flowing IoT sensor data into HDFS for analysis using Impala and Hive while getting fine-grained metrics, anomaly detection and alerts.

  • Getting Started with StreamSets Data Controller Edge

    May 4, 2018

    Learn how to use StreamSets Control Hub to create a dataflow pipeline for StreamSets Data Controller Edge. More details at

  • Cloudera + StreamSets Demo

    April 10, 2018

    Watch this to demo to see how to deploy StreamSets Data Collector using Cloudera Manager and then build, test, and execute pipelines into your Enterprise Data Hub. The video shows off features such as Data Drift Handling, pipeline preview and live metrics.

  • Introduction to StreamSets Data Protector (Discover & Secure Data in Motion)

    March 16, 2018

    Enjoy this short introduction to how StreamSets Data Protector discovers personal data as it is ingested and then automatically protects it via masking, hashing, scrambling and other obfuscation methods. The solution includes policy management to help you standardize policies by governance zone and comply with regulations on an enterprise-wide basis.

  • MapR StreamSets Factory IoT Demo

    March 14, 2018

    See a live demo of ingestion of IoT data from a factory into MapR including visualization.

  • StreamSets and GSK – Freeing Data for Drug Discovery

    March 2, 2018

    Enjoy this short video where GSK data executives discuss how they have consolidated all of their R&D data into a single location to improve drug discovery. Their architecture includes over a million dataflow pipelines created using StreamSets.

  • SDC Edge Demo (Ultra Lightweight Data Ingestion for IoT)

    December 20, 2017

    Open source SDC Edge provides ultra lightweight ingestion for memory, CPU and connectivity constrained devices and sensors such as those used in IoT applications. This demo walks you through a simple use case with sensor-enabled Raspberry Pis.

  • Create Apache Kafka Pipelines in Minutes

    December 3, 2017

    Ian Wrigley, Technology Evangelist at StreamSets, walks you through how to create and run an Apache Kafka pipeline that reads, enriches and writes data, all without requiring a line of code.

  • StreamSets Apache Sqoop Importer Demo

    October 26, 2017

    This is short demonstration of a cool free tool that converts a Sqoop command line into a StreamSets Data Collector pipeline to allow for drag and drop manipulation and runtime metrics and monitoring. The tool can be accessed through the Python command “pip3 install StreamSets”

  • Welcome to ‘Ask StreamSets'

    September 6, 2017

    ‘Ask StreamSets' is the new question & answer site for the StreamSets community. This short video explains how to register, search, ask and answer questions.

  • Cache Salesforce Data in Redis with StreamSets Data Collector

    June 26, 2017

    A look at how the open source Jedis library provides a small, sane, easy to use Java interface to Redis, and how a StreamSets Data Collector (SDC) pipeline can read data from a platform, such as Salesforce, write the data to Redis via Jedis, and keep Redis up-to-date by subscribing for notifications of changes in Salesforce by writing new and updated data to Redis.

  • Triggering Databricks Notebook Jobs from StreamSets Data Collector

    June 20, 2017

    Learn how StreamSets Data Collector can write data to Amazon S3, triggering Databricks notebook jobs to run as S3 objects are written.

  • StreamSets Multitenant Support

    March 9, 2017

    See how StreamSets Data Collector and DPM let you create multitenant environments that isolate workflows and protect data integrity.

  • Drift Synchronization with StreamSets Data Collector and Azure Data Lake

    March 2, 2017

    StreamSets Data Collector can read data from a variety of sources, including relational databases, detect data drift, and automatically reconcile schema changes into destinations such as Hive on Azure Data Lake and HDInsight.

  • Bryan Duxbury, StreamSets – Spark Summit East 2017

    February 9, 2017

    Bryan Duxbury, Vice President of Engineering at StreamSets, sits down with Dave Vellante & George Gilbert at Spark Summit East 2017 at the Hynes Convention Center in Boston, Massachusetts.

  • Replicating Relational Databases with StreamSets Data Collector

    February 3, 2017

    The JDBC Multitable Consumer in StreamSets Data Collector allows you to ingest an entire database with a single pipeline. This video shows how to replicate a MySQL database in Apache Hive.

  • Ingesting Data into Apache Kudu Using StreamSets Data Collector

    January 30, 2017

    This demo walks you through a Customer 360 use case ingesting batch and streaming data into Apache Kudu.

  • How to Use Dataflow Triggers

    January 12, 2017

    In 2 minutes, learn how to trigger tasks in external systems based on events that occur in a StreamSets Data Collector pipeline.

  • Ingest in 60 Seconds – Microsoft Azure Data Lake

    December 12, 2016

    See how easy it is to build a data pipeline into Microsoft Azure Data Lake in this short video demonstration.

  • Building Data Pipelines with Apache Spark and StreamSets

    October 27, 2016

    Big data tools such as Hadoop and Spark allow you to process data at unprecedented scale, but keeping your processing engine fed can be a challenge. Metadata in upstream sources can ‘drift’ due to infrastructure, OS and application changes, causing ETL tools and hand-coded solutions to fail. In this session we’ll look at how SDC’s “intent-driven” approach keeps the data flowing, with a particular focus on clustered deployment with Spark and other exciting Spark integrations in the works.


    Why StreamSets? StreamSets Company Overview

    October 21, 2016

    Learn about StreamSets, its mission, heritage, products and value, in the words of its founders and employees.

  • Ingesting Drifting Data into Hive and Impala

    October 20, 2016

    This video guides you through configuring StreamSets Data Collector to ingest data from MySQL to Apache Hive running on any of the Apache, Cloudera, MapR or Hortonworks distributions.


    Demonstration of StreamSets DPM

    October 8, 2016

    Enjoy this walk through of the new StreamSets Dataflow Performance Manager featuring StreamSets CTO Arvind Prabhakar and Director of Product Management Kirit Basu.

  • Connected Car (IoT) Demo with Cloudera

    September 26, 2016

    Watch this demo to see StreamSets Data Collector ingest IoT data from cars into Kafka and Solr, including enforcing business rules, handling data drift and scripting custom transformations.

  • Adaptive Data Cleansing with StreamSets and Cassandra

    September 8, 2016

    Cassandra is a perfect fit for consuming high volumes of time-series data directly from users, devices, and sensors. Sometimes, though, when we consume data from the real world, systematic and random errors creep in. In this session, we'll see how to use open source tools like RabbitMQ and StreamSets Data Collector with Cassandra features such as User Defined Aggregates to collect, cleanse and ingest variable quality data at scale. Discover how to combine the power of Cassandra with the flexibility of StreamSets to implement adaptive data cleansing.

  • Continuous IoT Ingestion Using StreamSets & Cloudera

    June 10, 2016

    See how to use StreamSets Data Collector to easily ingest IoT data into a Cloudera cluster in the face of data drift from changing versions of sensors in the field.

  • 2016 Star Wars Tweet Off

    May 4, 2016

    Watch Yoda, Darth Vadar and other Star Wars favorites grow out of the sand of Tatooine in Minecraft as their Twitter mentions rise.

  • How We Did It: 2016 Star Wars Tweet Off

    May 4, 2016

    How we used StreamSets Data Collector to ingest and analyze #StarWars tweets for display in Minecraft.

  • Ingest and Stream Processing: What will you choose?

    April 13, 2016

    StreamSets + Cloudera presentation

  • Integrating StreamSets with Salesforce Wave Analytics

    April 2, 2016

    Uploading data to Wave Analytics from StreamSets Data Collector.

  • Visualizing Apache Log Data… with StreamSets & Minecraft!

    March 24, 2016

    StreamSets Data Collector is the perfect tool for streaming data from Apache logs, processing it, and sending it to destinations such as Kafka. This video presents a fun visualization of the geographic origin of those requests.

  • Simple Kafka Enablement Using StreamSets

    January 25, 2016

    StreamSets Data Collector makes it very easy to create Kafka Producers and Consumers without writing a single line of code.

  • Logfiles into Elastic with StreamSets Data Collector

    January 25, 2016

  • Installing StreamSets Using Cloudera Parcels

    December 2, 2015

    This short clip shows you how to install StreamSets using the CSD and Parcels features of Cloudera Manager.

  • Hands on: Open Source StreamSets Tackles Data Drift

    October 27, 2015

    Watch Streamsets Field Engineer Jonathan “Natty” Natkins demonstrate how you can use the open source Streamsets Data Collector to flexibly handle “data drift” – the inevitable and painful evolution of infrastructure, semantics and schema that can corrupt data and break pipelines.

  • Introduction to StreamSets Data Collector

    October 7, 2015

    StreamSets is an open source, continuous big data ingest infrastructure.

  • StreamSets Data Collector: Getting Started

    September 14, 2015

  • Introduction to StreamSets

    September 13, 2015

Receive Updates

Receive Updates

Join our mailing list to receive the latest news from StreamSets.

You have Successfully Subscribed!

Pin It on Pinterest