skip to Main Content

The DataOps Blog

Where Change Is Welcome

Announcing Data Collector ver

By Posted in StreamSets News December 2, 2016

And here it is folks, the last release of 2016 – StreamSets Data Collector version We’ve put in a host of important new features and resolved 120+ bugs.

We’re gearing up for a solid roadmap in 2017, enabling exciting new use cases and bringing in some great contributions from customers and our community.

Please take this out for a spin and let us know what you think. Without further adieu, here are some of the top features in

Origins and Destinations

  • Support for writing to Azure Data Lake Store
  • Support for writing to Google BigTable
  • You can now read from and write to Salesforce, and also write to Wave Analytics
  • Ability to do Change Data Capture from MySQL. (Thanks to WgStreamSets for the contribution!)
  • The Kudu destination now supports various write operations such as: Insert, Update, Delete or Upsert


  • Support for executing Spark jobs within the pipeline. As you develop applications in Spark you no longer have to worry about writing plumbing code to read and write data from a multitude of origins and destinations. Just write your Spark code in Java or Scala and drop the jar file into the pipeline, the Spark Evaluator processor takes care of converting SDC data formats to RDDs and reading them back out again.

We currently support Spark in local mode, coming up next will be the facility to run this in Cluster mode.

Event Framework

  • We have a new framework for post processing type tasks. You can use this to do things like run a MapReduce job after writing a file to Hadoop, refresh Hive or Impala table statistics after depositing new data, and most anything else you can imagine. We currently support the following executors:
    • MapReduce executor – Write your custom MapReduce Job or use the built in Avro-to-Parquet generator.
    • Hive Query executor and JDBC Query executor – Run custom queries on Hive/Impala or relational databases.
    • HDFS File Metadata executor – Change file metadata such as name, location, permissions, ACLs, etc.

Here’s a more complete description of all the capabilities of this system.

Soon we will be adding support for triggering REST API calls, or executing Spark jobs – if you want to do something else, let us know.

Other Changes

  • A few new functions to get information about files/directories, pipeline information and to work with Datetime fields exposed via the Expression Language (EL).
  • We’ve cleaned up the data format options in the UI – they are all consolidated within a single tab for all origins and destinations.
  • The Whole File transfer option now has the ability to generate and test checksums of the files.
  • LDAP Authentication is now possible across multiple directory servers.

Download the Data Collector to get started now. Visit Documentation for more details.

Back To Top

We use cookies to improve your experience with our website. Click Allow All to consent and continue to our site. Privacy Policy