Dataflow Performance Blog

Announcing Data Collector ver 2.6.0.0

We are excited to announce version 2.6 of StreamSets Data Collector. This release has important functionality focused on helping customers to modernize their enterprise data warehouses on Hadoop, CyberSecurity, IoT and Spark.

You can download the latest open source release here.

This release has 6 new features, 20 improvements and 72 bug fixes. For a full list, see What's New. For a list of bug fixes and known issues, see the Release Notes.

No more manually managing schemas

Our Data Drift Synchronization feature now supports automatically updating Hive schemas when writing to Parquet-backed tables. Customers can get the best of both worlds and use StreamSets Data Collector to write to both Avro and Parquet files at the same time, and automatically update the Hive metastore if the incoming schema of the data changes.

Orchestrate beyond data flows

The Dataflow Triggers feature now supports sending a custom email in a response to events detected within the pipeline. For example, the Email executor can send an email when a file has been written.

Also in this release is a new Shell executor. You can now execute arbitrary shell commands based on events occurring within the pipeline.

Speed up ingest for cybersecurity and IoT

We have added a new multithreaded TCP Server origin, which greatly speeds up ingesting any TCP payload such as Syslog, NetFlow or others. Customers typically use this feature for cybersecurity use cases such as high throughput ingest for real-time fraud or anomaly detection.

We’ve also added an origin and destination for the CoAP protocol. CoAP is a protocol with extremely low overhead designed for IoT and M2M devices that have very little memory and resources.

Support for Spark 2.0

You can now use the StreamSets Data Collector to trigger Spark 2.x jobs via the Spark processor, Spark executor or while running on the Hadoop cluster as a Spark Streaming application.

Improve time to support

SDC has a new one-click option to generate a support bundle zip file containing logs, environment variables, configs, pipelines and resource files. This helps customers quickly provide all relevant information about their environment to our support team when trying to debug problems.

Kirit BasuAnnouncing Data Collector ver 2.6.0.0