Dataflow Performance Blog

Announcing StreamSets Data Collector ver 2.4.0.0

We are happy to announce the newest version of StreamSets Data Collector is available for download. This short release has over 25 new features and improvements and over 50 bug fixes. This is an enterprise-focused release that addresses the needs of some of the world's largest organizations using StreamSets. Below is a short list of what's new, please check out the release notes for more details.

Multi-tenancy

To better enforce security standards for your data operations, we've introduced a set of features to enable multi-tenancy including access control lists and support for groups within both StreamSets Data Collector andStreamSets Dataflow Performance Manager (DPM), our operations management environment. Enterprises can use this functionality to restrict access to pipelines, jobs or topologies to specific groups of users.

Setting up groups and access control lists in StreamSets Data Collector is easy and seamless, as it is integrated with the process of registering pipelines within DPM.

Support for Cloudera's Apache Kafka 2.1, CDH 5.10, and Kudu 1.x

StreamSets Data Collector now supports the latest versions of Kudu and the Cloudera distributions.

UI for installing external libraries

You no longer have to go looking through config and properties files to install database drivers. You can now install external libraries such as database drivers or external java libraries for the language processors directly through the StreamSets Data Collector user interface. If you are automating installation through scripting, you can do the same using a REST API. Incidentally, when you run pipelines on a YARN cluster, the system will automatically copy all the requisite resource files to the nodes on the cluster; you don't have to do this manually.

Sending metrics to DPM without a message queue

If you want to send metric data into StreamSets Dataflow Performance Manager to enable long-term statistics monitoring of your pipelines, you no longer need to use a message queue. You can use our built-in RPC stages to send this data. This reduces the overhead for getting started with DPM.

In environments where you may have network outages and cannot afford to lose any metrics, you may still want to implement a message queue.

NOTE: Please upgrade to Java 8

If you are not already using Java 8, please plan to upgrade ASAP. With the next major release, 2.5.0.0, SDC will no longer run on Java 7.

Please be sure to check out the Release Notes for detailed information about this release. And download the Data Collector now.

Kirit BasuAnnouncing StreamSets Data Collector ver 2.4.0.0