Dataflow Performance Blog

Announcing StreamSets Data Collector version 3.0

Version 3.0 marks an important new milestone for StreamSets. With close to a million downloads and a strong community and customer base, we are very excited to offer a host of powerful new capabilities within the product. This release has greater connectivity with cloud services, deeper integration with Hadoop distributions, new data aggregations and an exciting new technology for running pipelines on resource constrained devices.

For those keeping count, SDC 3.0 has 27 new features, close to 100 improvements and almost 200 bug fixes.

This release also contains important new functionality where we extend the reach of SDC out to devices out on the edge. SDC Edge is a lightweight agent that can execute pipelines designed in SDC. These agents can run on Windows, Linux, Mac, Android and IoS. To learn more about SDC Edge, follow this link.

Please feel free to check the in depth release notes and documentation, and download the software now.

Here’s what’s new in StreamSets Data Collector 3.0:

  • We continue to expand our support for cloud services, in this release we’ve added an AWS SQS origin and Google Cloud Storage origin and destination.
  • We’ve worked closely with the MapR team and now support reading CDC messages from MapR DB. Also added is a new multitopic MapR Streams origin that can spawn multiple threads to read multiple topics at once.
  • Several origins can now be scaled up, e.g. the UDP Multithreaded origin to read high volume network/syslog messages. The Directory origin can now read multiple files simultaneously.
  • The Kafka multithreaded origin has been updated to support major new releases of Kafka for the Apache as well as Hadoop distributions.
  • As SDC continues to gain popularity with the cybersecurity community we’ve been adding several network oriented capabilities. We’ve now added support for parsing NetFlow 9 messages. On the roadmap is support for IPFIX, let us know other network protocols you’d like to see supported.
  • SDC now supports a WebSocket Client origin that can read data from a WebSocket Server endpoint.
  • With the growing popularity of GPU databases, we now support writing data to a KineticaDB destination.
  • We’ve made improvements to the Oracle CDC Client origin, this origin is now up to 20% faster.
  • A very popular ask from customers was the ability to aggregate data, we’re happy to report that the Data Collector now adds support for an Aggregator processor. This is a net new feature that lets you aggregate data that arrives within a window of time.
  • Continuing our commitment to the open source community, SDC now supports running on top of OpenJDK.
  • Enterprises using CyberArk as their credential store can now use the credential expression language to securely connect to target systems.
  • As a part of the StreamSets commercial subscription, we now offer support for writing lineage data to Cloudera Navigator and Apache Atlas.

We trust you’ll agree that version 3.0 adds a strong set of capabilities to SDC. If you haven’t already done so, download SDC now and see for yourself. To learn more about any of the features referenced in this blog, contact Sales for more information.

Kirit BasuAnnouncing StreamSets Data Collector version 3.0