Today, StreamSets has announced the immediate availability of StreamSets Data Collector 3.4.0 and StreamSets Control Hub 3.3.0. These enhancements are aimed at delivering a better and more connected cloud experience for users of the StreamSets Data Collector and a refined and streamlined user experience in StreamSets Control Hub. In addition to some of the points below we have also added support for new popular destinations and origins that are outlined in the release notes.
Let’s take a quick look at some of the areas of development for this release.
StreamSets Data Collector 3.4.0
You can now create Microservices pipelines using a new REST Service origin and new Send to Origin Response destination to build a free standing microservice.
Support for EMR
StreamSets Data Collector can now use the cluster EMR batch mode to run on an Amazon EMR cluster and process data from Amazon S3. StreamSets Data Collector can run on an existing EMR cluster or on a new EMR cluster that is provisioned when the cluster pipeline starts. At the time of configuration you can choose whether the cluster remains active or terminates when the pipeline stops.
We have added support for change data capture (CDC) on PostgreSQL (version 9.4 and above). As adoption for PostgreSQL-based managed services evolves we have added a new origin to address these common cloud offerings.
Users must install a plug-in on PostgreSQL to use this origin. This is a common practice, as PostgreSQL does not have a built-in CDC plug-in. This feature works on Amazon RDS Postgres.
Support for reading Microsoft Excel files
Microsoft Excel continues to be one of the most widely used business intelligence tools with broad adoption in the enterprise. You can now use the following file-based origins to process Microsoft Excel files: Amazon S3 origin, Directory origin, Google Cloud Storage origin, SFTP/FTP Client origin.
We have included several more enhancements. For a full list please consult the release documentation. We have included better Cloudera Navigator integration and support for writing Parquet files without a Hadoop cluster.
StreamSets Control Hub 3.3
In Control Hub Pipeline Designer, pipeline fragments now support preview using a Test Origin. Users can now preview incomplete pipelines, so they can see what fields are included before finalizing a pipeline. During the preview process they can also terminate a job if it is taking too long. This is key when performing iterative development and testing of new pipelines. Our aim is to decrease the number of review and troubleshooting cycles before a new pipeline is promoted to production.
Collect CPU Load and Memory Metrics
StreamSets Control Hub now collects CPU Load and Memory metrics from both Data Collector and Data Collector Edge. These metrics are rolled up and displayed in Control Hub for both latest and time series data.
StreamSets is launching its first product portal, an interactive portal for submitting product and feature requests and tracking the progress of your requests. StreamSets cares about what our customers need to fulfill their use cases so your input is extremely valuable to us. It can also be embedded in a webpage/SDC UI/SCH UI via iframe.
We have also enhanced the online experience of finding and accessing our documentation. You will notice a new look and feel. All of the previous documentation remains exactly where you expect it, but it is now easier to view and navigate on smaller devices like your tablet or mobile phone.
We will continue to enhance our DataOps platform to support the most in-demand origins and destinations along with more cloud interoperability. Share your ideas with us in our community pages and new product portal,
Check out our Documentation page for doc highlights, what’s new, and tutorials: streamsets.com/docs
Or you can go straight to our latest documentation here: https://streamsets.com/documentation/datacollector/latest/help