skip to Main Content

The DataOps Blog

Where Change Is Welcome

StreamSets Announces Control Hub Version 3.4.0 and StreamSets Data Collector Version 3.5.0.

By Posted in StreamSets News October 2, 2018

StreamSets is excited to announce the immediate availability of Control Hub version 3.4.0 and StreamSets Data Collector version 3.5.0.

StreamSets Data Collector is the platform’s open source execution engine, moving data between any source and destination, performing transformations and push down analytics along the way.

StreamSets Control Hub offers a hosted environment for collaborative design and deployment of dataflows. It includes a pipeline repository for easy dataflow reuse and tight control over the production process. It displays a live data map with end-to-end metrics and it can automatically scale dataflows on-premise or across multiple cloud.

Versions 3.4.0 and 3.5.0 respectively display notable enhancements for data protection, governance, guided pipeline completion, and pipeline validation. For a full look at the enhancements please visit our documentation. Let’s take a quick look at what’s new.

StreamSets Control Hub 3.4.0

Pipeline Completion

StreamSets Pipeline Designer now completes expressions in stage and pipeline properties to provide a list of data types, runtime parameters, fields, and functions that you can use. Pipeline Designer also manages pipeline and pipeline fragment versions. When configuring a pipeline or pipeline fragment in Pipeline Designer, you can now view a visualization of the pipeline or fragment version history.  

When you expand the version history, you can manage the pipeline or fragment versions including comparing versions, creating tags for versions, and deleting versions. You can also expand and collapse individual pipeline fragments when used in a pipeline. Previously, expanding a fragment meant that all fragments in the pipeline were expanded.

Pipeline Validation

StreamSets users can now use Pipeline Designer to preview and validate edge pipelines.  This is critical for the design and testing of data flowing from remote sensors or applications.  Developers can now validate and test these pipelines to prevent failures in production.

For more information about Control Hub version 3.4.0 please visit our documentation.

StreamSets Data Collector 3.5.0

Technology Preview Functionality

Data Collector now includes certain new features and stages with the Technology Preview designation. Technology Preview functionality is available for use in development and testing, not currently ideal for production. Technology Preview stages display a Technology Preview icon on the upper left corner of the stage.

New Data Formats

When reading delimited data that contains headers with empty values, Data Collector now replaces the empty values with the string “empty-” plus the column number starting from zero. When reading Excel data, Data Collector now processes the underlying raw values for numeric columns in a spreadsheet, rather than the displayed values.

Microservice Pipelines

WebSocket Client origin and WebSocket Server origin can now send responses back to the originating REST API client when used with destinations that send records to the origin in the same microservice pipeline. HTTP Client destinations, Kafka Producer destinations, and Kinesis Producer destinations can now send records to the origin in the microservice pipeline.

Data Governance Integration

Data Collector can now publish metadata to data governance tools for the following stages: Amazon S3 origin, Kafka Multitopic Consumer origin, SFTP/FTP Client origin, and Kafka Producer destination.

Data Collector can also publish metadata to Cloudera Navigator running on Cloudera Manager versions 5.10 to 5.15. If Cloudera Navigator is configured for TLS/SSL, Data Collector requires a local truststore file to verify the identity of the Cloudera Navigator Metadata Server.

For more information about StreamSets Data Collector version 3.5.0 please visit our documentation.

Data Protector

Rules and Policies

StreamSets Data Protector performs global in-stream discovery and protection of data in motion. Data Protector provides StreamSets classification rules and enables creating custom classification rules to identify sensitive data. Custom protection policies provide rules-based data protection for every job that you run. You can also use Data Protector stages in pipelines for localized protection needs.

Data Protector is available as an add-on option with a StreamSets Enterprise subscription. For more information, contact us.

Back To Top

We use cookies to improve your experience with our website. Click Allow All to consent and continue to our site. Privacy Policy