What's New

What's New in 3.16.0

StreamSets Control Hub version 3.16.0 includes the following new features and enhancements:

Pipelines and Pipeline Fragments
  • A microservice pipeline template is now available when creating a Data Collector pipeline from a template.
  • Pipeline Designer can now use field information from data preview in the following ways:
    • Some field properties have a Select Fields Using Preview Data icon that you can use to select fields from the last data preview.
    • As you type a configuration, the list of valid values includes fields from the input and output schema extracted from the preview.
    • Fields in the Schema tab and in the data preview have a Copy Field Path to Clipboard icon that you can use to copy a field path, which you can then paste where needed.
  • The Pipelines view and the Pipeline Fragments view now display long lists over multiple pages.
  • The Pipelines view and the Pipeline Fragments view now offer additional support for filters:
    • Click the Keep Filter Persistent checkbox to retain the last applied filter when you return to the view.
    • Save or share the URL to reopen the view with the applied filter later or on a different browser.
Snapshots
You can now take snapshots during pipeline test runs or job runs for Data Collector. You can view a snapshot to see the pipeline records at a point in time and you can download the snapshot file. The Data Collector instance used for the snapshot depends on where you take the snapshot:
  • Snapshots taken from a pipeline test run use the selected authoring Data Collector.
  • Snapshots taken while monitoring a job use the execution Data Collector for the job run. When there is more than one execution Data Collector, the snapshot uses the Data Collector selected in the monitoring detailed view.
Jobs
  • From the Jobs view, you can now duplicate a job or job template to create one or more exact copies of an existing job or job template. You can then change the configuration and runtime parameters of the copies.
  • The color of the job status in the Jobs view during deactivation depends on how the job was deactivated:
    • Jobs stopped automatically due to an error have a red deactivating status.
    • Jobs stopped as requested or as expected have a green deactivating status.
  • The Jobs view now offers additional support for filters:
    • Click the Keep Filter Persistent checkbox to retain the last applied filter when you return to the view.
    • Save or share the URL to reopen the view with the applied filter later or on a different browser.
  • The monitoring panel now shows additional information about job runs:
    • The Summary tab shows additional metrics, such as record throughput.
    • The History tab has a View Summary link that opens a Job Metrics Summary page for previous job runs.
System Data Collector
Administrators can now enable or disable the system Data Collector for use as the default authoring Data Collector in Pipeline Designer. By default, the system Data Collector is enabled for existing organizations, but disabled for new organizations.
Control Hub REST API
The Control Hub REST API includes a new Control Hub Metrics category that contains several RESTful APIs:
  • Job Runner Metrics APIs retrieve metrics on job runs and executor uptime, CPU, and memory usage.
  • Time Series APIs retrieve metrics on job runs and executor CPU and memory usage over time.
  • Security Metrics APIs retrieve login and action audit reports.
Organization Configuration
System administrators can now configure the following organization properties at an organization level in addition to a global level:
  • System limit on the maximum number of job runs
  • System limit on the maximum number of days before job status history is purged
  • System limit on the maximum number of time series purge days

When organization administrators save organization properties, Control Hub verifies that the system limits are not exceeded.

LDAP Authentication
To help synchronize Control Hub with the LDAP provider when using LDAP authentication, you can now configure Control Hub to automatically create and deactivate users to match users in LDAP groups. When enabled, Control Hub does the following:
  • Creates Control Hub users when necessary to match users in the LDAP groups.
  • Deactivates but does not delete users from Control Hub when removed from all LDAP groups linked to Control Hub.

By default, this process is disabled.

Updated Configuration Files
The following configuration files include new properties for this release:
  • $DPM_CONF/security-app.properties includes the following new properties to facilitate synchronization with LDAP:
    • ldap.automaticResolutionEnabled - A flag to enable Control Hub to automatically add and deactivate users to match users in LDAP groups.
    • ldap.resolutionFrequencyMillis - The number of milliseconds to wait between checks to add and deactivate users to match users in LDAP groups.
  • $DPM_CONF/timeseries-app.properties includes the following new properties to support deletion of the metrics of inactive jobs older than a threshold that administrators configure for an organization:
    • enable.metrics.history.purge - A flag to enable deletion of historical metrics.
    • metrics.history.purge.init.delay.minutes - The number of minutes after the application starts until the first check for historical metrics to delete.
    • metrics.history.purge.freq.minutes - The number of minutes between checks for historical metrics to delete.
    • metrics.history.purge.batch.size - The maximum number of historical metrics to delete in one batch.

    Do not change the values of these properties without guidance from StreamSets Support.

What's New in 3.15.0

StreamSets Control Hub version 3.15.0 includes the following new features and enhancements:

Pipeline Fragments
When you reuse pipeline fragments in the same pipeline, you can now specify different values for the runtime parameters in each instance of the fragment. When adding a pipeline fragment to a pipeline, you specify a prefix for parameter names, and Pipeline Designer automatically adds that prefix to each runtime parameter in the pipeline fragment.
Jobs
To improve performance and scalability, this release introduces a process to manage job history. Now, Control Hub automatically deletes the history and metrics associated with a job on a predetermined basis. By default, Control Hub:
  • Retains the job history for the last 10 job runs. Administrators can increase the retention to at most 100 job runs.
  • Retains the job history for 15 days from each retained job run. The history for each job run can contain at most 1,000 entries. Administrators can increase the retention to at most 60 days of job history.
  • Retains job metrics only for jobs that have been active within the past 6 months.
Integration of Cloud-Native Transformer Application
You can use Helm charts to run Transformer as a cloud-native application in a Kubernetes cluster. Control Hub can now generate a Helm script that inserts the Transformer authentication token and Control Hub URL into a Helm chart.

You can use Transformer running inside a Kubernetes cluster to launch Transformer pipeline jobs outside the Kubernetes cluster, such as those in a Databricks cluster, EMR cluster, or Azure HDInsight cluster. However, you can run a Transformer pipeline from the Kubernetes cluster without any additional Spark installation support from other vendors.

Updated Configuration Files
The following updated configuration files include new properties for this release:
  • $DPM_CONF/jobrunner-app.properties now includes the following new properties to support purging of deleted jobs and to support automatic deletion of job history based on thresholds that administrators configure for an organization:
    • job.status.history.purge.batch.size - The number of job status history entries purged in a batch.
    • system.limit.job.status.history.records - The maximum number of records permitted in the job status history for each run.

    Do not change the values of these properties without guidance from StreamSets Support.

  • $DPM_CONF/timeseries-app.properties now includes the following new properties to support purging of metrics of deleted jobs and inactive jobs older than a threshold that administrators configure for an organization:
    • enable.metrics.purge - A flag to enable purging of metrics.
    • metrics.purge.init.delay.minutes - The number of minutes after the application starts until the first check for metrics to purge.
    • metrics.purge.freq.minutes - The number of minutes between checks for metrics to purge.
    • metrics.purge.batch.size - The maximum number of metrics to purge in one batch.
    • metrics.purge.batch.pause.millis - The number of milliseconds to pause between each batch.
    • metrics.fetch.batch.size - The maximum number of metrics to fetch for purging.

    Do not change the values of these properties without guidance from StreamSets Support.

What's New in 3.14.0

StreamSets Control Hub version 3.14.0 includes the following new features and enhancements:

Data Protector
  • Import and export policies - You can now import and export policies and their associated procedures. This enables you to share policies with different organizations, as from a development to a production organization.

    Import or export policies from the Protection Policies view.

  • Category name assist in procedures - When you configure a procedure based on a category pattern, a list of matching category names displays when you begin typing the name. You can select the category name to use from the list of potential matches.
  • Policy enactment change - Policies are no longer restricted to being used only upon read or only upon write. A policy can now be used in either case. As a result, the following changes have occurred:
    • When previewing data or configuring a job, you can now select any policy for the read and for the write.
    • You can now select any policy as the default read or write policy for an organization. You can even use the same policy as the default read policy and the default write policy.
UI Improvements
To improve usability, the Pipelines, Pipeline Fragments, Reports, and Jobs views change the positions of some fields.
Data Collectors and Edge Data Collectors
You can now configure resource thresholds for any registered Data Collector or Data Collector Edge. When starting, synchronizing, or balancing jobs, Control Hub ensures that a Data Collector or Data Collector Edge does not exceed its resource thresholds for CPU load, percent memory used, and number of pipelines running.
Balancing Jobs
From the Registered Data Collectors list, you can now balance jobs that are enabled for failover and running on selected Data Collectors to distribute pipeline load evenly. When balancing jobs, Control Hub redistributes jobs based on assigned labels, possibly distributing jobs to Data Collectors not selected.
Organization Security
When creating or editing a group, you can now click links to clear any assigned roles or select all available roles.
Updated Configuration Files
The following updated configuration files include new properties for this release:
  • $DPM_CONF/jobrunner-app.properties now includes the following new properties to facilitate purging of deleted jobs:
    • purge.job.immediate - A flag to purge jobs immediately upon delete rather than archiving them. The default value keeps existing functionality unchanged.
    • enable.job.purge - A flag to enable purging of jobs.
    • job.purge.age.days - The age of deleted jobs that the application purges.
    • enable.active.job.purge - A flag to enable purging of deleted jobs shown as active.
    • job.purge.init.delay.minutes - The number of minutes after the application starts until the first check for jobs to purge.
    • job.purge.freq.minutes - The number of minutes between checks of jobs to purge.
    • job.purge.batch.size - The maximum number of jobs to purge in one batch.
    • job.purge.batch.pause.millis - The number of milliseconds to pause between each batch.
    • enable.job.status.purge - A flag to enable purging of the job status.
    • job.status.purge.exec.limit - Maximum number of job status records to retain.
    • enable.job.status.history.purge - A flag to enable purging of the job status history.

    Do not change the values of these properties without guidance from StreamSets Support.

  • $DPM_CONF/messaging-app.properties now includes the following new properties to facilitate automatic deletion of excessive messages:
    • event.delete.threshold - The maximum number of events in the queue before the application begins automatically deleting excessive messages.
    • event.delete.chunk - The number of events deleted together as a chunk.
    • event.delete.frequency.millis - The number of milliseconds between checks for excessive messages to delete.
    • event.delete.initial.delay.millis - The number of milliseconds after the start of the messaging application before the first check for excessive messages to delete.
    • event.delete.apps.to.monitor - A comma-separated list of applications monitored for automatic deletion of excessive messages.
    • event.type.ids.to.monitor - A comma-separated list of event-type IDs monitored for automatic deletion of excessive messages.

    In most cases, the default values for these properties should work.

What's New in 3.13.0

StreamSets Control Hub version 3.13.0 includes the following new features and enhancements:

Data Protector
  • Classification preview - You can now preview how StreamSets and custom classification rules classify data. You can use the default JSON test data or supply your own test data for the preview.

    This is a Technology Preview feature that is meant for development and testing only.

  • Pipeline preview with policy selection - When you preview pipeline data, you can now configure the read and write protection policies to use during the preview.

    This is a Technology Preview feature that is meant for development and testing only.

  • Cluster mode support - Data Protector now supports protecting data in cluster mode pipelines.
  • Encrypt Data protection method - You can now use the Encrypt Data protection method to encrypt sensitive data.
  • UI enhancements:
    • The “Classification Rules” view has been renamed “Custom Classifications.”
    • The Custom Classifications view and Protection Policies view are now available under a new Data Protector link in the navigation panel.
    • Protection policies now list procedures horizontally instead of vertically.
Subscriptions
When configuring an email action for a job status change event, you can now use a parameter to have an email sent to the owner of the job.
Organization Security
This release includes the following improvements for managing users:
  • The Add User dialog box now has links to clear any assigned roles or select all available roles.
  • After you create a user account or reset the password of a user account, Control Hub now sends an email with a link to set a new password.
Accessing Metrics with the REST API

You can now use the Control Hub REST API to access the count of input records, output records, and error records, as well as the last reported metric time for a job run.

New Configuration File
This release includes a new configuration file, $DPM_CONF/dynamic_preview-app.properties, used to configure the Dynamic Preview (Data Protector) application.
Updated Configuration Files
The following updated configuration files include new properties for this release:
  • $DPM_CONF/common-to-all-apps.properties now includes a new property, db.slow.query.interval.millis, which defines the number of milliseconds that a query must run before Control Hub logs a warning. In most cases, the default for this property should work.
  • $DPM_CONF/jobrunner-app.properties now includes a new property, failover.initial.delay.millis, which defines the number of milliseconds to wait before Control Hub starts the failover thread. In most cases, the default for this property should work.
  • $DPM_CONF/security-app.properties now includes the property reset.token.expiry.in.days, which defines the number of days until the link to set a password expires. This property replaces the same property in the common-to-all-apps.properties file. In most cases, the default value for this property should work.

What's New in 3.12.0

StreamSets Control Hub version 3.12.0 includes the following new features and enhancements:

Uploading Offsets with the REST API
You can now upload a valid offset with the Control Hub REST API, which uploads the offset as JSON data.
Updated Configuration File
The $DPM_CONF/common-to-all-apps.properties configuration file now includes a new property, app.threads, which defines the number of threads each application uses. In most cases, the default for this property should work.

What's New in 3.11.0

StreamSets Control Hub version 3.11.0 includes the following new features and enhancements:

Transformer Integration
This release integrates StreamSets Transformer in Control Hub.
Just as Data Collector pipelines run on a Data Collector engine, Transformer pipelines run on a Transformer engine. Since the Transformer engine is built on Apache Spark, an open-source cluster-computing framework, Transformer pipelines can perform heavy processing on the entire data set in batch or streaming mode.
To use Transformer with Control Hub, install Transformer on a machine that is configured to submit Spark jobs to a cluster, such as a Hadoop edge or data node or a cloud virtual machine. Then register Transformer with Control Hub. Use Pipeline Designer to design Transformer pipelines and configure a job to run the pipeline, just as you would a Data Collector pipeline.
For comparison of Transformer with Data Collector, see Transformer for Data Collector Users.
Pipeline Design
Pipeline Designer includes the following enhancements:
  • Delete a draft pipeline or fragment in Pipeline Designer - While editing a draft version of a pipeline or fragment, you can now delete that draft version to revert to the previous published version of the pipeline or fragment. Previously, you could not delete a draft pipeline or fragment that was open in Pipeline Designer. You had to view the pipeline history, and then select the draft version to delete.
  • View the input and output schema for each stage - After running preview for a pipeline, you can now view the input and output schema for each stage on the Schema tab in the pipeline properties panel. The schema includes each field name and data type.

    Use the Schema tab when you configure pipeline stages that require field names. For example, let’s say you are configuring a Field Type Converter processor to convert the data type of a field by name. You can run preview, copy the field name from the Schema tab, and then paste the field name into the processor configuration.

  • Bulk update pipelines to use a different fragment version - When viewing a published pipeline fragment in Pipeline Designer, you can now update multiple pipelines at once to use a different version of that fragment. For example, if you edit a fragment and then publish a new version of the fragment, you can easily update all pipelines using that fragment to use the latest version.
  • Import a new version of a published pipeline in Pipeline Designer - While viewing a published pipeline in Pipeline Designer, you can import a new version of the pipeline. You can import any pipeline exported from Data Collector for use in Control Hub or any pipeline exported from Control Hub as a new version of the current pipeline.
  • User-defined pipeline templates - You can now create a user-defined pipeline template by assigning the templates pipeline label to a published pipeline. Users with read permission on the published pipeline can select the pipeline as a user template when developing a new pipeline.
  • Test run of a draft pipeline - You can now perform a test run of a draft pipeline in Pipeline Designer. Perform a test run of a draft pipeline to quickly test the pipeline logic. You cannot perform a test run of a published pipeline. To run a published pipeline, you must first add the pipeline to a job and then start the job.
  • Shortcut keys to undo and redo actions - You can now use the following shortcut keys to easily undo and redo actions in Pipeline Designer:
    • Press Command+Z to undo an action.
    • Press Command+Shift+Z to redo an action.
Jobs
Jobs include the following enhancements:
  • Monitoring errors - When you monitor an active job with a pipeline stage that encounters errors, you can now view details about each error record on the Errors tab in the Monitor panel.
  • Export and import job templates - When you export and import a job template, the template is now imported as a job template. You can then create job instances from that template in the new organization. You cannot export and import a job instance. Previously, when you exported and imported a job template or a job instance, the imported job template or instance functioned as a regular job in the new organization.
Subscriptions
You can now configure a subscription action for a ​changed job status color. For example, you might create a subscription that sends an email when a job status changes from active green to active red.
Roles
All Data Collector roles have been renamed to Engine roles and now enable performing tasks in registered Data Collectors and registered Transformers.

For example, the Data Collector Administrator role has been renamed to the Engine Administrator role. The Engine Administrator role now allows users to perform all tasks in registered Data Collectors and registered Transformers.

Provisioned Data Collectors
Provisioned Data Collectors include the following enhancements:
  • Upload the deployment YAML specification file - When you create a deployment, you can now upload the deployment YAML specification file instead of copying the contents of the file in the YAML Specification property.
  • View YAML specification file for active deployments - You can now view the contents of the YAML specification file when you view the details of an active deployment.
  • Configurable Kerberos principal user name - When you define a deployment YAML specification file to provision Data Collector containers enabled for Kerberos authentication, you can now optionally define the Kerberos principal user name to use for the deployment. If you do not define a Kerberos user name, the Provisioning Agent uses sdc as the user name.
Updated Configuration Files
The following updated configuration files include new properties for this release:
  • common-to-all-apps.properties

    The $DPM_CONF/common-to-all-apps.properties file includes a new property that defines a timeout for all queries to the Control Hub databases. It also includes a new property that defines the number of objects to check for permissions in a single query when a user views objects of that type. In most cases, the defaults for these properties should work.

  • notification-app.properties

    The $DPM_CONF/notification-app.properties file includes a new should.use.check.frequency.millis property that is set to false so that the Notification application calculates the value of the check.frequency.millis property. Do not change the default value of false.

What's New in 3.10.0

StreamSets Control Hub version 3.10.0 includes the following new features and enhancements:

Installation Requirements
Control Hub includes the following enhancements to the installation requirements:
Pipeline Design
Pipeline Designer includes the following enhancements:
  • Preview time zone - You can now select the time zone to use for the preview of date, datetime, and time data. Previously, preview always displayed data using the browser time zone.
  • Compare pipeline versions - When you compare pipeline versions, you can now click the name of either pipeline version to open that version in the pipeline canvas.

    Previously, you had to return to the Navigation panel and then select the pipeline version from the Pipeline Repository to open one of the versions in the pipeline canvas.

Jobs
You can now upload an initial offset file for a job. Upload an initial offset file when you first run a pipeline in Data Collector, publish the pipeline to Control Hub, and then want to continue running the pipeline from the Control Hub job using the last-saved offset maintained by Data Collector.
SAML Authentication
When SAML authentication is enabled, users with the new Control Hub Authentication role can complete the following tasks that require users to be authenticated by Control Hub:
  • Use the Data Collector command line interface.
  • Log into a Data Collector running in disconnected mode.
  • Use the Control Hub REST API.

Previously, only users with the Organization Administrator role could complete these tasks when SAML authentication was enabled.

Provisioned Data Collectors
The Control Agent Docker image version 3.10.0 now requires that each YAML specification file that defines a deployment use the Kubernetes API version apps/v1. Previously, the Control Agent Docker image required that each YAML specification file use the API version extensions/v1beta1 for a deployment. Kubernetes has deprecated the extensions/v1beta1 version for deployments.
If you upgrade a Provisioning Agent to use the Control Agent Docker image version 3.10.0 or later or if a Provisioning Agent uses latest as the Control Agent Docker image version, you must update all deployment YAML specification files before you redeploy the Provisioning Agent. For more information, see Update Deployments for Provisioned Data Collectors.
Updated Configuration File
The $DPM_CONF/security-app.properties file includes a new property that defines the welcome email sent to new users when SAML authentication is enabled for the organization

What's New in 3.9.0

StreamSets Control Hub version 3.9.0 includes the following new features and enhancements:
Sensitive Data in Configuration Files
You can now protect sensitive data in Control Hub configuration files by storing the data in an external location and then using the exec function to call a script or executable that retrieves the data. For example, you can develop a script that decrypts an encrypted file containing a password. Or you can develop a script that calls an external REST API to retrieve a password from a remote vault system.
After developing the script, use the exec function in the Control Hub configuration files to call the script or executable as follows:
${exec("<script name>")}
Updated Configuration Files
The following updated configuration files include new properties for this release:
  • dpm.properties

    The $DPM_CONF/dpm.properties file includes a new ui.pendo.enabled property that determines whether Control Hub enables tracking tools that send UI usage data to StreamSets and that receive product announcements and onboarding guides from StreamSets.

  • jobrunner-app.properties

    The $DPM_CONF/jobrunner-app.properties file includes a new should.use.check.frequency.millis property that is set to false so that the Job Runner application calculates the value of the check.frequency.millis property. Do not change the default value of false.

What's New in 3.8.0

StreamSets Control Hub version 3.8.0 includes the following new features and enhancements:

Provisioned Data Collectors
You can now create a Provisioning Agent that provisions Data Collector containers enabled for Kerberos authentication.
StreamSets recommends using Helm to create a Provisioning Agent that can provision Data Collectors enabled for Kerberos. Helm is a tool that streamlines installing and managing Kubernetes applications.
Jobs
You can now upgrade active jobs to use the latest pipeline version. When you upgrade an active job, Control Hub stops the job, updates the job to use the latest pipeline version, and then restarts the job.
You can manually upgrade jobs or you can schedule the upgrade of jobs on a regular basis. For example, you might use the Control Hub scheduler to create a scheduled task that runs every Saturday at 12:00 am to check if an active job has a later pipeline version. If a later pipeline version exists, the scheduled task stops the job, updates the job to use the latest pipeline version, and then restarts the job.
Pipeline Design
When you use the Control Hub Pipeline Designer to design pipelines, you can now call pipeline parameters for properties that display as checkboxes and drop-down menus. The parameters must evaluate to a valid option for the property.
Permissions
You can now share and grant permissions on multiple objects at the same time.
Subscriptions
When you create a subscription, you now configure the subscription in a single dialog box instead of clicking through multiple pages.
LDAP Authentication
When Control Hub uses LDAP authentication, organization administrators that have configured a disconnected mode password for their user account can now log into registered Data Collectors that are running in disconnected mode.

A registered Data Collector uses the Control Hub disconnected mode when the Data Collector cannot connect to Control Hub, due to a network or system outage.

Updated Configuration File
The $DPM_CONF/jobrunner-app.properties file includes the following new properties:
  • always.migrate.offsets - Determines whether Control Hub always migrates job offsets to another available Data Collector assigned all labels specified for the job when you stop and restart a job. Default is false.
  • failover.check.freq.millis - The time in milliseconds between Control Hub checks for pipelines that should be failed over to another Data Collector. Default is 60,000 milliseconds.

What's New in 3.7.1

StreamSets Control Hub version 3.7.1 includes the following new features and enhancements:

Upgrade

When you update schemas in the relational database, you now run the same database initialization script for all upgrades. Previously, you had to run different commands based on the version that you were upgrading from.

Preview in Pipeline Designer

Pipeline Designer can now display preview data in table view.

Subscriptions
Subscriptions include the following enhancements:
  • Pipeline status change event - You can now configure a subscription action for a ​changed pipeline status. For example, you might create a subscription that sends an email when a pipeline status changes to RUN_ERROR.
  • Expression completion to filter events - You can now use expression completion to determine the functions and parameters that you can use for each subscription filter.
Scheduler

The Control Hub scheduler can now stop a job at a specified frequency. For example, you might want to run a streaming job every day of the week except for Sunday. You create one scheduled task that starts the job every Monday at 12:00 am. Then, you create another scheduled task that stops the same job every Sunday at 12:00 am.

SAML Authentication

When you map a Control Hub user account to a SAML IdP user account, the SAML User Name property now defaults to the email address associated with the Control Hub user account. Previously, the default value was the user ID associated with the Control Hub user account.

What's New in 3.6.0

StreamSets Control Hub version 3.6.0 includes the following new features and enhancements:

Installation
The Control Hub installation process includes the following enhancements:
  • MariaDB support - Control Hub now supports MariaDB in addition to MySQL and PostgreSQL for the relational database that stores metadata written by Control Hub applications.
  • Setting up Control Hub - During the Control Hub installation process, when you run the Control Hub setup script to configure Control Hub properties, you no longer need to enter both a Control Hub Base URL and a Load Balancer URL. The duplicate Load Balancer URL property has been removed. If you are installing multiple instances of Control Hub for high availability, you simply enter the load balancer URL for the Control Hub Base URL.
  • Control Hub license temporarily activated - When you generate a unique system ID after installing Control Hub, your license is now temporarily activated for seven days. This way, you can start and log in to Control Hub while you wait for your permanent activation key from StreamSets.
Data Protector

This release supports the latest version of Data Protector, Data Protector 1.4.0.

Preview in Pipeline Designer
You can now preview multiple stages in Pipeline Designer. When you preview multiple stages, you select the first stage and the last stage in the group. The Preview panel then displays the output data of the first stage in the group and the input data of the last stage in the group.
Job Templates
When you create a job for a pipeline that uses runtime parameters, you can now enable the job to work as a job template. A job template lets you run multiple job instances with different runtime parameter values from a single job definition.
For example, you create a pipeline that uses a runtime parameter to read from different Google Cloud Storage buckets. You create a single job template for that pipeline, and then start multiple instances of the job, specifying a unique value for the bucket parameter for each job instance.
Subscribe to Unresponsive Data Collector or Data Collector Edge Events
You can now configure a subscription action for a Data Collector or Data Collector Edge not responding event. For example, you might create a subscription that sends an alert to a Slack channel when a registered Data Collector stops responding.
Updated Configuration File
The $DPM_CONF/common-to-all-apps.properties file no longer includes the duplicate http.load.balancer.url property. If you installed multiple instances of Control Hub for high availability, you simply enter the load balancer URL for the dpm.base.url property.

What's New in 3.5.0

StreamSets Control Hub version 3.5.0 includes the following new features and enhancements:

StreamSets Data Protector
You can now use StreamSets Data Protector to perform global in-stream discovery and protection of data in motion with Control Hub.
Data Protector provides StreamSets classification rules and enables creating custom classification rules to identify sensitive data. Custom protection policies provide rules-based data protection for every job that you run. You can also use Data Protector stages in pipelines for localized protection needs.
Data Protector is available as an add-on option with a StreamSets Enterprise subscription. For more information, contact us.
Pipeline Designer
Pipeline Designer includes the following enhancements:
  • Expression completion - Pipeline Designer now completes expressions in stage and pipeline properties to provide a list of data types, runtime parameters, fields, and functions that you can use.
  • Manage pipeline and fragment versions - When configuring a pipeline or pipeline fragment in Pipeline Designer, you can now view the following visualization of the pipeline or fragment version history:

    When you expand the version history, you can manage the pipeline or fragment versions including comparing versions, creating tags for versions, and deleting versions.

  • Pipeline fragment expansion in pipelines - You can now expand and collapse individual pipeline fragments when used in a pipeline. Previously, expanding a fragment meant that all fragments in the pipeline were expanded.

    When a fragment is expanded, the pipeline enters read-only mode allowing no changes. Collapse all fragments to make changes to the pipeline.

  • Preview and validate edge pipelines - You can now use Pipeline Designer to preview and validate edge pipelines.
  • Shortcut menu for stages - When you select a stage in the canvas, a shortcut menu now displays with a set of options:
    • For a pipeline fragment stage, you can copy, expand or delete the fragment.
    • For all other stages, you can copy or delete the stage, or create a pipeline fragment using the selected stage or set of stages.
Failover Retries for Jobs
When a job is enabled for failover, Control Hub by default retries the pipeline failover an infinite number of times. If you want the pipeline failover to stop after a given number of retries, you can now define the maximum number of retries to perform. Control Hub maintains the failover retry count for each available Data Collector.
Starting Jobs with the REST API
You can now define runtime parameter values for a job when you start the job using the Control Hub REST API.
Data Collectors
You can now use an automation tool such as Ansible, Chef, or Puppet to automate the registering and unregistering of Data Collectors using the following commands:
streamsets sch register
streamsets sch unregister
Enabling HTTPS for Control Hub
If working with a Control Hub on-premises installation enabled for HTTPS in a test or development environment, you can now configure Data Collector Edge (SDC Edge) to skip verifying the Control Hub trusted certificates. StreamSets highly recommends that you configure SDC Edge to verify trusted certificates in a production environment.
New Configuration Files
This release includes the following new configuration files located in the $DPM_CONF directory:
  • policy-app.properties - Used to configure the Policy (Data Protector) application.
  • sdp_classification-app.properties - Used to configure the Classification (Data Protector) application.
Updated Configuration File
The $DPM_CONF/dpm.properties file includes a new ui.doc.help.url property that determines whether Control Hub uses the help project installed with Control Hub or uses the help project hosted on the StreamSets website. Hosted help contains the latest available documentation and requires an internet connection.

What's New in 3.3.0

StreamSets Control Hub version 3.3.0 includes the following new features and enhancements:

Pipelines and Pipeline Fragments
  • Data preview enhancements:
    • Data preview support for pipeline fragments - You can now use data preview with pipeline fragments. When using Data Collector 3.4.0 for the authoring Data Collector, you can also use a test origin to provide data for the preview. This can be especially useful when the fragment does not contain an origin.
    • Edit data and stage properties - You can now edit preview data and stage properties, then run the preview with your changes. You can also revert data changes and refresh the preview to view additional data.
  • Select multiple stages - When you design pipelines and pipeline fragments, you can now select multiple stages in the canvas by selecting the Shift key and clicking each stage. You can then move or delete the selected stages.
  • Export enhancement - When you export a single pipeline or a single fragment, the pipeline or fragment is now saved in a zip file of the same name, as follows: <pipeline or fragment name>.zip. Exporting multiple pipelines or fragments still results in the following file name: <pipelines|fragments>.zip.
  • View where fragments are used - When you view the details of a fragment, Pipeline Designer now displays the list of pipelines that use the fragment.
Jobs
  • Runtime parameters enhancements - When you edit a job, you can now use the Get Default Parameters option to retrieve all parameters and their default values as defined in the pipeline. You can also use simple edit mode, in addition to bulk edit mode, to define parameter values.
  • Pipeline failover enhancement - When determining which available Data Collector restarts a failed pipeline, Control Hub now prioritizes Data Collectors that have not previously failed the pipeline.
Data Collectors
  • Monitor Data Collector performance - When you view registered Data Collectors version 3.4.0 from the Execute view, you can now monitor the CPU load and memory usage of each Data Collector.
Edge Data Collectors (SDC Edge)
  • Monitor SDC Edge performance - When you view registered Edge Data Collectors version 3.4.0 from the Execute view, you can now monitor the CPU load and memory usage of each SDC Edge.
Data Delivery Reports
  • Destination statistics - Data delivery reports for jobs and topologies now contain statistics for destinations.
Documentation
  • Documentation enhancement - The online help has a new look and feel. All of the previous documentation remains exactly where you expect it, but it is now easier to view and navigate on smaller devices like your tablet or mobile phone.

What's New in 3.2.1

StreamSets Control Hub version 3.2.1 fixes the following known issues:

  • Viewing pipeline details from the Topology view causes an error to occur.
  • Time series charts for jobs cannot be viewed from the Topology view even though time series analysis is enabled.
  • When a Kubernetes pod is restarted, the Provisioning Agent fails to register the Data Collector containers with Control Hub.

What's New in 3.2.0

StreamSets Control Hub version 3.2.0 includes the following new features:

Activate the Control Hub License

Each Control Hub system now requires an active license. During the installation process, you use the Control Hub security command line program to generate a unique system ID and then request an activation key for that system ID from the StreamSets support team. After you receive the activation key, you use the security command line program to activate the license.

Each activation key is generated for a specific Control Hub system ID. If you install multiple Control Hub instances for a highly available system, you only need to activate the license once.

Pipeline Fragments
Control Hub now includes pipeline fragments. A pipeline fragment is a stage or set of connected stages that you can reuse in Data Collector or SDC Edge pipelines. Use pipeline fragments to easily add the same processing logic to multiple pipelines and to ensure that the logic is used as designed.

Pipeline fragments can only be created in the Control Hub Pipeline Designer. You can use any stage available in the authoring Data Collector in a fragment. Pipeline fragments cannot be designed within the Data Collector user interface.

Scheduler

Control Hub now includes a scheduler that manages long-running scheduled tasks. A scheduled task periodically triggers the execution of a job or a data delivery report at the specified frequency. For example, a scheduled task can start a job or generate a data delivery report on a weekly or monthly basis.

Before you can schedule jobs and data delivery reports, the Scheduler Operator role must be assigned to your user account.

Data Delivery Reports

Control Hub now includes data delivery reports that show how much data was processed by a job or topology over a given period of time. You can create periodic reports with the scheduler, or create an on-demand report.

Before you can manage data delivery reports, the Reporting Operator role must be assigned to your user account.

Jobs
  • Edit a pipeline version directly from a job - When viewing the details of a job or monitoring a job, you can now edit the latest version of the pipeline directly from the job. Previously, you had to locate the pipeline in the Pipeline Repository view before you could edit the pipeline.
  • Enable time series analysis - You can now enable time series analysis for a job. When enabled, you can view historical time series data when you monitor the job or a topology that includes the job.

    When time series analysis is disabled, you can still view the total record count and throughput for a job or topology, but you cannot view the data over a period of time. For example, you can’t view the record count for the last five minutes or for the last hour.

    By default, all existing jobs have time series analysis enabled. All new jobs have time series analysis disabled. You might want to enable time series analysis for new jobs for debugging purposes or to analyze dataflow performance.

  • Pipeline force stop timeout - In some situations when you stop a job, a remote pipeline instance can remain in a Stopping state for a long time. When you configure a job, you can now configure the number of milliseconds to wait before forcing remote pipeline instances to stop. The default time to force a pipeline to stop is 2 minutes.
  • View logs- While monitoring an active job, the top toolbar now includes a View Logs icon that displays the logs for any remote pipeline instance run from the job.
Subscriptions
  • Email action - You can now create a subscription that listens for Control Hub events and then sends an email when those events occur. For example, you might send an email each time a job status changes.
  • Pipeline committed event - You can configure an action for a pipeline committed event. For example, you might send a message when a pipeline is committed with the name of the user who committed it.
  • Filter the events to subscribe to - You can now use the StreamSets expression language to create an expression that filters the events that you want to subscribe to. You can include subscription parameters and StreamSets string functions in the expression.
    For example, you might enter the following expression for a Job Status Change event so that the subscription is triggered only when the specified job ID encounters a status change:
    ${JOB_ID == '99efe399-7fb5-4383-9e27-e4c56b53db31:MyCompany'}

    If you do not filter the events, then the subscription is triggered each time an event occurs for all objects that you have at least read permission on.

  • Permissions - When permission enforcement is enabled for your organization, you can now share and grant permissions on subscriptions.
Provisioned Data Collectors

When you define a deployment YAML specification file for provisioned Data Collectors, you can now optionally associate a Kubernetes Horizontal Pod Autoscaler, service, or Ingress with the deployment.

Define a deployment and Horizontal Pod Autoscaler in the specification file for a deployment of one or more execution Data Collectors that must automatically scale during times of peak performance. The Kubernetes Horizontal Pod Autoscaler automatically scales the deployment based on CPU utilization.

Define a deployment and service in the specification file for a deployment of a single development Data Collector that must be exposed outside the cluster using a Kubernetes service. Optionally associate an Ingress with the service to provide load balancing, SSL termination, and virtual hosting to the service in the Kubernetes cluster.

New Configuration Files
This release includes the following new configuration files located in the $DPM_CONF directory:
  • reporting-app.properties - Used to configure the Reporting application.
  • scheduler-app.properties - Used to configure the Scheduler application.

What's New in 3.1.1

StreamSets Control Hub version 3.1.1 includes the following new feature:
View logs for an active job
When monitoring an active job, you can now view the logs for a remote pipeline instance from the Data Collectors tab.

What's New in 3.1.0

StreamSets Control Hub version 3.1.0 includes the following new features:

System Data Collector

You can now configure the system Data Collector connection properties when you run the Control Hub setup script. Previously, you had to modify the $DPM_CONF/common-to-all-apps.properties to configure the system Data Collector properties.

Pipelines
Pipelines include the following enhancements:
  • Duplicate pipelines - You can now select a pipeline in the Pipeline Repository view and then duplicate the pipeline. A duplicate is an exact copy of the original pipeline.
  • Commit message when publishing pipelines - You can now enter commit messages when you publish pipelines from Pipeline Designer. Previously, you could only enter commit messages when you published pipelines from a registered Data Collector.
Export and Import
You can now use Control Hub to export and import the following objects:
  • Jobs and topologies - You can now export and import jobs and topologies to migrate the objects from one organization to another. You can export a single job or topology or you can export a set of jobs and topologies.

    When you export and import jobs and topologies, you also export and import dependent objects. For jobs, you also export and import the pipelines included in the jobs. For topologies, you also export and import the jobs and pipelines included in the topologies.

  • Sets of pipelines - You can now select multiple pipelines in the Pipeline Repository view and export the pipelines as a set to a ZIP file. You can also now import pipelines from a ZIP file containing multiple pipeline files.
Alerts

The Notifications view has now been renamed the Alerts view.

Subscriptions

You can now create a subscription that listens for Control Hub events and then completes an action when those events occur. For example, you might create a subscription that sends a message to a Slack channel each time a job status changes.

When you create a subscription, you select the Control Hub events to subscribe to - such as a changed job status or a triggered data SLA. You then configure the action to take when the events occur - such as using a webhook to send an HTTP request to an external system.
Important: By default, an organization is not enabled to send events that trigger subscriptions. Before Control Hub can trigger subscriptions for your organization, your organization administrator must enable events for the organization.
Scale Out Active Jobs
When the Number of Instances property for a job is set to -1, Control Hub can now automatically scale out pipeline processing for the active job.
When Number of Instances is set to any other value, you must synchronize the active job to start additional pipeline instances on newly available Data Collectors or Edge Data Collectors.
For example, if Number of Instances is set to -1 and three Data Collectors have all of the specified labels for the job, Control Hub runs three pipeline instances, one on each Data Collector. If you register another Data Collector with the same labels as the active job, Control Hub automatically starts a fourth pipeline instance on that newly available Data Collector.
Previously, you had to synchronize all active jobs - regardless of the Number of Instances value - to start additional pipeline instances on a newly registered Data Collector.

What's New in 3.0.1

StreamSets Control Hub version 3.0.1 includes the following new feature:
PostgreSQL Support
Control Hub now supports PostgreSQL in addition to MySQL for the relational database that stores metadata written by Control Hub applications.

What's New in 3.0.0

Control Hub version 3.0.0 includes the following new features and enhancements:

Product Rename

With this release, we have created a new product called StreamSets Control HubTM that includes a number of new dataflow design, deployment, and scale-up features. Since this release is now our core service for controlling dataflows, we have renamed "Dataflow Performance Manager (DPM)" to "StreamSets Control Hub".

DPM now refers to the performance management functions that reside in the cloud such as live metrics and data SLAs. Customers who have purchased the StreamSets Enterprise Edition will gain access to all Control Hub functionality and continue to have access to all DPM functionality as before.

To understand the end-to-end StreamSets Data Operations Platform and how the products fit together, visit https://streamsets.com/products/.

Installation
StreamSets Control Hub is now supported on the CentOS 6.x, Oracle Linux 6.x, and Red Hat Enterprise Linux 6.x operating systems. StreamSets provides the following RPM packages for Control Hub:
  • EL6 - Use to install Control Hub on CentOS 6.x, Oracle Linux 6.x, or Red Hat Enterprise Linux 6.x.
  • EL7 - Use to install Control Hub on CentOS 7.x, Oracle Linux 7.x, or Red Hat Enterprise Linux 7.x.
LDAP Authentication

If your company uses Lightweight Directory Access Protocol (LDAP), you can use the LDAP provider to authenticate Control Hub users and to retrieve group membership. LDAP authenticates a user using the credentials stored in the LDAP server.

LDAP authentication is configured by the system administrator for the complete Control Hub installation. After LDAP authentication is enabled, all organizations must use LDAP authentication. Users log in to Control Hub using their Control Hub user ID and their LDAP password.

To use LDAP authentication, the system administrator configures LDAP connection information for Control Hub and then maps organization administrator accounts to LDAP users. Organization administrators then create Control Hub user accounts and groups for their organization, mapping these to LDAP users and groups.

Pipeline Designer
You can now create and design pipelines directly in the Control Hub Pipeline Designer after you select an authoring Data Collector for Pipeline Designer to use. You select one of the following types of Data Collectors to use as the authoring Data Collector:
  • System Data Collector - Use to design pipelines only - cannot be used to preview or explicitly validate pipelines. The system Data Collector is provided with Control Hub for exploration and light development. Includes the latest version of all stage libraries available with the latest version of Data Collector.
  • Registered Data Collector using the HTTPS protocol - Use to design, preview, and explicitly validate pipelines. Includes the stage libraries and custom stage libraries installed in the registered Data Collector.

When you create pipelines in Pipeline Designer, you can create a blank pipeline or you can create a pipeline from a template. Use pipeline templates to quickly design pipelines for typical use cases.

Provisioning Data Collectors

You can now automatically provision Data Collectors on a Kubernetes container orchestration framework. Provisioning includes deploying, registering, starting, scaling, and stopping Data Collector Docker containers in the Kubernetes cluster.

Use provisioning to reduce the overhead of managing a large number of Data Collector instances. Instead, you can manage a central Kubernetes cluster used to run multiple Data Collector containers.

Integration with Data Collector Edge

Control Hub now works with Data Collector Edge (SDC Edge) to execute edge pipelines. SDC Edge is a lightweight agent without a UI that runs pipelines on edge devices with limited resources. Edge pipelines read data from the edge device or receive data from another pipeline and then act on that data to control the edge device.

You install SDC Edge on edge devices, then register each SDC Edge with Control Hub. You assign labels to each SDC Edge to determine which jobs are run on that SDC Edge.

You either design edge pipelines in the Control Hub Pipeline Designer or in a development Data Collector. After designing edge pipelines, you publish the pipelines to Control Hub and then add the pipelines to jobs that run on a registered SDC Edge.

Pipeline comparison

When you compare two pipeline versions, Control Hub now highlights the differences between the versions in the pipeline canvas. Previously, you had to visually compare the two versions to discover the differences between them.

Aggregated statistics

You can now configure a pipeline to write aggregated statistics to MapR Streams.

Balancing jobs

When a job is enabled for pipeline failover, you can now balance the job to redistribute the pipeline load across available Data Collectors that are running the fewest number of pipelines. For example, let’s say that a failed pipeline restarts on another Data Collector due to the original Data Collector shutting down. When the original Data Collector restarts, you can balance the job so that Control Hub redistributes the pipeline to the restarted Data Collector not currently running any pipelines.

Roles

You can now assign provisioning roles to user accounts, which enable users to view and work with Provisioning Agents and deployments to automatically provision Data Collectors.

You must assign the appropriate provisioning roles to users before they can access the Provisioning Agents and Deployments views in the Navigation panel.

Navigation panel

The Navigation panel now groups the Data Collectors view under an Execute menu, along with the new Edge Data Collectors, Provisioning Agents, and Deployments views:

Dashboards

The default dashboard now includes the number of users in your organization when your user account has the Organization Administrator role.

New Configuration File
This release includes a new $DPM_CONF/provisioning-app.properties file used to configure the Provisioning application.
Updated Configuration Files
The following updated configuration files include new properties for this release:
  • common-to-all-apps.properties
    The $DPM_CONF/common-to-all-apps.properties file includes the following new properties used to configure the system Data Collector:
    • pipeline.designer.system.sdc.url
    • pipeline.designer.system.sdc.username
    • pipeline.designer.system.sdc.password

    The file also includes the following new property that is reserved for future use: ui.signup.enabled

  • pipelinestore-app.properties
    The $DPM_CONF/pipelinestore-app.properties file includes the following new properties used to configure the organization that manages system pipeline templates:
    • pipeline.templates.organization
    • pipeline.templates.organizationUser
  • security-app.properties

    The $DPM_CONF/security-app.properties file includes new properties to configure LDAP authentication.

    The file also includes the following new properties that are reserved for future use:
    • trial.days
    • trial.maxJobs
    • trial.maxTopologies
    • trial.providerEmailAddress