Update Notes

What's New in the November 28, 2018 Update

The update for November 28, 2018 fixes the following known issue:
  • When the scheduler starts a job that includes a pipeline with a Hadoop-related stage configured to impersonate the Hadoop user as the currently logged in Data Collector user, Control Hub incorrectly interprets the user who starts the pipeline to be scheduler000 which causes the pipeline to fail.

What's New in the November 19, 2018 Update

The update for November 19, 2018 includes the following new features and enhancements:

Preview in Pipeline Designer

Pipeline Designer can now display preview data in table view.

Subscriptions
Subscriptions include the following enhancements:
  • Pipeline status change event - You can now configure a subscription action for a ​changed pipeline status. For example, you might create a subscription that sends an email when a pipeline status changes to RUN_ERROR.
  • Expression completion to filter events - You can now use expression completion to determine the functions and parameters that you can use for each subscription filter.
Scheduler

The Control Hub scheduler can now stop a job at a specified frequency. For example, you might want to run a streaming job every day of the week except for Sunday. You create one scheduled task that starts the job every Monday at 12:00 am. Then, you create another scheduled task that stops the same job every Sunday at 12:00 am.

SAML Authentication

When you map a Control Hub user account to a SAML IdP user account, the SAML User Name property now defaults to the email address associated with the Control Hub user account. Previously, the default value was the user ID associated with the Control Hub user account.

This update also fixes the following known issue:
  • The Control Hub UI takes a long time to display users and groups.

What's New in the October 27, 2018 Update

The update for October 27, 2018 includes the following new features and enhancements:

Data Protector

This release supports the latest version of Data Protector, Data Protector 1.4.0.

For more information about Data Protector 1.4.0, see the Data Protector Release Notes.

Preview in Pipeline Designer
You can now preview multiple stages in Pipeline Designer. When you preview multiple stages, you select the first stage and the last stage in the group. The Preview panel then displays the output data of the first stage in the group and the input data of the last stage in the group.
Job Templates
When you create a job for a pipeline that uses runtime parameters, you can now enable the job to work as a job template. A job template lets you run multiple job instances with different runtime parameter values from a single job definition.
For example, you create a pipeline that uses a runtime parameter to read from different Google Cloud Storage buckets. You create a single job template for that pipeline, and then start multiple instances of the job, specifying a unique value for the bucket parameter for each job instance.
Subscribe to Unresponsive Data Collector or Data Collector Edge Events
You can now configure a subscription action for a Data Collector or Data Collector Edge not responding event. For example, you might create a subscription that sends an alert to a Slack channel when a registered Data Collector stops responding.
This update also fixes the following known issues:
  • The POST method for the /pipelinestore/rest/v1/pipelines/exportPipelineCommits REST API endpoint has the wrong content type response header.
  • When special characters such as colons (:) and square brackets ( [ ] ) are included in a pipeline name, the remotely running pipeline cannot communicate with Control Hub.

What's New in the October 12, 2018 Update

The update for October 12, 2018 includes the following new features and enhancements:

Failover Retries for Jobs
When a job is enabled for failover, Control Hub by default retries the pipeline failover an infinite number of times. If you want the pipeline failover to stop after a given number of retries, you can now define the maximum number of retries to perform. Control Hub maintains the failover retry count for each available Data Collector.
Starting Jobs with the REST API
You can now define runtime parameter values for a job when you start the job using the Control Hub REST API.

What's New in the October 4, 2018 Update

The update for October 4, 2018 fixes the following known issue:
  • A job encounters system pipeline failures when the job includes a pipeline published from Data Collector 3.5.0 and configured to write aggregated statistics to a Kafka cluster.

What's New in the September 28, 2018 Update

The update for September 28, 2018 includes the following new features and enhancements:

StreamSets Data Protector
You can now use StreamSets Data Protector to perform global in-stream discovery and protection of data in motion with Control Hub.
Data Protector provides StreamSets classification rules and enables creating custom classification rules to identify sensitive data. Custom protection policies provide rules-based data protection for every job that you run. You can also use Data Protector stages in pipelines for localized protection needs.
Data Protector is available as an add-on option with a StreamSets Enterprise subscription. For more information, contact us.
Pipeline Designer
Pipeline Designer includes the following enhancements:
  • Expression completion - Pipeline Designer now completes expressions in stage and pipeline properties to provide a list of data types, runtime parameters, fields, and functions that you can use.
  • Manage pipeline and fragment versions - When configuring a pipeline or pipeline fragment in Pipeline Designer, you can now view the following visualization of the pipeline or fragment version history:

    When you expand the version history, you can manage the pipeline or fragment versions including comparing versions, creating tags for versions, and deleting versions.

  • Pipeline fragment expansion in pipelines - You can now expand and collapse individual pipeline fragments when used in a pipeline. Previously, expanding a fragment meant that all fragments in the pipeline were expanded.

    When a fragment is expanded, the pipeline enters read-only mode allowing no changes. Collapse all fragments to make changes to the pipeline.

  • Preview and validate edge pipelines - You can now use Pipeline Designer to preview and validate edge pipelines.
  • Shortcut menu for stages - When you select a stage in the canvas, a shortcut menu now displays with a set of options:
    • For a pipeline fragment stage, you can copy, expand or delete the fragment.
    • For all other stages, you can copy or delete the stage, or create a pipeline fragment using the selected stage or set of stages.
Data Collectors
You can now use an automation tool such as Ansible, Chef, or Puppet to automate the registering and unregistering of Data Collectors using the following commands:
streamsets sch register
streamsets sch unregister
This update also fixes the following known issues:
  • Scheduling a job in any time zone except UTC does not work as expected.

  • Stopping a job that contains a pipeline with a Directory origin causes intermittent SPOOLDIR_35 errors to occur.

What's New in the August 29, 2018 Update

The update for August 29, 2018 fixes the following known issue:
  • Control Hub uses multiple versions of the jackson-databind JAR file.

What's New in the August 4, 2018 Update

The update for August 4, 2018 includes the following new features and enhancements:

Pipelines and Pipeline Fragments
  • Data preview enhancements:
    • Data preview support for pipeline fragments - You can now use data preview with pipeline fragments. When using Data Collector 3.4.0 for the authoring Data Collector, you can also use a test origin to provide data for the preview. This can be especially useful when the fragment does not contain an origin.
    • Edit data and stage properties - You can now edit preview data and stage properties, then run the preview with your changes. You can also revert data changes and refresh the preview to view additional data.
  • Select multiple stages - When you design pipelines and pipeline fragments, you can now select multiple stages in the canvas by selecting the Shift key and clicking each stage. You can then move or delete the selected stages.
  • Export enhancement - When you export a single pipeline or a single fragment, the pipeline or fragment is now saved in a zip file of the same name, as follows: <pipeline or fragment name>.zip. Exporting multiple pipelines or fragments still results in the following file name: <pipelines|fragments>.zip.
  • View where fragments are used - When you view the details of a fragment, Pipeline Designer now displays the list of pipelines that use the fragment.
Jobs
  • Runtime parameters enhancements - When you edit a job, you can now use the Get Default Parameters option to retrieve all parameters and their default values as defined in the pipeline. You can also use simple edit mode, in addition to bulk edit mode, to define parameter values.
  • Pipeline failover enhancement - When determining which available Data Collector restarts a failed pipeline, Control Hub now prioritizes Data Collectors that have not previously failed the pipeline.
Data Collectors
  • Monitor Data Collector performance - When you view registered Data Collectors version 3.4.0 from the Execute view, you can now monitor the CPU load and memory usage of each Data Collector.
Edge Data Collectors (SDC Edge)
  • Monitor SDC Edge performance - When you view registered Edge Data Collectors version 3.4.0 from the Execute view, you can now monitor the CPU load and memory usage of each SDC Edge.
Data Delivery Reports
  • Destination statistics - Data delivery reports for jobs and topologies now contain statistics for destinations.
Documentation
  • Documentation enhancement - The online help has a new look and feel. All of the previous documentation remains exactly where you expect it, but it is now easier to view and navigate on smaller devices like your tablet or mobile phone.
This update also fixes the following known issues:
  • Re-importing a deleted job does not update all relevant information.
  • When load balancing jobs to other Data Collectors, offsets are not retained.
  • Do not change a job status to Inactive until after the status of the system pipeline becomes Inactive.
  • When you configure a pipeline in Pipeline Designer to use Write to Kafka for both the Error Records and Statistics tabs, changes you make to the Kafka settings on one tab are automatically copied to the other tab.
  • When the same job is executed by different Data Collectors, a topology can display metrics from a previous run of the job.
  • Data Collectors provisioned with a deployment might not inherit permissions assigned to the deployment.

What's New in the May 25, 2018 Update

The update for May 25, 2018 fixes the following known issues:

  • Viewing pipeline details from the Topology view causes an error to occur.
  • Time series charts for jobs cannot be viewed from the Topology view even though time series analysis is enabled.
  • When a Kubernetes pod is restarted, the Provisioning Agent fails to register the Data Collector containers with Control Hub.

What's New in the May 11, 2018 Update

The update for May 11, 2018 includes the following new features and enhancements:

Pipeline Fragments
Control Hub now includes pipeline fragments. A pipeline fragment is a stage or set of connected stages that you can reuse in Data Collector or SDC Edge pipelines. Use pipeline fragments to easily add the same processing logic to multiple pipelines and to ensure that the logic is used as designed.

Pipeline fragments can only be created in the Control Hub Pipeline Designer. You can use any stage available in the authoring Data Collector in a fragment. Pipeline fragments cannot be designed within the Data Collector user interface.

Scheduler

Control Hub now includes a scheduler that manages long-running scheduled tasks. A scheduled task periodically triggers the execution of a job or a data delivery report at the specified frequency. For example, a scheduled task can start a job or generate a data delivery report on a weekly or monthly basis.

Before you can schedule jobs and data delivery reports, the Scheduler Operator role must be assigned to your user account.

Data Delivery Reports

Control Hub now includes data delivery reports that show how much data was processed by a job or topology over a given period of time. You can create periodic reports with the scheduler, or create an on-demand report.

Before you can manage data delivery reports, the Reporting Operator role must be assigned to your user account.

Jobs
  • Edit a pipeline version directly from a job - When viewing the details of a job or monitoring a job, you can now edit the latest version of the pipeline directly from the job. Previously, you had to locate the pipeline in the Pipeline Repository view before you could edit the pipeline.
  • Enable time series analysis - You can now enable time series analysis for a job. When enabled, you can view historical time series data when you monitor the job or a topology that includes the job.

    When time series analysis is disabled, you can still view the total record count and throughput for a job or topology, but you cannot view the data over a period of time. For example, you can’t view the record count for the last five minutes or for the last hour.

    By default, all existing jobs have time series analysis enabled. All new jobs have time series analysis disabled. You might want to enable time series analysis for new jobs for debugging purposes or to analyze dataflow performance.

  • Pipeline force stop timeout - In some situations when you stop a job, a remote pipeline instance can remain in a Stopping state for a long time. When you configure a job, you can now configure the number of milliseconds to wait before forcing remote pipeline instances to stop. The default time to force a pipeline to stop is 2 minutes.
  • View logs- While monitoring an active job, the top toolbar now includes a View Logs icon that displays the logs for any remote pipeline instance run from the job.
Subscriptions
  • Email action - You can now create a subscription that listens for Control Hub events and then sends an email when those events occur. For example, you might send an email each time a job status changes.
  • Pipeline committed event - You can configure an action for a pipeline committed event. For example, you might send a message when a pipeline is committed with the name of the user who committed it.
  • Filter the events to subscribe to - You can now use the StreamSets expression language to create an expression that filters the events that you want to subscribe to. You can include subscription parameters and StreamSets string functions in the expression.
    For example, you might enter the following expression for a Job Status Change event so that the subscription is triggered only when the specified job ID encounters a status change:
    ${JOB_ID == '99efe399-7fb5-4383-9e27-e4c56b53db31:MyCompany'}

    If you do not filter the events, then the subscription is triggered each time an event occurs for all objects that you have at least read permission on.

  • Permissions - When permission enforcement is enabled for your organization, you can now share and grant permissions on subscriptions.
Provisioned Data Collectors

When you define a deployment YAML specification file for provisioned Data Collectors, you can now optionally associate a Kubernetes Horizontal Pod Autoscaler, service, or Ingress with the deployment.

Define a deployment and Horizontal Pod Autoscaler in the specification file for a deployment of one or more execution Data Collectors that must automatically scale during times of peak performance. The Kubernetes Horizontal Pod Autoscaler automatically scales the deployment based on CPU utilization.

Define a deployment and service in the specification file for a deployment of a single development Data Collector that must be exposed outside the cluster using a Kubernetes service. Optionally associate an Ingress with the service to provide load balancing, SSL termination, and virtual hosting to the service in the Kubernetes cluster.

This update also fixes the following known issue:
  • Importing a pipeline with a null label causes a null pointer exception.

What's New in the March 30, 2018 Update

The update for March 30, 2018 includes the following new features and enhancements:

Pipelines
Pipelines include the following enhancements:
  • Duplicate pipelines - You can now select a pipeline in the Pipeline Repository view and then duplicate the pipeline. A duplicate is an exact copy of the original pipeline.
  • Commit message when publishing pipelines - You can now enter commit messages when you publish pipelines from Pipeline Designer. Previously, you could only enter commit messages when you published pipelines from a registered Data Collector.
Export and Import
You can now use Control Hub to export and import the following objects:
  • Jobs and topologies - You can now export and import jobs and topologies to migrate the objects from one organization to another. You can export a single job or topology or you can export a set of jobs and topologies.

    When you export and import jobs and topologies, you also export and import dependent objects. For jobs, you also export and import the pipelines included in the jobs. For topologies, you also export and import the jobs and pipelines included in the topologies.

  • Sets of pipelines - You can now select multiple pipelines in the Pipeline Repository view and export the pipelines as a set to a ZIP file. You can also now import pipelines from a ZIP file containing multiple pipeline files.
Alerts

The Notifications view has now been renamed the Alerts view.

Subscriptions

You can now create a subscription that listens for Control Hub events and then completes an action when those events occur. For example, you might create a subscription that sends a message to a Slack channel each time a job status changes.

When you create a subscription, you select the Control Hub events to subscribe to - such as a changed job status or a triggered data SLA. You then configure the action to take when the events occur - such as using a webhook to send an HTTP request to an external system.
Important: By default, an organization is not enabled to send events that trigger subscriptions. Before Control Hub can trigger subscriptions for your organization, your organization administrator must enable events for the organization.
Jobs
  • Scale out active jobs - When the Number of Instances property for a job is set to -1, Control Hub can now automatically scale out pipeline processing for the active job.

    When Number of Instances is set to any other value, you must synchronize the active job to start additional pipeline instances on newly available Data Collectors or Edge Data Collectors.

    For example, if Number of Instances is set to -1 and three Data Collectors have all of the specified labels for the job, Control Hub runs three pipeline instances, one on each Data Collector. If you register another Data Collector with the same labels as the active job, Control Hub automatically starts a fourth pipeline instance on that newly available Data Collector.

    Previously, you had to synchronize all active jobs - regardless of the Number of Instances value - to start additional pipeline instances on a newly registered Data Collector.

  • View logs for an active job - When monitoring an active job, you can now view the logs for a remote pipeline instance from the Data Collectors tab.
This update also fixes the following known issues:
  • Control Hub does not update the job status after automatically scaling out an active job.
  • The topology auto fix method throws an error when an updated pipeline version includes changes made to an error handling stage.
  • After deleting a registered Data Collector, the Data Collector heartbeats back into Control Hub, but without a Data Collector URL.
  • Users and groups are not hard deleted.

What's New in the March 6, 2018 Update

The update for March 6, 2018 fixes the following known issues:
  • The Pipeline Designer preview mode does not correctly display no output.
  • The Pipeline Designer deletes the incorrect row from a list of expressions.
  • The browser crashes when a topology contains an infinite loop.

What's New in the January 14, 2018 Update

The update for January 14, 2018 fixes the following known issues:
  • Pipeline Designer does not yet include the ability to configure rules.
  • You cannot acknowledge errors or force stop system jobs that run system pipelines.
  • Runtime parameters are not propagated to the system pipeline - causing the system pipeline to fail.

What's New in the December 15, 2017 Update

The update for December 15, 2017 includes the following new features and enhancements:

Product Rename

With this update, we have created a new product called StreamSets Control HubTM that includes a number of new cloud-based dataflow design, deployment, and scale-up features. Since this update is now our core service for controlling dataflows, we have renamed the StreamSets cloud experience from "Dataflow Performance Manager (DPM)" to "StreamSets Control Hub”.

DPM now refers to the performance management functions that reside in the cloud such as live metrics and data SLAs. Customers who have purchased the StreamSets Enterprise Edition will gain access to all Control Hub functionality and continue to have access to all DPM functionality as before.

To understand the end-to-end StreamSets Data Operations Platform and how the products fit together, visit https://streamsets.com/products/.

Pipeline Designer
You can now create and design pipelines directly in the Control Hub Pipeline Designer after you select an authoring Data Collector for Pipeline Designer to use. You select one of the following types of Data Collectors to use as the authoring Data Collector:
  • System Data Collector - Use to design pipelines only - cannot be used to preview or explicitly validate pipelines. The system Data Collector is provided with Control Hub for exploration and light development. Includes the latest version of all stage libraries available with the latest version of Data Collector.
  • Registered Data Collector using the HTTPS protocol - Use to design, preview, and explicitly validate pipelines. Includes the stage libraries and custom stage libraries installed in the registered Data Collector.

When you create pipelines in Pipeline Designer, you can create a blank pipeline or you can create a pipeline from a template. Use pipeline templates to quickly design pipelines for typical use cases.

Provisioning Data Collectors

You can now automatically provision Data Collectors on a Kubernetes container orchestration framework. Provisioning includes deploying, registering, starting, scaling, and stopping Data Collector Docker containers in the Kubernetes cluster.

Use provisioning to reduce the overhead of managing a large number of Data Collector instances. Instead, you can manage a central Kubernetes cluster used to run multiple Data Collector containers.

Integration with Data Collector Edge

Control Hub now works with Data Collector Edge (SDC Edge) to execute edge pipelines. SDC Edge is a lightweight agent without a UI that runs pipelines on edge devices with limited resources. Edge pipelines read data from the edge device or receive data from another pipeline and then act on that data to control the edge device.

You install SDC Edge on edge devices, then register each SDC Edge with Control Hub. You assign labels to each SDC Edge to determine which jobs are run on that SDC Edge.

You either design edge pipelines in the Control Hub Pipeline Designer or in a development Data Collector. After designing edge pipelines, you publish the pipelines to Control Hub and then add the pipelines to jobs that run on a registered SDC Edge.

Pipeline comparison

When you compare two pipeline versions, Control Hub now highlights the differences between the versions in the pipeline canvas. Previously, you had to visually compare the two versions to discover the differences between them.

Aggregated statistics

You can now configure a pipeline to write aggregated statistics to MapR Streams.

Balancing jobs

When a job is enabled for pipeline failover, you can now balance the job to redistribute the pipeline load across available Data Collectors that are running the fewest number of pipelines. For example, let’s say that a failed pipeline restarts on another Data Collector due to the original Data Collector shutting down. When the original Data Collector restarts, you can balance the job so that Control Hub redistributes the pipeline to the restarted Data Collector not currently running any pipelines.

Roles

You can now assign provisioning roles to user accounts, which enable users to view and work with Provisioning Agents and deployments to automatically provision Data Collectors.

You must assign the appropriate provisioning roles to users before they can access the Provisioning Agents and Deployments views in the Navigation panel.

Navigation panel

The Navigation panel now groups the Data Collectors view under an Execute menu, along with the new Edge Data Collectors, Provisioning Agents, and Deployments views:

Dashboards

The default dashboard now includes the number of users in your organization when your user account has the Organization Administrator role.

What's New in the September 22, 2017 Update

The update for September 22, 2017 fixes the following known issues:
  • When the pipeline repository contains more than 50 pipelines, creating a job from the Pipeline Repository view might fail.
  • Data Collector version 2.7.0.0 cannot report remote pipeline status to DPM.
  • If a job fails over to another Data Collector, DPM continues to store acknowledgement messages from the previous Data Collector that is no longer running a remote pipeline for the job. This can cause performance issues when you try to view a large number of jobs in DPM.

What's New in the August 9, 2017 Update

The update for August 9, 2017 includes the following new features and enhancements:

Jobs
  • Number of pipeline instances - The default value for the number of pipeline instances for a job is now 1. This runs one pipeline instance on an available Data Collector running the fewest number of pipelines.

    Previously, the default value for the number of pipeline instances was -1, which ran one pipeline instance on each available Data Collector. For example, if three Data Collectors had all of the specified labels for the job, by default DPM ran three pipeline instances, one on each Data Collector.

  • Job history - When you monitor a job, the History tab now includes the following additional information:
    • All user actions completed on the job - such as when a user starts, stops, resets the offset, or acknowledges an error for the job.
    • The progress of all Data Collectors running remote a pipeline instance for the job - such as when each Data Collector starts and stops the remote pipeline instance.
  • Inactive job status when pipelines finish - When all pipelines run from an active job reach a finished state, the job now transitions to an inactive status. Previously, the job remained in the active status.
Data Collectors
  • Data Collector versions - The Data Collectors view now displays the version of each registered Data Collector. You can filter the list of registered Data Collectors by version.
  • Registering Data Collectors from DPM - After you generate an authentication token to register a Data Collector from DPM, you can now simply click Copy Token to copy the token from the Authentication Tokens window. Previously, you had to select the entire token string, right-click, and then select Copy to copy the token.
Roles
You can now assign the Auth Token Administrator role to user accounts, which enables users to complete the following tasks:
  • Register, unregister, and deactivate Data Collectors using DPM.
  • Regenerate authentication tokens and delete unregistered authentication tokens.

Previously, only users assigned the Organization Administrator role could perform these tasks. Users assigned the Organization Administrator role can still perform these tasks.

What's New in the June 17, 2017 Update

The update for June 17, 2017 includes the following new features and enhancements:

SAML authentication
If your company uses a Security Assertion Markup Language (SAML) identity provider (IdP), you can use the IdP to authenticate DPM users.
SAML provides single sign on for web applications. SAML single sign on transfers the user’s identity from one place (the IdP) to another (the service provider). DPM acts as the SAML service provider that works with the SAML IdP that you specify.
To use SAML authentication, you must register DPM as a service provider with the IdP of your choice. Then within DPM, you enable SAML authentication for your organization. You also must create a DPM user account for each user that needs to access DPM or a registered Data Collector. When you create the user accounts, you map each DPM user account to an IdP user account.
Send pipeline statistics directly to DPM
You can now use Data Collector to configure a pipeline to write statistics directly to DPM. Write statistics directly to DPM when you run a job for the pipeline on a single Data Collector.
When you run a job on multiple Data Collectors, a remote pipeline instance runs on each of the Data Collectors. To view aggregated statistics for the job within DPM, you must configure the pipeline to write the statistics to a Kafka cluster, Amazon Kinesis Streams, or SDC RPC.
Jobs
  • Runtime parameters - You can now specify the values to use for runtime parameters when you create or edit a job that includes a pipeline with runtime parameters.

    You configure runtime parameters for a pipeline in Data Collector. Use runtime parameters to represent any stage or pipeline property with a value that must change for each pipeline run - such as batch sizes and timeouts, directories, or URI.

    After you publish the pipeline to DPM, you can change the parameter values for each job that runs the pipeline without having to edit the pipeline.

  • Use latest pipeline version - DPM now notifies you when a job includes a pipeline that has a later version by displaying the New Pipeline Version icon () next to the job. When the job is inactive, you can simply click the icon to update the job to use the latest pipeline version.
  • Filter jobs by label - You can now filter jobs by label in the Jobs view.
  • Create jobs for multiple pipelines - You can now use the Pipeline Repository view to select multiple pipelines and then create jobs for each of the pipelines.
  • Create multiple jobs for a single pipeline - In the Add Job window, you can now choose to create multiple jobs for the selected pipeline. For example, if you use runtime parameters, you can quickly create multiple jobs for the same pipeline, defining different values for the runtime parameters for each job.
  • Add to a topology during job creation - You can now add a job to an existing topology when you create the job.
  • Create a topology from the Jobs view - You can now select multiple jobs in the Jobs view and create a topology that includes those jobs.
Topologies
  • Manage jobs from a topology - You can now perform the following actions for jobs from a topology:
    • Acknowledge errors for a job.
    • Force stop a job.
    • Start and stop all jobs.
  • Auto discover connecting systems - DPM can now automatically discover connecting systems between jobs in a topology. DPM discovers possible connecting systems and then offers you suggestions of how you might want to connect the systems, which you can accept or reject.
  • Display of topology details - Topology details now display on the right side of the canvas instead of on the bottom. Double-click the canvas or click the Open Detail Pane arrow to display the topology detail pane. You can close the detail pane to view the canvas only, or you can resize the detail pane.

    The following image shows the new display of topology details:

Notifications
When you click the Notifications icon () in the top toolbar, you can now view the following notifications:
  • Triggered alerts - Displays all triggered alerts that have not been acknowledged.
  • History of error messages - Displays recent error messages that briefly displayed in the UI.

What's New in the April 15, 2017 Update

The update for April 15, 2017 includes the following new feature:
Pipeline Failover
DPM now supports pipeline failover for jobs. Enable pipeline failover for jobs to minimize downtime due to unexpected pipeline failures and to help you achieve high availability. By default, pipeline failover is disabled for all jobs.
DPM can restart a failed pipeline on another available Data Collector in the following situations:
  • The Data Collector running the pipeline shuts down.
  • The pipeline encounters an error, such as inadequate resources on the Data Collector machine.
An available Data Collector includes any Data Collector in the group of Data Collectors for the job. When multiple Data Collectors are available, DPM restarts the pipeline on the Data Collector that is running the fewest number of pipelines.
To enable pipeline failover for a job, complete the following tasks when you create or edit the job:
  1. Select the Enable Failover property.
  2. Set the Number of Instances property to a value less than the number of available Data Collectors. This reserves available Data Collectors for pipeline failover. The number of instances determines the number of pipeline instances that DPM runs from the job.

    For example, you want to run a job on the group of four Data Collectors assigned the WesternRegion label, and want to reserve two of the Data Collectors for pipeline failover. You assign the WesternRegion label to the job and set the Number of Instances property to two.

    When you start the job, DPM identifies two available Data Collectors and starts pipeline instances on both. The third and fourth Data Collectors serve as backups and are available to continue processing pipelines if another Data Collector shuts down or a pipeline encounters an error.

What's New in the March 4, 2017 Update

The update for March 4, 2017 includes the following new features and enhancements:

Groups
You can now create groups of users to more efficiently manage user accounts. You can assign roles and permissions to individual user account or to groups.
DPM provides a default all@<organization ID> group that includes every user in the organization.
Permissions
You can now can share and grant permissions on Data Collectors, pipelines, jobs, topologies, and data SLAs. Permissions determine the access level that users and groups have on objects belonging to the organization.
To create a multitenant environment within your organization, create groups of users and then share objects with the groups to grant different levels of access.
When you create an object within DPM, you become the owner of that object and have full access to the object. You can share the object with other groups or user accounts within your organization. When you share the object, you grant others permission to the object - granting read, write, or execute access to the object. Any user with the Organization Administrator role has full access to all objects in the organization, and can grant other users and groups permission to access each object.
To perform DPM tasks, you must have the appropriate object permissions as well as the role associated with the task. For example, if you have the Pipeline Editor role, you can delete pipeline versions from the repository only when granted write permission on the pipeline.
By default, permission enforcement is not enabled for existing organizations. You can still assign permissions. However, DPM does not enforce the permissions until you enable enforcement. To enable permission enforcement, click Administration > Organizations, and then click the Organization Configurations icon. Select the Enforce permissions during object access property.
Data SLAs for Topologies
You can now configure data SLAs (service level agreements) for topologies. Data SLAs trigger an alert when a specified threshold has been reached. You configure data SLAs on the jobs included in the topology. Data SLAs enable you to monitor incoming data to ensure that it meets business requirements for availability and accuracy.
For example, you can configure a data SLA for a topology to trigger an alert when the throughput rate on a job reaches a minimum value. When the alert triggers, DPM notifies you in the top toolbar and in the new Notifications view.
The tasks you can perform for data SLAs and notifications are determined by the following new roles:
  • Data SLA Editor and Data SLA User
  • Notification User
By default, these new roles are not assigned to existing users. A user with the Organization Administrator role must assign these roles to other users and groups.
Job Offsets
The job History view now displays the last-saved job offset sent by each Data Collector running a remote pipeline instance for the job.
Aggregated Statistics
You can now configure a pipeline to write aggregated statistics to SDC RPC. Write statistics to SDC RPC for development purposes only. For a production environment, use a Kafka cluster or Amazon Kinesis Streams to aggregate statistics.
Register Data Collectors with DPM
If Data Collector uses file-based authentication and if you register the Data Collector from the Data Collector UI, you can now create DPM user accounts and groups during the registration process.
Organization Configuration
You can now configure the following information for your organization:
  • Maximum number of minutes that a user session can remain inactive before timing out.
  • Maximum number of days that a user password is valid.

Known Issues

Please note the following known issues:
  • Pipeline Designer displays credential values in stage properties when the pipeline is viewed in read only mode.
  • The number of pipelines displayed on the Dashboards view does not match the number of pipelines in the pipeline repository.
  • Cluster pipelines that include an HDFS destination or that run on Spark 2.3 fail to start when the pipelines are run from Control Hub jobs.
  • Pipeline Designer does not detect changes to the pipeline when you change a metric, data, or data drift rule.
  • A job owner can stop the job when it is running on a Data Collector that he doesn't have access to.
  • The Scheduler view can display only 50 scheduled jobs or reports.
  • The Control Hub scheduler cannot schedule job templates to create and start job instances on a regular basis.
  • A topology displays metrics from a single pipeline instance instead of aggregated metrics from all pipeline instances after a job has been updated to write aggregated statistics to another system.

    Workaround: Delete the updated job from the topology and then add the job again.

  • When a user schedules a job that he does not have execute permission on, the scheduled task is successfully created. However, the scheduled task fails to trigger and does not display an error message indicating that the user must have execute permission on the job.

    Workaround: Grant the user execute permission on the job being scheduled.

  • When permission enforcement is enabled, pagination logic on the Jobs view might prevent you from viewing and creating jobs.
  • When Control Hub uses SAML or LDAP authentication, you cannot log into registered Data Collectors using the disconnected mode.
  • Updating a job to use the latest version of a pipeline removes any existing data SLAs configured for that topology.
  • Topology metrics and aggregated metrics for a job which runs on multiple Data Collectors might not accurately reflect the metrics for individual jobs or pipelines.

    Workaround: For topology metrics, select a job in the topology canvas to monitor the statistics for the job.

  • If you scale up a deployment and then delete unregistered authentication tokens for Data Collectors while waiting for the deployment to scale, the newly provisioned Data Collectors might start but not successfully register with Control Hub.