Post Upgrade Tasks

In some situations, you must complete tasks within Data Collector after you upgrade.

Configure JDBC Producer Schema Names

With Data Collector version 2.5.0.0, you can use a Schema Name property to specify the database or schema name. In previous releases, you specified the database or schema name in the Table Name property.

Upgrading from a previous release does not require changing any existing configuration at this time. But we recommend using the new Schema Name property, since the ability to specify a database or schema name with the table name might be deprecated in the future.

Evaluate Precondition Error Handling

With Data Collector version 2.5.0.0, precondition error handling has changed.

The Precondition stage property allows you to define conditions that must be met for a record to enter the stage. Previously, records that did not meet all specified preconditions were passed to the pipeline for error handling. That is, the records were processed based on the Error Records pipeline property.

With version 2.5.0.0, records that do not meet the specified preconditions are handled by the error handling configured for the stage. Stage error handling occurs based on the On Record Error property on the General tab of the stage.

Review pipelines that use preconditions to verify that this change does not adversely affect the behavior of the pipelines.

Authentication for Docker Image

With Data Collector version 2.4.1.0, the Docker image now uses the form type of file-based authentication by default. As a result, you must use a Data Collector user account to log in to the Data Collector. If you haven't set up custom user accounts, you can use the admin account shipped with the Data Collector. The default login is: admin / admin.

Earlier versions of the Docker image used no authentication.

Configure Pipeline Permissions

Data Collector version 2.4.0.0 is designed for multitenancy and enables you to share and grant permissions on pipelines. Permissions determine the access level that users and groups have on pipelines.

In earlier versions of Data Collector without pipeline permissions, pipeline access is determined by roles. For example, any user with the Creator role could edit any pipeline.

In version 2.4.0.0, roles are augmented with pipeline permissions. In addition to having the necessary role, users must also have the appropriate permissions to perform pipeline tasks.

For example, to edit a pipeline in 2.4.0.0, a user with the Creator role must also have read and write permission on the pipeline. Without write permission, the user cannot edit the pipeline. Without read permission, the user cannot see the pipeline at all. It does not display in the list of available pipelines.

Note: With pipeline permissions enabled, all upgraded pipelines are initially visible only to users with the Admin role and the pipeline owner - the user who created the pipeline. To enable other users to work with pipelines, have an Admin user configure the appropriate permissions for each pipeline.

In Data Collector version 2.5.0.0, pipeline permissions are disabled by default. To enable pipeline permissions, set the pipeline.access.control.enabled property to true in the Data Collector configuration file.

Tip: You can configure pipeline permissions when permissions are disabled. Then, you can enable the pipeline permissions property after pipeline permissions are properly configured.

For more information about roles and permissions, see Roles and Permissions. For details about configuring pipeline permissions, see Sharing Pipelines.

Update Elasticsearch Pipelines

Data Collector version 2.3.0.0 includes an enhanced Elasticsearch destination that uses the Elasticsearch HTTP API. To upgrade pipelines that use the Elasticsearch destination from Data Collector versions earlier than 2.3.0.0, you must review the value of the Default Operation property.

Review all upgraded Elasticsearch destinations to ensure that the Default Operation property is set to the correct operation. Upgraded Elasticsearch destinations have the Default Operation property set based on the configuration for the Enable Upsert property:

  • With upsert enabled, the default operation is set to INDEX.
  • With upsert not enabled, the default operation is set to CREATE which requires a DocumentId.
Note: The Elasticsearch version 5 stage library is compatible with all versions of Elasticsearch. Earlier stage library versions have been removed.

Update Kudu Pipelines

Data Collector version 2.2.0.0 provides support for Apache Kudu version 1.0.x and no longer supports earlier Kudu versions. To upgrade pipelines that contain a Kudu destination from Data Collector versions earlier than 2.2.0.0, upgrade your Kudu cluster and then add a stage alias for the earlier Kudu version to the Data Collector configuration file, $SDC_CONF/sdc.properties.

The configuration file includes stage aliases to enable backward compatibility for pipelines created with earlier versions of Data Collector.

To update Kudu pipelines:

  1. Upgrade your Kudu cluster to version 1.0.x.

    For instructions, see the Apache Kudu documentation.

  2. Open the $SDC_CONF/sdc.properties file and locate the following comment:
    # Stage aliases for mapping to keep backward compatibility on pipelines when stages move libraries
  3. Below the comment, add a stage alias for the earlier Kudu version as follows:
    stage.alias.streamsets-datacollector-apache-kudu-<version>-lib, com_streamsets_pipeline_stage_destination_kudu_KuduDTarget = streamsets-datacollector-apache-kudu_1_0-lib, com_streamsets_pipeline_stage_destination_kudu_KuduDTarget
    Where <version> is the earlier Kudu version: 0_7, 0_8, or 0_9. For example, if you previously used Kudu version 0.9, add the following stage alias:
    stage.alias.streamsets-datacollector-apache-kudu-0_9-lib, com_streamsets_pipeline_stage_destination_kudu_KuduDTarget = streamsets-datacollector-apache-kudu_1_0-lib, com_streamsets_pipeline_stage_destination_kudu_KuduDTarget
  4. Restart Data Collector to enable the changes.

Update Vault Pipelines

Due to a known issue in Data Collector version 1.5.0.0, you can use Vault functions to call Vault secrets from within any pipeline or stage property. If you are upgrading from version 1.5.0.0, update Vault pipelines as needed.

To protect the security of sensitive information, calling Vault is now restricted to the following properties:

  • Usernames, passwords, and similar properties such as AWS Access Key ID and Secret Access Key.
  • HTTP headers and bodies when using HTTPS.

After upgrading from version 1.5.0.0, update any pipeline that uses Vault functions in other properties. Remove Vault functions from unsupported properties or the pipeline will fail validation when you validate or start the pipeline.