Working with Upgraded External Systems

When an external system is upgraded to a new version, you can continue to use existing Data Collector pipelines that connected to the previous version of the external system. You simply configure the pipelines to work with the upgraded system.

For example, let's say that you have pipelines that read from Apache Kafka version 0.9. You upgrade Apache Kafka to version 0.10. You can continue to use the existing pipelines after you configure the Kafka stages to use the Kafka version 0.10 stage library.

Or, let's say that you develop a pipeline to write to Cloudera CDH version 5.8 distribution of Hadoop. Then you export and import the pipeline into a Data Collector that has the Cloudera CDH version 5.9 stage library installed. You can continue to use the imported pipeline after you configure the appropriate stages to use the Cloudera CDH version 5.9 stage library.

  1. Verify that the new stage library version is installed in Data Collector.
    For a tarball installation, you can use the Package Manager or the command line to view or install stage libraries. See Installing for Tarball Using the Package Manager or Installing for Tarball Using the Command Line.

    For an RPM installation, you must use the command line to view or install stage libraries, as described in Installing for RPM.

  2. Open each pipeline that connects to the upgraded external system.
  3. On the General tab for each stage that connects to the external system, select the new stage library version.

Working with a Cloudera CDH 5.11 System

When you upgrade to Cloudera CDH version 5.11 from a previous version, you must update pipelines that set permissions on HDFS or Hive by modifying file mode bits with the minus or equals operators.

Pipelines can modify file mode bits on HDFS or Hive with the following stage properties:
  • The HDFS File Metadata executor Set Permissions property
  • The Hadoop FS destination whole file Permissions Expression whole file property
CDH 5.11 changes how the minus and equals operators are evaluated as follows:
  • In previous CDH releases, the minus operator (-) grants the specified permissions. In the current release, it removes the specified permissions.

    For example, in previous releases, a-rw grants read and write permissions to all users. With CDH 5.11, it removes read and write permissions from all users.

  • In earlier CDH releases, the equals operator (=) removes the specified permissions. In the current release, it grants the specified permissions.

    For example, in previous releases, a=we removes write and execute permission from all users. With CDH 5.11, it grants write and execute permission to all users.

To ensure that file permissions are set as expected, update all properties in upgraded pipelines that modify file mode bits with the minus or equals operators.

This behavior change is noted in the Cloudera documentation regarding the fix for HADOOP-13508.

Working with an Upgraded MapR System

If you upgrade MapR, you must complete additional steps to continue using existing pipelines that connected to the previous MapR version.

  1. Stop Data Collector.
  2. In the Data Collector configuration file, $SDC_CONF/sdc.properties, add the previous MapR version stage library to the system.stagelibs.blacklist property.
    For example, if you upgraded MapR version 5.1 to 5.2, add MapR version 5.1 to the blacklist property so that the property lists all supported MapR versions like so:
    system.stagelibs.blacklist=\
      streamsets-datacollector-mapr_5_0-lib,\
      streamsets-datacollector-mapr_5_1-lib,\
      streamsets-datacollector-mapr_5_2-lib
  3. If the MapR cluster uses username/password login authentication, uncomment the following line in the Data Collector environment configuration file:
    #export SDC_JAVA_OPTS="-Dmaprlogin.password.enabled=true"

    If you start Data Collector as a service, modify the $SDC_DIST/libexec/sdcd-env.sh file. If you start Data Collector manually, modify the $SDC_DIST/libexec/sdc-env.sh file.

  4. Run the setup-mapr command, as described in Step 2. Run the Command to Set Up MapR.

    The command modifies configuration files and creates the required symbolic links. You can run the command in interactive or non-interactive mode.