Upgrade an Installation from the Tarball

When you upgrade a full or core installation from the tarball, you configure the new version to use a new configuration directory outside of the base Data Collector runtime directory. You then configure the new version to use the same data, log, and resource directories as the previous version. As a result, the new version has access to the files created in the previous version.
Note: If you installed additional drivers or developed custom stages, verify that those libraries are stored in a local directory external to the Data Collector runtime directory before you upgrade. That way, Data Collector can still use the libraries after the upgrade.

To upgrade a full or core installation from the tarball, perform the following steps:

Step 1. Shut Down the Previous Version

Step 2. Back Up the Previous Version

Step 3. Install the New Version

Step 4. Update the Environment Configuration File

Step 5. Update the Configuration Files

Step 6. Install Additional Libraries for the Core Installation

Step 7. Start the New Version of Data Collector

Step 1. Shut Down the Previous Version

Stop all pipelines and then shut down the previous version of Data Collector.

  1. From the Home page, select all running pipelines in the list and then click the Stop icon.
    When the confirmation dialog appears, click Yes.
  2. Use one of the following methods to shut down Data Collector:
    • To use the command line for shutdown when Data Collector is started as a service, use the following command:
      service sdc stop
    • To use the Data Collector console for shutdown, click Administration > Shut Down. When the confirmation dialog box appears, click Yes.

Step 2. Back Up the Previous Version

Before you install the new version, create a backup of the files in the previous version by copying and renaming the configuration, data, log, and resource directories. That way, you can continue to run the previous version if needed.

Copy and rename the following directories:
  • SDC_CONF - The Data Collector configuration directory.
  • SDC_DATA - The Data Collector directory for pipeline state and configuration information.
  • SDC_LOG - The Data Collector directory for logs.
  • SDC_RESOURCES - The Data Collector directory for runtime resource files.

For example, if you are upgrading version 2.6.0.0, copy the Data Collector configuration directory and rename it as follows: /etc/sdc2600.

Step 3. Install the New Version

The instructions that you use to install the new version depend on whether you start Data Collector manually or as a service.

Installing from the Tarball (Manual Start)

Install the new version of the tarball.

  1. Use the following URL to download the full or core Data Collector tarball from the StreamSets website: https://streamsets.com/opensource.
  2. Extract the tarball to a different directory than the previous version.
  3. Use the following command to set the $SDC_DIST environment variable to the location where you extracted the tarball:
    export SDC_DIST=<extraction directory>
    For example:
    export SDC_DIST=/sdc/streamsets-datacollector-2.6.0.0

Installing from the Tarball (Service Start)

Install the new version of the tarball. Installing the full Data Collector as a service requires sudo privileges on the root directory.

  1. Use the following URL to download the full or core Data Collector tarball from the StreamSets website: https://streamsets.com/opensource.
  2. Extract the tarball to a different directory than the previous version.
  3. Create a backup of the /etc/init.d/sdc file that was used in the previous version.
  4. Use the following commands from the directory where you extracted the tarball to copy the initd/_sdcinitd_prototype file to the /etc/init.d directory and then change ownership of the file to sdc:
    cp initd/_sdcinitd_prototype  /etc/init.d/sdc
    chown sdc:sdc /etc/init.d/sdc
  5. Edit the /etc/init.d/sdc file and set the $SDC_DIST and $SDC_HOME environment variables to the location where you extracted the tarball.
  6. Use the following command to make the sdc file executable:
    chmod 755 /etc/init.d/sdc

Step 4. Update the Environment Configuration File

Update the environment configuration file so that the new version of Data Collector uses a new configuration directory but the same working data, log, and resource directories as the previous version.

For example, let's assume that the previous version of Data Collector used the directory /var/lib/sdc to store the data files for pipeline configuration and run details. When you upgrade, you configure the new version of Data Collector to use the same working directory /var/lib/sdc for the data files. As a result, the new version has access to the pipelines created in the previous version.

You also must update the environment configuration file with any other customized property values that you defined in the previous version.

  1. In the new version of Data Collector, open the appropriate environment configuration file for your installation:
    • $SDC_DIST/libexec/sdc-env.sh - Used when you start Data Collector manually from the command line.
    • $SDC_DIST/libexec/sdcd-env.sh - Used when you start Data Collector as a service.
  2. Update the directory environment variables to use the following values:
    Environment Variable Value
    SDC_CONF New location outside of the base Data Collector runtime directory and unique from the previous renamed directory. For example, if you renamed the previous configuration directory to /etc/sdc2600, use the value /etc/sdc.
    SDC_DATA Same directory that the previous version used.
    SDC_LOG Same directory that the previous version used.
    SDC_RESOURCES Same directory that the previous version used.
  3. If you installed external libraries or developed custom stages, add the STREAMSETS_LIBRARIES_EXTRA_DIR or USER_LIBRARIES_DIR environment variable to the environment configuration file, and set it to the same directory used in the previous version.
  4. Manually update the environment configuration file with any other customized property values that you defined in the previous version.
  5. Use the following command to create the Data Collector configuration directory at /etc/sdc:
    mkdir /etc/sdc
  6. Use the following command from the directory where you extracted the tarball to copy all files from etc into the Data Collector configuration directory that you just created:
    cp -R etc/ /etc/sdc
  7. To run Data Collector as a service, change the owner of the /etc/sdc directory and all files in the directory to the system user and group that starts Data Collector.
    By default, Data Collector uses a system user and group named sdc.
  8. Use the following command to set owner only permission on the form-realm.properties file in the /etc/sdc directory:
    chmod go-rwx /etc/sdc/form-realm.properties

Step 5. Update the Configuration Files

A new Data Collector version can include new properties and configuration files required for Data Collector to start or function properly. In the previous step, we updated the environment configuration file so that the new version of Data Collector uses the new configuration files stored in the $SDC_CONF directory. In this step, we’ll compare the previous and new versions of the configuration files, and update the new files as needed with the same customized property values.

For example, we'll compare the files in the /etc/sdc2600 directory with the files in the /etc/sdc directory. We'll update the new files in the /etc/sdc directory with any customizations made in the previous files in the /etc/sdc2600 directory.

  1. Compare the previous and new versions of the sdc.properties file, and update the new file as needed with the same customized property values.
  2. If you registered the previous Data Collector to work with Dataflow Performance Manager (DPM), complete the following steps to update the configuration files used by DPM:
    1. Compare the previous and new version of the dpm.properties file, and update the new file as needed with the same customized property values.
    2. Replace the new version of the application-token.txt file with the previous version of the file.
      The previous version of the file includes the authentication token that this Data Collector instance requires to issue authenticated requests to DPM. As a result, we'll need Data Collector to use the previous version of the file.
  3. Compare the previous and new versions of the remaining files, and update the new files as needed with the same customized property values:
    • The appropriate realm.properties file, based on the authentication type that you use.
    • email-password.txt
    • keystore files
    • LDAP files
    • log4j properties file
    • security policy file
    • Vault properties file

Step 6. Install Additional Libraries for the Core Installation

If you upgraded a core installation of Data Collector, install the individual stage libraries that the upgraded pipelines require. Use the Package Manager or the stagelibs command to install additional stage libraries.

Step 7. Start the New Version of Data Collector

Start the new version of Data Collector.

To start Data Collector manually
Use the following command:
$SDC_DIST/bin/streamsets dc
Or, use the following command to launch Data Collector and run it in the background:
nohup $SDC_DIST/bin/streamsets dc &
To start Data Collector as a service
Use the following command:
service sdc start  
To add the Data Collector service to the system startup, use the required command for your operating system:
  • For CentOS, use the following command:
    chkconfig --add sdc
  • For Ubuntu, use the following command:
    update-rc.d sdc defaults 97 03