Install External Libraries

Install external libraries to make them available to Data Collector stages. When using multiple stage libraries for a particular stage, to make external libraries available to all versions of the stage, install the external libraries for each stage library.

You can install external libraries using the Package Manager in the Data Collector user interface, or you can install them manually.

You can install external libraries for the following stages:
  • Before you use the following stages, install JDBC drivers for the implementation that you want to use:
    • JDBC Multitable Consumer origin
    • JDBC Query Consumer origin
    • JMS Consumer origin
    • MySQL Binary Log origin
    • Oracle CDC Client origin
    • JDBC Lookup processor
    • JDBC Tee processor
    • JDBC Producer destination
    • JDBC Query executor

    For example, to use the JDBC Query Consumer or the JDBC Producer with Oracle, install the Oracle JDBC drivers.

  • Before you use the Spark Evaluator processor, install the Spark application JAR file and any dependencies other than the streamsets-datacollector-api, streamsets-datacollector-spark-api, and spark-core libraries.
  • You can install external Java libraries to call external Java code from the scripting processors: Groovy, Java, and Jython Evaluator.

Install Using the Package Manager

To install external libraries using the Package Manager, complete the following general steps:

  1. Set up an external directory to store the libraries.
  2. Use the Package Manager within Data Collector to install the external libraries.

Step 1. Set Up an External Directory

Before you install external libraries, set up a local directory external to the Data Collector installation directory for the libraries. Use an external directory to enable use of the libraries after Data Collector upgrades. Use the required procedure for your installation type.

Setting Up for RPM and Tarball

Before you install external libraries for an RPM or tarball installation, set up an external directory to store the libraries.

  1. Create a local directory external to the Data Collector installation directory.
    For example, if you installed Data Collector in the following directory:
    /opt/sdc/
    you might create the external directory at:
    /opt/sdc-extras
  2. Grant the user who starts Data Collector ownership on the external directory.
    For example, if you use the default system user and group named sdc to run Data Collector as a service, use the following command to change the owner of the external directory and all files in the directory to sdc:sdc:
    chown -R sdc:sdc /opt/sdc-extras
  3. In the Data Collector environment configuration file, add the STREAMSETS_LIBRARIES_EXTRA_DIR environment variable and point it to the external directory.

    If you start Data Collector as a service, set the environment variable in the $SDC_DIST/libexec/sdcd-env.sh file. If you start Data Collector manually, set the variable in the $SDC_DIST/libexec/sdc-env.sh file.

    Set the environment variable as follows:

    export STREAMSETS_LIBRARIES_EXTRA_DIR="<external directory>"

    For example:

    export STREAMSETS_LIBRARIES_EXTRA_DIR="/opt/sdc-extras/"
  4. When using the Java Security Manager, which is enabled by default, update the Data Collector security policy to include the external directory as follows:
    1. In the Data Collector configuration directory, open the security policy file, $SDC_CONF/sdc-security.policy.
    2. Add the following lines to the file:
      // user-defined external directory
      grant codebase "file://<external directory>-" {
        permission java.security.AllPermission;
      };
      For example:
      // user-defined external directory
      grant codebase "file:///opt/sdc-extras/-" {
        permission java.security.AllPermission;
      };
  5. Restart Data Collector.

Setting Up for Cloudera Manager

Before you install external libraries for a Cloudera Manager installation, set up an external directory to store the libraries.

  1. In Cloudera Manager, select the StreamSets service and then click Configuration.
  2. On the Configuration page, in the Data Collector Advanced Configuration Snippet (Safety Valve) for sdc-env.sh field, add the STREAMSETS_LIBRARIES_EXTRA_DIR environment variable and point it to the external directory, as follows:
    export STREAMSETS_LIBRARIES_EXTRA_DIR="<external directory>"

    For example:

    export STREAMSETS_LIBRARIES_EXTRA_DIR="/opt/sdc-extras/"
    By default, the path is /var/lib/sdc.
  3. Create the /opt/sdc-extras/ directory on every node that runs Data Collector.
  4. Grant the user who starts Data Collector ownership on the external directory added to every node.
    For example, if you use the default system user and group named sdc to run Data Collector as a service, use the following command to change the owner of the external directory and all files in the directory to sdc:sdc:
    chown -R sdc:sdc /opt/sdc-extras
  5. When using the Java Security Manager, which is enabled by default, update the Data Collector Advanced Configuration Snippet (Safety Valve) for sdc-security.policy property to include the external directory as follows:
    // user-defined external directory
    grant codebase "file://<external directory>-" {
      permission java.security.AllPermission;
    };
    For example:
    // user-defined external directory
    grant codebase "file:///opt/sdc-extras/-" {
      permission java.security.AllPermission;
    };
  6. Restart Data Collector.

Step 2. Install External Libraries

After you've set up the external directory, use the Package Manager within Data Collector to install external libraries.

  1. In Data Collector, in the top right toolbar, click the Package Manager icon:
  2. In the navigation panel, click External Libraries:
    Data Collector lists any currently installed external libraries.
  3. Immediately under the top right toolbar, click the Install External Libraries icon:
  4. In the Install External Libraries dialog box, select the stage library that needs to access the external library.
    For example, if you are installing a JDBC driver for the JDBC Multitable Consumer origin, select the JDBC stage library. If you are installing an external Java library for the Groovy Evaluator processor, select the Groovy stage library.
  5. Browse to select the external library to install and click Open.
  6. To install the external library to the specified stage library, click Upload.
    Data Collector installs the external library and displays a message offering to restart Data Collector.
  7. To install additional external libraries, click Cancel, then repeat steps 3 - 6 for every stage library that needs access to the external library.
    For example, say you want to use an external library with the Spark Evaluator processor, but you use two versions of the processor - each from a different stage library. To make the external library available to both processor versions, you must upload the external library to both stage libraries.
  8. After installing all of the external libraries that you want, restart the Data Collector in one of the following ways:
    • If you started the Data Collector manually from the command line, click Restart Data Collector in the Install External Libraries window.
    • If you started the Data Collector as a service, you must use the command line for restart. Click Cancel in the Install External Libraries window, and then run the following command:
      service sdc restart

Install Manually

To manually install external libraries, use the required procedure for your installation type.

Installing Manually for RPM and Tarball

To manually install external libraries for an RPM or tarball installation, perform the following steps:

  1. Create a local directory external to the Data Collector installation directory.
    For example, if you installed Data Collector in the following directory:
    /opt/sdc/
    you might create the external directory at:
    /opt/sdc-extras
  2. Create subdirectories for each set of external libraries based on the stage library name as follows:
    /opt/sdc-extras/<stage library name>/lib/
    For example, to install drivers for stages included with the JDBC stage library, create the following subdirectory:
    /opt/sdc-extras/streamsets-datacollector-jdbc-lib/lib/

    To also install drivers for stages included with the JMS stage library, create the following subdirectory:

    /opt/sdc-extras/streamsets-datacollector-jms-lib/lib/
    Note: If you use multiple stage libraries for a particular stage, and you want to use an external library with all stage libraries, you must install the external library for each stage library.

    For example, say you want to use an external library with the Spark Evaluator processor, but you use two versions of the processor - each from a different stage library. To make the external library available to both processor versions, you must upload the external library to both stage libraries.

  3. Copy the external libraries to the appropriate subdirectories.
  4. In the Data Collector environment configuration file, add the STREAMSETS_LIBRARIES_EXTRA_DIR environment variable and point it to the external directory.

    If you start Data Collector as a service, set the environment variable in the $SDC_DIST/libexec/sdcd-env.sh file. If you start Data Collector manually, set the variable in the $SDC_DIST/libexec/sdc-env.sh file.

    Set the environment variable as follows:

    export STREAMSETS_LIBRARIES_EXTRA_DIR="<external directory>"

    For example:

    export STREAMSETS_LIBRARIES_EXTRA_DIR="/opt/sdc-extras/"
  5. When using the Java Security Manager, which is enabled by default, update the Data Collector security policy to include the external directory as follows:
    1. In the Data Collector configuration directory, open the security policy file, $SDC_CONF/sdc-security.policy.
    2. Add the following lines to the file:
      // user-defined external directory
      grant codebase "file://<external directory>-" {
        permission java.security.AllPermission;
      };
      For example:
      // user-defined external directory
      grant codebase "file:///opt/sdc-extras/-" {
        permission java.security.AllPermission;
      };
  6. Restart Data Collector.

Installing Manually for Cloudera Manager

To manually install external libraries for an installation with Cloudera Manager, perform the following steps:

  1. In Cloudera Manager, select the StreamSets service and then click Configuration.
  2. On the Configuration page, in the Data Collector Advanced Configuration Snippet (Safety Valve) for sdc-env.sh field, add the STREAMSETS_LIBRARIES_EXTRA_DIR environment variable and point it to the external directory, as follows:
    export STREAMSETS_LIBRARIES_EXTRA_DIR="<external directory>"

    For example:

    export STREAMSETS_LIBRARIES_EXTRA_DIR="/opt/sdc-extras/"
    By default, the path is /var/lib/sdc.
  3. On every node that runs Data Collector, create subdirectories for each set of external libraries based on the stage library name as follows:
    $STREAMSETS_LIBRARIES_EXTRA_DIR/<stage library name>/lib/
    For example, to install drivers for JDBC, create the following subdirectory on every node:
    /opt/sdc-extras/streamsets-datacollector-jdbc-lib/lib/
    To also install drivers for JMS, create the following subdirectory on every node:
    /opt/sdc-extras/streamsets-datacollector-jms-lib/lib/
    Note: If you use multiple stage libraries for a particular stage, and you want to use an external library with all stage libraries, you must install the external library for each stage library.

    For example, say you want to use an external library with the Spark Evaluator processor, but you use two versions of the processor - each from a different stage library. To make the external library available to both processor versions, you must upload the external library to both stage libraries.

  4. Copy the external libraries to the appropriate subdirectories on every node.
  5. When using the Java Security Manager, which is enabled by default, update the Data Collector Advanced Configuration Snippet (Safety Valve) for sdc-security.policy property to include the external directory as follows:
    // user-defined external directory
    grant codebase "file://<external directory>-" {
      permission java.security.AllPermission;
    };
    For example:
    // user-defined external directory
    grant codebase "file:///opt/sdc-extras/-" {
      permission java.security.AllPermission;
    };
  6. Restart Data Collector.