MapR Prerequisites

Due to licensing restrictions, StreamSets cannot distribute MapR libraries with Data Collector. As a result, you must perform additional steps to enable the Data Collector machine to connect to MapR.

MapR prerequisites include installing the required client libraries and then running the command to set up MapR.

Note: The MapR FS destination supports MapR versions 5.0.0, 5.1.0 and 5.2.0. All other stages that use the MapR library currently only support MapR versions 5.1.0 and 5.2.0.

Step 1. Install Client Libraries

Install the required client libraries. If the MapR cluster uses username/password login authentication, you must enable Data Collector to use username/password authentication.

  1. Install Data Collector on a node in the MapR cluster or on a client machine.
    To run Data Collector on a client machine - outside the cluster or on your local machine - the MapR client package must be installed and configured on the machine. If the MapR client package is not installed on the machine, download and install the following files:
    • MapR client library - Typically named mapr-client_<version>.<ext>.
      You can download the files for your operating system here:
      http://package.mapr.com/releases/<version>/
    • Kafka client library - Typically named mapr-kafka-<version>.<ext>.
      For MapR versions 5.0.0 or 5.1.0, you can download the files for your operating system here:
      http://package.mapr.com/releases/ecosystem-<version>/
      For MapR version 5.2.0, you can download the files for your operating system here:
      http://package.mapr.com/releases/MEP/MEP-<version>/
  2. If the MapR cluster uses username/password login authentication, uncomment the following line in the Data Collector environment configuration file:
    #export SDC_JAVA_OPTS="-Dmaprlogin.password.enabled=true"

    If you start Data Collector as a service, modify the $SDC_DIST/libexec/sdcd-env.sh file. If you start Data Collector manually, modify the $SDC_DIST/libexec/sdc-env.sh file.

Step 2. Run the Command to Set Up MapR

After installing the required client libraries, run the setup-mapr command. The command modifies configuration files and creates the required symbolic links. You can run the command in interactive or non-interactive mode.

In interactive mode, the command prompts you for the MapR version and home directory. In non-interactive mode, you define the MapR version and home directory in environment variables before running the command.

Running the Command in Interactive Mode

When you run the setup-mapr command in interactive mode, the command prompts you for the MapR version and home directory.

  1. Set the following environment variables:
    Environment Variable Description
    SDC_HOME Data Collector home directory.
    Note: The default home directory for an RPM installation is /opt/streamsets-datacollector. The tarball home directory is the location where you extracted the file.
    SDC_CONF Data Collector configuration directory.
    Use the following command to set an environment variable:
    export <environment variable>=<value>
    For example, use the following commands if you used the default home and configuration directories for an RPM installation:
    export SDC_HOME=/opt/streamsets-datacollector
    export SDC_CONF=/etc/sdc
  2. Use the following command to set up MapR:
    $SDC_DIST/bin/streamsets setup-mapr
  3. When prompted, enter 5.0.0, 5.1.0, or 5.2.0 for the MapR version.
  4. When prompted, enter the absolute path to the MapR home directory, usually /opt/mapr.
  5. Restart Data Collector and verify that the MapR origins and destinations appear in the stage library.

Running the Command in Non-Interactive Mode

When you run the setup-mapr command in non-interactive mode, you define the MapR version and home directory in environment variables before running the command.

  1. Set the following environment variables:
    Environment Variable Description
    SDC_HOME Data Collector home directory.
    Note: The default home directory for an RPM installation is /opt/streamsets-datacollector. The tarball home directory is the location where you extracted the file.
    SDC_CONF Data Collector configuration directory.
    MAPR_HOME MapR home directory, usually /opt/mapr.
    MAPR_VERSION MapR version: 5.0.0, 5.1.0, or 5.2.0.
    Use the following command to set an environment variable:
    export <environment variable>=<value>
    For example, use the following commands if you used the default home and configuration directories for an RPM installation, the default MapR home directory, and MapR version 5.2.0:
    export SDC_HOME=/opt/streamsets-datacollector
    export SDC_CONF=/etc/sdc
    export MAPR_HOME=/opt/mapr
    export MAPR_VERSION=5.2.0
  2. Use the following command to set up MapR:
    $SDC_DIST/bin/streamsets setup-mapr
  3. Restart Data Collector and verify that the MapR origins and destinations appear in the stage library.