Full Installation and Launch (Manual Start)

You can install the full Data Collector tarball and start it manually on all supported operating systems.

When you start Data Collector manually, Data Collector runs as the system user account logged into the command prompt when you run the launch command. You can alternatively impersonate another user account when you run the command.

  1. Use the following URL to download the full StreamSets Data Collector tarball from the StreamSets website: https://streamsets.com/opensource.
  2. Extract the tarball to the desired location.
  3. For a production environment, configure the directories used to store configuration, data, log, and resource files so that they are outside of $SDC_DIST, the location where you extracted the tarball and the base Data Collector runtime directory.

    Use directories outside of the runtime directory to enable use of the directories after Data Collector upgrades.

    For a development or test environment, you can use the default locations within the $SDC_DIST runtime directory. However, StreamSets recommends that you use directories outside of the runtime directory for all environments. If you use the default values for a development or test environment, make sure the user who starts Data Collector has write permission on the base Data Collector runtime directory.

    1. Create directories outside of the $SDC_DIST runtime directory for the configuration, data, log, and resource files.
    2. In the $SDC_DIST/libexec/sdc-env.sh file, set the following environment variables to the newly created directories:
      • SDC_CONF - The Data Collector configuration directory.
      • SDC_DATA - The Data Collector directory for pipeline state and configuration information.
      • SDC_LOG - The Data Collector directory for logs.
      • SDC_RESOURCES - The Data Collector directory for runtime resource files.
    3. Copy all files from $SDC_DIST/etc to the newly created $SDC_CONF directory.
  4. Use the following command from the $SDC_DIST directory to run Data Collector as the system user account logged into the command prompt:
    bin/streamsets dc

    Or, use the following command to run Data Collector in the background:

    nohup bin/streamsets dc &

    Use the following command to run Data Collector as another system user account:

    sudo -u <user> bin/streamsets dc
  5. To access the Data Collector UI, enter the following URL in the address bar of your browser:
    http://<hostname>:18630/