Full Installation and Launch (Service Start)

To install the full Data Collector as a service, you can download the Data Collector RPM package or the Data Collector tarball from the StreamSets website.

Installing from the RPM Package

When you install from the RPM package, Data Collector uses the default directories and the default system user and group.

The default system user and group are named sdc. If an sdc user and an sdc group do not exist on the machine, the installation creates the user and group for you and assigns them the next available user ID and group ID.

Tip: To use specific IDs for the sdc user and group, create the user and group before installation and specify the IDs that you want to use. For example, if you’re installing Data Collector on multiple machines, you might want to create the system user and group before installation to ensure that the user ID and group ID are consistent across the machines.

Installing the full Data Collector as a service requires sudo privileges on the root directory.

  1. Use the following URL to download the Data Collector RPM package from the StreamSets website: https://streamsets.com/opensource.
  2. Use the following command to extract the file to the desired location:
    tar -xzf streamsets-datacollector-<version>-all-rpms.tgz
    For example, to extract version 2.6.0.0, use the following command:
    tar -xzf streamsets-datacollector-2.6.0.0-all-rpms.tgz
  3. Use the following command to install the full Data Collector RPM package:
    yum localinstall streamsets*
  4. To start Data Collector as a service, use the following command:
    service sdc start
  5. To access the Data Collector console, enter the following URL in the address bar of your browser:
    http://<system-ip>:18630/

Installing from the Tarball

This procedure walks through setting the default directories and the default system user and group used to start Data Collector as a service.

Before you install, you can alternatively use the $SDC_DIST/libexec/sdcd-env.sh file to modify the environment variables that define directories and the system user and group.

Installing the full Data Collector as a service requires sudo privileges on the root directory.

  1. Use the following URL to download the Data Collector tarball from the StreamSets website: https://streamsets.com/opensource.
  2. Use the following command to extract the tarball to the desired location, typically /opt/local/:
    tar -xzf streamsets-datacollector-all-<version>.tgz
    For example, to extract version 2.6.0.0, use the following command:
    tar -xzf streamsets-datacollector-all-2.6.0.0.tgz
  3. Create a system user and group named sdc.
    The sdc user and group are used to start Data Collector as a service.
  4. Use the following command to create the /etc/init.d directory:
    mkdir /etc/init.d
  5. Use the following commands to copy initd/_sdcinitd_prototype to the /etc/init.d directory and then change ownership of the file to sdc:
    cp initd/_sdcinitd_prototype  /etc/init.d/sdc
    chown sdc:sdc /etc/init.d/sdc
  6. Edit the /etc/init.d/sdc file and set the $SDC_DIST and $SDC_HOME environment variables to the location where you extracted the tarball.
  7. Use the following command to make the sdc file executable:
    chmod 755 /etc/init.d/sdc
  8. Use the following command to create the Data Collector configuration directory at /etc/sdc:
    mkdir /etc/sdc
  9. Use the following command from the directory where you extracted the tarball to copy all files from etc into the Data Collector configuration directory that you just created:
    cp -R etc/ /etc/sdc
  10. Use the following command to change the owner of the /etc/sdc directory and all files in the directory to sdc:sdc:
    chown -R sdc:sdc /etc/sdc
  11. Use the following command to set owner only permission on the form-realm.properties file in the /etc/sdc directory:
    chmod go-rwx /etc/sdc/form-realm.properties
  12. Use the following commands to create the Data Collector log directory at /var/log/sdc and change the owner to sdc:sdc:
    mkdir /var/log/sdc
    chown sdc:sdc /var/log/sdc
  13. Use the following commands to create the Data Collector data directory at /var/lib/sdc and change the owner to sdc:sdc:
    mkdir /var/lib/sdc
    chown sdc:sdc /var/lib/sdc
  14. Use the following commands to create the Data Collector resources directory at /var/lib/sdc-resources and change the owner to sdc:sdc:
    mkdir /var/lib/sdc-resources
    chown sdc:sdc /var/lib/sdc-resources
  15. Use the following command to start Data Collector as a service:
    service sdc start  
  16. To add the Data Collector service to the system startup, use the required command for your operating system.
    • For CentOS, use the following command:
      chkconfig --add sdc
    • For Ubuntu, use the following command:
      update-rc.d sdc defaults 97 03
  17. To access the Data Collector console, enter the following URL in the address bar of your browser:
    http://<system-ip>:18630/