Core Installation

You can download and install a core version of Data Collector, and then install individual stage libraries as needed. Use the core installation to install only the stage libraries that you want to use. The core installation allows Data Collector to use less disk space.

To install and launch a core version of Data Collector, you can download the RPM package or the core tarball.

The core installation includes Data Collector, development stages, most processors, and the origins and destinations in the basic stage library, such as Directory and Local FS. You then use the Data Collector console or the command line interface to install additional stage libraries.

The core installation includes the following origins:
  • CoAP Server
  • Directory
  • File Tail
  • HTTP Client
  • HTTP Server
  • MQTT Subscriber
  • SDC RPC
  • SFTP/FTP Client
  • TCP Server
  • UDP Source
  • WebSocket Server
The core installation includes all processors except the Groovy Evaluator, Jython Evaluator, HBase Lookup, Redis Lookup, and Spark Evaluator.
The core installation includes the following destinations:
  • CoAP Client
  • HTTP Client
  • Local FS
  • MQTT Publisher
  • SDC RPC
  • To Error
  • Trash
  • WebSocket Client
The core installation includes the following executors:
  • Email
  • Pipeline Finisher
  • Shell

Installing the Core RPM Package

To install the core version of Data Collector, download the RPM package. After you perform the core installation and launch, install individual stage libraries as needed.

When you install from the RPM package, Data Collector uses the default directories and the default system user and group.

The default system user and group are named sdc. If an sdc user and an sdc group do not exist on the machine, the installation creates the user and group for you and assigns them the next available user ID and group ID.

Tip: To use specific IDs for the sdc user and group, create the user and group before installation and specify the IDs that you want to use. For example, if you’re installing Data Collector on multiple machines, you might want to create the system user and group before installation to ensure that the user ID and group ID are consistent across the machines.
  1. Use the following URL to download the Data Collector RPM package from the StreamSets website: https://streamsets.com/opensource.
  2. Use the following command to extract the file to the desired location:
    tar -xzf streamsets-datacollector-<version>-all-rpms.tgz
    For example, to extract version 2.6.0.0, use the following command:
    tar -xzf streamsets-datacollector-2.6.0.0-all-rpms.tgz
  3. Use the following command to install the core Data Collector RPM package:
    yum localinstall streamsets-datacollector-<version>-1.noarch.rpm
    For example, to install version 2.6.0.0, use the following command:
    yum localinstall streamsets-datacollector-2.6.0.0-1.noarch.rpm
  4. To start Data Collector as a service, use the following command:
    service sdc start
  5. To access the Data Collector console, enter the following URL in the address bar of your browser:
    http://<system-ip>:18630/

Installing the Core Tarball

To install the core version of Data Collector, download the core tarball. After you perform the core installation and launch, install individual stage libraries as needed.

Download the core installation tarball from the StreamSets website, and then use one of the following installation methods to install the core Data Collector: