Core Installation

You can download and install a core version of Data Collector, and then install individual stage libraries as needed. Use the core installation to install only the stage libraries that you want to use. The core installation allows Data Collector to use less disk space.

To install and launch a core version of Data Collector, you can download the RPM package or the core tarball.

The core installation includes Data Collector and the following stage libraries:
  • Basic stage library
  • Data formats stage library
  • Development stage library
  • Statistics stage library
  • Windows stage library

You then use the Data Collector UI or the command line interface to install additional stage libraries.

The core installation includes all development stages and the following origins:

  • CoAP Server
  • Directory
  • File Tail
  • HTTP Client
  • HTTP Server
  • MQTT Subscriber
  • OPC UA Client
  • SDC RPC
  • SFTP/FTP Client
  • TCP Server
  • UDP Multithreaded Source
  • UDP Source
  • WebSocket Client
  • WebSocket Server
  • Windows Event Log

The core installation includes all processors except the Databricks ML Evaluator, Encrypt and Decrypt Fields, Groovy Evaluator, HBase Lookup, Hive Metadata, Jython Evaluator, Kudu Lookup, MLeap Evaluator, PMML Evaluator, PostgreSQL Metadata processor, Redis Lookup, Spark Evaluator, SQL Parser, TensorFlow Evaluator, and Whole File Transformer processors.

The core installation includes the following destinations:
  • CoAP Client
  • HTTP Client
  • Local FS
  • MQTT Publisher
  • Named Pipe
  • SDC RPC
  • Splunk
  • Syslog
  • To Error
  • Trash
  • WebSocket Client
The core installation includes the following executors:
  • Databricks
  • Email
  • Pipeline Finisher
  • Shell

Installing the Core RPM Package

You can install the Data Collector RPM package and start it as a service on CentOS or Red Hat Enterprise Linux. To install the core version of Data Collector, download the RPM package. After you perform the core installation and launch, install individual stage libraries as needed.

When you install from the RPM package, Data Collector uses the default directories and the default system user and group.

The default system user and group are named sdc. If an sdc user and an sdc group do not exist on the machine, the installation creates the user and group for you and assigns them the next available user ID and group ID.

Tip: To use specific IDs for the sdc user and group, create the user and group before installation and specify the IDs that you want to use. For example, if you’re installing Data Collector on multiple machines, you might want to create the system user and group before installation to ensure that the user ID and group ID are consistent across the machines.
  1. Use the following URL to download the Data Collector RPM package for your operating system from the StreamSets website: https://streamsets.com/opensource:
    • For CentOS 6, Oracle Linux 6, or Red Hat Enterprise Linux 6, download the RPM EL6 package.
    • For CentOS 7, Oracle Linux 7, or Red Hat Enterprise Linux 7, download the RPM EL7 package.
  2. Use the following command to extract the file to the desired location:
    tar xf streamsets-datacollector-<version>-<operating_system>-all-rpms.tar
    For example, to extract version 3.6.0 on CentOS 7, use the following command:
    tar xf streamsets-datacollector-3.6.0-el7-all-rpms.tar
  3. Use the following command to install the core Data Collector RPM package:
    yum localinstall streamsets-datacollector-<version>-1.noarch.rpm
    For example, to install version 3.6.0, use the following command:
    yum localinstall streamsets-datacollector-3.6.0-1.noarch.rpm
  4. To start Data Collector as a service, use the required command for your operating system:
    • For CentOS 6, Oracle Linux 6, or Red Hat Enterprise Linux 6, use:
      service sdc start
    • For CentOS 7, Oracle Linux 7, or Red Hat Enterprise Linux 7, use:
      systemctl start sdc
  5. To access the Data Collector UI, enter the following URL in the address bar of your browser:
    http://<hostname>:18630/

Installing the Core Tarball

To install the core version of Data Collector, download the core tarball. After you perform the core installation and launch, install individual stage libraries as needed.

Download the core installation tarball from the StreamSets website, and then use one of the following installation methods to install the core Data Collector: