Installation Requirements

Install StreamSets Control Hub on a machine that meets the following minimum requirements:

Component Minimum Requirement
Operating system Use one of the following operating systems and versions:
  • CentOS 6.x or 7.x
  • Oracle Linux 6.x or 7.x
  • Red Hat Enterprise Linux 6.x or 7.x
  • Ubuntu 14.04 LTS
Java Oracle Java 8 or OpenJDK 8.

Also requires Java Cryptography Extension (JCE) Unlimited Strength Jurisdiction Policy Files 8.

The remaining requirements depend on whether you are installing a single Control Hub instance for a development environment or multiple Control Hub instances for a highly available production environment:

Component Single Installation Multiple Installations for High Availability
CPU 8 4
RAM 15 GB 7.5 GB
Disk space 50 GB 30 GB
If installing on Amazon EC2 instances, install Control Hub on a separate volume instead of the root volume. Use the following instance types:
  • Single installation - c4.2xlarge
  • Multiple installations for high availability - c4.xlarge

General Access Requirements

After installation, Control Hub requires access to the following components. These components can be local or remote to the Control Hub installations:

Component Minimum Requirement
SMTP account SMTP account to send emails.
Load balancer Load balancer to set up a highly available Control Hub system. We recommend using a Layer 7 load balancer such as HAProxy, NGINX, or F5.

Required for a production environment, optional for a development environment.

Browser Use the latest version of one of the following browsers:
  • Chrome
  • Firefox
  • Safari
StreamSets Data Collector StreamSets recommends using the latest version of Data Collector.
The minimum supported Data Collector version depends on how you use Data Collector:
  • Version 2.1.0.0 or later is required to design pipelines in Data Collector and to run standalone and cluster pipelines from jobs.
  • Version 3.0.0.0 or later is required as the authoring Data Collector used to design pipelines in the Control Hub Pipeline Designer.
  • Version 3.2.0.0 or later is required as the authoring Data Collector used to design pipeline fragments.
  • Version 3.4.0 or later is required to monitor the CPU load and memory usage of each Data Collector from within Control Hub.
  • Version 3.5.0 or later with the Data Protector stage library installed is required to use Data Protector.

If needed, you can customize the supported Data Collector version range.

StreamSets Transformer StreamSets recommends using the latest version of Transformer to design and execute Transformer pipelines from Control Hub.
StreamSets Data Collector Edge (SDC Edge) StreamSets recommends using the latest version of SDC Edge to run edge pipelines from jobs.

The minimum supported SDC Edge version is 3.0.0.0. Version 3.4.0 or later is required to monitor the CPU load and memory usage of each SDC Edge from within Control Hub.

Statistics aggregator Use one of the following systems to aggregate pipeline statistics when jobs run on multiple Data Collectors:
  • Amazon Kinesis Streams
  • Kafka version supported by Data Collector
  • MapR Streams version supported by Data Collector
Note: In a development environment, you can also use SDC RPC to aggregate pipeline statistics. Using SDC RPC to aggregate statistics is not highly available and might cause the loss of some data. It should be used for development purposes only.

Relational Database Requirements

Control Hub supports MariaDB, MySQL, or PostgreSQL for the relational database.

MariaDB Requirements

The relational database for a single Control Hub instance supports MariaDB 10.0 through 10.3.

The relational database for a highly available Control Hub system supports MariaDB Galera Cluster 10.0 through 10.3.

MariaDB installations must meet the following minimum requirements:

Component Single Installation Multiple Installations for High Availability
CPU 4 4
RAM 30.5 GB 30.5 GB
Disk space 50 GB 100 GB
If installing on Amazon EC2 instances, use the following instance types:
  • Single installation - db.r3.xlarge
  • Multiple installations for high availability - db.r3.xlarge

MySQL Requirements

The relational database for a single Control Hub instance supports MySQL 5.6 or 5.7.

The relational database for a highly available Control Hub system supports MySQL Enterprise High Availability 5.6 or 5.7.

MySQL installations must meet the following minimum requirements:

Component Single Installation Multiple Installations for High Availability
CPU 4 4
RAM 30.5 GB 30.5 GB
Disk space 50 GB 100 GB
If installing on Amazon EC2 instances, use the following instance types:
  • Single installation - db.r3.xlarge
  • Multiple installations for high availability - db.r3.xlarge

PostgreSQL Requirements

The relational database for a single Control Hub instance supports PostgreSQL 9.4 or 9.6.

The relational database for a highly available Control Hub system supports PostgreSQL 9.4 or 9.6 with high availability enabled.

PostgreSQL installations must meet the following minimum requirements:

Component Single Installation Multiple Installations for High Availability
CPU 4 4
RAM 30.5 GB 30.5 GB
Disk space 50 GB 100 GB
If installing on Amazon EC2 instances, use the following instance types:
  • Single installation - db.r3.xlarge
  • Multiple installations for high availability - db.r3.xlarge

Time Series Database Requirements

The time series database for a single Control Hub instance supports InfluxDB 1.3 or 1.7.

The time series database for a highly available Control Hub system supports InfluxDB Enterprise 1.3 or 1.7, with a minimum of 2 data nodes and 3 meta nodes.

Influx installations must meet the following minimum requirements:

Component Single Installation Multiple Installations for High Availability
CPU 4 8
RAM 30.5 GB 61 GB
Disk space 250 GB 500 GB
If installing on Amazon EC2 instances, use the following instance types:
  • Single installation - r4.xlarge
  • Multiple installations for high availability - r3.2xlarge

Default Ports

The following table lists the default ports exposed to Control Hub clients and how they are used. Note that the default port numbers can be changed during installation.

In a development environment, configure network routes and firewalls so that web UI clients and registered Data Collectors, Edge Data Collectors, and Provisioning Agents can reach the Control Hub IP addresses.

In a highly available production environment, configure network routes and firewalls so that the Control Hub instances, web UI clients, and registered Data Collectors, Data Collector Edges, and Provisioning Agents can reach the load balancer.

System Default Port Protocol Usage
Control Hub
  • HTTP - 18631
  • HTTPS - depends on dpm.properties configuration
TCP Access to the Control Hub web-based UI and API for a single Control Hub instance in a development environment.

Used by developers and administrators to access the UI. Used by registered Data Collectors, Edge Data Collectors, and Provisioning Agents to access the API.

Control Hub Admin tool
  • HTTP - 18632
  • HTTPS - depends on dpm.properties configuration
TCP Access to the Control Hub Admin tool web-based UI for a single Control Hub instance in a development environment.

Used by administrators to access the UI.

Load balancer Depends on the chosen load balancer TCP When using multiple Control Hub instances in a highly available production environment, both Control Hub and the Control Hub Admin tool are accessed through a load balancer.

The following table lists the default ports of the external systems that Control Hub depends on and how they are used. The default port numbers can change - confirm the actual numbers with your systems administrator.

External System Default Port Protocol Usage
MariaDB 3306 TCP Relational database that stores Control Hub application data.
MySQL 3306 TCP Relational database that stores Control Hub application data.
PostgreSQL 5432 TCP Relational database that stores Control Hub application data.
InfluxDB 8086 TCP Time series database that stores metrics.
LDAP or LDAPS 389

636

TCP Used when Control Hub is configured for LDAP or LDAPS authentication.
SMTP 465 TCP Used by Control Hub to send email notifications.