Architecture
StreamSets Control Hub architecture includes applications that manage user requests and databases that store application metadata and time series data.
System Data Collector
Data Collector Communication
StreamSets Control Hub works with Data Collector to design pipelines and to execute standalone and cluster pipelines.
SDC Edge Communication
StreamSets Control Hub works with Data Collector Edge (SDC Edge) to execute edge pipelines. SDC Edge is a lightweight agent that runs pipelines on edge devices with limited resources.
Provisioning Agent Communication
A Provisioning Agent is a containerized application that runs in a container orchestration framework, such as Kubernetes. The agent automatically provisions Data Collector containers in the Kubernetes cluster on which it runs.
High Availability
Authentication
Installation Requirements
Overview
Creating the Databases
Install the required database software and create the databases before installing StreamSets Control Hub.
Installing Control Hub
You can install Control Hub on the same machine as the required databases or on a remote machine. For best performance, we recommend installing on a remote machine.
Enabling LDAP Authentication
If your company uses Lightweight Directory Access Protocol (LDAP), you can use the LDAP provider to authenticate Control Hub users. LDAP authenticates a user using the credentials stored in the LDAP server.
Enabling HTTPS
To secure communications to the Control Hub UI and REST API, enable both Control Hub and the separate Admin tool to use HTTPS. HTTPS requires an SSL/TLS certificate.
Enabling Data Protector
StreamSets Data Protector enables in-stream discovery of data in motion and provides a range of capabilities to implement complex data protection policies.
Setting Up a Highly Available Environment
In a production environment, we recommend using multiple Control Hub instances and a load balancer to ensure that Control Hub is highly available.
Uninstalling Control Hub
Overview
Organizations
An organization is a secure space provided to a set of Control Hub users. All Data Collectors, pipelines, jobs, topologies, and other objects added by any user in the organization belong to that organization. A user logs in to Control Hub as a member of an organization and can access data that belongs to that organization only.
Dashboards
Messaging View
Pipeline Templates
Data Collector Version Range
Administer Control Hub Applications
Logs
Control Hub Configuration
You can edit StreamSets Control Hub configuration files to configure properties such as the host name and port number and SMTP account information for emails. You can also customize Control Hub to display your company logo instead of the StreamSets logo in the user interface.
Control Hub Environment Configuration
Starting Control Hub
Shutting Down Control Hub
Renewing the Control Hub License
Control Hub Admin Tool
StreamSets Control Hub includes a separate Admin tool. Use the Control Hub Admin tool to monitor and troubleshoot Control Hub issues. For example, if Control Hub becomes inaccessible, the Admin tool remains running. You can still log into the Admin tool to troubleshoot the Control Hub issues.
Overview
Preparing for the Upgrade
Before upgrading Control Hub, shut down and back up the previous Control Hub version and then back up the Control Hub databases. Depending on the version that you are upgrading from, create the new required databases.
Upgrading the Initial Control Hub Instance
If you are upgrading a development environment, follow these instructions to upgrade the single Control Hub instance.
Upgrading a Highly Available Environment
For a highly available production environment, upgrade the additional Control Hub instances and update the load balancer used by Control Hub.
Starting and Logging Into the Upgraded Control Hub
Completing Post Upgrade Tasks
In some situations, you must complete tasks within Control Hub after you upgrade.
© 2018 StreamSets, Inc.