High Availability

You can run a single StreamSets Control Hub instance in a development environment. However, in a production environment, we recommend using multiple Control Hub instances and a load balancer to ensure that Control Hub is highly available.

To set up Control Hub as a highly available system, complete the following tasks:
Use highly available database clusters
Use highly available database clusters:
  • For the relational database, use MariaDB Galera Cluster, MySQL Enterprise High Availability, or PostgreSQL with high availability enabled.
  • For the time series database, use InfluxEnterprise.
Install Control Hub on multiple machines
Install Control Hub on multiple machines, ensuring that each Control Hub instance uses the same relational and time series database and the same SMTP account for emails.
Set up a load balancer for Control Hub
Set up a load balancer to distribute user and registered Data Collector, Data Collector Edge, and Provisioning Agent requests across the Control Hub system. These Control Hub clients use the load balancer URL to communicate with the Control Hub system. In addition, each Control Hub instance accesses the front end of the load balancer to communicate with the other Control Hub instances.
We recommend using a Layer 7 load balancer such as HAProxy, NGINX, or F5. As a best practice, use multiple instances of the load balancer to ensure that the load balancer is also highly available.

The following image displays the components of a highly available Control Hub system: