StreamSets solutions architect Alex Woolford is a data engineer with deep experience building robust and scalable solutions using technologies such as the StreamSets DataOps Platform, Apache Kafka, and the Cloudera and Hortonworks Hadoop distributions. In his role at StreamSets, Alex provides our customers with expertise including architecture design, demonstration systems, prototypes, presentations, and product configurations. As well as all that, Alex loves to share his experience via his YouTube channel. So far, Alex has created about 30 videos on StreamSets products – here’s a selection of six that provide a great introduction to the features and capabilities of StreamSets Control Hub. If these videos pique your interest in Control Hub, sign up here for a free 30 day trial!
A Quick Introduction to StreamSets Control Hub
In this first video, Alex covers the basics of Control Hub, showing how to register a StreamSets Data Collector instance with Control Hub, import a pipeline from Data Collector to Control Hub and create a job, allowing the pipeline to run on a Data Collector instance.
Pipeline Management and Version Control with StreamSets Control Hub
Alex shows how to externalize configuration values such as usernames, hostnames and passwords from pipelines as runtime parameters, how to import a pipeline into Control Hub, and then how to override the default pipeline parameters when he creates a Control Hub job. Moving on to the Control Hub’s pipeline repository, Alex explains how to view the differences between pipeline versions, and how to configure a webhook to send a notification when a new pipeline version is committed. Finally, Alex shows how the StreamSets Python SDK can programmatically create pipelines, allowing test automation.
- Runtime Parameters
- Importing Pipelines
- Comparing Pipeline Versions
- Configuring Webhooks
- StreamSets SDK for Python
Jobs and Topologies in StreamSets Control Hub
Control Hub allows you to assign pipelines to Data Collector instances via jobs, and group multiple jobs into a topology that provides an end-to-end view of data moving across the enterprise. In this video, Alex explains how pipelines, jobs and topologies are related, and walks you through the process of creating them.
A Quick Introduction to LDAP and SAML with StreamSets Control Hub
Alex demonstrates Control Hub’s LDAP integration, showing how user identities can be centrally managed in an LDAP directory such as Active Directory or RedHat freeIPA. Control Hub can use LDAP group membership in assigning permissions to users, and Alex uses this to control which pipelines are visible to particular users. Alex wraps up this video by taking a quick look at single sign-on via SAML.
Configure StreamSets Control Hub to Authenticate using Google’s SAML
Alex explains how to set up single sign-on between Google and Control Hub using the SAML protocol, walking you through the process in detail.
How to Deploy Data Collector pipelines on Kubernetes with Control Hub
Control Hub can use Kubernetes to provision Data Collector instances. Alex shows how to configure the StreamSets Control Agent with Minikube, a single-node Kubernetes cluster, then creates a Control Hub job that runs on multiple Data Collector instances simultaneously, writing IoT data to a Kafka topic.
StreamSets Control Hub Free Trial
As I mentioned in the opening, if you’re running StreamSets Data Collector, but not already using StreamSets Control Hub, you might be interested to know that we are currently running a free Control Hub trial. Sign up here to gain free access to Control Hub for 30 days, and learn how you can better implement DataOps in your enterprise.