Control. We always want it, regularly don’t get it, yet in business it’s a must have to ensure things run as expected. Control is particularly critical when it comes to moving data around your company. Without it, it’s difficult to know where data is coming from, where it’s going and how it’s been manipulated (and by whom!) along the way. At StreamSets, we specialize in helping customers effectively move data around their business, from any source to any destination. Over time, we’ve observed that most organizations lack proper controls to adequately ensure that dataflow pipelines are built, executed and operated in a manner that meets the needs of the application or business process which they support.
Specifically, we see customers facing the following challenges:
- Data engineering teams growing in number and working in silos who need tools to help them collaborate more effectively on dataflow pipeline development.
- Organizations needing to scale up their development and architecture efforts by using toolkits, templates and API’s.
- As the number and complexity of dataflow pipelines grows, keeping tabs on them all becomes an onerous task, especially as they evolve over their lifecycle.
- Understanding how pipelines are intertwined and interact becomes difficult. This is particularly important for situations requiring full visibility into end-to-end data movement for compliance reasons.
To address these challenges, we’re excited to announce StreamSets Control Hub (SCH) as a way for teams to gain better control overall their dataflow pipelines. SCH extends the value of the StreamSets Data Operations Platform by delivering capabilities that help teams streamline how they develop, deploy and execute pipelines. Combined with StreamSets Data Collector and StreamSets Dataflow Performance Manager, SCH helps customers mature their approach to data movement and data operations.
StreamSets Control Hub addresses the challenges noted above by bringing the following capabilities to the StreamSets platform:
- Cloud-based design tool & shared pipeline repository: cloud-based design tool with the same UI functionality as StreamSets Data Collector. SCH adds a shared pipeline repository for team collaboration and pipeline lifecycle management.
- Automated deployment and provisioning: Automatically deploy pipelines created in SDC or Stream Sets Data Collector Edge and deploy and elastically scale pipelines via Kubernetes.
- Architecture wide visibility and control: Get a complete topology view and visualize multiple pipelines as a dataflow topology. Integrates directly with StreamSets Dataflow Performance Manager™ for up to the moment dataflow insights and the ability to set Data SLAs.
- Data governance support: Flow metadata throughout dataflow pipelines with built in processors to store metadata at any point in a dataflow. Metadata pushdown integration with Cloudera Navigator™ and Apache Atlas™
We couldn’t be more excited about the introduction of StreamSets Control Hub. As we strive to help customers along their path to success with building, executing and operating dataflow pipelines, SCH is an ideal solution to help teams to collaborate better and scale their efforts more effectively. We encourage you to check out our product page to learn more.