You can’t manage what you can’t see. StreamSets Control Hub gives you a live data map based on metadata emitted by StreamSets Data Collectors. For the first time, you can visualize connections between sources, stores and other infrastructure. You can measure throughput, latency and errors across any path on the map, be it end to end, point to point, or simply into and out of an individual system.
As data volumes grow, it is important that your dataflows don’t become bottlenecks as you funnel real-time data to important applications. StreamSets Data Collector, our data movement execution engine, runs natively and automatically on YARN or Mesos clusters or within Kubernetes environments. It also natively supports multithreaded operation and high availability.
Data movement pipelines are being built by developers and data scientists across your business. StreamSets helps you centralize the design of dataflow logic in order to facilitate collaboration, reuse and establishment of best practices and corporate standards. It includes a pipeline repository and version control over dataflows to support a disciplined production process.