Edge Data Collectors Overview

Control Hub uses StreamSets Data Collector Edge (SDC Edge) to execute edge pipelines. SDC Edge is a lightweight agent that runs pipelines on edge devices with limited resources.

You install Edge Data Collectors on edge devices and then register them to work with Control Hub. When you register an SDC Edge, you assign labels to the SDC Edge. The labels determine which jobs are run on that SDC Edge.

You use an authoring Data Collector to design edge pipelines. You can design edge pipelines in the Control Hub Pipeline Designer after selecting an available authoring Data Collector to use. Or, you can directly log into an authoring Data Collector to design edge pipelines.

Edge pipelines run in edge execution mode. Edge pipelines work in tandem with pipelines running in standalone execution mode on Data Collector. Edge pipelines are bidirectional - they can both send data to other pipelines and receive data from other pipelines. To use edge pipelines, you'll work with the following types of pipelines:
Edge sending pipeline
An edge sending pipeline runs on SDC Edge. It uses an origin specific to the edge device to read local data residing on the device. The pipeline can perform minimal processing on the data before sending the data to a Data Collector receiving pipeline.
Data Collector receiving pipeline
A Data Collector receiving pipeline runs on Data Collector. It reads data from the edge sending pipeline destination. Some systems require an intermediary message broker. The Data Collector receiving pipeline performs more complex processing on the data as needed, and then it writes the data to the final destinations.
Edge receiving pipeline
An edge receiving pipeline runs on SDC Edge. It listens for data sent by another pipeline running on Data Collector or on SDC Edge and then acts on that data to control the edge device.

For more information about designing edge pipelines including the supported stages, see Edge Pipelines.

After designing edge pipelines, you publish them to Control Hub. Within Control Hub, you add edge pipelines to jobs that run on an SDC Edge. You add the Data Collector receiving pipelines to jobs that run on an execution Data Collector.

For more details about how Control Hub and SDC Edge work together, see SDC Edge Communication.