SDC Edge Communication

StreamSets Control Hub works with Data Collector Edge (SDC Edge) to execute edge pipelines. SDC Edge is a lightweight agent that runs pipelines on edge devices with limited resources.

You install each SDC Edge on an edge device in your corporate network, and then register it to work with Control Hub.

You use an authoring Data Collector to design edge pipelines. You can design edge pipelines in the Control Hub Pipeline Designer after selecting an available authoring Data Collector to use. Or, you can directly log into an authoring Data Collector to design edge pipelines using the Data Collector UI.

To preview and validate edge pipelines as you design them, the authoring Data Collector must connect to a registered SDC Edge. The SDC Edge accepts inbound connections from the authoring Data Collector over HTTP or HTTPS on the port number configured for the SDC Edge.

Registered Edge Data Collectors use encrypted REST APIs to communicate with Control Hub. Edge Data Collectors initiate outbound connections to Control Hub over HTTPS on port number 443.

The following image shows how each SDC Edge communicates with Control Hub and with the authoring Data Collector:

SDC Edge Requests

Just like Data Collector, a registered SDC Edge sends requests and information to Control Hub.

Control Hub does not directly send requests to an SDC Edge. Instead, Control Hub sends requests using encrypted REST APIs to a messaging queue managed by Control Hub. An SDC Edge periodically checks with the queue to retrieve Control Hub requests.

SDC Edge communicates with Control Hub in the following areas:

Metrics
Every minute, an SDC Edge sends metrics for remotely running edge pipelines directly to Control Hub.
Messaging queue
Edge Data Collectors send the following information to the messaging queue:
  • At startup, an SDC Edge sends the following information: SDC Edge version, HTTP URL of the SDC Edge, and labels configured in the SDC Edge configuration file, edge.conf.
  • Every five seconds, an SDC Edge sends a heartbeat and any status changes for remote edge pipelines.
  • Every minute, an SDC Edge sends the last-saved offsets of remotely running edge pipelines and the status of all running edge pipelines.
Every three seconds, Control Hub checks the messaging queue to retrieve pipeline status changes and last-saved offsets sent by each SDC Edge.
Every five seconds, each SDC Edge checks with the messaging queue to retrieve requests sent by Control Hub. When you start, stop, or delete a job, Control Hub sends a pipeline request for a specific SDC Edge to the messaging queue. The messaging queue retains the request until the receiving SDC Edge retrieves the request.