Dataflow Performance Blog

Managing Data Operations on the Edge

lightweight agentTogether, StreamSets Control Hub (SCH) and StreamSets Data Collector Edge (SDC Edge) allow you to create, deploy and run dataflow pipelines in an unprecedented variety of environments. In this short series of videos, I'll show you how to install SDC Edge on a Raspberry Pi, how to get started building edge pipelines with SCH's Pipeline Designer, and how SDC Edge and its big brother, StreamSets Data Collector (SDC) work together to move data all the way from IoT sensors to the heart of your data infrastructure.


Last November StreamSets announced StreamSets Data Collector Edge, or SDC Edge for short. SDC Edge is an ultra-lightweight agent that runs dataflow pipelines created with StreamSets Data Collector. Written in Go, SDC Edge can run natively on Linux, Windows, Mac, Android and iOS with an executable less than 5MB in size.

Just a couple of weeks later, we announced StreamSets Control Hub (SCH), the basis of the StreamSets Data Operations Platform. SCH includes a web-based design tool and a shared pipeline repository, allowing you to create, deploy and monitor pipelines on both SDC and SDC Edge.

Together, SCH and SDC Edge form a seamless combination, streamlining data operations for even the most constrained environments. Let's get started by installing SDC Edge on a Raspberry Pi. Here's my Pi with its prototyping shield and BMP280 sensor.

Raspberry Pi with Shield

Getting Started with StreamSets Data Controller Edge

Clicking Download a new SDC Edge in SCH creates a tarball that includes the SDC Edge binary and all the configuration needed for SDC Edge to register with SCH, ready to run pipelines. Let's take a look, and build the simplest possible pipeline to test SDC Edge.

Sending Data to StreamSets Data Collector

Our simple pipeline works, but it's not very useful – it's just generating random numbers and sending them to the Trash destination. Let's extend it to send data to SDC via HTTP.

Reading and Processing Sensor Data with SDC Edge

Now we have data flowing, let's look at a more sophisticated pair of pipelines using SDC Edge’s Sensor origin to read temperature data from the environment, and the SDC Aggregator processor to generate alerts when the temperature crosses a threshold of 80°F.


StreamSets Data Collector Edge (SDC Edge) is an ultra-lightweight agent that lets you run dataflow pipelines almost anywhere. SDC Edge is open source, released under the Apache 2.0 license, and available for download or as source code.

SDC Edge works seamlessly with StreamSets Control Hub (SCH), the core of the StreamSets Data Operations Platform, allowing you to create, deploy and monitor dataflow pipelines across your enterprise. SCH is available as an online service or to deploy in your enterprise; contact StreamSets for more information.

Pat PattersonManaging Data Operations on the Edge