Skip to content

StreamSets Delivers Ultralight Open Source Ingestion for Edge Devices

New StreamSets Data Collector Edge™ Brings Data Operations to Internet of Things and Cybersecurity Applications

SAN FRANCISCO, Calif. – November 28, 2017 – StreamSets Inc., provider of the industry’s first enterprise data operations platform, today debuted StreamSets Data Collector Edge (SDC Edge), enabling the industry’s first end-to-end data ingestion solution for resource- and connectivity-constrained systems such as Internet of Things (IoT) devices and the network infrastructure and personal devices that inform cybersecurity applications.

Available today as open source software, SDC Edge packs the core functionality of the widely adopted StreamSets Data Collector into a footprint of less than 5MB, an order of magnitude smaller than alternatives. This makes it ideal for IoT use cases, where today ingestion logic is often hand-coded and tightly coupled to the specific device. As a result, dataflows are difficult to maintain as devices are upgraded, are poorly instrumented for operational dataflow management, and often require a gateway that adds cost, complexity and latency. The benefits of a small footprint also apply to cybersecurity initiatives, where its low CPU consumption and limited attack surface allow deployment of SDC Edge across large populations of mobile endpoints and networking systems.

Key characteristics of SDC Edge include:

  • Ultralight — Requires less than 5MB and does not need additional software (e.g. Java) to operate.
  • Platform-independent — Based on Go, SDC Edge runs on a broad range of operating systems, including Linux, OS X, Windows and Android.
  • Drag-and-drop dataflow design — Identical to StreamSets Data Collector, pipelines are built using origin, destination and transformation objects, with the option to plug in scripts and trigger custom code execution.
  • Edge analytics — SDC Edge performs computations such as data normalization, redaction and aggregation, and is architected to support full-featured edge analytics, including machine and deep-learning models.
  • Multiple bidirectional pipelines — SDC Edge can run multiple pipelines on the same edge device, and pipelines can both send or receive data.
  • No IoT gateway cost — Data can now be ingested directly to storage/compute systems without the added cost, complexity and latency of a separate IoT gateway system.
  • Performance management — Using StreamSets Dataflow Performance Manager™, SDC Edge can be deployed at scale, and metadata drives Live Data Map visualization and enforcement of source-to-consumption data SLAs.

IoT and cybersecurity are both red-hot spaces for big data innovation. Applying machine learning and other analytic techniques to data aggregated from IoT sensors and devices can help in areas as diverse as factory equipment, construction, oil and gas, and medical devices. Cybersecurity applications benefit from applying advanced analytics to the vast quantities of data collected across a corporate network in order to detect imminent threats or attacks in progress.

“The massive volume of data created by the explosion of digital devices presents an invaluable opportunity for analytics and insight. However, harnessing this data for important efforts such as IoT and cybersecurity has been a challenge due to the lack of end-to-end data ingestion frameworks,” said Arvind Prabhakar, co-founder and CTO, StreamSets. “We built SDC Edge to bring disciplined, well-managed data movement to huge populations of IoT sensors and personal devices so that the promised benefits of these critical initiatives are realized.”

SDC Edge can be downloaded for free, directly from the StreamSets websiteSource code is available on Github.

About StreamSets

StreamSets provides an innovative platform for data in motion that reinvents how enterprises deliver timely and trustworthy data to their critical applications. StreamSets Data Collector™ is award-winning, open source software for the development of any-to-any dataflows. StreamSets Dataflow Performance Manager (DPM™) provides a comprehensive control panel for managing the day-to-day operation of complex dataflow topologies. Founded by Girish Pancha, former chief product officer of Informatica, and Arvind Prabhakar, a former engineering leader at Cloudera, StreamSets is backed by top-tier Silicon Valley venture capital firms, including Accel Partners, Battery Ventures and New Enterprise Associates (NEA). For more information, visit

Media Contact:

Brittney Timmins
BOCA Communications

Back To Top