StreamSets Data Collector™

Build batch & streaming pipelines in minutes.

Thousands of companies use open source StreamSets Data Collector (SDC) to efficiently build, test and execute dataflow pipelines for data lake and multi-cloud data movement plus cybersecurity, IoT and customer 360 applications.

Data Collection Challenges with Building Robust Dataflow Pipelines

  • Custom Coding

    Not everyone can write custom code, and hand-coding ingest pipelines fails at scale.

  • Lengthy Development

    Build to test to production takes far too long, delaying data delivery to your stakeholders.

  • Brittle Pipelines

    Pipelines are hard-wired but require frequent change, causing many iterations and failures.

In a world marked by data drift, hand-coded pipelines take too long to build and require constant maintenance. With SDC you build and deploy in a fraction of the time and, since these smart pipelines are self-healing when data drifts, you greatly reduce downtime, downstream data pollution and maintenance costs.

Smart, Self-Healing Dataflow Pipelines

Simplify development cycles and build dataflow pipelines in minutes, not weeks or months.

  • Drag-and-drop connectors for batch and streaming sources and destinations.

  • Minimal schema specification speeds development.

  • Smart sensors detect and correct data drift detection automatically.

Lightweight Transformation for Consumption Ready Data

Transform data at any point in the pipeline.

  • Leverage dozens of built-in processors or design your own.

  • Trigger custom code when needed.

  • Identify and handle personal data/PII as it arrives.

Intelligent Monitoring and Error Detection

Ensure continual data delivery with built-in measuring and monitoring.

  • Fine-grained metrics to pinpoint problems.

  • Set triggers and alerts for error cases.

  • Ad hoc data introspection at any point along the pipeline.

Disciplined Dataflow Operations

Continuous integration and deployment, even in the face of constant change.

  • Performance alerts and pipeline snapshots to simplify troubleshooting.

  • Zero downtime when you upgrade underlying systems.

  • Deploy and manage anywhere—in your cluster, across multiple clouds, and even on edge devices (using SDC Edge).

Data Collector Common Use Cases

Apache Kafka Enablement

With StreamSets, you connect applications to Kafka without writing a single line of code!

Hadoop Ingest

StreamSets makes it easy to continuously ingest data into Hadoop and the surrounding ecosystem.

Cloud Migration

Migrate data onto or across cloud providers, including Amazon, Microsoft and Google.

Search Enablement

StreamSets makes it incredibly easy to populate your search solution of choice with data from any source.

Get started today

StreamSets runs on Linux or Mac OS X

Download Open Source
melissaStreamSets Data Collector