StreamSets Data Collector

Award-winning open source software for building any-to-any batch and streaming dataflows.

Create your next pipeline in minutes not days!

Adaptable Flows for Efficiency

Design batch and streaming data flows with minimal coding and maximum flexibility.

  • Intent-driven data flows – only specify the schema required for the job.

  • Smart pipelines that automatically handle data drift (schema and semantic changes).

  • Built-in sanitization performed on edge or natively in your cluster.

Data KPIs for Real-Time Visibility

Monitor and act on data flow performance and data quality.

  • Real-time data flow statistics, metrics for each flow stage and data fidelity measurements.

  • Automated handling and alerting for data drift (schema evolution, semantic shifts) via sampling, threshold rules and alerts.

  • Full capture of fine-grained metadata for lineage and impact analysis.

Architected for Agility

Operate continuously in the face of constant change.

  • DevOps-friendly IDE for agile handling of exceptions and new business requirements.

  • Flexible deployment: embeds in an app stack, runs on your existing cluster and integrates via a REST API.

  • Zero-downtime when you upgrade infrastructure due to logical isolation of each flow stage.

Use Cases

StreamSets simplifies ingest for numerous applications

Log Shipping

StreamSets is an effective tool for retrieving and transporting log messages from files, syslog, or gathering collectd metrics.

StreamSets can monitor and introspect the data as it's being ingested, alerting administrators to potential errors, anomalies, or outliers in realtime.

Kafka Enablement

Today, to read and write to Kafka, you must write your own custom integration code. With StreamSets, you connect your applications to Kafka without writing a single line of code!

StreamSets Data Collector includes out of the box connectors for Kafka and many other sources and destinations.

Search Ingest

Search can be an excellent tool for both analysts and engineers.

StreamSets makes it incredibly easy to populate your search solution of choice with data from any source. Data can even be routed to the appropriate search indices based on the values in the data.

Hadoop Ingest

StreamSets makes it easy to continuously ingest data into Hadoop and the surrounding ecosystem.

Using the StreamSets configuration-driven UI, it takes minutes to design pipelines to ingest data from origins like relational databases, flat files, AWS, and many other locations and write to systems like HDFS, HBase, Solr, and more. Using StreamSets, data can be delivered to downstream systems, ready for consumption, by routing, transforming, and enriching data during ingestion.

Get started today

StreamSets runs on Linux or Mac OS X

Download Open Source
melissaStreamSets Data Collector