skip to Main Content

Download and Install StreamSets Data Collector

How to easily move data between any source and destination

Quick Start Guide and Data Collector Installation Video

StreamSets Data Collector download is only available to existing users. If you are new to StreamSets, we encourage you to try our cloud-native platform for free.

  1. Use your StreamSets Account and download the tarball.
  2. Download and install Java 8 JDK or OpenJDK 8. (You must have Java 8 JDK, not Java 8 JRE.)
  3. Open the terminal window and set your file descriptors limit to at least 32768. 
  4. Extract the tarball by entering this command in the terminal window: tar xvzf streamsets-datacollector-all-<VERSION>.tgz
  5. After the tarball is extracted, change the folder to the root of the installation. For example, cd streamsets-datacollector-<VERSION>. 
  6. Run StreamSets Data Collector by running this command in the terminal window: bin/streamsets dc
  7. In your browser, enter the URL shown in the terminal window. For example,
  8. To start using StreamSets Data Collector, log in with your StreamSets Account credentials.

Note: Replace <VERSION> with the current version number and remove brackets.

The StreamSets DataOps Platform

Build smart data pipelines in minutes and deploy across hybrid and multi-cloud platforms from a single log in.

Data Engineering For DataOps On AWS
Data Engineering For DataOps On Azure
Data Engineering For DataOps On Google Cloud
Data Engineering For DataOps On Snowflake
Data Engineering For DataOps On Databricks

Getting Started Videos

Easy as 1-2-3: select an origin, select a processor, select a destination and run.

How to ingest Twitter data in real time and send transformed tweets to Apache Kafka.

How to access sample data pipelines and sample data to jump start your project.

Gain insight into each stage before running your data pipeline.

Build, run, monitor, and manage data pipelines for any design pattern with one log in. 

What Is a Data Collector?

StreamSets Data Collector Engine is a powerful execution engine used to route and process data in batch, streaming, or CDC pipelines. The Data Collector Engine processes data when it arrives at the origin and waits quietly when not needed. You can view real-time statistics about your data, inspect data as it passes through the pipeline, or take a closer look at a snapshot of your data. 

You can use Data Collector Engines anywhere you need to ingest data by configuring data pipelines to run automatically. It doesn’t matter if your data sources are on-prem, cloud-to-cloud or on-prem-to-cloud, use the pre-built connectors and native integrations to configure your pipeline without coding. With smart data pipelines, you can spend more time building new data pipelines and less time rewriting and fixing old pipelines.

Back To Top

We use cookies to improve your experience with our website. Click Allow All to consent and continue to our site. Privacy Policy