Download and Install StreamSets Data Collector
How to easily move data between any source and destination
Quick Start Guide and Data Collector Installation Video
StreamSets Data Collector download is only available to existing users. If you are new to StreamSets, we encourage you to try our cloud-native platform for free.
- Use your StreamSets Account and download the tarball.
- Download and install Java 8 JDK or OpenJDK 8. (You must have Java 8 JDK, not Java 8 JRE.)
- Open the terminal window and set your file descriptors limit to at least 32768.
- Extract the tarball by entering this command in the terminal window: tar xvzf streamsets-datacollector-all-<VERSION>.tgz
- After the tarball is extracted, change the folder to the root of the installation. For example, cd streamsets-datacollector-<VERSION>.
- Run StreamSets Data Collector by running this command in the terminal window: bin/streamsets dc
- In your browser, enter the URL shown in the terminal window. For example, http://10.0.0.100:18360
- To start using StreamSets Data Collector, log in with your StreamSets Account credentials.
Note: Replace <VERSION> with the current version number and remove brackets.
Getting Started Videos
Easy as 1-2-3: select an origin, select a processor, select a destination and run.
How to ingest Twitter data in real time and send transformed tweets to Apache Kafka.
How to access sample data pipelines and sample data to jump start your project.
Gain insight into each stage before running your data pipeline.
Build, run, monitor, and manage data pipelines for any design pattern with one log in.
What Is a Data Collector?
StreamSets Data Collector Engine is a powerful execution engine used to route and process data in batch, streaming, or CDC pipelines. The Data Collector Engine processes data when it arrives at the origin and waits quietly when not needed. You can view real-time statistics about your data, inspect data as it passes through the pipeline, or take a closer look at a snapshot of your data.
You can use Data Collector Engines anywhere you need to ingest data by configuring data pipelines to run automatically. It doesn’t matter if your data sources are on-prem, cloud-to-cloud or on-prem-to-cloud, use the pre-built connectors and native integrations to configure your pipeline without coding. With smart data pipelines, you can spend more time building new data pipelines and less time rewriting and fixing old pipelines.