Quick Start Guide and Data Collector Installation Video
- Download the tarball from your StreamSets Account. (You can create an account for free.)
- Download and install Java 8 JDK or OpenJDK 8. (You must have Java 8 JDK, not Java 8 JRE.)
- Open the terminal window and set your file descriptors limit to at least 32768.
- Extract the tarball by entering this command in the terminal window: tar xvzf streamsets-datacollector-all-<VERSION>.tgz
- After the tarball is extracted, change the folder to the root of the installation. For example, cd streamsets-datacollector-<VERSION>.
- Run StreamSets Data Collector by running this command in the terminal window: bin/streamsets dc
- In your browser, enter the URL shown in the terminal window. For example, http://10.0.0.100:18360
- To start using StreamSets Data Collector, log in with your StreamSets Account credentials.
Note: Replace <VERSION> with the current version number and remove brackets.
Getting Started Videos
Easy as 1-2-3: select an origin, select a processor, select a destination and run.
How to ingest Twitter data in real time and send transformed tweets to Apache Kafka.
How to access sample data pipelines and sample data to jump start your project.
Gain insight into each stage before running your data pipeline.
What Is a Data Collector?
StreamSets Data Collector is a powerful execution engine used to route and process data in batch, streaming, or CDC pipelines. Data Collector processes data when it arrives at the origin and waits quietly when not needed. You can view real-time statistics about your data, inspect data as it passes through the pipeline, or take a closer look at a snapshot of your data.
You can use Data Collectors anywhere you need to ingest data by configuring data pipelines to run automatically. It doesn’t matter if your data sources are on-prem, cloud-to-cloud or on-prem-to-cloud, use the pre-built connectors and native integrations to configure your pipeline without coding. With smart data pipelines, you can spend more time building new data pipelines and less time rewriting and fixing old pipelines.