skip to Main Content

Download and Install StreamSets Transformer

How to download and install ETL software to simplify Spark data pipelines

Quick Start Guide and ETL Software Installation Video

  1. Download the tarball from your StreamSets Account.
  2. Download and install Apache Spark 2.4.7 
  3. Extract the Apache Spark tarball by entering this command in the terminal window: tar xvzf spark-2.4.7-bin-without-hadoop.tgz
  4. Extract the Transformer tarball by entering this command in the terminal window: tar xvzf streamsets-transformer-all-<VERSION>.tgz
  5. Change the folder to the root of the Transformer installation. For example: cd streamsets-transformer-<VERSION>
  6. Change directory to libexec folder underneath the root of the Transformer installation and edit the transformer-env.sh file. Add the below line to the transformer-env.sh file to set the environment variable for SPARK-HOME: export SPARK_HOME=<SPARK_PATH>
  7. Run this command in the terminal window: bin/streamsets transformer
  8. In your browser, enter the URL shown in the terminal window. For example, http://10.0.0.100:19360
  9. Log in to start using StreamSets Transformer.

Note: Replace <VERSION> with current version and <SPARK_PATH> with the full path to Apache Spark.

Download And Install ETL Software

Getting Started with ETL Videos

Learn how to build your first data pipeline using StreamSets Transformer in a few easy steps.

Learn how to build, preview, and run your data pipeline in a few easy steps on a Spark cluster or where ever you have Spark installed.

Build a data pipeline in StreamSets Transformer for clickstream analysis on Amazon EMR, Amazon Redshift and Elasticsearch.

Pipeline preview helps ensure data integrity and data quality, and makes debugging easier.

Looking for more demos? Subscribe to Demos with Dash! In these monthly 45 min sessions, you will get to see live demos of StreamSets DataOps Platform.

What Is a Transformer?

StreamSets Transformer is an execution engine that runs data processing pipelines on Apache Spark. It is an ETL software tool for building Spark ETL data pipelines to perform transformations that require heavy processing on the entire data set in batch or in streaming mode. You can install a Transformer on any environment running Apache Spark. 

It doesn’t matter if your data sources are on-prem, cloud-to-cloud or on-prem-to-cloud, use the pre-built connectors and native integrations to configure your Spark ETL pipeline without hand coding.

Announcing StreamSets Summer '21 Beta! Now, you can access all our offerings from a single log in. Get Beta Access!
Back To Top

We use cookies to improve your experience with our website. Click Allow All to consent and continue to our site. Privacy Policy