Modern ETL Pipelines without the Complexity
Turn unlimited data into insights in minutes with StreamSets Transformer for Spark. StreamSets Transformer runs on any Apache Spark environment (Databricks, AWS EMR, Google Cloud Dataproc, and Yarn) on premises and across clouds. StreamSets Transformer for Spark is a data pipeline engine designed for any developer or data engineer to build and manage ETL and ML pipelines that execute on Spark.
Runs On
Run Apache Spark anywhere now and in the future as your needs evolve.




Operationalize Your Data Transformations

Build and Manage ETL and ML Pipelines That Execute on Spark
Put powerful and native ETL at the fingertips of any data engineer. Use a simple, drag-and-drop UI to create highly instrumented pipelines for performing ETL, stream processing, and machine learning operations. StreamSets DataOps Platform helps your team accelerate your data projects. Easily operationalize code and automate critical Spark operations through a central platform.
Run on Multiple Spark Platforms
Transformer Engines are designed to run on all major Spark distributions for maximum flexibility. You can natively execute on EMR, HDInsight, and Databricks platforms. Run your development and production projects on multiple Spark platforms or support different business unit needs from a single tool without rework.


See What Changed and Respond Easily
Full visibility and unmatched resiliency in your pipelines means you can stop hunting through log files for errors when change happens. Transformer pipelines are instrumented to provide deep visibility into Spark execution so you can troubleshoot at the pipeline level and at each stage in the pipeline. Transformer offers the enterprise features and productivity of legacy ETL tools, while revealing the full power and flexibility of Apache Spark.
Frequently Asked Questions
What is StreamSets Transformer for Spark?
What is a StreamSets Transformer for Spark pipeline?
All data pipelines for all of our engines, including Transformer for Spark, are essentially data flows. Taking data from one source to another and often including transformations along the way. Data pipelines can be leveraged to power machine learning, advanced analytics, business intelligence and other key insights.
How do I install StreamSets Transformer for Spark?
Installation information can be found in the Transformer Spark documentation..