skip to Main Content

StreamSets Transformer for Spark

Build and Manage ETL and Data Transformation pipelines built on Apache Spark

Modern ETL Pipelines without the Complexity

Turn unlimited data into insights in minutes with StreamSets Transformer for Spark. StreamSets Transformer runs on any Apache Spark environment (Databricks, AWS EMR, Google Cloud Dataproc, and Yarn) on premises and across clouds. StreamSets Transformer for Spark  is a data pipeline engine designed for any developer or data engineer to build and manage  ETL and ML pipelines that execute on Spark.

Create pipelines for performing ETL and machine learning operations using an intent-driven visual design tool

Troubleshoot with unparalleled visibility into the execution of Spark applications

Run any major Spark distribution and switch platforms without redesign

Runs On

Run Apache Spark anywhere now and in the future as your needs evolve.

Hadoop HDFS Apache Spark For ETL Processing
MapR Apache Spark For ETL Processing
Microsoft SQL Server Big Data Clusters Apache Spark For ETL Processing
StreamSets For Databricks

Operationalize Your Data Transformations

Simplify Apache Spark For ETL For Everyone

Build and Manage ETL and ML Pipelines That Execute on Spark

Put powerful and native ETL at the fingertips of any data engineer. Use a simple, drag-and-drop UI to create highly instrumented pipelines for performing ETL, stream processing, and machine learning operations. StreamSets DataOps Platform helps your team accelerate your data projects. Easily operationalize code and automate critical Spark operations through a central platform.

Unlock the Power of Apache Spark for Everyone

Run on Multiple Spark Platforms

Transformer Engines are designed to run on all major Spark distributions for maximum flexibility. You can natively execute on EMR, HDInsight, and Databricks platforms.  Run your development and production projects on multiple Spark platforms or support different business unit needs from a single tool without rework.  

Download: Design Considerations for Apache Spark
Visibility Into Apache Spark Executions
Adopt Apache Spark For ETL And Machine Learning

See What Changed and Respond Easily

Full visibility and unmatched resiliency in your pipelines means you can stop hunting through log files for errors when change happens. Transformer pipelines are instrumented to provide deep visibility into Spark execution so you can troubleshoot at the pipeline level and at each stage in the pipeline. Transformer offers the enterprise features and productivity of legacy ETL tools, while revealing the full power and flexibility of Apache Spark.

Watch Demo: Changing Dimensions and StreamSets Transformer

The StreamSets DataOps Platform

Build smart data pipelines in minutes and deploy across hybrid and multi-cloud platforms from a single log in.

Data Engineering For DataOps On AWS
Data Engineering For DataOps On Azure
Data Engineering For DataOps On Google Cloud
Data Engineering For DataOps On Snowflake
Data Engineering For DataOps On Databricks
Back To Top