skip to Main Content

StreamSets Transformer Engine

Leverage the power of Apache Spark for ETL and machine learning pipelines

Modern ETL Pipelines without the Complexity

Turn big data into insights throughout your organization with the power of Apache Spark on Databricks, AWS EMR, Google Cloud Dataproc, SQL Server 2019 Big Data Cluster, and other Spark clusters. StreamSets Transformer Engine is a data pipeline engine designed for any developer or data engineer (with or without Scala or Python skills) to build ETL and ML pipelines that execute on Apache Spark.

Reduce time to design and operate pipelines on Spark at all skill levels

Troubleshoot with unparalleled visibility into the execution of Spark applications 

Run any major Spark distribution and switch platforms without redesign


Run Apache Spark anywhere now and in the future as your needs evolve.

Hadoop HDFS Apache Spark For ETL Processing
MapR Apache Spark For ETL Processing
StreamSets For Databricks
Amazon EMR Apache Spark For ETL Processing
Microsoft SQL Server Big Data Clusters Apache Spark For ETL Processing
Oracle Apache Spark For ETL Processing

Operationalize Your Data Transformations

Simplify Apache Spark For ETL For Everyone

No Code ETL for Apache Spark

Put Apache Spark at the fingertips of any data engineer. Use a simple, drag-and-drop UI to create highly instrumented pipelines for performing ETL, stream processing, and machine learning operations. StreamSets DataOps Platform helps your team leverage Apache Spark to accelerate your data projects without deep Scala expertise. Advanced Spark developers can easily operationalize their code with custom processors and automate critical Spark operations.

Download: Design Considerations for Apache Spark

Run on Multiple Spark Platforms

Transformer Engine is designed to run ETL operations on all major Spark distributions for maximum flexibility. You can natively execute on AWS EMR, HDInsight, and Databricks platforms, and in containerized Spark environments such as Microsoft SQL Server 2019 Big Data Cluster. Run your development and production projects on multiple Apache Spark platforms or support different business needs from a single tool. 

Watch: Pipeline Design for Critical Cloud Workloads
Visibility Into Apache Spark Executions
Adopt Apache Spark For ETL And Machine Learning

See What Changed and Respond Easily

Full visibility and unmatched resiliency in your pipelines means you can stop hunting through log files for errors when change happens. Transformer pipelines are instrumented to provide deep visibility into Spark execution so you can troubleshoot at the pipeline level and at each stage in the pipeline. Transformer Engine offers the enterprise features and agility of legacy ETL tools, while revealing the full power and opportunity of Apache Spark.

Watch: DataOps for Apache Spark

Introducing StreamSets Summer '21

Build smart data pipelines in minutes and deploy across hybrid and multi-cloud platforms from a single log in.

Data Engineering For DataOps On AWS
Data Engineering For DataOps On Azure
Data Engineering For DataOps On Google Cloud
Data Engineering For DataOps On Snowflake
Data Engineering For DataOps On Databricks
Back To Top

We use cookies to improve your experience with our website. Click Allow All to consent and continue to our site. Privacy Policy