skip to Main Content

StreamSets Transformer

Leverage the power of Apache Spark for ETL and machine learning

Massive Processing Power without the Complexity

Turn big data into insights throughout your organization with the power of Apache Spark on Databricks, EMR, Azure HDInsight, and other Spark clusters. StreamSets Transformer is a modern transformation engine designed for developers and data engineers to build data transformations that execute on Apache Spark without Scala or Python skills.

Speed adoption with a single interface to design, test, and deploy Spark applications

Gain deep visibility into Spark execution and monitor for data drift

Run on your platform of choice or switch platforms without redesign

Try StreamSets Transformer

Integrations

Run Apache Spark anywhere now and in the future as your needs evolve.

Hadoop HDFS Apache Spark For ETL Processing
MapR Apache Spark For ETL Processing
Databricks Apache Spark For ETL Processing
Amazon EMR Apache Spark For ETL Processing
Microsoft SQL Server Big Data Clusters Apache Spark For ETL Processing
Oracle Apache Spark For ETL Processing

Operationalize Your Data Transformations

Simplify Apache Spark For ETL For Everyone

Speed Apache Spark Adoption

Put Apache Spark at the fingertips of any data engineer. Use a simple, drag-and-drop UI to create highly instrumented pipelines for performing ETL, stream processing, and machine learning operations. StreamSets Transformer helps your team leverage Apache Spark to accelerate your data projects without deep Scala expertise. Advanced Spark developers can easily operationalize their code and automate critical Spark operations.

Watch: Introducing StreamSets Transformer

Run on Multiple Spark Platforms

Transformer is designed to run on all major Spark distributions for maximum flexibility. You can natively execute on Hadoop YARN, EMR, HDInsight, and Databricks platforms, and in containerized Spark environments such as Microsoft SQL Server 2019 Big Data Cluster. Run your development and production projects on multiple Spark platforms or support different business unit needs from a single tool without rework. 

Watch: Pipeline Design for Critical Cloud Workloads
Visibility Into Apache Spark Executions
Adopt Apache Spark For ETL And Machine Learning

See What Changed and Respond Easily

Full visibility and unmatched resiliency in your pipelines means you can stop hunting through log files for errors when change happens. Transformer pipelines are instrumented to provide deep visibility into Spark execution so you can troubleshoot at the pipeline level and at each stage in the pipeline. Transformer offers the enterprise features and agility of legacy ETL tools, while revealing the full power and opportunity of Apache Spark.

Watch: DataOps for Apache Spark
Webinar

Slowly Changing Dimensions & StreamSets Transformer

Analyst Report

Hired Brains Research: Data in Mind, Data in Hand

Frictionless Provisioning for Data Science and ML/AI with DataOps
Data Sheet & Briefs

StreamSets Transformer Data Sheet

Back To Top