Announcing StreamSets Transformer Engine 4.0.0

By Dash Desai Posted in Data Integration June 24, 2021

StreamSets is excited to announce the immediate availability of StreamSets Transformer Engine 4.0.0. It is a modern ETL engine that enables developers and data engineers to build data pipelines and transformations that execute on Apache Spark.

Highlights

This is our biggest release ever and there are some great new features and enhancements included in this release—below I’ve reviewed some of the highlights. For a detailed and complete list of enhancements, new features, bug fixes, and upgrade instructions, please refer to the Release Notes.

StreamSets Transformer 4.0.0 Highlights Let’s take a closer look at some of my favorite highlights.

StreamSets Summer ‘21

Data engineers can now deploy StreamSets Transformer Engine 4.0.0 in the newly released StreamSets Summer ‘21 beta that will enable them to access the power of the StreamSets DataOps Platform to handle the breadth of enterprise workloads, while being able to get up and running fast on the cloud. NOTE: As an existing customer with an enterprise license, you can download the latest version through our StreamSets Support portal.

Spark 3.0 and Scala 2.12

StreamSets Transformer Engine 4.0.0 supports using Spark 3.0 and Scala 2.12. For information about the clusters that support Spark 3.0, see Cluster Compatibility Matrix. For information about the features available in different versions of Spark, see Spark Versions and Available Features.

Amazon Redshift

The new Amazon Redshift origin will enable users to ingest data from Amazon Redshift tables without having to use generic JDBC origin.

Amazon EMR Cluster Enhancements

Now users can run data pipelines on EMR 6.1.x or later 6.x.x clusters. For all supported versions, see Cluster Compatibility Matrix.
Bootstrap Actions — Users can use this new property to bootstrap executable files located on Amazon S3 or to bootstrap scripts defined in the pipeline.

Databricks Cluster Enhancements

Users can now run pipelines on Databricks 7.x and 8.x clusters. For all supported versions, see Cluster Compatibility Matrix.
Init script — Now users can use Databricks cluster-scoped init scripts when provisioning a cluster on AWS by defining a DBFS script in the stage or specifying a location including one on Amazon S3.
Job failover — Jobs running on Databricks clusters can now be configured for failovers.

Connection Catalog

A connection defines the information required to connect to an external system. The benefits of using connections are increased security and reusability where you can create a connection once and then reuse that connection in multiple pipelines–this also reduces maintainability and possibility of errors.

With this new release of StreamSets Transformer Engine 4.0.0, the following origins and destinations now support using Control Hub connections.

Origins
- MySQL JDBC Table
- Oracle JDBC Table
- PostgreSQL JDBC Table
- SQL Server JDBC Table
- Amazon Redshift
Destination
- Amazon Redshift

For detailed, technical information about StreamSets Transformer Engine, visit our documentation.

If you would like to see live demos of recently released features and enhancements, subscribe to StreamSets Live: Demos with Dash!

Related Resources

Webinar

Integration Roadmap: Navigating the Future of iPaaS with webMethods and StreamSets

Get introduced to the newest capabilities of webMethods.io and StreamSets. Plus get a sneak peek into Software AG’s vision for the iPaaS...

Watch Now

Whitepapers & Ebooks

The Data Integration Advantage: Building a Foundation for Scalable AI

Explore the state of AI in the enterprise including challenges of scaling and optimizing data flows.

Download Now

Report

Creating Order from Chaos: Governance in the Data Wild West

StreamSets is excited to announce the immediate availability of StreamSets Transformer Engine 4.0.0. It is a modern ETL engine that enables...