What's New

What's New in 3.13.0

StreamSets Transformer version 3.13.0 includes the following new features and enhancements:
New Cluster Types
  • Amazon EMR support - You can now run pipelines on an Amazon EMR cluster, version 5.13.0 or later.
  • Kubernetes support - You can now run pipelines on a Kubernetes cluster.
  • MapR support - You can now run pipelines on a MapR version 6.1.0 Hadoop YARN cluster.
New Stages
  • Amazon Redshift destination - You can now write to an Amazon Redshift table.
  • Azure Event Hubs origin and destination - You can now read from and write to Azure Event Hubs.
  • MapR FS origin and destination - You can now read from and write to MapR FS.
  • MapR Hive origin and destination - You can now read from and write to Hive on a MapR cluster.
  • MapR Event Store origin and destination - You can now read from and write to MapR Event Store.
  • MySQL JDBC Table origin - You can now read from a MySQL database table.
General Enhancements
  • Azure credential store - You can now use an Azure credential store with Transformer.
  • Reverse proxy support - You can now register Transformer instances deployed behind a reverse proxy with Control Hub.
Pipeline Enhancements
  • Callback URL - You can now specify a custom callback URL that the cluster uses to communicate with Transformer. This property can be configured on the Advanced tab of the pipeline properties.
  • Cluster preview - You can now preview pipelines using the cluster configured for the pipeline, in addition to previewing data using the local Transformer machine.
Stage Enhancements
  • Amazon stages - Amazon stages can now connect using Amazon Web Services libraries for Hadoop 2.7.7.
  • Delta Lake destination - When using the Upsert using Merge write mode, you can now specify a timestamp column to ensure that the latest record is upserted when a batch contains multiple versions of a record.
  • Glob patterns allowed in origin directory paths - You can now use glob patterns to specify the read directory for the following origins: ADLS Gen1, ADLS Gen2, Amazon, and File.
  • Snowflake origin - You can now use the SELECT command as well as COPY UNLOAD to read from Snowflake.
  • Snowflake stages - You can now configure advanced Snowflake properties to pass to Snowflake. This property can be configured on the Advanced tab of the stage properties.

What's New in 3.12.0

StreamSets Transformer version 3.12.0 includes the following new features and enhancements:
  • SQL Server 2019 Big Data Cluster support - You can run pipelines using SQL Server 2019 Big Data Cluster as a cluster manager.
  • Delta Lake additional storage support - You can now use ADLS Gen2 as a storage system for Delta Lake stages.

  • Enhanced file path validation - File path validation has been enhanced for HDFS-based stages, such as the Amazon S3, ADLS, Delta Lake, and File origins and destinations.

  • Hive external table creation - You can configure the Hive destination to create an external table at a specified location.

  • HTTPS self-signed certificates - You can more easily use self-signed certificates when enabling Transformer to use HTTPS.

  • JDBC stage quote character - You can specify a quote character for the JDBC origin, JDBC destination, and JDBC Lookup processor to use.

  • Kafka schema registry authentication - When a Kafka origin or destination uses Avro schemas in Confluent Schema Registry, you can specify basic authentication user information when required.

  • Scala processor empty batch handling - You can now configure the Scala processor to skip processing for empty batches.

  • Configuration property rename - In the Transformer configuration file, transformer.properties, the sdc.base.http.url property has been renamed to transformer.base.http.url.

    If you configured the sdc.base.http.url property in a previous version of Transformer, configure the new transformer.base.http.url property to use the same value when you upgrade.

  • Configuration property removal - In the Transformer configuration file, transformer.properties, the kerberos.client.enabled property has been removed since it is no longer used.

    When upgrading from a previous version of Transformer, do not add this property to the Transformer 3.12.0 configuration file.

What's New in 3.11.0

StreamSets Transformer version 3.11.0 includes the following new features and enhancements:
  • Field Existence Selector processor - Use this new processor to route records to a downstream stage when only the specified fields exist in those records. Records that do not meet the criteria pass to a separate output for error handling.

  • Avro advanced read properties - When reading Avro data using Spark 2.4 or later, you can now specify the Avro schema to use in JSON format. You can also configure the origin to process all files, including those that do not use the .avro extension.

  • Delta Lake destination enhancement - You can now use the Delta Lake destination to perform updates, upserts, and deletes in addition to inserts.

  • JDBC Lookup processor enhancement - You can configure the processor to return the first matching row or multiple matching rows. You can determine if any matching rows exist or perform a count of matching rows. You can also specify how to order multiple matches.

  • Pipeline preprocessing script - You can define a pipeline preprocessing script that registers a user-defined function (UDF) before the pipeline starts, enabling the UDF to be called by a pipeline stage. For more information, see the sample script on the Advanced tab of the pipeline configuration properties.

  • Standalone Spark cluster - You can now configure a pipeline to run on a standalone Spark cluster.