skip to Main Content

Prebuilt Transformation Processors

Key Features

For both data practitioners and business users

UI-driven actions increase speed and consistency

Comprehensive suite of processors including SCDs and Pivots

Quicker time to insights with drag and drop capabilities

Stage:
Aggregate
Function:

Performs aggregate calculations within a data set

Aggregate

Benefit of Using / Best Practice
Allows users to calculate averages, sums, counts, or other statistical measures to provide quick insights.
Function:

Applies a function to selected columns, enabling users to define multiple columns at once or conveniently select columns within the tool.

Apply Functions

Benefit of Using / Best Practice
Allows user to apply a function based on a dynamically (run-time) selected set of columns; as new columns matching the criteria are added to the source tables, the function will be applied to them. This feature offers flexibility and ease-of-use, empowering users to streamline data processing without being limited to regular expressions.
Differentiates Us? (compared to SQL or competitor)
StreamSets can apply functions to a set of columns chosen by a regular expression. For example, you could apply a user defined function to columns with name ending "_DATE". New columns would automatically get the function applied.

When it comes to dates, never again get your interval wrong. How many times in the world of code do you sit and wonder, is 'month' a capital 'M', or does that mean minute? StreamSets reduces the need to search elsewhere like on Stack Overflow or read through documentation.
Productivity Impact (H/M/L)
H
Stage:
Column Order
Function:

Reorders columns to a specified order

Column Order

Benefit of Using / Best Practice
Allows users to rearrange the order of columns in a dataset to improve data organization and readability.
Stage:
Column Remover
Function:

Removes columns from a data set

Column Remover

Benefit of Using / Best Practice
Allows users to remove columns in a dataset to improve data organization and readability.
Differentiates Us? (compared to SQL or competitor)
Selecting columns to keep is easy, but selecting columns to remove requires you to specify the exact columns you need, driving brittleness if a new column is added, requires manual intervention to add that column to the dataset. Pattern-based inclusion/exclusion is not achievable in traditional SQL (can be done but requires a heck of a lot of additional code, and becomes a maintenance nightmare!)
Productivity Impact (H/M/L)
L
Stage:
Column Rename
Function:

Renames columns in a data set

Column Rename

Benefit of Using / Best Practice
Allows users to easily rename columns.
Differentiates Us? (compared to SQL or competitor)
To rename a column in SQL, you would have to call each individual column into your script and then manually apply the change. You cannot do this based on a regular expression, also traditional approaches drive brittleness or slower adoption to changing columns in the source tables (as you have to manually bring them in).
Productivity Impact (H/M/L)
M
Stage:
Column Type Converter
Function:

Converts data types for columns that match a regular expression. Regex can also be used to specify multiple columns.

Column Type Converter

Benefit of Using / Best Practice
Enables users to seamlessly convert data from one type to another, such as converting float to integer, enhancing data quality and improving downstream processing.
Differentiates Us? (compared to SQL or competitor)
StreamSets can convert a set of columns chosen by a regular expression. For example, you could convert string columns with name ending "_ID" to integers. New columns would automatically get the function applied.
Stage:
Cube
Function:

Performs group by cube and aggregates data in specified columns

Cube

Benefit of Using / Best Practice
Simplifies complex data aggregations, providing a powerful analytical capability for multidimensional analysis.
Stage:
Duplicates
Function:

Removes duplicates rows from a data set

Duplicates

Benefit of Using / Best Practice
Enables users to efficiently remove duplicate records from datasets, which ensures data quality and reduce storage footprint by eliminating redundant information.
Stage:
Filter
Function:

Passes only rows that match all defined conditions

Filter

Benefit of Using / Best Practice
Allows users to selectively extract and manipulate data based on specific conditions or criteria.
Stage:
Join
Function:

Joins data from two tables

Join

Benefit of Using / Best Practice
Simplifies the process of merging datasets based on common attributes, which enables non-technical users to efficiently integrate and analyze data. Saves time while reducing risk of error.
Differentiates Us? (compared to SQL or competitor)
The concept of joining is common across SQL and Transformer for Snowflake. However, the benefit lies in the fact you can quickly understand the join. In traditional SQL, bad naming practices creep in (how many derived tables called A or B?) and understanding the impact of the join requires assessing the where statement in conjunction. StreamSets simplifies the view of this showing clearly what records are going into the join and what the join means, helping with debugging and audit compliance in regulated markets.
Stage:
JSON Parser
Function:

Parses JSON data and extracts specific elements

JSON Parser

Benefit of Using / Best Practice
Parsing JSON in SQL often requires writing complex queries. Using a dedicated JSON Parser function eliminates the need for manual JSON manipulation while reducing the complexity of SQL queries.
Differentiates Us? (compared to SQL or competitor)
With StreamSets, you don't need to know the structure of the JSON data in order to parse it.
Stage:
Null Processor
Function:

There are many reasons null values happen: this is a processor that helps the analysts automatically clean up the data (eliminate the null values) for use in a pipeline.

Null Processor

Benefit of Using / Best Practice
Without this processor, they often have to do manual, iterative updates (it's takes time, it's low value work but high visibility when it's not done). It is a particularly useful processor to use in conjunction with the Join processor because doing Joins with Nulls is problematic (technical and business).
Stage:
Pivot
Function:

Rotates a column by turning unique values into new columns and aggregates the results

Pivot

Benefit of Using / Best Practice
Reshapes the data to provide a summarized and structured view of the dataset, making it easier to analyze and present information. Significantly reduces time required by manual data manipulation.
Differentiates Us? (compared to SQL or competitor)
Have you tried pivoting in SQL? One way to do it is to put all the results in a derived table, then write a set of sums and a group by. And if a new item comes to sum... it will not appear in the dataset.

With Transformer for Snowflake, you can select the field you want as a column, what you want to group together, and what math function to apply and it reacts to change.
Stage:
Rollup
Function:

Performs Group by Rollup and aggregates data into specific columns

Rollup

Benefit of Using / Best Practice
Simplifies the process of producing subtotal results for various levels of grouping. It enables users to quickly obtain summaries and subtotals for multiple dimensions within their data, making it easier to identify patterns and trends across different levels of granularity.
Stage:
Sample
Function:

Generates sample of incoming data

Sample

Benefit of Using / Best Practice
Allows users to generate a representative sample of incoming data for analysis or testing purposes.
Function:

Generates updates for a slowly changing dimension based on master and change data

Slowly Changing Dimension

Benefit of Using / Best Practice
Simplifies the management of changing data over time. This allows users to track and analyze historical changes in data attributes without requiring extensive coding. Reduces time required to update changing data by manual coding.
Differentiates Us? (compared to SQL or competitor)
With StreamSets, history tables are made easy, but is a really complex logic to write by hand.
Function:

Applies Snowflake SQL expressions to a data set

Snowflake SQL Evaluator

Benefit of Using / Best Practice
Enables users to leverage the full power and flexibility of Snowflake's SQL language for data manipulation. The SQL Evaluator function simplifies the process of incorporating complex SQL logic into the transformation pipeline without the need for separate SQL scripts or manual coding.
Stage:
Sort
Function:

Sorts data based on specified columns

Sort

Benefit of Using / Best Practice
Enables efficient downstream processes such as joins or aggregations that require ordered data. Using SQL would require complex queries, which can take up a lot of time.
Function:

Passes rows downstream based on conditions

Stream Selector

Differentiates Us? (compared to SQL or competitor)
Isolating particular records to process is very messy in traditional SQL, especially to meet the requirements of auditing and collaborative working. StreamSets reduces time to inspect and evaluate what a dataflow is doing.
Stage:
Union
Function:

Merges data from multiple inputs into a single output

Union

Benefit of Using / Best Practice
Allows users to combine multiple datasets. As a best practice, ensure the datasets have compatible columns types and order. Saves time while reducing risk of error.
Differentiates Us? (compared to SQL or competitor)
Transformer for Snowflake innovatively streamlines data integration by automatically aligning columns, regardless of their inbound order. It eliminates time-consuming tasks like schema matching and null-value backfilling. Additionally, it offers control over column selection and duplicates, improving data management efficiency. We're transforming data integration - you gain valuable time for more critical tasks.
Stage:
Unpivot
Function:

Rotates a table by transforming columns into rows

Unpivot

Benefit of Using / Best Practice
Makes it easier for users to work with multi-dimensional datasets. Using SQL to accomplish this requires complex queries and manual transformations, which can take up a significant amount of time.
Function:

Applies window functions

Windows Functions

Benefit of Using / Best Practice
Window functions allow for advanced calculations and analysis, such as ranking, running totals, and moving averages, by operating on a specific subset of data based on defined window specifications.
Helpful Resources
Webinars

End-to-End Data Integration for Snowflake with Snowpark

Solution Briefs & Infographics

Transformer for Snowflake: Healthcare and Life Sciences

Back To Top