Convergence is a beautiful thing, when two forces that were traditionally at odds begin to coexist and complement to create something unique. For so many years the rift between the reliable and useful confines of SQL and the complete control of advanced order coding was a forcible design choice. For companies where standard SQL was working, there remained hesitation to invest in data science and data engineering. For forward leaning companies, bought into the panacea of distributed systems and advanced analytics, the ever ongoing investigations into why their new technologies could not give the order and predictability of standard SQL modeling continued.
With the introduction of Snowpark, the newest component of the Snowflake Data Cloud that provides powerful extensibility beyond SQL and StreamSets’ newest addition “Transformer for Snowflake” that leverages Snowpark, companies no longer need to make a forcible choice between the two paths. Data teams are now enabled both to leverage the simplicity of data warehouse operations traditionally limited to SQL as well as complexity of Snowpark’s code-based interface to reuse advanced analytic functions, thus becoming even more productive in utilizing skills across their teams.
Transformer for Snowflake, a component of StreamSets Platform along with Data Collector will now offer end-to-end data ingestion from on-premise/cloud data sources into Snowflake and ease of use of executing complex transformations natively on Snowflake. Data teams will also benefit from the full power of the StreamSets Platform’s built-in monitoring and operations of smart data pipelines at scale.
Transformer for Snowflake – a peak under the hood
StreamSets Transformer for Snowflake is a hosted engine in the StreamSets platform delivering advanced data transformation functionality natively on Snowflake. This is the industry’s first enterprise-grade transformation engine built on Snowpark, combining the power and extensibility of the Snowflake Data Cloud with full lifecycle data integration capabilities.
Data engineers can utilize the intuitive design canvas, and can choose a no code approach or drop into code whenever they want. There’s nothing to deploy or manage, so you can get started in minutes.
Simple to complex transformations
The engine gives the user the ability to use any of the several built-in transformations ranging from the simple to the complex. Some examples of built-in simple transformations include joins, aggregates, and unions. The engine also offers built-in common data warehousing patterns such as de-duplication and slowly changing dimensions that encapsulates really complex logic. Users can drag & drop these as a built-in processor in the pipeline and the engine automatically generates SQL on the fly to execute natively inside Snowflake. This allows for an enterprise to put Data Engineering into the hands of actual users without overburdening IT for repetitive tasks that takes time away from strategic business initiatives.
Apply common best-practices
Apply functions is yet another important capability as it frees users from having to maintain repetitive and brittle SQL code to apply common best-practices across all their data elements. The Apply Functions processor applies a Snowflake function to all field names that match a regular expression to apply a variety of available functions to a field such as date and time, numeric, string and user-defined functions.
Team collaboration and re-use
Transformer for Snowflake is the only graphical data transformation engine that supports leveraging existing code for Python, Java, and Scala. Users can now leverage and re-use their own custom developed logic for advanced analytic and data science needs via Snowpark UDFs developed in Java, Scala and Python natively as part of the StreamSets pipeline design experience.
Transformer for Snowflake allows data engineers to go beyond SQL to express powerful data transformation logic with the StreamSets Data Integration Platform to conform and cleanse data. All the data and processing stays on Snowflake, greatly increasing performance , eliminating added costs and reducing complexity.
How to Access Transformer for Snowflake
After a successful Beta period with significant focus on user experience and feedback, the entire team here at StreamSets is looking forward to learning from more of our users as we launch into GA. If you are unable to make Office Hours and have feedback you’d like to share, please email email@example.com.