skip to Main Content

Snowflake + StreamSets: DataOps Accelerating Cloud Adoption

By Posted in Cloud Data Migration January 23, 2019

StreamSets is proud to announce their new partnership with Snowflake and the general availability release of StreamSets for Snowflake. As enterprises move more of their big data workloads to the cloud, it becomes imperative that Data Operations are more resilient and adaptive to continue to serve the business’s needs. This is why StreamSets has partnered with Snowflake: to extended a cloud native data operations platform, with an enterprise grade DataOps service, to a cloud native data warehouse – to bring the best of both worlds together.

Cloud Data Warehouse Migration

Take for example a common use case such as a data warehouse migration, which offers potential Snowflake customers a 604% ROI in under 3 a month payback period. Sounds tempting, right? But don’t believe me, read the report for yourself, and see what possibilities become realized when you combine unlimited concurrency and fast elasticity with built-in high availability and security. Now moving data from one place to another, only once, is a relatively simple problem even a Hooked on Phonics alumnus like myself can solve.

The real problem comes in maintaining that same connection and all the others that you have made. In fact, it is most likely that at this initial stage you will build a host of new connections to the Snowflake data warehouse, because why not stream events, JSON files, and graph-related information as well? It only increases the ROI on the project, right? But keep in mind, these new integrations will be in addition to all the other independently managed connections that have been made over the years between existing critical systems. Unwittingly, users have expanded a series of new integration points in the effort to consolidate the hardware and the assorted accoutrement of a data warehouse, ultimately leaving the transformation of the organization incomplete. But how can we consolidate streaming integration jobs and batch jobs, or even operationalize data movement with SLA’s, failover, and security controls?

Making the Connection

StreamSets has a rich tradition of performing complex data ingestion routines for some of the world’s largest commercial enterprises and demanding national governments. StreamSets is one of the only Snowflake partners with the ability to send data directly to Snowpipe, and when combined with other enterprise grade capabilities on the StreamSets Data Operations Platform –  such as data drift processors, intelligent pipeline sensors, Kubernetes provisioning agents, or built in data protection – it is the only Snowflake partner that can provide policy driven data protections while infinitely scaling pipelines in Kubernetes, and that can continue to process data in the face of ever changing schemas. This also doesn’t include the ability run edge pipelines, pre-made testing framework or implement machine learning in the data stream itself. But that is another discussion for another day.

How does this data operations concept extend a cloud native data warehouse and complete a digital transformation? Because if the project is a success (and given Snowflake’s track record of success it most likely will be), business users are inevitably going to ask for more data, more quickly, and of a wider variety which will require more integrations, more security controls, and better resiliency as the new integrations become “business critical.” If Snowflake is the white whale to enterprise data warehouses, then what do you use to conquer the oceans of data in which we currently swim?

As part of the general availability to release, StreamSets for Snowflake supports synchronous (SQL bulk operations, COPY INTO and MERGE) and asynchronous data ingestion (via Snowpipe). Users should note that the StreamSets destination supports two types of data, new data (pure INSERTs) and CDC data (from any of the CDC sources Data Collector supports). Both multi-threaded and single-threaded workloads are supported by StreamSets and Snowflake. Best of all, StreamSets for Snowflake supports Data Drift out of the box and can automatically create the table and new columns in the Snowflake table if new fields show up in the pipeline. This goes a long way to helping users with streaming analytics use case in their data warehouse, where business analysts often ask to incorporate data in the EDW over which the data operations team has no control!

StreamSets is able to further strengthen the cloud data warehouse by adding data flow sensors, failover/high-availability, CI/CD, and SLA’s so that the data is constant, fresh, and accurate. Combine those capabilities with the ability to infinitely scale our native streaming engine via Kubernetes, and handle data drift/error-handling, and an organization will be positioned to take advantage of a truly modernized data warehouse, one that allows organizations to land analytic-ready data together from inside and outside its sphere of control. To put it more simply, to take advantage of the best that Snowflake has to offer as a best-in-class data warehouse, an equally elastic, concurrent and security data-operations platform is a necessary requirement to complete the transformation.

Cloud Data Warehouse Adoption

Like most cloud products, adopting a cloud-based data warehouse like Snowflake is a long journey to a promising destination, but during the course of that journey, current and legacy systems must be maintained or migrated. Yet that is what is so remarkable about this partnership; while Snowflake offers arguably the most competitive per TB storage costs in the industry with unlimited elasticity and pay-per usage model, StreamSets can improve existing integrations and speed up your journey to the cloud.

This is accomplished in several ways. Some of the more basic examples include the 100+ pre-made stages to free up developer time and focus so that handwritten integrations are standardized; the ability to template pipelines within a larger shared pipeline repository; or the ability to set alerts and routing when the data flows change. These capabilities will not only simultaneously save time and money by connecting existing systems to Snowflake, but also reduce the overall pain of current integration projects, enabling developers and administrators to innovate on Snowflake’s unique platform and provide real value to the organization.

Conclusion

What good is a destination without any roads to travel upon, and what good are roads without a destination? If the roads to get to your destination are long and bumpy, does that not delay and ultimately limit one’s ability to take advantage of what the new destination has to offer? Does heavy industry not rely on a standardized system of rail networks to ship it the necessary raw materials to process into finished goods? To put it in a more modern context, if you are going to use a cloud native data warehouse at scale, then cloud native data operations platform is required to operate in parallel, otherwise your cloud journey… might be a road to nowhere.

To learn more about StreamSets for Snowflake visit our website.

Conduct Data Ingestion and Transformations In One Place

Deploy across hybrid and multi-cloud
Schedule a Demo
Back To Top