StreamSets is excited to announce the immediate availability of StreamSets for Snowflake, the first DataOps platform for Snowflake. Now users can extend their DataOps environments to the popular Snowflake service. StreamSets makes copying data from databases, streams, and event processing directly into your cloud EDW simple, without complex schema design and hand coding. Users get high performance ingest, full monitoring and alerts, data drift detection, and CDC all delivered through an easy to use UI. StreamSets for Snowflake also helps keep your data safe with access and policy enforcement, governance, and PII detection.
With an ever increasing list of cloud ingest tools, it is often hard to understand the best fit for your data movement and workloads. Let’s take a look at some of the advantages StreamSets for streaming data into Snowflake offers over other approaches.
DataOps for Snowflake
Snowflake is a great service for quickly ingesting all data, engineering departmental data marts, and quickly and easily sharing them with stakeholders. However data ingestion and data movement often encompass way more than these specific workloads. For instance, many StreamSets’ users take a centralized approach to the deployment, management, and monitoring of their data pipelines that spans across workloads and requirements. We call this DataOps. DataOps help you remain collaborative and agile when working with on-premises and cloud services. Using point solutions built only for Snowflake limits your visibility and potential for end-to-end control. StreamSets provides a visual interactive architecture map for problem detection, as well as data SLA’s for operational excellence.
StreamSets will automatically create a table or multiple tables while ingesting into the Snowflake platform. This makes ingestion resilient to shifting changes in the table structure. When new tables are created StreamSets can simply identify the new table and edit the upstream system to accept the data. StreamSets also supports multi-table creation which is handy when the database structure is unknown. Say for instance uploading an entire database in bulk. StreamSets lets you focus on getting the data into snowflake rather than lamenting on the schema and structure.
Data Drift and Snowflake
In modern data environments, the only constant is change. Companies are seeing these changes in 3 areas: the structure of the data, the meaning of the data (semantic), and location of where data lives (infrastructure). This unending change is called “Data Drift.” Data drift breaks pipelines and pollutes data. It is accelerating in the modern world of big data, unstructured data, and streaming data.
Change Data Capture for Snowflake
Snowflake is not your only data platform. It is one of a growing number of systems that constitute a modern architecture, including big data, search engines, streaming analytics platforms and more. StreamSets for Snowflake comes with dynamic tools for designing CDC (change data capture) to ensure your source destinations are in sync with your cloud analytics environments. CDC sources include popular EDW destinations (Oracle, Teradata), relational databases (SQL, MySQL), and big data filesystems (HDFS, HBase, Kudu). Since the StreamSets platform supports real-time operation users can specify their level of CDC capture vs. managing to pre-set windows.
High Performance Ingest
StreamSets allows companies to quickly stream all types of data into the Snowflake service. StreamSets users will see increased performance for both synchronous and asynchronous workloads ingesting directly and through the Snowpipe service. Many file types that are considered “complex” to Snowflake, like JSON blobs and semi-structured data, will ingest faster because they will not be hindered by formatting checks. Performance using StreamSets and Snowflake is more consistent and can be executed with a smaller architecture footprint.
Security concerns about critical data, applications and systems in the cloud continue to deter faster adoption of cloud services. While the promise of self-service analytics and data at scale is alluring, many companies simply cannot expose sensitive data to potential misuse. StreamSets solves for these concerns with enforcement of policy and access controls, PII detection, and in-pipeline data masking. This allows you to expose more data for analytics without sacrificing security requirements.