skip to Main Content

The DataOps Blog

Where Change Is Welcome

How to Simplify Data Loading Snowflake

By January 28, 2019


Transformer for Snowflake is the first enterprise data transformation engine built on Snowpark. Want to learn how the engine makes advanced, native data transformations for your Data Cloud possible? Join our technical experts on Office Hours.

Mike Fuller, a consultant at Red Pill Analytics, has been working on ingesting data into Snowflake’s cloud data warehouse using StreamSets for Snowflake. In this guest blog post, Mike explains how he was able to replicate an Oracle database to Snowflake using the new functionality, both for initial data loading into Snowflake and with change data capture.

Encrypt and Decrypt Data in Dataflow Pipelines

By January 22, 2019

The Encrypt and Decrypt processor, introduced in StreamSets Data Collector 3.5.0, uses the Amazon AWS Encryption SDK to encrypt and decrypt data within a dataflow pipeline, and a variety of mechanisms, including the Amazon AWS Key Management Service, to manage encryption keys. In this blog post, I’ll walk through the basics of working with encryption on AWS, and show you how to build pipelines that encrypt and decrypt data, from individual fields to entire records.

Five Ways to Scale Kafka with StreamSets

By December 18, 2018

The StreamSets DataOps Platform was architected to scale to the largest workloads, particularly when working with continuous streams of data from systems such as Apache Kafka or Apache Pulsar. As well as the ability to scale Kafka, the platform offers…

StreamSets Data Collector – the First Four Years

By August 16, 2018

June 2018 marked the fourth anniversary of StreamSets’ founding; here’s a look back at the past four years of StreamSets and the Data Collector product, from the early days in stealth-startup mode, to the recent release of StreamSets Data Collector 3.4.0.

Girish Pancha and Arvind Prabhakar founded StreamSets on June 27th, 2014. Girish had been at Informatica for many years, rising to the level of Chief Product Officer; Arvind had held technical roles at both Informatica and Cloudera. Between them, they had realized that the needs of enterprises were not being met by existing data integration products – for instance, the best practice for many customers ingesting data into Hadoop was laborious manual coding of data processing logic and orchestration using low-level frameworks like Apache Sqoop. They also realized that this would be true for other proliferating data platforms such as Kafka (Kafka Connect), Elastic (Logstash) and cloud infrastructure (Amazon Glue, etc.). The main obstacle they identified was ‘data drift’, the inescapable evolution of data structure, semantics and infrastructure, making data integration difficult and solutions brittle. Data drift exists in the traditional data integration world, but grows and accelerates when dealing with modern data sources and platforms. The founding vision for StreamSets was to ”solve the data drift problem for all modern data architectures”.

Create a Microservice Data Pipeline with StreamSets Data Collector Engine (Tutorial)

By August 8, 2018

How to Create a Microservice Data Pipeline with StreamSets DataOps Platform

Microservice Data Pipeline TemplateA microservice data pipeline is a lightweight component that implements a relatively small component of a larger system – for example, providing access to user data. A microservice architecture comprises a set of independent microservices, often implemented as RESTful web services communicating via JSON over HTTP, that together implement a system’s functionality, rather than a single monolithic application.  Think of an e-commerce web site: we might have separate microservices for searching for inventory, managing the shopping cart, and recommending items based on the shopping cart’s content. Compared to monolithic applications, the microservice approach promotes fine-grained modularity, allowing agile implementation of components by independent teams, which may even be using different technologies. Now, one of those technologies can be StreamSets Data Collector Englne. Data Collector 3.4.0, released earlier this week, introduces microservice data pipelines, with a new REST Service origin and Send Response to Origin destination allowing you to implement RESTful web services completely within the Data Collector Engine.

Back To Top

We use cookies to improve your experience with our website. Click Allow All to consent and continue to our site. Privacy Policy