February 2017

By Pat Patterson February 27, 2017

The Spark Evaluator, introduced in StreamSets Data Collector (SDC) version 2.2.0.0, lets you run an Apache Spark application, termed a Spark Transformer, as part of an SDC pipeline. Back in December, we released a tutorial walking you through the process of building a Transformer in Java. Since then, Maurin Lenglart, of Cuberon Labs, has contributed skeleton code for a Scala Transformer, paving the way for a new tutorial, Creating a StreamSets Spark Transformer in Scala.

Ingest Data into Azure Data Lake Store with StreamSets Data Collector

Data Integration

Cloud Data Migration

By Pat Patterson February 20, 2017

Azure Data Lake Store (ADLS) is Microsoft’s cloud repository for big data analytic workloads, designed to capture data for operational and exploratory analytics. StreamSets Data Collector (SDC) version 2.3.0.0 included an Azure Data Lake Store destination, so you can create pipelines to read data from any supported data source and write it to ADLS.

Since configuring the ADLS destination is a multi-step process; our new tutorial, Ingesting Local Data into Azure Data Lake Store, walks you through the process of adding SDC as an application in Azure Active Directory, creating a Data Lake Store, building a simple data ingest pipeline, and then configuring the ADLS destination with credentials to write to an ADLS directory.

Replicating Relational Databases with StreamSets Data Collector

Data Integration

Cloud Data Migration

Data Transformation

By Pat Patterson February 3, 2017

StreamSets Data Collector Engine has long supported both reading and writing data from and to relational databases via Java Database Connectivity (JDBC). While it was straightforward to configure pipelines to read data from individual tables, ingesting records from an entire database was cumbersome, requiring a pipeline per table. StreamSets Data Collector Engine Now introduces the JDBC Multitable Consumer, a new pipeline origin that can read data from multiple tables through a single database connection. In this blog entry, I’ll explain how the JDBC Multitable Consumer can implement a typical use case – replicating relational databases (an entire one) into Hadoop.

Announcing Data Collector ver 2.3.0.0

By Kirit Basu, Head of Strategy February 3, 2017

We’re excited to release the next version of the StreamSets Data Collector. This release has 80+ new features and improvements, and 150+ bug fixes.

StreamSets Data Integration Blog

Running Scala Code in StreamSets Data Collector

Ingest Data into Azure Data Lake Store with StreamSets Data Collector

Replicating Relational Databases with StreamSets Data Collector

Announcing Data Collector ver 2.3.0.0

Stay in Touch

Connect