Since configuring the ADLS destination is a multi-step process; our new tutorial, Ingesting Local Data into Azure Data Lake Store, walks you through the process of adding SDC an an application in Azure Active Directory, creating a Data Lake Store, building a simple data ingest pipeline, and then configuring the ADLS destination with credentials to write to an ADLS directory.
Pat PattersonIngest Data into Azure Data Lake Store with StreamSets Data Collector
StreamSets Data Collector has long supported both reading and writing data from and to relational databases via Java Database Connectivity (JDBC). While it was straightforward to configure pipelines to read data from individual tables, ingesting records from an entire database was cumbersome, requiring a pipeline per table. StreamSets Data Collector (SDC) 18.104.22.168 introduces the JDBC Multitable Consumer, a new pipeline origin that can read data from multiple tables through a single database connection. In this blog entry, I'll explain how the JDBC Multitable Consumer can implement a typical use case – replicating an entire relational database into Hadoop.
Pat PattersonReplicating Relational Databases with StreamSets Data Collector
Pat PattersonCalling External Java Code from Script Evaluators
Since writing tutorials for creating custom destinations and processors for StreamSets Data Collector (SDC), I've been looking for a good use case for a custom origin tutorial. It's been trickier than I expected, partly because the list of out of the box origins is so extensive, and partly because the HTTP Client origin can access most web service APIs, rendering a custom origin redundant. Then, last week, StreamSets software engineer Jeff Evans suggested Git. Creating a custom origin to read the Git commit log turned into the perfect tutorial.
Pat PattersonCreating a Custom Origin for StreamSets Data Collector
It’s been a little over a year (9/24/15) since we launched StreamSets Data Collector as an open source project. For those of you unfamiliar with the product, it’s any-to-any big data ingestion software through which you can build and place into production complex batch and streaming pipelines using built-in processors for all sorts of data transformations. The product features, plus video demos, tutorials, etc. can all be “ingested” through the SDC product page.
We’re thrilled to announce that as of last month StreamSets Data Collector had been downloaded by over ⅓ of the Fortune 100! That's several dozen of the largest companies in the U.S. And downloads of this award-winning software have been accelerating, with over 500% growth in the quarter ending in October versus the previous quarter.
Rick BilodeauMore Than One Third of the Fortune 100 Have Downloaded StreamSets Data Collector
As well as being part of our engineering culture, open source gives us a number of business advantages. Prospective users can freely download, install, evaluate and even put SDC into production, customers have access to the source code without a costly escrow process and, perhaps most importantly, our users can contribute fixes and enhancements to improve the product for the benefit of the whole community. In this post, I'd like to acknowledge some of those contributions, and invite you to contribute, too.
Pat PattersonContributing to the StreamSets Data Collector Community