We’re thrilled to announce version 2.5 of StreamSets Data Collector, a major release which includes important functionality related to the Internet of Things (IoT), high-performance database ingest, integration with Apache Spark and integration into your enterprise infrastructure. You can download the latest open source release here.
MapR-DB is an enterprise-grade, high performance, NoSQL database management system. As a multi-model NoSQL database, it supports both JSON document models and wide column data models. MapR-DB stores JSON documents in tables; documents within a table in MapR-DB can have different structures. StreamSets Data Collector enables working with MapR-DB documents with its powerful schema-on-read and ingestion capability.
With StreamSets Data Collector, I’ll show you how easy it is to stream data from MongoDB into a MapR-DB table as well as stream data out of the MapR-DB table into MapR Streams.
Rupal ShahRead and Write JSON to MapR DB with StreamSets Data Collector
We are happy to announce the newest version of StreamSets Data Collector is available for download. This short release has over 25 new features and improvements and over 50 bug fixes. This is an enterprise-focused release that addresses the needs of some of the world's largest organizations using StreamSets. Below is a short list of what's new, please check out the release notes for more details.
Kirit BasuAnnouncing StreamSets Data Collector ver 18.104.22.168
Since configuring the ADLS destination is a multi-step process; our new tutorial, Ingesting Local Data into Azure Data Lake Store, walks you through the process of adding SDC an an application in Azure Active Directory, creating a Data Lake Store, building a simple data ingest pipeline, and then configuring the ADLS destination with credentials to write to an ADLS directory.
Pat PattersonIngest Data into Azure Data Lake Store with StreamSets Data Collector
StreamSets Data Collector has long supported both reading and writing data from and to relational databases via Java Database Connectivity (JDBC). While it was straightforward to configure pipelines to read data from individual tables, ingesting records from an entire database was cumbersome, requiring a pipeline per table. StreamSets Data Collector (SDC) 22.214.171.124 introduces the JDBC Multitable Consumer, a new pipeline origin that can read data from multiple tables through a single database connection. In this blog entry, I'll explain how the JDBC Multitable Consumer can implement a typical use case – replicating an entire relational database into Hadoop.
Pat PattersonReplicating Relational Databases with StreamSets Data Collector