Since configuring the ADLS destination is a multi-step process; our new tutorial, Ingesting Local Data into Azure Data Lake Store, walks you through the process of adding SDC an an application in Azure Active Directory, creating a Data Lake Store, building a simple data ingest pipeline, and then configuring the ADLS destination with credentials to write to an ADLS directory.
Pat PattersonIngest Data into Azure Data Lake Store with StreamSets Data Collector
Splunk indexes and correlates log and machine data, providing a rich set of search, analysis and visualization capabilities. In this blog post, I'll explain how to efficiently send high volumes of data to Splunk's HTTP Event Collector via the StreamSets Data Collector Jython Evaluator. I'll present a Jython script with which you'll be able to build pipelines to read records from just about anywhere and send them to Splunk for indexing, analysis and visualization.
Pat PattersonIngest Data into Splunk with StreamSets Data Collector
I'm frequently asked, ‘How does StreamSets Data Collector (SDC) integrate with Spark Streaming? How about on Databricks?'. In this blog entry, I'll explain how to use SDC to ingest data into a Spark Streaming app running on Databricks, but the principles apply to Spark apps running anywhere.
Pat PattersonContinuous Data Integration with StreamSets Data Collector and Spark Streaming on Databricks
Since writing tutorials for creating custom destinations and processors for StreamSets Data Collector (SDC), I've been looking for a good use case for a custom origin tutorial. It's been trickier than I expected, partly because the list of out of the box origins is so extensive, and partly because the HTTP Client origin can access most web service APIs, rendering a custom origin redundant. Then, last week, StreamSets software engineer Jeff Evans suggested Git. Creating a custom origin to read the Git commit log turned into the perfect tutorial.
Pat PattersonCreating a Custom Origin for StreamSets Data Collector
Apache Flume“is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data”. The typical use case is collecting log data and pushing it to a destination such as the Hadoop Distributed File System. In this blog entry we'll look at a couple of Flume use cases, and see how they can be implemented with StreamSets Data Collector.
Pat PattersonUpgrading From Apache Flume to StreamSets Data Collector