StreamSets Data Integration Blog
Where change is welcome.
AWS Reference Architecture Guide for StreamSets
Using StreamSets DataOps Platform To Integrate Data from PostgreSQL to AWS S3 and Redshift: A Reference Architecture This document describes…
Visualizing Apache Log Data… with Minecraft!
A key differentiator of StreamSets Data Collector (SDC) is that it operates in continuous mode - set a pipeline running and it will continue to read files from a directory or take messages from a queue. A Twitter conversation with Richard Tuttle, a solution architect at CRM Science, prompted me to wonder, would it be possible to ingest Apache Web Server log data, lookup…
What’s the Biggest Lot in the City of San Francisco?
After building my first pipeline with StreamSets Data Collector, I wanted to give the framework more of a workout. I've spent a lot of time working with JSON data over the past few years, and the biggest, baddest JSON data set I can easily get hold of is a 181MB file containing the address and coordinates of all 206,560 city lots in San Francisco. Not…
Getting Started with StreamSets Data Collector
Hi, I'm Pat Patterson, newly minted 'community champion' here at StreamSets. As I get up to speed with big data in general and StreamSets Data Collector (SDC) in particular, I'll write up my exploits here on the StreamSets blog to help other novices as they get started with open source big data ingest. I'm going to assume you know the…
Announcing StreamSets Data Collector ver 1.2.2.0
We’re happy to announce a new version of the StreamSets Data Collector.
Building a Real-Time Retail Analytics Solution with StreamSets, MapR Streams and MapR FS
Today’s complex retail applications have changed dramatically and in order to compete, enterprises must adopt new strategies for working with data. Big data and Hadoop enable retailers to connect with customers through multiple channels at new levels by leveraging traditional and real-time data sources for stream processing and analytics. These data sources often have the characteristics of varying volumes, velocity…