Apache Flume“is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data”. The typical use case is collecting log data and pushing it to a destination such as the Hadoop Distributed File System. In this blog entry we'll look at a couple of Flume use cases, and see how they can be implemented with StreamSets Data Collector.
Pat PattersonUpgrading From Apache Flume to StreamSets Data Collector
It’s been a little over a year (9/24/15) since we launched StreamSets Data Collector as an open source project. For those of you unfamiliar with the product, it’s any-to-any big data ingestion software through which you can build and place into production complex batch and streaming pipelines using built-in processors for all sorts of data transformations. The product features, plus video demos, tutorials, etc. can all be “ingested” through the SDC product page.
We’re thrilled to announce that as of last month StreamSets Data Collector had been downloaded by over ⅓ of the Fortune 100! That's several dozen of the largest companies in the U.S. And downloads of this award-winning software have been accelerating, with over 500% growth in the quarter ending in October versus the previous quarter.
Rick BilodeauMore Than One Third of the Fortune 100 Have Downloaded StreamSets Data Collector
As well as being part of our engineering culture, open source gives us a number of business advantages. Prospective users can freely download, install, evaluate and even put SDC into production, customers have access to the source code without a costly escrow process and, perhaps most importantly, our users can contribute fixes and enhancements to improve the product for the benefit of the whole community. In this post, I'd like to acknowledge some of those contributions, and invite you to contribute, too.
Pat PattersonContributing to the StreamSets Data Collector Community
What do Sony, Target and the Democratic Party have in common?
Besides being well-respected brands, they’ve all been subject to some very public and embarrassing hacks over the past 24 months. Because cybercrime is no longer driven by angst-ridden teenagers but rather professional criminal organizations and state-sponsored hacker groups, the halcyon days of looking for a threat signatures are well behind us.
Rick BilodeauThe Challenge of Fetching Data for Apache Spot (incubating)
Sandish Kumar, a Solutions Engineer at phData, builds and manages solutions for phData customers. In this article, reposted from the phData blog, he explains how to generate simulated NetFlow data, read it into StreamSets Data Collector via the UDP origin, then buffer it in Apache Kafka before sending it to Apache Kudu. A true big data enthusiast, Sandish spends his spare time working to understand Kudu internals.
Pat PattersonVisualizing NetFlow Data with StreamSets Data Collector, Kudu, Impala and D3
Last October, we publicly announced StreamSets Data Collector version 1.0. Over the last 12 months we have seen an awesome (a word we don't use lightly) amount of adoption of our first product – from individual developers simplifying their day-to-day work, to small startups building the next big thing, to the very largest companies building global scale enterprise architectures with StreamSets Data Collector at its core.
Kirit BasuAnnouncing StreamSets Data Collector version 2.0
Apache Kudu and Open Source StreamSets Data Collector Simplify Batch and Real-Time Processing
As originally posted on the Cloudera VISION Blog.
At StreamSets, we come across dataflow challenges for a variety of applications. Our product, StreamSets Data Collector is an open-source any-to-any dataflow system that ensures that all your data is safely delivered in the various systems of your choice. At its core is the ability to handle data drift that allows these dataflow pipelines to evolve with your changing data landscape without incurring redesign costs.
This position at the front of the data pipeline has given us visibility into various use cases, and we have found that many applications rely on patched-together architectures to achieve their objective.
Arvind PrabhakarCreating a Post-Lambda World with Apache Kudu