As fall approaches and there’s a chill in the morning air, it inevitably comes time for the annual Cloudera Data Impact awards. We are thrilled to have three finalists in the hunt this year: Vodafone, one of the world’s largest mobile operators, in the Business Impact category, Voya Financial, a Forbes 1000 financial services firm, […]
Getting Started with Cloudera’s Cybersecurity Solution (feat. StreamSets, Arcadia Data and Centrify)
This post was originally published on the Cloudera VISION blog by Sam Heywood. StreamSets configurations and images of Apache Spot Open Data Model ingest pipelines can be found here on Github. A quick conversation with most Chief Information Security Officers (CISOs) reveals they understand they need to modernize their security architecture and the correct answer […]
It’s been a little over a year (9/24/15) since we launched StreamSets Data Collector as an open source project. For those of you unfamiliar with the product, it’s any-to-any big data ingestion software through which you can build and place into production complex batch and streaming pipelines using built-in processors for all sorts of data […]
Reposted from the Cloudera Vision blog. What do Sony, Target and the Democratic Party have in common? Besides being well-respected brands, they’ve all been subject to some very public and embarrassing hacks over the past 24 months. Because cybercrime is no longer driven by angst-ridden teenagers but rather professional criminal organizations and state-sponsored hacker groups, the […]
Last week we announced the results of a survey of over 300 enterprise data professionals conducted by Dimensional Research and sponsored by StreamSets. We were trying to understand the market’s state of play for managing their big data flows. What we discovered was that there is an alarming issue at hand: companies are struggling to […]
I am always eager to learn about new architectures and best big data practices. Recently I came across a paper from Trifacta discussing the role of data preparation and it got me thinking about the complementary nature of data ingestion and data preparation. Data preparation, more colorfully known as data wrangling, is the activity performed […]
A step-by-step walkthrough of how Mac Noland implemented StreamSets to move away from hand-coded ETL and scale out an increasingly complex ingestion pipeline. Mac is a Solution Architect for phData, a Twin Cities services firm focused on Hadoop. He has spent 17 years as a software engineer and architect for projects in the legal, accounting, risk and medical device industries.
Watch StreamSets Field Engineer Jonathan “Natty” Natkins demonstrate how you can use the open source StreamSets Data Collector to flexibly handle painful “data drift” – the inevitable evolution of infrastructure, semantics and schema that leads to corrupted data and broken pipelines. Download Open Source StreamSets Data Collector at www.streamsets.com/opensource.