Rick is StreamSets' VP of Marketing, who excels in managing and directing all the StreamSets marketing efforts. Rick has been achieving success for Fortune 500 companies for decades, specializing in marketing for the technology sector.
A quick conversation with most Chief Information Security Officers (CISOs) reveals they understand they need to modernize their security architecture and the correct answer is to adopt a machine learning and analytics platform as a fundamental and durable part of their data strategy. However, many CISOs fear deployment of an initial use case will be somewhat daunting. Cloudera has partnered along with Arcadia Data and StreamSets to make it easier than ever for CISOs to take the first step and deploy basic use cases leveraging data sources common to many environments.
Rick BilodeauGetting Started with Cloudera’s Cybersecurity Solution (feat. StreamSets, Arcadia Data and Centrify)
It’s been a little over a year (9/24/15) since we launched StreamSets Data Collector as an open source project. For those of you unfamiliar with the product, it’s any-to-any big data ingestion software through which you can build and place into production complex batch and streaming pipelines using built-in processors for all sorts of data transformations. The product features, plus video demos, tutorials, etc. can all be “ingested” through the SDC product page.
We’re thrilled to announce that as of last month StreamSets Data Collector had been downloaded by over ⅓ of the Fortune 100! That's several dozen of the largest companies in the U.S. And downloads of this award-winning software have been accelerating, with over 500% growth in the quarter ending in October versus the previous quarter.
Rick BilodeauMore Than One Third of the Fortune 100 Have Downloaded StreamSets Data Collector
What do Sony, Target and the Democratic Party have in common?
Besides being well-respected brands, they’ve all been subject to some very public and embarrassing hacks over the past 24 months. Because cybercrime is no longer driven by angst-ridden teenagers but rather professional criminal organizations and state-sponsored hacker groups, the halcyon days of looking for a threat signatures are well behind us.
Rick BilodeauThe Challenge of Fetching Data for Apache Spot (incubating)
Last week we announced the results of a survey of over 300 enterprise data professionals conducted by Dimensional Research and sponsored by StreamSets. We were trying to understand the market’s state of play for managing their big data flows. What we discovered was that there is an alarming issue at hand: companies are struggling to detect and keep bad data out of their stores.
There is a bad data problem within big data
When we asked data pros about their challenges with big data flows, the most-cited issue was ensuring the quality of the data in terms of accuracy, completeness and consistency, getting votes from over ⅔ of respondents. Security and operations were also listed by more than half. The fact that quality was rated as a more common challenge than even security and compliance is quite telling, as you usually can count on security to be voted the #1 challenge for most IT domains.
Rick BilodeauSurvey Shows Enterprises Struggling with Bad Data
I am always eager to learn about new architectures and best big data practices. Recently I came across a paper from Trifacta discussing the role of data preparation and it got me thinking about the complementary nature of data ingestion and data preparation.
Data preparation, more colorfully known as data wrangling, is the activity performed by data-driven professionals, such as data or business analysts, to explore, clean, transform and blend data of all varieties to make it trustworthy for analysis or predictive modeling. A form of data manipulation that has traditionally been achieved using Excel or, for more technically-advanced end users, languages such as R, SAS or Python. But with the rise of enormous and dynamic data sets in Hadoop, these approaches are no longer feasible. Trifacta took the lead in creating a self-service web-based solution that enables business users to access and manipulate data stored in Hadoop without needing programming skills.
Rick BilodeauThe Complementary Nature of Data Ingestion and Data Preparation
This is a nice example of Kafka enablement using Maxwell (a mysql-to-kafka binlog processor) and StreamSets Data Collector from the folks at B23. It includes a schema change listener for handling data drift. Enjoy!
A step-by-step walkthrough of how Mac Noland implemented StreamSets to move away from hand-coded ETL and scale out an increasingly complex ingestion pipeline. Mac is a Solution Architect for phData, a Twin Cities services firm focused on Hadoop. He has spent 17 years as a software engineer and architect for projects in the legal, accounting, risk and medical device industries.
Rick BilodeauIngesting Streaming Data from JMS into HDFS and Solr using StreamSets
Watch StreamSets Field Engineer Jonathan “Natty” Natkins demonstrate how you can use the open source StreamSets Data Collector to flexibly handle painful “data drift” – the inevitable evolution of infrastructure, semantics and schema that leads to corrupted data and broken pipelines.