skip to Main Content

The DataOps Blog

Where Change Is Welcome

The Complementary Nature of Data Ingestion and Data Preparation

By May 25, 2016

I am always eager to learn about new architectures and best big data practices. Recently I came across a paper from Trifacta discussing the role of data preparation and it got me thinking about the complementary nature of data ingestion and data preparation.

Data preparation, more colorfully known as data wrangling, is the activity performed by data-driven professionals, such as data or business analysts, to explore, clean, transform and blend data of all varieties to make it trustworthy for analysis or predictive modeling. A form of data manipulation that has traditionally been achieved using Excel or, for more technically-advanced end users, languages such as R, SAS or Python. But with the rise of enormous and dynamic data sets in Hadoop, these approaches are no longer feasible. Trifacta took the lead in creating a self-service web-based solution that enables business users to access and manipulate data stored in Hadoop without needing programming skills.

How Trend Micro Uses StreamSets – An Interview with the Threat Research Team

By March 21, 2016
The Forward-Looking Threat Research team at Trend Micro were early adopters of StreamSets Data Collector. They use StreamSets to ingest data from a wide variety of sources to create a Threat Assessment Dashboard in Elasticsearch. In this interview, we talk with members of their team about how they evaluated StreamSets and implemented it in their production environment in a short period of time.

Continuous Ingest in the Face of Data Drift (from the Cloudera Vision Blog)

Arvind Prabhakar By February 1, 2016

Big data has come a long way, with adoption accelerating as CIOs recognize the business value of extracting insights from the troves of data collected by their companies and business partners. But, as is often the case with innovations, mainstream adoption of big data has exposed a new challenge: how to ingest data continuously from any source and with high quality. Indeed, we have found that there are environmental causes that make it next to impossible to scale ingestion using current approaches, and this has serious implications for scaling big data projects.

Back To Top