Skip to content

StreamSets Data Integration Blog

Where change is welcome.

The Complementary Nature of Data Ingestion and Data Preparation

By May 25, 2016

I am always eager to learn about new architectures and best big data practices. Recently I came across a paper from Trifacta discussing the role of data preparation and it got me thinking about the complementary nature of data ingestion and data preparation.

Data preparation, more colorfully known as data wrangling, is the activity performed by data-driven professionals, such as data or business analysts, to explore, clean, transform and blend data of all varieties to make it trustworthy for analysis or predictive modeling. A form of data manipulation that has traditionally been achieved using Excel or, for more technically-advanced end users, languages such as R, SAS or Python. But with the rise of enormous and dynamic data sets in Hadoop, these approaches are no longer feasible. Trifacta took the lead in creating a self-service web-based solution that enables business users to access and manipulate data stored in Hadoop without needing programming skills.

May the 4th Be With You – Analyzing Star Wars Twitter Mentions in Minecraft

By May 4, 2016

Arena - high angleA couple of weeks ago, as May the 4th approached, a lively Star Wars debate brewed at StreamSets:

  • “Do new school characters get as much play as old favorites like Darth Vader, Yoda and Han Solo?”
  • “Does the Dark Side of the Force dominate the Light?”
  • “Does Yoda prevail over Darth Vader?”

It occurred to us that, with the Twitter Streaming API and StreamSets Data Collector, we didn’t have to guess or debate. We built a data flow that ingested and analyzed tweets and then displayed them in… Minecraft!

Back To Top