skip to Main Content

The DataOps Blog

Where Change Is Welcome

StreamSets Transformer: Your Questions Answered

By October 22, 2019

StreamSets Transformer, a powerful tool for creating highly instrumented Apache Spark applications for modern ETL, is the newest addition to the StreamSets DataOps Platform. StreamSets enables next-generation ETL through the StreamSets Transformer tool. The product provides enterprises with the flexibility…

A Fun Example of Streaming Data into Minecraft

By March 27, 2018

Angel AlvaradoAngel Alvarado is a senior software engineer at One Degree, a San Francisco-based non-profit, and also helps run the Molanco data engineering community. In his spare time, Angel enjoys playing Minecraft with his 11 year-old-cousin. Recently, Angel, found a fun way to combine his gaming with data engineering. This blog entry, reposted from the original with Angel’s kind permission, picks up the story…

Data Engineering can get really complex really quick and being aware of the hundreds of tools and data platforms in the industry can get very overwhelming. The following project is about how to use three data engineering tools to visualize data in a video game, it aims to solve a common data engineering problem with a twist to make it fun and entertaining.

May the 4th Be With You – Analyzing Star Wars Twitter Mentions in Minecraft

By May 4, 2016

Arena - high angleA couple of weeks ago, as May the 4th approached, a lively Star Wars debate brewed at StreamSets:

  • “Do new school characters get as much play as old favorites like Darth Vader, Yoda and Han Solo?”
  • “Does the Dark Side of the Force dominate the Light?”
  • “Does Yoda prevail over Darth Vader?”

It occurred to us that, with the Twitter Streaming API and StreamSets Data Collector, we didn’t have to guess or debate. We built a data flow that ingested and analyzed tweets and then displayed them in … Minecraft!

Visualizing Apache Log Data… with Minecraft!

By March 18, 2016

Apache log data in MinecraftA key differentiator of StreamSets Data Collector (SDC) is that it operates in continuous mode – set a pipeline running and it will continue to read files from a directory or take messages from a queue. A Twitter conversation with Richard Tuttle, a solution architect at CRM Science, prompted me to wonder, would it be possible to ingest Apache Web Server log data, lookup the geolocation from the client IP address, and plot the results on a map… in Minecraft?

Using Open Source StreamSets to Tackle Data Drift (video)

By October 27, 2015
Watch StreamSets Field Engineer Jonathan “Natty” Natkins demonstrate how you can use the open source StreamSets Data Collector to flexibly handle painful “data drift” – the inevitable evolution of infrastructure, semantics and schema that leads to corrupted data and broken pipelines.

Download Open Source StreamSets Data Collector at
Back To Top