skip to Main Content

The DataOps Blog

Where Change Is Welcome

Ingest Game-Streaming Data from the Twitch API

By May 25, 2018

Nick JastixNikolay Petrachkov (Nik for short) is a BI developer in Amsterdam by day, but in his spare time, he combines his passion for games and data engineering by building a project to analyze game-streaming data from Twitch. Nik discovered StreamSets Data Collector when he was looking for a way to build data pipelines to deliver insights from gaming data without having to write a ton of code. In this guest post, reposted from the original with his kind permission, Nik explains how he used StreamSets Data Collector to extract data about streams and games via the Twitch API. It’s a great example of applying enterprise dataops principles to a fun use case. Over to you, Nik…

Automating Pipeline Development with the StreamSets SDK for Python

By May 15, 2018

When it comes to creating and managing your dataflow pipelines, the graphical user interfaces of StreamSets Control Hub and StreamSets Data Collector put the complete power of our robust Data Operations Platform at your fingertips. There are times, however, when a more programmatic approach may be needed, and those times will be significantly more enjoyable with the release of version 3.2.0 of the StreamSets SDK for Python. In this post, I’ll describe some of the SDK’s new functionality and show examples of how you can use it to enable your own data use cases.

Using StreamSets Control Hub with Minikube

By April 26, 2018

Hari Nayak's recent blog post provides a quickstart for using StreamSets Control Hub to deploy multiple instances of StreamSets Data Collector on Google's Kubernetes Engine (GKE).  This post modifies the core scripts from that project in order to run on Minikube rather than GKE. As Minikube can run…

Kafka + TLS/Kerberos in Cluster Streaming Mode is here!

By March 29, 2018

Spark Streaming + Data Collector + Secure Kafka

When we first introduced cluster streaming mode with Apache Spark Streaming 1.3 and Apache Kafka 0.8 several years ago, Kafka didn’t support security features such as TLS (transport encryption, authentication) and Kerberos (authentication). In Spark 2.1, an updated Kafka connector was introduced with support for these features when used with Kafka 0.10 or newer.

A Fun Example of Streaming Data into Minecraft

By March 27, 2018

Angel AlvaradoAngel Alvarado is a senior software engineer at One Degree, a San Francisco-based non-profit, and also helps run the Molanco data engineering community. In his spare time, Angel enjoys playing Minecraft with his 11 year-old-cousin. Recently, Angel, found a fun way to combine his gaming with data engineering. This blog entry, reposted from the original with Angel’s kind permission, picks up the story…

Data Engineering can get really complex really quick and being aware of the hundreds of tools and data platforms in the industry can get very overwhelming. The following project is about how to use three data engineering tools to visualize data in a video game, it aims to solve a common data engineering problem with a twist to make it fun and entertaining.

Back To Top