skip to Main Content

StreamSets Data Integration Blog

Where change is welcome.

Ingest Game-Streaming Data from the Twitch API

By May 25, 2018

Nick JastixNikolay Petrachkov (Nik for short) is a BI developer in Amsterdam by day, but in his spare time, he combines his passion for games and data engineering by building a project to analyze game-streaming data from Twitch. Nik discovered StreamSets Data Collector when he was looking for a way to build data pipelines to deliver insights from gaming data without having to write a ton of code. In this guest post, reposted from the original with his kind permission, Nik explains how he used StreamSets Data Collector to extract data about streams and games via the Twitch API. It’s a great example of applying enterprise dataops principles to a fun use case. Over to you, Nik…

DataOps: Applying DevOps to Data

By May 18, 2018

DevOps to DataOps LifecycleThe term DataOps is a contraction of ‘Data Operations’ and comes from applying DevOps to data. It seems to have been coined in a 2015 blog post by Tamr co-founder and CEO Andy Palmer. In this blog post, I’ll dive into what DataOps means today, and how enterprises can adopt its practice to create reliable, always-on dataflows using smart data pipelines to unlock the value of their data.

In his 2015 post, Palmer argued that the democratization of analytics and the implementation of “built-for-purpose” database engines created the need for DataOps. In addition to the two dynamics Palmer identified, a third has emerged: the need for analysis at the “speed of need”, which, depending on the use, can be real-time, near-real-time or with some acceptable latency. Data must be made available broadly, via a more diverse set of data stores and analytic methods, and as quickly as required by the consuming user or application.

What’s driving these three dynamics is the strategic imperative that enterprises wield their data as a competitive weapon by making it available and consumable across numerous points of use, in short, that their data enables pervasive intelligence. The centralized discipline of SQL-driven business intelligence has been subsumed into a decentralized world of advanced analytics and machine learning. Pervasive intelligence lets “a thousand flowers bloom” in order to maximize business benefits from a company’s data, whether it be speeding product innovation, lowering costs through operational excellence or reducing corporate risk.

Automating Pipeline Development with the StreamSets SDK for Python

By May 15, 2018

When it comes to creating and managing your smart data pipelines, the graphical user interfaces of StreamSets Control Hub and StreamSets Data Collector Engine put the complete power of our robust Data Operations Platform at your fingertips. There are times, however, when a more programmatic approach may be needed, and those times will be significantly more enjoyable with the release of version 3.2.0 of the StreamSets SDK for Python. In this post, I’ll describe some of the SDK’s new functionality and show examples of how you can use it to enable your own data use cases.

StreamSets Announces Control Hub version 3.2

By May 14, 2018

Today we are pleased to announce the general availability of StreamSets Control Hub version 3.2. StreamSets has built the industry’s only DataOps platform.  We call it DataOps because our platform makes it easy to iteratively update dataflows when technology changes.…

Back To Top