Pat Patterson, Author at StreamSets

Using Docker Wrong: My Journey to a Better Container

By Pat Patterson July 3, 2018

Following on from last week's guest post from MapR's Ian Downard on integrating StreamSets Data Collector with MapR Persistent Application Client Container (PACC), MapR Distinguished Technologist John Omernik offers a cautionary tale on examining your assumptions before jumping into the world of…

Using StreamSets and MapR Together in Docker

Data Integration

By Pat Patterson June 26, 2018

Today's guest blogger is Ian Downard, a Senior Developer Evangelist at MapR Technologies. Ian focuses on machine learning and data engineering, and recently documented how he brought together the MapR Persistent Application Client Container (PACC) with StreamSets Data Collector and Docker to build pipelines…

Streaming Extreme Data Made Simple with Kinetica and StreamSets

By Pat Patterson June 21, 2018

Kinetica, just one of dozens of origins and destinations supported by StreamSets Data Collector, is a distributed, in-memory, GPU database designed for geospatial analysis, machine learning, predictive analytics, and other workloads requiring high performance parallel processing. Mathew Hawkins, a Principal Solutions Architect at Kinetica, recently…

Extract Data from Google Analytics using StreamSets Data Collector

Operational Analytics

By Pat Patterson June 19, 2018

Angel Alvarado is a senior software engineer at One Degree, a San Francisco-based non-profit, and also helps run the Molanco data engineering community. Angel previously contributed a Fun Example of Streaming Data into Minecraft; this time he get serious with the Google Analytics API. Many…

RingCentral Scales Out Big Data Streaming with StreamSets

Stream Data Processing

By Pat Patterson June 14, 2018

RingCentral is an award-winning global provider of cloud-unified communications and collaboration solutions. RingCentral solutions empower today’s mobile and distributed workforces to be connected anywhere and on any device through voice, video, team messaging, collaboration, SMS, conferencing, online meetings, contact center,…

Change Data Capture from Oracle with StreamSets Data Collector

By Pat Patterson June 12, 2018

Editor's Note: StreamSets no longer relies on the continuous miner function in Oracle. Here is an update on Oracle 19c Bulk Ingest and CDC. Today's guest post is by Franck Pachot, an Oracle Consultant at dbi services in Switzerland. Franck has over…

Ingest Game-Streaming Data from the Twitch API

Stream Data Processing

Data Integration

By Pat Patterson May 25, 2018

Nikolay Petrachkov (Nik for short) is a BI developer in Amsterdam by day, but in his spare time, he combines his passion for games and data engineering by building a project to analyze game-streaming data from Twitch. Nik discovered StreamSets Data Collector when he was looking for a way to build data pipelines to deliver insights from gaming data without having to write a ton of code. In this guest post, reposted from the original with his kind permission, Nik explains how he used StreamSets Data Collector to extract data about streams and games via the Twitch API. It’s a great example of applying enterprise dataops principles to a fun use case. Over to you, Nik…

DataOps: Applying DevOps to Data

Data Integration

By Pat Patterson May 18, 2018

The term DataOps is a contraction of ‘Data Operations’ and comes from applying DevOps to data. It seems to have been coined in a 2015 blog post by Tamr co-founder and CEO Andy Palmer. In this blog post, I’ll dive into what DataOps means today, and how enterprises can adopt its practice to create reliable, always-on dataflows using smart data pipelines to unlock the value of their data.

In his 2015 post, Palmer argued that the democratization of analytics and the implementation of “built-for-purpose” database engines created the need for DataOps. In addition to the two dynamics Palmer identified, a third has emerged: the need for analysis at the “speed of need”, which, depending on the use, can be real-time, near-real-time or with some acceptable latency. Data must be made available broadly, via a more diverse set of data stores and analytic methods, and as quickly as required by the consuming user or application.

What’s driving these three dynamics is the strategic imperative that enterprises wield their data as a competitive weapon by making it available and consumable across numerous points of use, in short, that their data enables pervasive intelligence. The centralized discipline of SQL-driven business intelligence has been subsumed into a decentralized world of advanced analytics and machine learning. Pervasive intelligence lets “a thousand flowers bloom” in order to maximize business benefits from a company’s data, whether it be speeding product innovation, lowering costs through operational excellence or reducing corporate risk.

Mini MapR Academy: How the ACT Government Uses Data Collector w/ MapR (videos)

Data Integration

Data Transformation

By Pat Patterson April 23, 2018

Selvaraaju (‘Selva') Murugesan is Senior Manager for Innovation and Data Analytics in the Australian Capital Territory (ACT) Government. Selva focuses on data management practices and data analytics, using StreamSets Data Collector to extract data from different databases, perform data cleansing on the fly and…

Efficient Splunk Ingest for Cybersecurity

Data Transformation

Stream Data Processing

By Pat Patterson April 17, 2018

Many StreamSets customers use Splunk to mine insights from machine-generated data such as server logs, but one problem they encounter with the default tools is that they have no way to filter the data that they are forwarding. While Splunk is a great tool for searching and analyzing machine-generated data, particularly in cybersecurity use cases, it’s easy to fill it with redundant or irrelevant data, driving up costs without adding value. In addition, Splunk may not natively offer the types of analytics you prefer, so you might also need to send that data elsewhere.

In this blog entry I’ll explain how, with StreamSets Control Hub, we can build a topology of pipelines for efficient Splunk data ingestion to support cybersecurity and other domains, by sending only necessary and unique data to Splunk and routing other data to less expensive and/or more analytics-rich platforms.

StreamSets Data Integration Blog

Using Docker Wrong: My Journey to a Better Container

Using StreamSets and MapR Together in Docker

Streaming Extreme Data Made Simple with Kinetica and StreamSets

Extract Data from Google Analytics using StreamSets Data Collector

RingCentral Scales Out Big Data Streaming with StreamSets

Change Data Capture from Oracle with StreamSets Data Collector

Ingest Game-Streaming Data from the Twitch API

DataOps: Applying DevOps to Data

Mini MapR Academy: How the ACT Government Uses Data Collector w/ MapR (videos)

Efficient Splunk Ingest for Cybersecurity

Stay in Touch

Connect