Pat Patterson, Author at StreamSets

By Pat Patterson May 7, 2020

StreamSets Data Collector’s HTTP pipeline stages allow a wide range of API integrations. I recently built a data pipeline to integrate customer data from a MySQL database: retrieving, creating and updating leads in Marketo via their REST API. I’ll share…

Ingesting AWS Marketplace Analytics Data into MySQL

Data Integration

Cloud Data Migration

Operational Analytics

By Pat Patterson April 28, 2020

StreamSets Data Collector is available in the major cloud marketplaces; as an AWS Marketplace vendor, StreamSets has access to a number of data sets detailing product usage and billing. Data sets are available interactively via a web portal and via…

Enhanced Error Diagnostics in StreamSets Data Collector 3.9.0

Data Integration

By Pat Patterson July 13, 2019

StreamSets Data Collector reads from and writes to a wide variety of data stores and messaging platforms. Any interaction with an external system brings with it the risk of an error, and error messages are often less than helpful at pinpointing the root cause of the problem. Version 3.9.0 of Data Collector, released a few weeks ago, includes an extensible rule-based system, codenamed ‘Antenna Doctor’, that makes dataflow pipeline development easier than ever.

From Zero to Production ETL in 30 minutes with StreamSets

Data Integration

Cloud Data Migration

By Pat Patterson June 14, 2019

Jeff Schmitz has been working with big data for over a decade: at Shell, Sanchez Energy, MapR and, currently, as a senior solutions architect at MongoDB. Here, in a guest post reposted with permission from the original, Jeff shares his early experience with StreamSets Data Collector.

Now that I work for MongoDB I work with StreamSets quite a bit however a while back I was on a different journey at a customer site. We were struggling with picking an ETL tool. Many that we looked at were very pricey and required significant admin and resource time to manage. Luckily we found StreamSets…

Ingesting Data from Apache Kafka to TimescaleDB

Data Integration

Stream Data Processing

By Pat Patterson May 28, 2019

The Glue Conference (better known as GlueCon) is always a treat for me. I’ve been speaking there since 2012, and this year I presented a session explaining how I use StreamSets Data Collector to ingest content delivery network (CDN) data from compressed CSV files in S3 to MySQL for analysis, using the Kickfire API to turn IP addresses into company data. The slides are here, and I’ll write it up in a future blog post.

As well as speaking, I always enjoy the keynotes (shout out to Leah McGowen-Hare for an excellent presentation on inclusion!) and breakouts. In one of this year’s breakouts, Diana Hsieh, director of product management at Timescale, focused on the TimescaleDB time series database.

Oracle Replication to MySQL and JSON

Data Integration

Change Data Capture

By Pat Patterson May 10, 2019

Yannick Jaquier is a Database Technical Architect at STMicroelectronics in Geneva, Switzerland. Recently, Yannick started experimenting with StreamSets Data Collector Engine‘s Oracle CDC Client origin, building pipelines for Oracle replication–replicating data to a second Oracle database, a JSON file, and a MySQL database. Yannick documented his journey very thoroughly, and kindly gave us permission to repost his findings from his original blog entry.

Creating the OmniSci F1 Demo: Real-Time Data Ingestion With StreamSets

Data Integration

By Pat Patterson May 8, 2019

Randy Zwitch is a Senior Director of Developer Advocacy at OmniSci, enabling customers and community users alike to utilize OmniSci to its fullest potential. With broad industry experience in energy, digital analytics, banking, telecommunications and media, Randy brings a wealth of knowledge across verticals as well as an in-depth knowledge of open-source tools for analytics. In this guest blog post, reposted from the original with permission, Randy explains the Formula 1 demo he built with StreamSets Data Collector to show real-time telemetry ingestion into OmniSci’s GPU-accelerated analytics platform.

Building a Slack Slash Command as a Microservice Pipeline

Data Integration

By Pat Patterson March 26, 2019

One of the drivers behind Slack‘s rise as an enterprise collaboration tool is its rich set of integration mechanisms. Bots can monitor channels for messages and reply as if they were users, apps can send and receive messages and more via a wide range of APIs, and slash commands allow users to interact with external systems from the Slack message box. In this blog post, I’ll explain how I implemented a sample slash command as a microservice pipeline in StreamSets Data Collector 3.8.0, allowing users to look up stock item names and URLs from stock item numbers. Use this as the basis for creating your own slash command!

Solving Data Quality in Smart Data Pipelines

Data Integration

By Pat Patterson March 13, 2019

Vinu Kumar is Chief Technologist at HorizonX, based in Sydney, Australia. Vinu helps businesses in unifying data, focusing on a centralized data architecture. In this guest post, reposted from the original here, he explains how to automate solving data quality using open source…

Creating Dataflow Pipelines with Amazon Kinesis

Data Integration

Cloud Data Migration

By Pat Patterson February 8, 2019

Although the recent public preview of Amazon Managed Streaming for Kafka (MSK) certainly made headlines, Kinesis remains Amazon’s supported, production, real-time streaming service. In this blog post, I’ll show you how to get started using StreamSets Data Collector to build dataflow pipelines to send data to and receive data from Amazon Kinesis Data Streams.

StreamSets Data Integration Blog

Data Integration, Meet Marketing Automation! Integrating MySQL with Marketo

Ingesting AWS Marketplace Analytics Data into MySQL

Enhanced Error Diagnostics in StreamSets Data Collector 3.9.0

From Zero to Production ETL in 30 minutes with StreamSets

Ingesting Data from Apache Kafka to TimescaleDB

Oracle Replication to MySQL and JSON

Creating the OmniSci F1 Demo: Real-Time Data Ingestion With StreamSets

Building a Slack Slash Command as a Microservice Pipeline

Solving Data Quality in Smart Data Pipelines

Creating Dataflow Pipelines with Amazon Kinesis

Stay in Touch

Connect