Yannick Jaquier is a Database Technical Architect at STMicroelectronics in Geneva, Switzerland. Recently, Yannick started experimenting with StreamSets Data Collector’s Oracle CDC Client origin, building pipelines to replicate data to a second Oracle database, a JSON file, and a MySQL database. Yannick documented his journey very thoroughly, and kindly gave us permission to repost his findings […]
Randy Zwitch is a Senior Director of Developer Advocacy at OmniSci, enabling customers and community users alike to utilize OmniSci to its fullest potential. With broad industry experience in energy, digital analytics, banking, telecommunications and media, Randy brings a wealth of knowledge across verticals as well as an in-depth knowledge of open-source tools for analytics. […]
One of the drivers behind Slack‘s rise as an enterprise collaboration tool is its rich set of integration mechanisms. Bots can monitor channels for messages and reply as if they were users, apps can send and receive messages and more via a wide range of APIs, and slash commands allow users to interact with external […]
Vinu Kumar is Chief Technologist at HorizonX, based in Sydney, Australia. Vinu helps businesses in unifying data, focusing on a centralized data architecture. In this guest post, reposted from the original here, he explains how to automate data quality using open source tools such as StreamSets Data Collector, Apache Griffin and Apache Kafka. “Data is the new oil. […]
Although the recent public preview of Amazon Managed Streaming for Kafka (MSK) certainly made headlines, Kinesis remains Amazon’s supported, production, real-time streaming service. In this blog post, I’ll show you how to get started using StreamSets Data Collector to build dataflow pipelines to send data to and receive data from Amazon Kinesis Data Streams.
Mike Fuller, a consultant at Red Pill Analytics, has been working on ingesting data into Snowflake’s cloud data warehouse using StreamSets for Snowflake. In this guest blog post, Mike explains how he was able to replicate an Oracle database to Snowflake using the new functionality, both for initial load and with change data capture.
The Encrypt and Decrypt processor, introduced in StreamSets Data Collector 3.5.0, uses the Amazon AWS Encryption SDK to encrypt and decrypt data within a dataflow pipeline, and a variety of mechanisms, including the Amazon AWS Key Management Service, to manage encryption keys. In this blog post, I’ll walk through the basics of working with encryption […]
The StreamSets DataOps Platform was architected to scale to the largest workloads, particularly when working with continuous streams of data from systems such as Apache Kafka or Apache Pulsar. As well as the ability to scale, the platform offers a number of deployment options, allowing you to trade off complexity, performance, and cost. This blog […]
Apache Pulsar is an open-source distributed pub-sub messaging system originally created at Yahoo, and a top-level Apache project since September 2018. StreamSets Data Collector 3.5.0, released soon after, introduced the Pulsar Consumer and Producer pipeline stages. In this blog entry I’ll explain how to get started creating dataflow pipelines for Pulsar.
Minneapolis-based phData has long been a StreamSets partner, deploying the StreamSets DataOps Platform at customers across the US. It’s not surprising then, that when phData principal solutions architect Keith Smith wanted to integrate the Ethereum blockchain platform with the Apache Hadoop filesystem and Apache Kudu, he reached for StreamSets Data Collector.