Although the recent public preview of Amazon Managed Streaming for Kafka (MSK) certainly made headlines, Kinesis remains Amazon’s supported, production, real-time streaming service. In this blog post, I’ll show you how to get started using StreamSets Data Collector to build dataflow pipelines to send data to and receive data from Amazon Kinesis Data Streams.
This post is summarized content from a full tutorial at https://academy.datastax.com/content/ingesting-data-relational-databases-cassandra-streamsets How do you ingest from an existing relational database (RDBMS) to an Apache Cassandra or DataStax Enterprise cluster? What about a one-time batch loading of historical data vs. streaming changes? I know what some of you are […]
Mike Fuller, a consultant at Red Pill Analytics, has been working on ingesting data into Snowflake’s cloud data warehouse using StreamSets for Snowflake. In this guest blog post, Mike explains how he was able to replicate an Oracle database to Snowflake using the new functionality, both for initial load and with change data capture.
StreamSets is proud to announce their new partnership with Snowflake and the general availability release of StreamSets for Snowflake. As enterprises move more of their big data workloads to the cloud, it becomes imperative that Data Operations are more resilient and adaptive to continue to serve the business’s needs. This is why StreamSets has partnered with […]
Overview You have options when bulk loading data into RedShift from relational database (RDBMS) sources. These options include manual processes or using one of the numerous hosted as-a-service options. But, if you have broader requirements than simply importing, you need another option. Your company may have requirements such as adhering to enterprise security policies which […]
The Encrypt and Decrypt processor, introduced in StreamSets Data Collector 3.5.0, uses the Amazon AWS Encryption SDK to encrypt and decrypt data within a dataflow pipeline, and a variety of mechanisms, including the Amazon AWS Key Management Service, to manage encryption keys. In this blog post, I’ll walk through the basics of working with encryption […]
StreamSets is excited to announce the immediate availability of StreamSets for Snowflake, the first DataOps platform for Snowflake. Now users can extend their Dataops environments to the popular Snowflake service. StreamSets makes copying data from databases, streams, and event processing directly into your cloud EDW simple, without complex schema design and hand-coding. Users get high […]
Introduction Data warehouses are a critical component of modern data architecture in enterprises that leverage massive amounts of data to drive quality of their products and services. A data warehouse is an OLAP (Online Analytical Processing) database that collects data from transactional databases such as Billing, CRM, ERP, etc. and provides a layer on top […]
The StreamSets DataOps Platform was architected to scale to the largest workloads, particularly when working with continuous streams of data from systems such as Apache Kafka or Apache Pulsar. As well as the ability to scale, the platform offers a number of deployment options, allowing you to trade off complexity, performance, and cost. This blog […]
Apache Pulsar is an open-source distributed pub-sub messaging system originally created at Yahoo, and a top-level Apache project since September 2018. StreamSets Data Collector 3.5.0, released soon after, introduced the Pulsar Consumer and Producer pipeline stages. In this blog entry I’ll explain how to get started creating dataflow pipelines for Pulsar.