Ingesting Data from Relational Databases to Cassandra with StreamSets
This post is summarized content from a full tutorial at https://academy.datastax.com/content/ingesting-data-relational-databases-cassandra-streamsets
How do you ingest from an existing relational database (RDBMS) to an Apache Cassandra or DataStax Enterprise cluster?
What about a one-time batch loading of historical data vs. streaming changes?
I know what some of you are thinking, write and deploy some code. And maybe the code can utilize a framework like Apache Spark. That's what I would have thought a few years ago. But, it often turns out that's not as easy as expected.
Don't get me wrong, writing and deploying code makes sense for some folks. But for many others, writing and deploying custom code may require significant time and resources.
Are there any alternatives to custom code for Cassandra ingestion from an RDBMS?
For example, are there any third party tools available which focus on data ingestion? And if so, do they support Cassandra or DataStax Enterprise from an RDBMS?
Yes and Yes with StreamSets.
In this tutorial, we'll explore how you can use the open source StreamSets Data Collector for migrating from an existing RDBMS to DataStax Enterprise or Cassandra.
We're going to cover both batch and streaming based data ingestion from an RDBMS to Cassandra:
- Use Case 1: Initial Bulk Load of historical RDBMS based data into Cassandra (batch)
- Use Case 2: Change Data Capture (aka CDC) trickle feed from RDBMS to Cassandra to keep Cassandra updated in near real-time (streaming)
Why this matters?
- Migrate to Cassandra more quickly than writing a custom code solution
- Build confidence in your Cassandra data models and operations using real-world data
- Switch-over from an RDBMS based environment to Cassandra with minimum downtown (in fact, no downtime is possible. keep reading.)
- Utilize a tool built for data ingest, so you can focus on your business objectives which rely on Cassandra. You're not in the data ingest business, right? So why build something when you don't have to. Prioritize.
Read the full post at https://academy.datastax.com/
In this tutorial, you saw how to batch load and stream changes from an RDBMS to Cassandra using StreamSets. If you want to learn about all of our connectors please visit our website.