A successful data migration strategy involves moving data from one source to another with as little friction as possible. That friction usually comes in cost, data loss, or downtime accessing the target or destination data sources. A good migration design and a well-picked data migration tool can help limit these common sources of friction and make migrating data much smoother.
A very common data migration design involves moving from an on-prem database, like Oracle, Postgresql, or MySQL, to a cloud data warehouse like Snowflake or Redshift. However, data migration can encompass a broad range of patterns, and the following advice is intended to apply to as many of these patterns as possible.
Sources of Friction
All of the most troublesome friction points in data migration really come down to poor planning. There are a few key questions to ask and establish the answers to before your data starts moving. Here are a few to get you started:
- How much data do you need to move?
- What business requirements will the new data source be supporting? (i.e., new BI tool, advanced analytics, etc.).
- What transforms need to be applied to the data before it can be considered valid and complete?
The aim of these questions is to understand what the success metrics look like for your data migration. Draft out and answer as many questions as it requires to understand this success criteria fully.
There are two main methods of migrating data from one destination to another: all at once or over time. You might choose an all at once or bulk data migration if you’re moving from a well-designed, centralized legacy system and you plan on retiring your legacy system soon.
If your legacy system is instead decentralized into many different data marts and you plan on making a lot of changes to your data before it’s in a fit state for travel you may consider an over time or trickle in approach to data migration. This approach can involve a Change Data Capture (CDC) strategy that detects operations like UPDATES, DELETES, and INSERTS in the source database and delivers them to your target data source as they occur. The choice between the two depends entirely on your business requirements.
These patterns are also not mutually exclusive as a common macro pattern is an initial bulk migration and reciprocal incremental updates (CDC) over time. In fact, modern data integration tools like StreamSets can allow you to pivot or hybridize as your organization requires.
Regardless of which strategy you prefer, the first stage of actually migrating data is to copy over objects like tables, views, and functions to your new location. One way to do this is to create a DDL file from your old data source to your new one.
But, the easiest way to go about migrating a schema is to choose a migration tool that doesn’t require you to do it at all. StreamSets for example, is schema agnostic, allowing you to skip over this step completelyIn addition, if you’re moving into a cloud data warehouse, you most likely don’t have to move indexes over due to how these new data sources handle compute. Proving that choosing both the right migration tool and destination might save you a lot of time during this stage of the migration.
The second stage of migration is actually moving the data. There are a few different methods of accomplishing this.
- You could perform a manual data migration by extracting the data files from your source by hand and uploading them one by one into your target destination. This is only recommended for data migrations with a very small amount of data.
- You could use an ETL tool like StreamSets. Tools like this add a layer of monitoring, reusability and efficiency that can streamline your data migration. Using these tools also has the benefit of allowing your data migration strategy to include transforming your data in flight so that only clean and complete data ends up in your destination. This can enable your organization to start fresh without being held back by bad data from your legacy system.
There are some tricky things to consider with data migration. One of the most frequently occurring is data type issues. It may be that your source system is using data types that aren’t supported by your target system. These little gotchas can be corrected in stage one when you begin to move objects over if caught early.
Target data sources often have recommended file sizes for optimum performance. Snowflake, for example, recommends that files be between 100 MB and 1 GB. It would be wise to research these recommendations before beginning your data migration project and conform them as much as possible.
Ease Your Data Migration
A successful data migration strategy can be challenging to execute. To be successful, plan carefully, decide early on if you want to bulk import or trickle in your data, and lean on tools like StreamSets to ease your data migration into the finish line. Modern integration tools like Streamsets can soften the hard edges of a data migration strategy and give you one less thing to worry about.