Replicating Oracle to MySQL and JSON

Replicating Oracle to MySQL and JSON

YannickYannick Jaquier is a Database Technical Architect at STMicroelectronics in Geneva, Switzerland. Recently, Yannick started experimenting with StreamSets Data Collector's Oracle CDC Client origin, building pipelines to replicate data to a second Oracle database, a JSON file, and a MySQL database. Yannick documented his journey very thoroughly, and kindly gave us permission to repost his findings from his original blog entry.

Preamble

I came across a nice overview of Oracle CDC with StreamSets by Franck Pachot and shared it with a few teammates; they were all interested in the StreamSets Data Collector product. One of the main reasons is the obvious cost cutting versus Oracle GoldenGate; we have implemented GoldenGate in a project deployed worldwide. The Data Collector product is free and can handle INSERT, UPDATE, SELECT_FOR_UPDATE, and DELETE DML operations for one or more tables in a database. To handle DDL you have to check the “Produce Events” checkbox and handle the generated events in an appropriate target; this is a little bit more complex and outside the scope of this blog post.

I decided to try the product out and implement it for what we currently do with GoldenGate: building a reporting environment of our production database. The target is also an Oracle database but might in future be MySQL.

My testing environment comprises three servers:

  • server1.domain.com (192.168.56.101) is the primary database server.
  • server2.domain.com (192.168.56.102) is the secondary database server. The server hosting the databases (Oracle & MySQL) where data should land.
  • server4.domain.com (192.168.56.104) is the StreamSets Data Collector server.

Oracle database release is 18c Enterprise Edition Release 18.3.0.0.0.Data Collector is release 3.4.1. MySQL release is 8.0.12 MySQL Community Server.

The three servers are in fact VirtualBox guests running Oracle Linux Server release 7.5.

StreamSets Data Collector installation

I first created a Linux streamsets account (in users group) with a /streamsets dedicated filesystem:

I installed Java SE Development Kit 8 (jdk-8u181-linux-x64.rpm). Only release 8 is supported so far

At the end of /etc/security/limits.conf I added:

Now I can launch the process with:

Then from any browser (at this url for me, http://server4.domain.com:18630/) you should get this login window (admin/admin as default username/password):

StreamSets login

Once logged in you see:

StreamSets get started

By default, the Oracle CDC (Change Data Capture) origin requires the Oracle JDBC thin driver. Click on Package Manager in the top right tool bar (third icon). If you go in JDBC you see that nothing is there (the error message is because I’m behind a company proxy):

When I tried to import my JDBC thin driver file (ojdbc8.jar, the name for 18c (18.3) is the same as for 12.2.0.1, but the size differs) I saw that the JDBC category was not there:

Install external libraries

I spent a bit of time before discovering that all of the instructions were on the StreamSets Data Collector download page:

Relaunching Streamsets Data Collector and returning to the web interface I now saw JDBC as a possible library:

Install JDBC driver

Next I imported the JDBC driver:

You should be prompted to restart StreamSets Data Collector and see the below screen:

Package manager

Oracle Database LogMiner Configuration

This source multi-tenant Oracle database is common for the three scenarios I decided to test, so I had to configure it only once. On this source Oracle database I created a global account able to manage LogMiner (global because LogMiner is accessible from the root pluggable database in a multi-tenant architecture):

Change the source Oracle database to archivelog mode and activate the default supplemental log:

I also created a test schema on my pluggable database (pdb1) to handle my test table:

I created a test table and added the supplemental log:

And inserted a few rows in it to simulate an already existing environment:

On the source database I generated a dictionary in the redo log. If you choose to set “Dictionary Source” to Online Catalog then this step is not mandatory. It is also much faster on small deployments to use the Online Catalog so it's really up to you:

StreamSets Data Collector Oracle to Oracle Replication

Oracle Prerequisites

On the target Oracle database I created an account in my target pluggable database (pdb1):

On the source database create an export directory and grant read and write on it to the test01 account:

Get the current System Change Number (SCN) on the source database with:

Finally export the schema with:

Import the schema on the target database with something like (in the pdb1 pluggable database):

StreamSets Data Collector Configuration

Create the pipeline:

Create pipeline

Choose Oracle CDC Client as the origin (for replicating from Oracle this is the de facto option to choose):

Oracle CDC origin

Configure all parameters. I have chosen the most reliable solution to get dictionary from redo logs in case we want to test schema changes (DDL). As the simplest test I have chosen ‘From Latest Change' – this processes changes that arrive after you start the pipeline. I will re-configure it after the database settings:

Configure Oracle CDC client

Below query can help you to choose correct database time zone:

Set the JDBC connection string (I am in a multi-tenant configuration):

JDBC Configuration

Use the global account we defined earlier:

Credentials configuration

I didn't add a processor to modify data between the source and target but obviously this is possible:

Processor list

Now I set the JDBC connection string in the JDBC Producer destination. I am in a multi-tenant configuration but here I just connected to the pluggable destination database directly:

Local pluggable database user to connect with:

Destination credentials

In the StreamSets configuration I configured the Oracle CDC Client to start at a particular SCN (the one I extracted above):

Configure SCN

StreamSets Data Collector Oracle to Oracle replication testing

Then on source database I started inserting new data:

The data appeared on the target database, visible via a nice monitoring screen for the pipeline:

Runni

Then to generate a bit of traffic I used the below PL/SQL script (number of inserted rows is up to you; I have personally done multiple tests):

If you capture the monitoring screen while it’s running you should be able to see transfer rate figures:

Transfer rate figures

StreamSets Data Collector Oracle to JSON file generation

As well as JDBC insertion in a target Oracle database I also wanted to test generation of a JSON file. I started by adding a new destination called Local FS and then used the mouse in the GUI interface to draw a new line between the Oracle CDC Client origin and the Local FS destination. The only parameter I changed was generated Data Format to JSON:

Generate JSON file

For each row in the source table (I restarted from an empty table) the record was duplicated in:

  • The same target Oracle database as above.
  • A text file, on the StreamSets Data Collector server, in JSON format.

We can see the output generated records is equal to two:

Writing to JSON

The output file (located on the server where StreamSets Data Collector is running i.e. server4.domain.com):

StreamSets Data Collector Oracle to MySQL replication

I created a small MySQL 8 instance, used my personal account to connect to it and created a test01 database to map the schema name of the source Oracle pluggable database:

I created the target table the same as one of the source Oracle tables:

Remark:
I started from an empty table, but if that's not the case for you then you should export and import pre-existing data.

The JDBC connect string for MySQL is (3322 is my MySQL listening port):

I also set two JDBC parameters (see Errors encountered section):

MySQL JDBC configuration

The Schema Name parameter for MySQL must be inserted in lower case, so test01 in my case, and Table Name must also be in lowercase so use below formula (str:toLower function) to convert uppercase Oracle table name to lower case:

Table name mapping

We can see that rows have been inserted in the MySQL target table:

Errors encountered

JDBC_52 – Error starting LogMiner

In sdc.log or in View Logs of the user interface you might see something like:

It was a mistake from my side where I deleted the archived log file directly from the disk. I recovered the situation with:

I encountered another similar error:

It was simply because no archived log file contained a dictionary log. This can happen when you purge archived log files. Generate one with:

JDBC_44 – Error while getting changes due to error: java.sql.SQLRecoverableException: Closed Connection: getBigDecimal

When validating pipeline I got:

In sdc.log:

From what I have seen it looks like my test server is too slow; if you get this you might need to increase the timeout parameters in the Advanced tab of Oracle CDC client.

ORA-01291: missing logfile

This one kept me busy for a while:

One reason I have identified is because StreamSets starts LogMiner two hours in the past even when you choose from latest changes:

I suspect it occurs because Oracle CDC Client LogMiner Session Window parameter is set to 2 hours you must have the archived log files from last 2 hours available when starting the pipeline. So never ever purge archivelog file with:

But use:

Even when applying this I also noticed StreamSets was always starting LogMiner from the time where the pipeline has failed or when you have stopped it. This is saved in the offset.json file:

If this is expected and you know what you are doing (first setup, testing, …) you can reset the pipeline with the graphical interface:

Reset origin

Confirm that you know what you are doing:

Reset origin 2

This results in an empty offset.json file:

JDBC_16 – Table ” does not exist or PDB is incorrect. Make sure the correct PDB was specified

In JDBC Producer I replaced the Table Name field:

with

Then it failed with:

In JDBC Producer Errors section I noticed:

Because I inserted the record on master database with SYS account, I tried with TEST01 account but it failed for exact same error…

I finally found the solution when setting Schema Name field to TEST01, in uppercase, because as suggested in Oracle CDC Client documentation – Oracle uses all caps for schema, table, and column names by default.

JDBC_00 – Cannot connect to specified database: com.zaxxer.hikari.pool.PoolInitializationException: Exception during pool initialization: The server time zone value ‘CEST’ is unrecognized or represents more than one time zone.

The full error message also contains:

I had to add an additional JDBC property to set the server time zone:

Establishing SSL connection without server’s identity verification is not recommended

To solve this set in JDBC driver parameters:

JdbcGenericRecordWriter – No parameters found for record with ID

Complete error message is:

For Oracle to MySQL replication I had to do explicit column mapping like this:

Field column mapping

References

Related Resources

Check out StreamSets white papers, videos, webinars, report and more.

Visit the Resource Library

Related Blog Posts

Receive Updates

Receive Updates

Join our mailing list to receive the latest news from StreamSets.

You have Successfully Subscribed!