Oracle CDC Client

The Oracle CDC Client processes change data capture (CDC) information provided by Oracle LogMiner redo logs. Use Oracle CDC Client to process data from Oracle version 12c.

When needed, you can use a separate pipeline with the JDBC Query Consumer or JDBC Multitable Consumer origin to read existing data before you start a pipeline with Oracle CDC Client.

Oracle CDC Client processes data based on the commit number, in ascending order.

To read the redo logs, Oracle CDC Client requires the LogMiner dictionary. The origin can use the dictionary in redo logs or in an online catalog. When using the dictionary in redo logs, the origin captures and adjusts to schema changes. The origin can also generate events when using redo log dictionaries.

The origin can create records for the INSERT, UPDATE, SELECT_FOR_UPDATE, and DELETE operations for one or more tables in a database. You can select the operations that you want to use. The origin also includes CDC and CRUD information in record header attributes so generated records can be easily processed by CRUD-enabled destinations. For an overview of Data Collector changed data processing and a list of CRUD-enabled destinations, see Processing Changed Data.

Note: To use Oracle CDC Client, you must enable LogMiner for the database that you want to use and complete the necessary prerequisite tasks. The origin uses JDBC to access the database.

When you configure Oracle CDC Client, you configure change data capture details, such as the schema and tables to read from, how to read the initial change, the dictionary source location, and the operations to include.

You also specify JDBC connection information and user credentials. If the schema was created in a pluggable database, state the pluggable database name. You can configure custom properties that the driver requires.

To use a JDBC version older than 4.0, you can specify the driver class name and define a health check query.

LogMiner Dictionary Source

LogMiner provides dictionaries to help process redo logs. LogMiner can store dictionaries in several locations.

The Oracle CDC Client can use the following dictionary source locations:
  • Online catalog - Use the online catalog when table structures are not expected to change.
  • Redo logs - Use redo logs when table structures are expected to change. When reading the dictionary from redo logs, the Oracle CDC Client origin determines when schema changes occur and refreshes the schema that it uses to create records. The origin can also generate events for each DDL it reads in the redo logs.
    Important: When using the dictionary in redo logs, make sure to extract the latest dictionary to the redo logs each time table structures change. For more information, see Task 4. Extract a Log Miner Dictionary (Redo Logs) .

Note that using the dictionary in redo logs can have significantly higher latency than using the dictionary in the online catalog. But using the online catalog does not allow for schema changes.

For more information about dictionary options and configuring LogMiner, see the Oracle LogMiner documentation.

Oracle CDC Client Prerequisites

Before using the Oracle CDC Client origin, complete the following tasks:
  1. Enable LogMiner.
  2. Enable supplemental logging for the database or tables.
  3. Create a user account with the required roles and privileges.
  4. To use the dictionary in redo logs, extract the Log Miner dictionary.
  5. Install the Oracle JDBC driver.

Task 1. Enable LogMiner

LogMiner provides redo logs that summarize database activity. The origin uses these logs to generate records.

LogMiner requires an open database in ARCHIVELOG mode with archiving enabled. To determine the status of the database and enable LogMiner, use the following steps:
  1. Log into the database as a user with DBA privileges.
  2. Check the database logging mode:
    select log_mode from v$database;

    If the command returns ARCHIVELOG, you can skip to Task 2.

    If the command returns NOARCHIVELOG, continue with the following steps:

  3. Shut down the database:
    shutdown immediate;
  4. Start up and mount the database:
    startup mount;
  5. Configure enable archiving and open the database:
    alter database archivelog;
    alter database open;

Task 2. Enable Supplemental Logging

To retrieve data from redo logs, LogMiner requires supplemental logging for the database or tables.

Enable at least primary key or "identification key" logging at a table level for each table that you want to use. With identification key logging, records include only the primary key and changed fields.

Due to an Oracle known issue, to enable supplemental logging for a table, you must first enable minimum supplemental logging for the database.

To include all fields in the records the origin generates, enable full supplemental logging at a table or database level. Full supplemental logging provides data from all columns, those with unchanged data as well as the primary key and changed columns. For details on the data included in records based on the supplemental logging type, see Generated Records.
  1. To verify if supplemental logging is enabled for the database, run the following command:
    SELECT supplemental_log_data_min, supplemental_log_data_pk, supplemental_log_data_all FROM v$database;

    If the command returns Yes or Implicit for all three columns, then supplemental logging is enabled with both identification key and full supplemental logging. You can skip to Task 3.

    If the command returns Yes or Implicit for the first two columns, then supplemental logging is enabled with identification key logging. If this is what you want, you can skip to Task 3.

  2. Enable identification key or full supplemental logging.
    For multitenant databases, best practice is to enable logging for the container for the tables, rather than the entire database. You can use the following command first to apply the changes to just the container:
    ALTER SESSION SET CONTAINER=<pdb>;
    You can enable identification key or full supplemental logging to retrieve data from redo logs. You do not need to enable both:
    To enable identification key logging
    You can enable identification key logging for individual tables or all tables in the database:
    • For individual tables

      Use the following commands to enable minimal supplemental logging for the database, and then enable identification key logging for each table that you want to use:

      ALTER DATABASE ADD SUPPLEMENTAL LOG DATA;
      ALTER TABLE <schema name>.<table name> ADD SUPPLEMENTAL LOG DATA (PRIMARY KEY) COLUMNS;
    • For all tables

      Use the following command to enable identification key logging for the entire database:

      ALTER DATABASE ADD SUPPLEMENTAL LOG DATA (PRIMARY KEY) COLUMNS;
    To enable full supplemental logging
    You can enable full supplemental logging for individual tables or all tables in the database:
    • For individual tables

      Use the following commands to enable minimal supplemental logging for the database, and then enable full supplemental logging for each table that you want to use:
      ALTER DATABASE ADD SUPPLEMENTAL LOG DATA;
      ALTER TABLE <schema name>.<table name> ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;
    • For all tables

      Use the following command to enable full supplemental logging for the entire database:
      ALTER DATABASE ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;
  3. To submit the changes:
    ALTER SYSTEM SWITCH LOGFILE;

Task 3. Create a User Account

Create a user account to use with the Oracle CDC Client origin. You need the account to access the database through JDBC.

You create accounts differently based on the Oracle version that you use:
Oracle 12c multitenant databases
For multitenant Oracle 12c databases, create a common user account. Common user accounts are created in cdb$root and must use the convention: c##<name>.
  1. Log into the database as a user with DBA privileges.
  2. Create the common user account:
    ALTER SESSION SET CONTAINER=cdb$root;
    CREATE USER <user name> IDENTIFIED BY <password> CONTAINER=all;
    GRANT create session, alter session, set container, select any dictionary, logmining, execute_catalog_role TO <username> CONTAINER=all;
    ALTER SESSION SET CONTAINER=<pdb>;
    GRANT select on <db>.<table> TO <user name>;

    Repeat the final command for each table that you want to use.

When you configure the origin, use this user account for the JDBC credentials. Use the entire user name, including the "c##", as the JDBC user name.

Oracle 12c standard databases
For standard Oracle 12c databases, create a user account with the necessary privileges:
  1. Log into the database as a user with DBA privileges.
  2. Create the user account:
    CREATE USER <user name> IDENTIFIED BY <password>;
    GRANT create session, alter session, select any dictionary, logmining, execute_catalog_role TO <username>;
    GRANT select on <db>.<table> TO <user name>;

    Repeat the final command for each table that you want to use.

When you configure the origin, use this user account for the JDBC credentials.

Task 4. Extract a Log Miner Dictionary (Redo Logs)

When using redo logs as the dictionary source, you must extract the Log Miner dictionary to the redo logs before you start the pipeline. Repeat this step periodically to ensure that the redo logs that contain the dictionary are still available.

Oracle recommends that you extract the dictionary only at off-peak hours since the extraction can consume database resources.

To extract the dictionary for Oracle 12c databases, run the following command:
EXECUTE DBMS_LOGMNR_D.BUILD(OPTIONS=> DBMS_LOGMNR_D.STORE_IN_REDO_LOGS);
To extract the dictionary for Oracle 12c multitenant databases, run the following commands:
ALTER SESSION SET CONTAINER=cdb$root;
EXECUTE DBMS_LOGMNR_D.BUILD(OPTIONS=> DBMS_LOGMNR_D.STORE_IN_REDO_LOGS);

Task 5. Install the Driver

The Oracle CDC Client origin connects to Oracle through JDBC. You cannot access the database until you install the required driver.

Install the Oracle JDBC driver for the Oracle database version that you use. For more information, see Install External Libraries.

Initial Change

The initial change is the point in the LogMiner redo logs where you want to start processing. When you start the pipeline, Oracle CDC Client starts processing from the specified initial change and continues until you stop the pipeline.

Note that Oracle CDC Client processes only change capture data. If you need existing data, you might use a JDBC Query Consumer or a JDBC Multitable Consumer in a separate pipeline to read table data before you start an Oracle CDC Client pipeline.

Oracle CDC Client provides several ways to configure the initial change:
From the latest change
The origin processes all changes that occur after you start the pipeline.
From a specified datetime
The origin processes all changes that occurred at the specified datetime and later. Use the following format: DD-MM-YYYY HH24:MI:SS.
From a specified system change number (SCN)
The origin processes all changes that occurred in the specified SCN and later. Typically, a database admin can provide the SCN to use.

Example

You want to process all existing data in the Orders table and then capture changed data, writing all data to Amazon S3. To read the existing data, you use a pipeline with the JDBC Query Consumer and Amazon S3 destination as follows:

Once all existing data is read, you stop the JDBC Query Consumer pipeline and start the following Oracle CDC Client pipeline. This pipeline is configured to pick up changes that occur after you start the pipeline, but if you wanted to prevent any chance of data loss, you could configure the initial change for an exact datetime or earlier SCN:

Generated Records

The origin generates records differently based on the Oplog operation type and the logging enabled for the database and tables.
Oplog Operation Identification/Primary Key Logging Only Full Supplemental Logging
INSERT All fields that contain data, ignoring fields with null values. All fields.
UPDATE Primary key field and fields with updated values. All fields.
SELECT_FOR_ UPDATE Primary key field and fields with updated values. All fields.
DELETE Primary key field. All fields.

CRUD Operation Header Attributes

When generating records, the Oracle CDC Client origin specifies the operation type in both of the following record header attributes:
sdc.operation.type
The Oracle CDC Client evaluates the Oplog operation type associated with each entry that it processes and, when appropriate, it writes the operation type to the sdc.operation.type record header attribute.
The origin uses the following values in the sdc.operation.type record header attribute to represent the operation type:
  • 1 for INSERT
  • 2 for DELETE
  • 3 for UPDATE and SELECT_FOR_UPDATE

If you use a CRUD-enabled destination in the pipeline such as JDBC Producer or Kudu, the destination can use the operation type when writing to destination systems. When necessary, you can use an Expression Evaluator or scripting processors to manipulate the value in the sdc.operation.type header attribute. For an overview of Data Collector changed data processing and a list of CRUD-enabled destinations, see Processing Changed Data.

When using CRUD-enabled destinations, the destination looks for the operation type in the sdc.operation.type attribute before checking the oracle.cdc.operation attribute.
oracle.cdc.operation
The Oracle CDC Client also writes the Oplog CRUD operation type to the oracle.cdc.operation record header attribute. This attribute was implemented in an earlier release and is supported for backward compatibility.
The origin writes the Oplog operation type to the oracle.cdc.operation attribute as the following strings:
  • INSERT
  • UPDATE
  • SELECT_FOR_ UPDATE
  • DELETE
CRUD-enabled destinations check for this attribute for the operation type after checking the sdc.operation.type attribute.

CDC Header Attributes

The Oracle CDC Client origin provides the following CDC record header attributes for each record:
  • oracle.cdc.user
  • oracle.cdc.operation
  • oracle.cdc.timestamp
  • oracle.cdc.table
  • oracle.cdc.scn
And it includes the following record header attributes for each decimal field in the record:
  • oracle.cdc.<fieldname>.precision
  • oracle.cdc.<fieldname>.scale

Event Generation

The Oracle CDC Client origin can generate events that you can use in an event stream when the origin uses redo logs as the dictionary source. The origin does not generate events when using the online catalog as the dictionary source.

When you use redo logs as the dictionary source and enable event generation, the Oracle CDC Client generates events when it reads DDL statements. It generates events for ALTER, CREATE, DROP, and TRUNCATE statements.

When you start the pipeline, the origin queries the database and caches the schemas for all tables listed in the origin, and then generates an initial event record for each table. Each event record describes the current schema for each table. The origin uses the cached schemas to generate records when processing data-related redo log entries.

The origin then generates an event record for each DDL statement it encounters in the redo logs. Each event record includes the DDL statement and the related table in record header attributes.

The origin includes table schema information in the event record for new and updated tables. When the origin encounters an ALTER or CREATE statement, it queries the database for the latest schema for the table.

If the ALTER statement is an update to a cached schema, the origin updates the cache and includes the updated table schema in the event record. If the ALTER statement is older than the cached schema, the origin does not include the table schema in the event record. Similarly, if the CREATE statement is for a "new" table, the origin caches the new table and includes the table schema in the event record. Because the origin verifies that all specified tables exist when the pipeline starts, this can occur only when the table is dropped and created after the pipeline starts. If the CREATE statement is for a table that is already cached, the origin does not include the table schema in the event record.

Oracle CDC Client events can be used in any logical way. For example:

For more information about dataflow triggers and the event framework, see Dataflow Triggers Overview.

Event Records

Event records generated by the Oracle CDC Client origin have the following event-related record header attributes. Record header attributes are stored as strings:
Record Header Attribute Description
sdc.event.type Event type. Uses one of the following types:
  • STARTUP
  • ALTER
  • CREATE
  • DROP
  • TRUNCATE
sdc.event.version An integer that indicates the version of the event record type.
sdc.event.creation_timestamp Epoch timestamp when the stage created the event.
oracle.cdc.table Name of the Oracle database table that changed.
oracle.cdc.ddl The DDL statement that triggered the event.
STARTUP event records are generated when the pipeline starts. The origin creates one event record for each table. The event record includes the current table schema.

DROP and TRUNCATE event records include just the record header attributes listed above.

CREATE event records include the schema for the new table when the table has been dropped and recreated. ALTER event records include the table schema when the statement updates a cached schema. For more information about the behavior for CREATE and ALTER statements, see Event Generation.

For example, the following ALTER event record displays the three fields in an updated schema - NICK, ID, and NAME:

In the list of record header attributes, notice the DDL statement that added the NICK field, the name of the updated table, and the ALTER event type.

Working with the Drift Synchronization Solution

If you use the Oracle CDC Client origin as part of a Drift Synchronization Solution for Hive pipeline, make sure to pass only records flagged for Insert to the Hive Metadata processor.

The Drift Synchronization Solution for Hive enables automatic updates of Hive tables based on incoming data. The Hive Metadata processor, the first stage in the solution, expects Insert records only. Below are some recommended methods to ensure that the processor receives only Insert records:
  • Configure the Oracle CDC Client to process only Insert records.
  • If you want to process additional record types in the pipeline, use a Stream Selector to route only Insert records to the Hive Metadata processor.

Data Preview with Oracle CDC Client

When using data preview with the Oracle CDC Client origin, you might need to increase the Preview Timeout data preview property.

By default, data preview waits 10,000 milliseconds, 10 seconds, to establish a connection to the origin system before timing out. Due to the complex nature of this origin, the initial startup can take longer than the default.

If data preview times out, try increasing the timeout property to 120,000 milliseconds. If preview continues to time out, increase the timeout incrementally.

Configuring an Oracle CDC Client

Configure an Oracle CDC Client origin to process LogMiner change data capture information from an Oracle database.

Before you use the origin, complete the prerequisite tasks. For more information, see Oracle CDC Client Prerequisites.

  1. In the Properties panel, on the General tab, configure the following properties:
    General Property Description
    Name Stage name.
    Description Optional description.
    Produce Events When the origin uses redo logs as the dictionary source, can generate event records when the origin reads DDL statements. Use for event handling.
    On Record Error Error record handling for the stage:
    • Discard - Discards the record.
    • Send to Error - Sends the record to the pipeline for error handling.
    • Stop Pipeline - Stops the pipeline.
  2. On the Oracle CDC tab, configure the following change data capture properties:
    Change Data Capture Property Description
    Schema Name Name of the schema to use.
    Tables Tables to use.
    Case-Sensitive Names Enables case-sensitive schema, table, and column names. Use when the schema, tables, or columns were created with quotation marks surrounding the names.

    By default, the origin uses all caps for the names.

    Initial Change The starting point for the read. Use one of the following options:
    • From Latest Change - Processes changes that arrive after you start the pipeline.
    • From Date - Processes changes starting from the specified date.
    • From SCN - Processes changes starting from the specified system change number (SCN).
    Start Date The datetime to read from when you start the pipeline. For a date-based initial change.

    Use the following format: DD-MM-YYYY HH24:MI:SS.

    Start SCN The system change number to read from when you start the pipeline. For an SCN-based initial change.
    Operations Operations to include when creating records. All unlisted operations are ignored.
    Dictionary Source Location of the LogMiner dictionary:
    • Redo logs - Use when schemas can change. Allows the origin to adapt to schema changes and to generate events for DDL statements.
    • Online catalog - Use for better performance when schemas are not expected to change.
  3. On the JDBC tab, configure the following JDBC properties:
    JDBC Property Description
    JDBC Connection String
    Connection string to use to connect to the database.
    Note: If you include the JDBC credentials in the connection string, use the user account created for the origin. Common user accounts for Oracle 12c multitenant databases start with "c##". For more information, see Task 3. Create a User Account.
    Max Batch Size (records) Maximum number of records processed at one time. Honors values up to the Data Collector maximum batch size.

    Default is 1000. The Data Collector default is 1000.

    PDB Name of the pluggable database that contains the schema you want to use. Use only when the schema was created in a pluggable database.

    Required for schemas created in pluggable databases.

    Use Credentials Enables entering credentials on the Credentials tab. Use when you do not include credentials in the JDBC connection string.
    Additional JDBC Configuration Properties Additional JDBC configuration properties to use. To add properties, click Add and define the JDBC property name and value.

    Use the property names and values as expected by JDBC.

  4. To enter JDBC credentials separately from the JDBC connection string, on the Credentials tab, configure the following properties:
    Credentials Property Description
    Username

    User name for the JDBC connection. Use the user account created for the origin. Common user accounts for Oracle 12c multitenant databases start with "c##".

    For more information, see Task 3. Create a User Account.

    Password Password for the account.
    Tip: To secure sensitive information such as usernames and passwords, you can use runtime resources or Hashicorp Vault secrets. For more information, see Using Runtime Resources or Accessing Hashicorp Vault Secrets.
  5. When using JDBC versions older than 4.0, on the Legacy Drivers tab, optionally configure the following properties:
    Legacy Driver Property Description
    JDBC Class Driver Name Class name for the JDBC driver. Required for JDBC versions older than version 4.0.
    Connection Health Test Query Optional query to test the health of a connection. Recommended only when the JDBC version is older than 4.0.
  6. On the Advanced tab, optionally configure advanced properties.
    The defaults for these properties should work in most cases:
    Advanced Property Description
    Maximum Pool Size The maximum number of connections to create.

    Default is 1. The recommended value is 1.

    Minimum Idle Connections The minimum number of connections to create and maintain. To define a fixed connection pool, set to the same value as Maximum Pool Size.

    Default is 1.

    Connection Timeout Maximum time to wait for a connection. Use a time constant in an expression to define the time increment.
    Default is 30 seconds, defined as follows:
    ${30 * SECONDS}
    Idle Timeout Maximum time to allow a connection to idle. Use a time constant in an expression to define the time increment.

    Use 0 to avoid removing any idle connections.

    Default is 30 minutes, defined as follows:
    ${30 * MINUTES}
    Max Connection Lifetime Maximum lifetime for a connection. Use a time constant in an expression to define the time increment.

    Use 0 to avoid removing any idle connections.

    Default is 30 seconds, defined as follows:
    ${30 * SECONDS}
    Enforce Read-only Connection Creates read-only connections to avoid any type of write.

    Default is enabled. Disabling this property is not recommended.

    Transaction Isolation Transaction isolation level used to connect to the database.

    Default is the default transaction isolation level set for the database. You can override the database default by setting the level to any of the following:

    • Read committed
    • Read uncommitted
    • Repeatable read
    • Serializable