Oracle CDC Client
Supported pipeline types:
|
- Oracle 11g, 12c, 18c, and 19c
- Oracle Real Application Clusters (RAC) 12c, 18c, and 19c
The origin does not support other engineered systems built on top of Oracle databases, such as Oracle Exadata.
You might use this origin to perform database replication. You can use a separate pipeline with the JDBC Query Consumer or JDBC Multitable Consumer origin to read existing data. Then start a pipeline with the Oracle CDC Client origin to process subsequent changes.
Oracle CDC Client processes data based on the commit number, in ascending order.
The origin includes CDC and CRUD information in record header attributes so generated records can be easily processed by CRUD-enabled destinations. For an overview of Data Collector changed data processing and a list of CRUD-enabled destinations, see Processing Changed Data.
When you configure Oracle CDC Client, you configure change data capture details, such as the schema and tables to read from, how to read the initial change, the dictionary source location, and the operations to include. You also specify the transaction window and LogMiner session windows to use.
You can configure the origin to buffer records locally or to use database buffers. Before using local buffers, verify that the required resources are available and specify the action to take for uncommitted transactions.
You can specify the behavior when the origin encounters an unsupported data type, and you can configure the origin to pass null values when it receives them from supplemental logging data. When the source database has high-precision timestamps, you can configure the origin to write string values rather than datetime values to maintain the precision.
You also specify JDBC connection information and user credentials. If the schema was created in a pluggable database, state the pluggable database name. You can configure custom properties that the driver requires.
You can configure advanced connection properties. To use a JDBC version older than 4.0, you specify the driver class name and define a health check query.
The origin can generate events for an event stream. For more information about dataflow triggers and the event framework, see Dataflow Triggers Overview.
When concerned about performance, as when processing very wide tables, you can consider several alternatives to the default origin behavior. You can use the alternate PEG parser or use multiple threads for parsing. Or, you can configure the origin to not parse SQL query statements so you can pass the queries to the SQL Parser processor to be parsed.
For a list of supported operations and a description of generated records, see Parse SQL, Supported Operations, and Generated Records.
LogMiner Dictionary Source
LogMiner provides dictionaries to help process redo logs. LogMiner can store dictionaries in several locations.
- Online catalog - Use the online catalog when table structures are not expected to change.
- Redo logs - Use redo logs when table structures are expected to change. When
reading the dictionary from redo logs, the Oracle CDC Client origin determines
when schema changes occur and refreshes the schema that it uses to create
records. The origin can also generate events for each DDL it reads in the redo
logs. Important: When using the dictionary in redo logs, make sure to extract the latest dictionary to the redo logs each time table structures change. For more information, see Task 4. Extract a Log Miner Dictionary (Redo Logs).
Note that using the dictionary in redo logs can have significantly higher latency than using the dictionary in the online catalog. But using the online catalog does not allow for schema changes.
For more information about dictionary options and configuring LogMiner, see the Oracle LogMiner documentation.
Oracle CDC Client Prerequisites
- Enable LogMiner.
- Enable supplemental logging for the database or tables.
- Create a user account with the required roles and privileges.
- To use the dictionary in redo logs, extract the Log Miner dictionary.
- Install the Oracle JDBC driver.
Task 1. Enable LogMiner
LogMiner provides redo logs that summarize database activity. The origin uses these logs to generate records.
LogMiner requires that the database be open, writable, and in ARCHIVELOG mode with archiving enabled.
- Oracle databases
- Use the following steps to determine the status of the database and enable
LogMiner:
- In a SQL shell, log into the database as a user with DBA privileges and run the following commands.
- Check the database logging
mode:
select log_mode from v$database;
If the command returns ARCHIVELOG, you can skip to Task 2.
If the command returns NOARCHIVELOG, continue with the following steps:
- Shut down the
database:
shutdown immediate;
- Start up and mount the
database:
startup mount;
- Enable archiving, open the database, and make it
writable:
alter database archivelog; alter database open read write;
- Oracle RAC databases
- Use the following steps to enable LogMiner and verify the status of the
database:
- In a Terminal session on one of the database nodes, stop and mount
the
database:
srvctl stop database -d <database name> srvctl start database -d <database name> -o mount
- In a SQL shell, log into the database as a user with DBA privileges
and configure the
database:
alter database archivelog;
- In the Terminal session, restart the
database:
srvctl stop database -d <database name> srvctl start database -d <database name>
- In the SQL shell, check if the database is in logging mode:
select log_mode from v$database;
- In a Terminal session on one of the database nodes, stop and mount
the
database:
Task 2. Enable Supplemental Logging
To retrieve data from redo logs, LogMiner requires supplemental logging for the database or tables.
Enable at least primary key or "identification key" logging at a table level for each table that you want to use. With identification key logging, records include only the primary key and changed fields.
Due to an Oracle known issue, to enable supplemental logging for a table, you must first enable minimum supplemental logging for the database.
- To verify if supplemental logging is enabled for the database, run the following
command in a SQL
shell:
SELECT supplemental_log_data_min, supplemental_log_data_pk, supplemental_log_data_all FROM v$database;
If the command returns
Yes
orImplicit
for all three columns, then supplemental logging is enabled with both identification key and full supplemental logging. You can skip to Task 3.If the command returns Yes or Implicit for the first two columns, then supplemental logging is enabled with identification key logging. If this is what you want, you can skip to Task 3.
- Enable identification key or full supplemental logging. For Oracle and Oracle RAC 12c, 18c, or 19c multitenant databases, best practice is to enable logging for the container for the tables, rather than the entire database. You can use the following command in a SQL shell to first apply the changes to just the container:You can enable identification key or full supplemental logging to retrieve data from redo logs. You do not need to enable both:
ALTER SESSION SET CONTAINER=<pdb>;
- To enable identification key logging
- You can enable identification key logging for individual tables or all tables in the database:
- To enable full supplemental logging
- You can enable full supplemental logging for individual tables or all tables in the database:
- To submit the
changes:
ALTER SYSTEM ARCHIVE LOG CURRENT;
Task 3. Create a User Account
Create a user account to use with the Oracle CDC Client origin. You need the account to access the database through JDBC.
- 12c, 18c, or 19c multitenant databases
- For multitenant Oracle and Oracle RAC 12c, 18c, and 19c databases, create a
common user account. Common user accounts are created in
cdb$root
and must use the convention:c##<name>
. - 12c, 18c, or 19c standard databases
- For standard Oracle and Oracle RAC 12c, 18c, and 19c databases, create a
user account with the necessary privileges:
- In a SQL shell, log into the database as a user with DBA privileges.
- Create the user account.
Use the following set of commands for 12c and 18c:
CREATE USER <user name> IDENTIFIED BY <password>; GRANT create session, alter session, logmining, execute_catalog_role TO <user name>; GRANT select on GV_$DATABASE to <user name>; GRANT select on V_$LOGMNR_CONTENTS to <user name>; GRANT select on V_$ARCHIVED_LOG to <user name>; GRANT select on <db>.<table> TO <user name>;
Repeat the final command for each table that you want to use.
Use the following set of commands for 19c:CREATE USER <user name> IDENTIFIED BY <password>; GRANT create session, alter session, logmining, execute_catalog_role TO <user name>; GRANT select on GV_$DATABASE to <user name>; GRANT select on V_$LOGMNR_CONTENTS to <user name>; GRANT select on V_$ARCHIVED_LOG to <user name>; GRANT select on V_$LOG to <user name>; GRANT select on V_$LOGFILE to <user name>; GRANT select on V_$LOGMNR_LOGS to <user name>; GRANT select on <db>.<table> TO <user name>;
Repeat the final command for each table that you want to use.
- 11g databases
- For Oracle 11g databases, create a user account with the necessary
privileges:
- In a SQL shell, log into the database as a user with DBA privileges.
- Create the user account using the following
commands:
CREATE USER <user name> IDENTIFIED BY <password>; GRANT create session, alter session, execute_catalog_role, select any transaction, select any table to <user name>; GRANT select on GV_$DATABASE to <user name>; GRANT select on V_$ARCHIVED_LOG to <user name>; GRANT select on V_$LOGMNR_CONTENTS to <user name>; GRANT select on <db>.<table> TO <user name>;
Repeat the final command for each table that you want to use.
Task 4. Extract a Log Miner Dictionary (Redo Logs)
When using redo logs as the dictionary source, you must extract the Log Miner dictionary to the redo logs before you start the pipeline. Repeat this step periodically to ensure that the redo logs that contain the dictionary are still available.
Oracle recommends that you extract the dictionary only at off-peak hours since the extraction can consume database resources.
EXECUTE DBMS_LOGMNR_D.BUILD(OPTIONS=> DBMS_LOGMNR_D.STORE_IN_REDO_LOGS);
ALTER SESSION SET CONTAINER=cdb$root;
EXECUTE DBMS_LOGMNR_D.BUILD(OPTIONS=> DBMS_LOGMNR_D.STORE_IN_REDO_LOGS);
Task 5. Install the Driver
The Oracle CDC Client origin connects to Oracle through JDBC. You cannot access the database until you install the required driver.
Install the Oracle JDBC driver for the Oracle database version that you use. For information about installing additional drivers, see Install External Libraries in the Data Collector documentation.
Table Configuration
When you configure the Oracle CDC Client origin, you specify the tables with the change capture data that you want to process.
A table configuration defines a group of tables with the same table name pattern, that are from one or more schemas with the same name pattern.
You can define one or more table configurations.
Schema, Table Name, and Exclusion Patterns
To specify the tables, you define the schema, a table name pattern, and an optional table exclusion pattern.
When defining the schema and table name pattern, you can use SQL LIKE syntax to define a set of tables within a schema or across multiple schemas. For more information about valid patterns for the SQL LIKE syntax, see the Microsoft documentation.
When needed, you can also use a regular expression as an exclusion pattern to exclude a subset of tables from the larger set.
SALES
while excluding those that end with a dash
(-) and single-character suffix. You can use the following configuration to specify the
tables to process: - Schema:
sales
- Table Name Pattern:
SALES%
- Exclusion Pattern:
SALES.*-.
Initial Change
The initial change is the point in the LogMiner redo logs where you want to start processing. When you start the pipeline for the first time, the origin starts processing from the specified initial change. The origin only uses the specified initial change again when you reset the origin.
Note that Oracle CDC Client processes only change capture data. If you need existing data, you might use a JDBC Query Consumer or a JDBC Multitable Consumer in a separate pipeline to read table data before you start an Oracle CDC Client pipeline.
- From the latest change
- The origin processes all changes that occur after you start the pipeline.
- From a specified datetime
- The origin processes all changes that occurred at the specified datetime and
later. Use the following format:
DD-MM-YYYY HH24:MI:SS
. - From a specified system change number (SCN)
- The origin processes all changes that occurred in the specified SCN and later. When using the specified SCN, the origin starts processing with the timestamp associated with the SCN. If the SCN cannot be found in the redo logs, the origin continues reading from the next higher SCN that is available in the redo logs.
Example
You want to process all existing data in the Orders table and then capture changed data, writing all data to Amazon S3. To read the existing data, you use a pipeline with the JDBC Query Consumer and Amazon S3 destination as follows:
Once all existing data is read, you stop the JDBC Query Consumer pipeline and start the following Oracle CDC Client pipeline. This pipeline is configured to pick up changes that occur after you start the pipeline, but if you wanted to prevent any chance of data loss, you could configure the initial change for an exact datetime or earlier SCN:
Choosing Buffers
When processing data, the Oracle CDC Client can buffer data locally on the Data Collector machine or use Oracle LogMiner buffers:
- Local buffers
-
When using local buffers, the origin requests the transactions for the relevant tables and time period. The origin buffers the resulting LogMiner redo SQL statements until it verifies a commit for a transaction. After seeing a commit, it parses and processes the committed data.
The origin can buffer the redo SQL statements completely in memory or write them primarily to disk while using a small amount of memory for tracking.
By default, the origin uses local buffers. In general, using local buffers should provide better performance than Oracle Log Miner buffers.
Use local buffers to process large transactions or to avoid monopolizing the Oracle PGA. Buffer information in memory for better performance when Data Collector resources allow. Buffer information to disk to avoid monopolizing Data Collector resources.
- Oracle LogMiner buffers
- When using Oracle LogMiner buffers, the origin requests data from Oracle LogMiner for a particular time period. LogMiner then buffers all transactions for that time period for all tables in the database, rather than only the tables needed by the origin.
Local Buffer Resource Requirements
Before using local buffers, you should verify that the allocated resources are sufficient for the needs of the pipeline.
- In memory
- When buffering in memory, the origin buffers the LogMiner redo SQL statements returned by Oracle. It processes the data after receiving a commit for the statement.
- To disk
- When buffering to disk, the origin stores only the statement ID for each SQL query in memory. Then it saves the queries to disk.
For information about configuring the Data Collector heap size, see Java Heap Size in the Data Collector documentation.
Uncommitted Transaction Handling
You can configure how the Oracle CDC Client origin handles old uncommitted transactions when the origin uses local buffers.
When using local buffers, the Oracle CDC Client origin clears the buffers of old uncommitted transactions after each LogMiner session. Old uncommitted transactions are those that are older than the time specified for the Maximum Transaction Length property.
By default, the origin generates error records for the old uncommitted transactions. The origin converts each LogMiner redo SQL statement to a record and passes the record to the stage for error handling.
If you don't need error records for old uncommitted transactions, you can configure the origin to discard uncommitted transactions. This also reduces the overhead used to generate the error records.
For example, say the Maximum Transaction Length and the LogMiner Session Window properties are both set to one hour, and a transaction arrives in the last ten minutes of a LogMiner session. If the transaction remains uncommitted when the LogMiner session closes and the next session begins, the Oracle CDC origin retains the transaction for the next session. This is because the transaction is only ten minutes old and the maximum transaction length is an hour.
Now say the transaction is still uncommitted an hour later, when the next LogMiner session closes. Then, the transaction is considered an old uncommitted transaction because it is older than the maximum transaction length. By default, the origin generates an error record for the transaction. However, if you configure the origin to discard old uncommitted transactions, the origin simply discards the transaction instead.
To discard old uncommitted transactions, select the Discard Old Uncommitted Transactions property. This property is only available when you buffer changes locally.
Include Nulls
When the Oracle LogMiner performs full supplemental logging, the resulting data includes all columns in the table with null values where no changes occurred. When the Oracle CDC Client processes this data, by default, it ignores null values when generating records.
You can configure the origin to include the null values in the record. You might need to include the null values when the destination system has required fields. To include null values, enable the Include Nulls property on the Oracle CDC tab.
Unsupported Data Types
You can configure how the origin handles records that contain unsupported data types. The origin can perform the following actions:
- Pass the record to the pipeline without the unsupported data types.
- Pass the record to error without the unsupported data types.
- Discard the record.
You can configure the origin to include the unsupported data types in the record. When you include unsupported types, the origin includes the field names and passes the data as unparsed strings, when possible.
- Array
- Blob
- Clob
- Datalink
- Distinct
- Interval
- Java Object
- Nclob
- Other
- Ref
- Ref Cursor
- SQLXML
- Struct
- Time with Timezone
Conditional Data Type Support
- The Oracle Raw data type is treated as a Data Collector Byte Array.
- The Oracle Timestamp with Timezone data type is converted to the Data Collector Zoned Datetime data type. To maximize efficiency while providing more precision, the origin includes only the UTC offset with the data. It omits the time zone ID.
Parse SQL, Supported Operations, and Generated Records
- Parsing SQL Queries
- When the Oracle CDC Client origin parses SQL queries, it can create records
for the following operations:
- INSERT
- DELETE
- UPDATE
- Not parsing SQL queries
-
When the Oracle CDC Client origin does not parse the SQL query, it writes each LogMiner SQL statement to a field named
sql
. The origin also generates field attributes that provide additional information about field.You might write LogMiner SQL statements to generated records to pass the queries to the SQL Parser processor to be parsed or to archive that information.
When not parsing the SQL query, the origin can create records for the following operations:- INSERT
- DELETE
- UPDATE
- SELECT_FOR_UPDATE
CRUD Operation Header Attributes
- sdc.operation.type
- The Oracle CDC Client evaluates the Oplog operation type associated with
each entry that it processes and, when appropriate, it writes the operation
type to the
sdc.operation.type
record header attribute. - oracle.cdc.operation
- The Oracle CDC Client also writes the Oplog CRUD operation type to the
oracle.cdc.operation
record header attribute. This attribute was implemented in an earlier release and is supported for backward compatibility.
CDC Header Attributes
- oracle.cdc.operation
- oracle.cdc.query
- oracle.cdc.rowId
- oracle.cdc.scn
- oracle.cdc.timestamp
- oracle.cdc.table
- oracle.cdc.user
- jdbc.<fieldname>.precision
- jdbc.<fieldname>.scale
You can use the record:attribute
or
record:attributeOrDefault
functions to access the information
in the attributes. For more information about working with record header attributes,
see Working with Header Attributes.
Field Attributes
The Oracle CDC Client origin generates field attributes for columns converted to the Decimal or Datetime data types in Data Collector. The attributes provide additional information about each field.
- The Oracle Number data type is converted to the Data Collector Decimal data type, which does not store scale and precision.
- The Oracle Timestamp data type is converted to the Data Collector Datetime data type, which does not store nanoseconds.
Data Collector Data Type | Generated Field Attribute | Description |
---|---|---|
Decimal | precision | Provides the original precision for every number column. |
Decimal | scale | Provides the original scale for every number column. |
Datetime | nanoSeconds | Provides the original nanoseconds for every timestamp column. |
You can use the record:fieldAttribute
or
record:fieldAttributeOrDefault
functions to access the information
in the attributes. For more information about working with field attributes, see Field Attributes.
Event Generation
The Oracle CDC Client origin can generate events that you can use in an event stream when the origin uses redo logs as the dictionary source. The origin does not generate events when using the online catalog as the dictionary source.
When you use redo logs as the dictionary source and enable event generation, the Oracle CDC Client generates events when it reads DDL statements. It generates events for ALTER, CREATE, DROP, and TRUNCATE statements.
When you start the pipeline, the origin queries the database and caches the schemas for all tables listed in the origin, and then generates an initial event record for each table. Each event record describes the current schema for each table. The origin uses the cached schemas to generate records when processing data-related redo log entries.
The origin then generates an event record for each DDL statement it encounters in the redo logs. Each event record includes the DDL statement and the related table in record header attributes.
The origin includes table schema information in the event record for new and updated tables. When the origin encounters an ALTER or CREATE statement, it queries the database for the latest schema for the table.
If the ALTER statement is an update to a cached schema, the origin updates the cache and includes the updated table schema in the event record. If the ALTER statement is older than the cached schema, the origin does not include the table schema in the event record. Similarly, if the CREATE statement is for a "new" table, the origin caches the new table and includes the table schema in the event record. Because the origin verifies that all specified tables exist when the pipeline starts, this can occur only when the table is dropped and created after the pipeline starts. If the CREATE statement is for a table that is already cached, the origin does not include the table schema in the event record.
- With the Email executor to send a custom email
after receiving an event.
For an example, see Sending Email During Pipeline Processing.
- With a destination to store event information.
For an example, see Preserving an Audit Trail of Events.
For more information about dataflow triggers and the event framework, see Dataflow Triggers Overview.
Event Records
Record Header Attribute | Description |
---|---|
sdc.event.type | Event type. Uses one of the following types:
|
sdc.event.version | Integer that indicates the version of the event record type. |
sdc.event.creation_timestamp | Epoch timestamp when the stage created the event. |
oracle.cdc.table | Name of the Oracle database table that changed. |
oracle.cdc.ddl | The DDL statement that triggered the event. |
DROP and TRUNCATE event records include just the record header attributes listed above.
CREATE event records include the schema for the new table when the table has been dropped and recreated. ALTER event records include the table schema when the statement updates a cached schema. For more information about the behavior for CREATE and ALTER statements, see Event Generation.
For example, the following ALTER event record displays the three fields in an updated schema - NICK, ID, and NAME:
In the list of record header attributes, notice the DDL statement that added the NICK field, the name of the updated table, and the ALTER event type.
PEG Parser (Beta)
The Oracle CDC Client origin provides an alternate PEG parser that you can try when concerned about pipeline performance.
The Oracle CDC Client origin provides a PEG parser as an alternate to the default parser. You can try the PEG parser when you feel that performance with the default parser is inadequate, as when processing very wide tables. Note that the PEG parser is in beta at this time and should be carefully tested before use in production.
For more information about PEG processing, see https://en.wikipedia.org/wiki/Parsing_expression_grammar.
To use the PEG parser, enable the Parse SQL Query property on the Oracle CDC tab, and then enable the Use PEG Parser property on the Advanced tab.
Multithreaded Parsing
When you configure the origin to use local buffering and to parse the SQL query, you can configure the Oracle CDC Client origin to use multiple threads to parse transactions. You can use multithreaded parsing with both the default Oracle CDC Client parser and the alternate PEG parser.
When performing multithreaded parsing, the origin uses multiple threads to generate records from committed SQL statements in a transaction. It does not perform multithreaded processing of the resulting records.
The Oracle CDC Client origin uses multiple threads for parsing based on the Parse Thread Pool Size property. When you start the pipeline, the origin creates the number of threads specified in the property. The origin connects to Oracle, creates a LogMiner session, and processes a single transaction at a time.
When the origin processes a transaction, it reads and buffers all SQL statements in the transaction to an in-memory queue and waits for statements to be committed before processing them. Once committed, the SQL statements are parsed using all available threads and the original order of the SQL statements is retained.
The resulting records are passed to the rest of the pipeline. Note that enabling multithreaded parsing does not enable multithreaded processing – the pipeline uses a single thread for reading data.
Working with the SQL Parser Processor
For very wide tables, those with hundreds of columns, reading the redo logs and parsing the SQL query using the Oracle CDC Client origin may take longer than expected and can cause the Oracle redo logs to be rotated out before they are read. When this happens, data is lost.
To avoid this, you can use multiple pipelines and the SQL Parser processor. The first pipeline contains the Oracle CDC Client and an intermediate endpoint. Configure the origin to not parse the SQL query. The second pipeline passes records from the intermediate endpoint to the SQL Parser processor to both parse the SQL query and to update the fields. Using this approach, the origin can read the redo logs without waiting for the SQL Parser to finish and therefore no data is lost.
- Reads the change data logs.
- Generates records that contain only the SQL query.
- Generates events.
- Parses the SQL query.
- Generates CDC and CRUD record header attributes.
For more information about the SQL Parser processor, see SQL Parser.
Working with the Drift Synchronization Solution
If you use the Oracle CDC Client origin as part of a Drift Synchronization Solution for Hive pipeline, make sure to pass only records flagged for Insert to the Hive Metadata processor.
- Configure the Oracle CDC Client to process only Insert records.
- If you want to process additional record types in the pipeline, use a Stream Selector to route only Insert records to the Hive Metadata processor.
Data Preview with Oracle CDC Client
When using data preview with the Oracle CDC Client origin, you might need to increase the Preview Timeout data preview property.
By default, data preview waits 10,000 milliseconds, 10 seconds, to establish a connection to the origin system before timing out. Due to the complex nature of this origin, the initial startup can take longer than the default.
If data preview times out, try increasing the timeout property to 120,000 milliseconds. If preview continues to time out, increase the timeout incrementally.
Configuring an Oracle CDC Client
Configure an Oracle CDC Client origin to process LogMiner change data capture information from an Oracle database.
Before you use the origin, complete the prerequisite tasks. For more information, see Oracle CDC Client Prerequisites.