Oracle CDC Client

The Oracle CDC Client processes change data capture (CDC) information provided by Oracle LogMiner redo logs. Use the origin to process data from Oracle 11g or 12c.

You might use this origin to perform database replication. You can use a separate pipeline with the Oracle Query Consumer or Oracle Multitable Consumer origin to read existing data. Then start a pipeline with the Oracle CDC Client origin to process subsequent changes.

The Oracle CDC Client origin processes data based on the commit number, in ascending order.

To read the redo logs, the Oracle CDC Client origin requires the LogMiner dictionary. The origin can use the dictionary in redo logs or in an online catalog. When using the dictionary in redo logs, the origin captures and adjusts to schema changes. The origin can also generate events when using redo log dictionaries.

The origin can create records for the INSERT, UPDATE, SELECT_FOR_UPDATE, and DELETE operations for one or more tables in a database. You can select the operations that you want to use. The origin also includes CDC and CRUD information in record header attributes so generated records can be easily processed by CRUD-enabled destinations. For an overview of changed data processing and a list of CRUD-enabled destinations, see Processing Changed Data.

Note: To use the Oracle CDC Client origin, you must enable LogMiner for the database that you want to use and complete the necessary prerequisite tasks. The origin uses JDBC to access the database.

The origin can generate events for an event stream.

When you configure the origin, you configure change data capture details, such as the schema and tables to read from, how to read the initial change, the dictionary source location, and the operations to include. You also specify the transaction window and LogMiner session windows to use.

You specify JDBC connection information and user credentials. The origin requires a secure connection to the database server using an SSH tunnel or SSL/TLS encryption.

Note: The origin includes advanced options with default values that should work in most cases. By default, the origin hides the advanced options. To configure the advanced options, select Show Advanced Options at the top of the properties pane.

LogMiner Dictionary Source

LogMiner provides dictionaries to help process redo logs. LogMiner can store dictionaries in several locations.

The Oracle CDC Client origin can use the following dictionary source locations:
  • Online catalog - Use the online catalog when table structures are not expected to change.
  • Redo logs - Use redo logs when table structures are expected to change. When reading the dictionary from redo logs, the Oracle CDC Client origin determines when schema changes occur and refreshes the schema that it uses to create records. The origin can also generate events for each DDL it reads in the redo logs.
    Important: When using the dictionary in redo logs, make sure to extract the latest dictionary to the redo logs each time table structures change. For more information, see Task 5. Extract a Log Miner Dictionary (Redo Logs).

Note that using the dictionary in redo logs can have significantly higher latency than using the dictionary in the online catalog. But using the online catalog does not allow for schema changes.

For more information about dictionary options and configuring LogMiner, see the Oracle LogMiner documentation.

Prerequisites

Before using the Oracle CDC Client origin, complete the following tasks:
  1. Set up the database to accept incoming connections from StreamSets Cloud.
  2. Enable LogMiner.
  3. Enable supplemental logging for the database or tables.
  4. Create a user account with the required roles and privileges.
  5. To use the dictionary in redo logs, extract the Log Miner dictionary.

Task 1. Accept Connections

Before you can read from Oracle LogMiner redo logs, you must set up the database server or an intermediary SSH server to accept incoming connections from StreamSets Cloud.

For more information, see Database Connections.

Task 2. Enable LogMiner

LogMiner provides redo logs that summarize database activity. The origin uses these logs to generate records.

LogMiner requires an open database in ARCHIVELOG mode with archiving enabled. To determine the status of the database and enable LogMiner, use the following steps:
  1. Log into the database as a user with DBA privileges.
  2. Check the database logging mode:
    select log_mode from v$database;

    If the command returns ARCHIVELOG, you can skip to Task 2.

    If the command returns NOARCHIVELOG, continue with the following steps:

  3. Shut down the database:
    shutdown immediate;
  4. Start up and mount the database:
    startup mount;
  5. Configure enable archiving and open the database:
    alter database archivelog;
    alter database open;

Task 3. Enable Supplemental Logging

To retrieve data from redo logs, LogMiner requires supplemental logging for the database or tables.

Enable at least primary key or "identification key" logging at a table level for each table that you want to use. With identification key logging, records include only the primary key and changed fields.

Due to an Oracle known issue, to enable supplemental logging for a table, you must first enable minimum supplemental logging for the database.

To include all fields in the records the origin generates, enable full supplemental logging at a table or database level. Full supplemental logging provides data from all columns, those with unchanged data as well as the primary key and changed columns. For details on the data included in records based on the supplemental logging type, see Generated Records.
  1. To verify if supplemental logging is enabled for the database, run the following command:
    SELECT supplemental_log_data_min, supplemental_log_data_pk, supplemental_log_data_all FROM v$database;

    If the command returns Yes or Implicit for all three columns, then supplemental logging is enabled with both identification key and full supplemental logging. You can skip to Task 3.

    If the command returns Yes or Implicit for the first two columns, then supplemental logging is enabled with identification key logging. If this is what you want, you can skip to Task 3.

  2. Enable identification key or full supplemental logging.
    For 12c multitenant databases, best practice is to enable logging for the container for the tables, rather than the entire database. You can use the following command first to apply the changes to just the container:
    ALTER SESSION SET CONTAINER=<pdb>;
    You can enable identification key or full supplemental logging to retrieve data from redo logs. You do not need to enable both:
    To enable identification key logging
    You can enable identification key logging for individual tables or all tables in the database:
    • For individual tables

      Use the following commands to enable minimal supplemental logging for the database, and then enable identification key logging for each table that you want to use:

      ALTER DATABASE ADD SUPPLEMENTAL LOG DATA;
      ALTER TABLE <schema name>.<table name> ADD SUPPLEMENTAL LOG DATA (PRIMARY KEY) COLUMNS;
    • For all tables

      Use the following command to enable identification key logging for the entire database:

      ALTER DATABASE ADD SUPPLEMENTAL LOG DATA (PRIMARY KEY) COLUMNS;
    To enable full supplemental logging
    You can enable full supplemental logging for individual tables or all tables in the database:
    • For individual tables

      Use the following commands to enable minimal supplemental logging for the database, and then enable full supplemental logging for each table that you want to use:
      ALTER DATABASE ADD SUPPLEMENTAL LOG DATA;
      ALTER TABLE <schema name>.<table name> ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;
    • For all tables

      Use the following command to enable full supplemental logging for the entire database:
      ALTER DATABASE ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;
  3. To submit the changes:
    ALTER SYSTEM SWITCH LOGFILE;

Task 4. Create a User Account

Create a user account to use with the Oracle CDC Client origin. You need the account to access the database through JDBC.

You create accounts differently based on the Oracle version that you use:
Oracle 12c multitenant databases
For multitenant Oracle 12c databases, create a common user account. Common user accounts are created in cdb$root and must use the convention: c##<name>.
  1. Log into the database as a user with DBA privileges.
  2. Create the common user account:
    ALTER SESSION SET CONTAINER=cdb$root;
    CREATE USER <user name> IDENTIFIED BY <password> CONTAINER=all;
    GRANT create session, alter session, set container, select any dictionary, logmining, execute_catalog_role TO <username> CONTAINER=all;
    ALTER SESSION SET CONTAINER=<pdb>;
    GRANT select on <db>.<table> TO <user name>;

    Repeat the final command for each table that you want to use.

When you configure the origin, use this user account for the JDBC credentials. Use the entire user name, including the "c##", as the JDBC user name.

Oracle 12c standard databases
For standard Oracle 12c databases, create a user account with the necessary privileges:
  1. Log into the database as a user with DBA privileges.
  2. Create the user account:
    CREATE USER <user name> IDENTIFIED BY <password>;
    GRANT create session, alter session, select any dictionary, logmining, execute_catalog_role TO <user name>;
    GRANT select on <db>.<table> TO <user name>;

    Repeat the final command for each table that you want to use.

When you configure the origin, use this user account for the JDBC credentials.

Oracle 11g databases
For Oracle 11g databases, create a user account with the necessary privileges:
  1. Log into the database as a user with DBA privileges.
  2. Create the user account:
    CREATE USER <user name> IDENTIFIED BY <password>;
    GRANT create session, alter session, execute_catalog_role, select any dictionary, select any transaction, select any table to <user name>;
    GRANT select on v_$logmnr_parameters to <user name>;
    GRANT select on v_$logmnr_logs to <user name>;
    GRANT select on v_$archived_log to <user name>;
    GRANT select on <db>.<table> TO <user name>;

    Repeat the final command for each table that you want to use.

When you configure the origin, use this user account for the JDBC credentials.

Task 5. Extract a Log Miner Dictionary (Redo Logs)

When using redo logs as the dictionary source, you must extract the Log Miner dictionary to the redo logs before you start the pipeline. Repeat this step periodically to ensure that the redo logs that contain the dictionary are still available.

Oracle recommends that you extract the dictionary only at off-peak hours since the extraction can consume database resources.

To extract the dictionary for Oracle 11g or 12c databases, run the following command:
EXECUTE DBMS_LOGMNR_D.BUILD(OPTIONS=> DBMS_LOGMNR_D.STORE_IN_REDO_LOGS);
To extract the dictionary for Oracle 12c multitenant databases, run the following commands:
ALTER SESSION SET CONTAINER=cdb$root;
EXECUTE DBMS_LOGMNR_D.BUILD(OPTIONS=> DBMS_LOGMNR_D.STORE_IN_REDO_LOGS);

Secure Connections

The Oracle CDC Client origin requires a secure connection to the Oracle database server.

You must configure the origin to use an SSH tunnel or SSL/TLS encryption to securely connect to the database.

SSH Tunnel

You can secure the connection to the Oracle server by configuring the origin to use an SSH tunnel.

When you configure the stage to connect to the database through an SSH tunnel, the stage uses an encrypted connection to communicate with the SSH server. The SSH server then uses an unencrypted connection to communicate with the database server.

For additional information including instructions to set up the SSH server to accept incoming connections from StreamSets Cloud, see Use an SSH Tunnel.

Tip: If you also configure SSL/TLS encryption, then the stage uses SSL/TLS encryption from the SSH server to the database server.

SSL/TLS Encryption

You can secure the connection to the Oracle server by configuring the origin to use SSL/TLS encryption.

Before configuring the origin to use SSL/TLS encryption, verify that the Oracle server is correctly configured to use SSL/TLS. For more information, see the Oracle documentation.

When you enable encryption for the origin, the origin verifies the following information before it establishes an SSL/TLS connection to the database server:

Server certificate
The stage verifies the certificate of the database server provided in the stage properties. You must paste the full contents of the PEM encoded server certificate into the Server Certificate PEM property, including the header and footer in the file. Use a text editor to open the PEM encoded certificate, and then copy and paste the full contents of the file into the origin property, as follows:
-----BEGIN CERTIFICATE REQUEST-----
MIIB9TCCAWACAQAwgbgxGTAXBgNVHJoMEFF1b1ZlZGwzIEkpbWl0ZWLxHLAaBgNV
                        
......
                        
98TwDIK/39WEB/V607As+KoYajQL9drorw==
-----END CERTIFICATE REQUEST-----
You also specify the cipher suites used for encryption and data integrity of the certificate. For example, you might enter TLS_RSA_WITH_AES_256_CBC_SHA as the cipher suite. You can enter multiple cipher suites separated by commas. For a full list of available cipher suites, see Cipher Suites in the Oracle documentation.
Server host name
The stage can optionally verify the host name of the database server provided in the stage properties.
To verify the host name, select the Verify Hostname property and then specify the SSL/TLS distinguished name (DN) of the server. The stage verifies that the specified distinguished name matches the DN specified in the server certificate.

Schema, Table Name and Exclusion Patterns

When you configure the Oracle CDC Client origin, you specify the tables with the change capture data that you want to process. To specify the tables, you define the schema, a table name pattern, and an optional exclusion pattern.

When defining the schema and table name pattern, you can use a regular expression to define a set of tables within a schema or across multiple schemas. When needed, you can also use a regular expression as an exclusion pattern to exclude a subset of tables from the larger set.

For example, say you want to process change data capture data for all tables in the sales schema that start with SALES while excluding those that end with a dash (-) and single-character suffix. You can use the following configuration to specify the tables to process:
  • Schema: sales
  • Table Name Pattern: SALES*
  • Exclusion Pattern: SALES.*-.

Initial Change

The initial change is the point in the LogMiner redo logs where you want to start processing. When you start the pipeline, the Oracle CDC Client origin starts processing from the specified initial change and continues until you stop the pipeline.

Note that the Oracle CDC Client origin processes only change capture data. If you need existing data, you might use an Oracle Query Consumer or Oracle Multitable Consumer in a separate pipeline to read table data before you start an Oracle CDC Client pipeline.

Oracle CDC Client provides several ways to configure the initial change:
From the latest change
The origin processes all changes that occur after you start the pipeline.
From a specified datetime
The origin processes all changes that occurred at the specified datetime and later. Use the following format: DD-MM-YYYY HH24:MI:SS.
From a specified system change number (SCN)
The origin processes all changes that occurred in the specified SCN and later. When using the specified SCN, the origin starts processing with the timestamp associated with the SCN. If the SCN cannot be found in the redo logs, the origin continues reading from the next higher SCN that is available in the redo logs.
Typically, a database admin can provide the SCN to use.

Example

You want to process all existing data in the Orders table and then capture changed data, writing all data to Amazon S3. To read the existing data, you use a pipeline with the Oracle Query Consumer origin and Amazon S3 destination as follows:

Once all existing data is read, you stop the Oracle Query Consumer pipeline and start the following Oracle CDC Client pipeline. This pipeline is configured to pick up changes that occur after you start the pipeline, but if you wanted to prevent any chance of data loss, you could configure the initial change for an exact datetime or earlier SCN:

Include Nulls

When the Oracle LogMiner performs full supplemental logging, the resulting data includes all columns in the table with null values where no changes occurred. When the Oracle CDC Client origin processes this data, by default, it ignores null values when generating records.

You can configure the origin to include the null values in the record. You might need to include the null values when the destination system has required fields. To include null values, enable the Include Nulls property on the Oracle CDC tab.

Oracle Data Types

The Oracle CDC Client origin converts Oracle data types into StreamSets Cloud data types.

The origin supports the following Oracle data types:
Oracle Data Type StreamSets Cloud Data Type
Binary_double Double
Binary_float Float
Char String
Date Datetime
Long String
Nchar String
Number Decimal
Nvarchar2 String
Raw Byte_array
Timestamp Datetime
Timestamp with time zone, Timestamp with local time zone Zoned_datetime

To maximize efficiency while providing more precision, the origin includes only the UTC offset with the data. It omits the time zone ID.

Varchar, Varchar2 String
XMLType String

Unsupported Data Types

You can configure how the Oracle CDC Client origin handles records that contain unsupported data types. The origin can perform the following actions:

  • Pass the record to the pipeline without the unsupported data types.
  • Pass the record to error without the unsupported data types.
  • Discard the record.

You can configure the origin to include the unsupported data types in the record. When you include unsupported types, the origin includes the field names and passes the data as unparsed strings.

The origin does not support the following Oracle data types:
  • Array
  • Blob
  • Clob
  • Datalink
  • Distinct
  • Interval
  • Java Object
  • Nclob
  • Other
  • Ref
  • Ref Cursor
  • SQLXML
  • Struct
  • Time with Timezone
  • UriType

Generated Records

The origin generates records differently based on the Oplog operation type and the logging enabled for the database and tables. It also includes CDC and CRUD information in record header attributes.

The following table describes how the origin generates record data:
Oplog Operation Identification/Primary Key Logging Only Full Supplemental Logging
INSERT All fields that contain data, ignoring fields with null values. All fields.
UPDATE Primary key field and fields with updated values. All fields.
SELECT_FOR_ UPDATE Primary key field and fields with updated values. All fields.
DELETE Primary key field. All fields.

CRUD Operation Header Attributes

The Oracle CDC Client origin specifies the CRUD operation type for a record in the sdc.operation.type record header attribute.

The Oracle CDC Client origin evaluates the Oplog operation type associated with each entry that it processes and, when appropriate, it writes the operation type to the sdc.operation.type record header attribute.

The origin uses the following values in the sdc.operation.type record header attribute to represent the operation type:

  • 1 for INSERT
  • 2 for DELETE
  • 3 for UPDATE and SELECT_FOR_UPDATE

If you use a CRUD-enabled destination in the pipeline such as Oracle Producer or Salesforce, the destination can use the operation type when writing to destination systems. When necessary, you can use an Expression Evaluator or scripting processors to manipulate the value in the sdc.operation.type header attribute. For an overview of changed data processing and a list of CRUD-enabled destinations, see Processing Changed Data.

CDC Header Attributes

The Oracle CDC Client origin provides the following CDC record header attributes for each record:
  • oracle.cdc.query
  • oracle.cdc.rowId
  • oracle.cdc.scn
  • oracle.cdc.timestamp
  • oracle.cdc.table
  • oracle.cdc.user
And it includes the following record header attributes for each decimal field in the record:
  • jdbc.<fieldname>.precision
  • jdbc.<fieldname>.scale

You can use the record:attribute or record:attributeOrDefault functions to access the information in the attributes. For more information about working with record header attributes, see Working with Header Attributes.

Event Generation

The Oracle CDC Client origin can generate events that you can use in an event stream when the origin uses redo logs as the dictionary source. The origin does not generate events when using the online catalog as the dictionary source.

When you use redo logs as the dictionary source and enable event generation, the Oracle CDC Client origin generates events when it reads DDL statements. It generates events for ALTER, CREATE, DROP, and TRUNCATE statements.

When you start the pipeline, the origin queries the database and caches the schemas for all tables listed in the origin, and then generates an initial event record for each table. Each event record describes the current schema for each table. The origin uses the cached schemas to generate records when processing data-related redo log entries.

The origin then generates an event record for each DDL statement it encounters in the redo logs. Each event record includes the DDL statement and the related table in record header attributes.

The origin includes table schema information in the event record for new and updated tables. When the origin encounters an ALTER or CREATE statement, it queries the database for the latest schema for the table.

If the ALTER statement is an update to a cached schema, the origin updates the cache and includes the updated table schema in the event record. If the ALTER statement is older than the cached schema, the origin does not include the table schema in the event record. Similarly, if the CREATE statement is for a "new" table, the origin caches the new table and includes the table schema in the event record. Because the origin verifies that all specified tables exist when the pipeline starts, this can occur only when the table is dropped and created after the pipeline starts. If the CREATE statement is for a table that is already cached, the origin does not include the table schema in the event record.

Oracle CDC Client events can be used with a destination to store event information. For an example, see Preserving an Audit Trail of Events.

For more information about dataflow triggers and the event framework, see Overview.

Event Records

Event records generated by the Oracle CDC Client origin have the following event-related record header attributes. Record header attributes are stored as strings:
Record Header Attribute Description
sdc.event.type Event type. Uses one of the following types:
  • STARTUP
  • ALTER
  • CREATE
  • DROP
  • TRUNCATE
sdc.event.version Integer that indicates the version of the event record type.
sdc.event.creation_timestamp Epoch timestamp when the stage created the event.
oracle.cdc.table Name of the Oracle database table that changed.
oracle.cdc.ddl The DDL statement that triggered the event.
STARTUP event records are generated when the pipeline starts. The origin creates one event record for each table. The event record includes the current table schema.

DROP and TRUNCATE event records include just the record header attributes listed above.

CREATE event records include the schema for the new table when the table has been dropped and recreated. ALTER event records include the table schema when the statement updates a cached schema. For more information about the behavior for CREATE and ALTER statements, see Event Generation.

For example, the following ALTER event record displays the three fields in an updated schema - NICK, ID, and NAME:

In the list of record header attributes, notice the DDL statement that added the NICK field, the name of the updated table, and the ALTER event type.

PEG Parser (Beta)

The Oracle CDC Client origin provides an alternate PEG parser that you can try when concerned about pipeline performance.

The Oracle CDC Client origin provides a PEG parser as an alternate to the default parser. You can try the PEG parser when you feel that performance with the default parser is inadequate, as when processing very wide tables. Note that the PEG parser is in beta at this time and should be carefully tested before use in production.

For more information about PEG processing, see https://en.wikipedia.org/wiki/Parsing_expression_grammar.

To use the PEG parser, enable the Use PEG Parser property on the Advanced tab.

Multithreaded Parsing

You can configure the Oracle CDC Client origin to use multiple threads to parse transactions. You can use multithreaded parsing with both the default Oracle CDC Client parser and the alternate PEG parser.

When performing multithreaded parsing, the origin uses multiple threads to generate records from committed SQL statements in a transaction. It does not perform multithreaded processing of the resulting records.

The Oracle CDC Client origin uses multiple threads for parsing based on the Parsing Thread Pool Size property. When you start the pipeline, the origin creates the number of threads specified in the property. The origin connects to Oracle, creates a LogMiner session, and processes a single transaction at a time.

When the origin processes a transaction, it reads and buffers all SQL statements in the transaction to an in-memory queue and waits for statements to be committed before processing them. Once committed, the SQL statements are parsed using all available threads and the original order of the SQL statements is retained.

The resulting records are passed to the rest of the pipeline. Note that enabling multithreaded parsing does not enable multithreaded processing – the pipeline uses a single thread for reading data.

Data Preview with Oracle CDC Client

When using data preview with the Oracle CDC Client origin, you might need to increase the Preview Timeout data preview property.

By default, data preview waits 30,000 milliseconds, 30 seconds, to establish a connection to the origin system before timing out. Due to the complex nature of this origin, the initial startup can take longer than the default.

If data preview times out, try increasing the timeout property to 120,000 milliseconds. If preview continues to time out, increase the timeout incrementally.

Configuring an Oracle CDC Client

Configure an Oracle CDC Client origin to process LogMiner change data capture information from an Oracle database.

Before you use the origin, complete the prerequisite tasks. For more information, see Prerequisites.

Note: Some of these properties are advanced options, which the origin hides by default. To configure the advanced options, select Show Advanced Options at the top of the properties pane.
  1. In the properties pane, on the General tab, configure the following properties:
    General Property Description
    Name Stage name.
    Description Optional description.
    Produce Events When the origin uses redo logs as the dictionary source, can generate event records when the origin reads DDL statements. Use for event handling.
    On Record Error Error record handling for the stage:
    • Discard - Discards the record.
    • Send to Error - Sends the record to the pipeline for error handling.
    • Stop Pipeline - Stops the pipeline.
  2. To securely connect to the database server through an SSH tunnel, configure the following properties on the SSH Tunnel tab:
    SSH Tunnel Property Description
    Use SSH Tunneling Enables the stage to connect to the database server through an SSH tunnel.

    When cleared, the stage must connect to the database server using SSL/TLS encryption.

    SSH Tunnel Host Host name for the SSH server.
    SSH Tunnel Port Port number for the SSH server.

    For example: 22.

    SSH Tunnel Host Fingerprint Host key fingerprint used to verify the identity of the SSH server. For example: 97:3c:ae:76:73:f3:ef:a7:18:02:6a:c6:57:43:82:f6.

    You can enter multiple fingerprints separated by commas. Be sure to properly generate the fingerprint.

    Enter the host key fingerprint when you want the stage to verify the identity of the SSH server. Leave blank when you want the stage to establish an SSH connection without any verification.

    SSH Tunnel Username SSH user account used to install the StreamSets Cloud public SSH key on the SSH server.
    SSH Public Key Public SSH key for your StreamSets Cloud account.

    You download and install the key on the SSH server when you set up the SSH server.

  3. On the Oracle CDC tab, configure the following change data capture properties:
    Oracle CDC Property Description
    Tables Tables to track. Specify related properties as needed.

    Using simple or bulk edit mode, click the Add icon to define another table configuration.

    Schema Name Schema to use. You can enter a schema name or use SQL LIKE syntax to specify a set of schemas.

    The origin submits the schema name in all caps by default. To use a lower or mixed-case name, enable the Case-Sensitive Names property.

    Table Name Pattern

    A table name pattern that specifies the tables to track. You can enter a table name or use SQL LIKE syntax to specify a set of tables.

    The origin submits table names in all caps by default. To use lower or mixed-case names, enable the Case-Sensitive Names property.
    Exclusion Pattern An optional table exclusion pattern to define a subset of tables to exclude. You can enter a table name or use a regular expression to specify a subset of tables to exclude.
    Case-Sensitive Names Enables using case-sensitive schema, table, and column names. When not enabled, the origin submits names in all caps.

    Oracle uses all caps for schema, table, and column names by default. Names can be lower- or mixed-case only if the schema, table, or column was created with quotation marks around the name.

    Initial Change The starting point for the read. Use one of the following options:
    • From Latest Change - Processes changes that arrive after you start the pipeline.
    • From Date - Processes changes starting from the specified date.
    • From SCN - Processes changes starting from the specified system change number (SCN).
    Start Date The datetime to read from when you start the pipeline. For a date-based initial change.

    Use the following format: DD-MM-YYYY HH24:MI:SS.

    Start SCN The system change number to start reading from when you start the pipeline. If the SCN cannot be found in the redo logs, the origin continues reading from the next higher SCN that is available in the redo logs.

    For an SCN-based initial change.

    Operations Operations to include when creating records. All unlisted operations are ignored.
    Dictionary Source Location of the LogMiner dictionary:
    • Redo logs - Use when schemas can change. Allows the origin to adapt to schema changes and to generate events for DDL statements.
    • Online catalog - Use for better performance when schemas are not expected to change.
    Unsupported Field Type Determines the behavior when the origin encounters unsupported data types in the record:
    • Ignore and Send Record to Pipeline - The origin ignores unsupported data types and passes the record with only supported data types to the pipeline.
    • Send Record to Error - The origin handles the record based on the error record handling configured for the stage. The error record contains only the supported data types.
    • Discard Record - The origin discards the record.

    For a list of unsupported data types, see Unsupported Data Types.

    Add Unsupported Fields to Records Includes fields with unsupported data types in the record. Includes the field names and the unparsed string values of the unsupported fields.
    Include Nulls Includes null values in records generated from full supplemental logging that include null values. By default, the origin generates a record without null values.
    Convert Timestamp To String Enables the origin to write timestamps as string values rather than datetime values. Strings maintain the precision stored in the source database.

    When writing timestamps to StreamSets Cloud date or time data types that do not store nanoseconds, the origin stores any nanoseconds from the timestamp in a field attribute.

    Maximum Transaction Length Time in seconds to wait for changes for a transaction. Enter the longest period of time that you expect a transaction to require.

    Default is ${ 1 * HOURS } which is 3600 seconds.

    LogMiner Session Window Time in seconds to keep a LogMiner session open. Set to larger than the maximum transaction length. Decrease the time to reduce LogMiner resource use.

    Default is ${ 2 * HOURS } which is 7200 seconds.

    DB Time Zone Time zone of the database.
  4. On the JDBC tab, configure the following JDBC properties:
    JDBC Property Description
    Database Host Host name of the machine where the Oracle database server is installed.
    Database Secure Port Secure port number of the Oracle database server.

    The stage requires a secure connection to the database server using an SSH tunnel or SSL/TLS encryption.

    Database SID Name of the Oracle database on the server, also known as the service name or SID.
    Max Batch Size (records) Maximum number of records processed at one time. Honors values up to the maximum batch size of 1000.

    Default is 1000.

    PDB Name of the pluggable database that contains the schema you want to use. Use only when the schema was created in a pluggable database.

    Required for schemas created in pluggable databases.

    Additional JDBC Configuration Properties Additional JDBC configuration properties to use. To add properties, click Add and define the JDBC property name and value.

    Use the property names and values as expected by JDBC.

  5. On the Credentials tab, configure the following properties:
    Credentials Property Description
    Username

    User name for the JDBC connection. Use the user account created for the origin. Common user accounts for Oracle 12c multitenant databases start with "c##".

    For more information, see Task 4. Create a User Account.

    Password Password for the account.
    Note: When you enter secrets such as user names and passwords, the stage encrypts the secret values.
  6. To use SSL/TLS encryption to securely connect to the database server, configure the following properties on the SSL/TLS Encryption tab:
    SSL/TLS Encryption Property Description
    Enable Encryption Uses SSL/TLS encryption to securely connect to the database server.

    When cleared, the stage must use an SSH tunnel to connect to the database server.

    Server Certificate PEM Server certificate in PEM format used to verify the SSL/TLS certificate of the database server.

    Use a text editor to open the PEM encoded certificate, and then copy and paste the full contents of the file into the property, including the header and footer.

    Cipher Suites Cipher suites used for encryption and data integrity of the certificate. For example, you might enter TLS_RSA_WITH_AES_256_CBC_SHA as the cipher suite.

    You can enter multiple cipher suites separated by commas. For a full list of available cipher suites, see Cipher Suites in the Oracle documentation.

    Verify Hostname Verifies the host name of the database server.
    SSL Distinguished Name SSL/TLS distinguished name (DN) of the database server that must match the DN specified in the server certificate.
  7. On the Advanced tab, optionally configure advanced properties.
    The defaults for these properties should work in most cases:
    Advanced Property Description
    JDBC Fetch Size for Current Window Number of redo logs to fetch and process together when the LogMiner session includes the current time.

    At the end of the session, the origin fetches and processes any remaining redo logs.

    Set to 1 to make data available as soon as the origin receives a redo log.

    Default is 1.

    JDBC Fetch Size for Past Windows Number of redo logs to fetch and process together when the LogMiner session is entirely in the past.

    When the fetch size exceeds available redo logs, the origin fetches and processes the remaining redo logs. Lower fetch sizes can increase throughput when writes to the destination system are slow.

    Default is 1.

    Use PEG Parser (beta) Enables use of the beta PEG parser instead of the default Oracle CDC Client origin parser. Can improve performance, but should be carefully tested before use in production.
    Maximum Pool Size Maximum number of connections to create.

    Default is 1. The recommended value is 1.

    Minimum Idle Connections Minimum number of connections to create and maintain. To define a fixed connection pool, set to the same value as Maximum Pool Size.

    Default is 1.

    Parsing Thread Pool Size Number of threads that the origin uses for multithreaded parsing.

    Can be used with both the default parser and PEG parser.

    Connection Timeout Maximum time to wait for a connection. Use a time constant in an expression to define the time increment.
    Default is 30 seconds, defined as follows:
    ${30 * SECONDS}
    Idle Timeout Maximum time to allow a connection to idle. Use a time constant in an expression to define the time increment.

    Use 0 to avoid removing any idle connections.

    When the entered value is close to or more than the maximum lifetime for a connection, the origin ignores the idle timeout.

    Default is 10 minutes, defined as follows:
    ${10 * MINUTES}
    Max Connection Lifetime Maximum lifetime for a connection. Use a time constant in an expression to define the time increment.

    Use 0 to set no maximum lifetime.

    When a maximum lifetime is set, the minimum valid value is 30 minutes.

    Default is 30 minutes, defined as follows:
    ${30 * MINUTES}
    Enforce Read-only Connection Creates read-only connections to avoid any type of write.

    Default is enabled. Disabling this property is not recommended.

    Transaction Isolation Transaction isolation level used to connect to the database.
    Default is the default transaction isolation level set for the database. You can override the database default by setting the level to any of the following:
    • Read committed
    • Read uncommitted
    • Repeatable read
    • Serializable
    Init Query

    SQL query to perform immediately after the stage connects to the database. Use to set up the database session as needed.