JDBC Lookup

The JDBC Lookup processor uses a JDBC connection to perform lookups in a database table and pass the lookup values to fields. Use the JDBC Lookup to enrich records with additional data.

For example, you can configure the processor to use a department_ID field as the column to look up department name values in a database table, and pass the values to a new department_name output field.

When a lookup results in multiple matches, the JDBC Lookup processor can return the first matching value or return all matching values in separate records.

When you configure JDBC Lookup, you specify connection information and custom JDBC configuration properties to determine how the processor connects to the database. You configure the SQL query to define the data to look up in the database, specify the output fields to write the lookup values to, and choose the multiple match behavior. You can optionally define a default value to use for fields with missing values.

You can configure the processor to locally cache the lookup values to improve performance.

To use a JDBC version older than 4.0, you can specify the driver class name and define a health check query.

When you monitor a pipeline that includes the JDBC Lookup processor, you can view stage statistics about the number of queries the processor makes and the average time of the queries.

Installing the JDBC Driver

Before you use the JDBC Lookup, install the JDBC driver for the database. You cannot access the database until you install the required driver.

For more information about installing drivers, see Install External Libraries.

Lookup Cache

To improve pipeline performance, you can configure the JDBC Lookup processor to locally cache the values returned from a database table.

The processor caches values until the cache reaches the maximum size or the expiration time. When the first limit is reached, the processor evicts values from the cache.

You can configure the following ways to evict values from the cache:
Size-based eviction
Configure the maximum number of values that the processor caches. When the maximum number is reached, the processor evicts the oldest values from the cache.
Time-based eviction
Configure the amount of time that a value can remain in the cache without being written to or accessed. When the expiration time is reached, the processor evicts the value from the cache. The eviction policy determines whether the processor measures the expiration time since the last write of the value or since the last access of the value.
For example, you set the eviction policy to expire after the last access and set the expiration time to 60 seconds. After the processor does not access a value for 60 seconds, the processor evicts the value from the cache.

When you stop the pipeline, the processor clears the cache.

Monitoring a JDBC Lookup

When you monitor a pipeline that includes the JDBC Lookup processor, the Summary tab displays statistics about the queries that the JDBC Lookup processor performs. Use the statistics to help identify any performance bottlenecks encountered by the pipeline.

When you select the JDBC Lookup processor in the canvas while monitoring a running pipeline, the Summary tab displays the following stage statistics:

The Select Queries Meter displays the number of queries that the processor makes per second. The Select Queries Timer displays the average amount of time that the queries take to complete.

Configuring a JDBC Lookup

Configure a JDBC Lookup processor to perform lookups in a database table.

  1. In the Properties panel, on the General tab, configure the following properties:
    General Property Description
    Name Stage name.
    Description Optional description.
    Required Fields Fields that must include data for the record to be passed into the stage.
    Tip: You might include fields that the stage uses.

    Records that do not include all required fields are processed based on the error handling configured for the pipeline.

    Preconditions Conditions that must evaluate to TRUE to allow a record to enter the stage for processing. Click Add to create additional preconditions.

    Records that do not meet all preconditions are processed based on the error handling configured for the stage.

    On Record Error Error record handling for the stage:
    • Discard - Discards the record.
    • Send to Error - Sends the record to the pipeline for error handling.
    • Stop Pipeline - Stops the pipeline. Not valid for cluster pipelines.
  2. On the JDBC tab, configure the following properties:
    JDBC Property Description
    JDBC Connection String Connection string to use to connect to the database.

    Some databases, such as Postgres, require the schema in the connection string. Use the connection string format required by the database.

    Use Credentials Enables entering credentials on the Credentials tab. Use when you do not include credentials in the JDBC connection string.
    SQL Query SQL query to use to look up data in the database. Use the following syntax for the query:
    SELECT <column1 name>, <column2 name> FROM <table name> WHERE <column3 name> =
        '${record:value(<field path>)}'
    For example, to use the department ID field to look up the department name column, use the following query:
    SELECT DeptName FROM Departments WHERE DeptID = '${record:value('/dept_ID')}'
    Column Mappings Use to override the default column to field mappings. By default, columns are written to fields of the same name.
    Enter the following:
    • Column Name - Name of the database column that contains the lookup value. Enter a column name or enter an expression that defines the column name.
    • SDC Field - Name of the field in the record that receives the lookup value. You can specify an existing field or a new field. If the field does not exist, JDBC Lookup creates the field.
    • Default Value - Optional default value to use when the query does not return a value for the field. If the query returns no value and this property is not defined, the processor sends the record to error.

      To enter a default value for the Date data type, use the following format: yyyy/MM/dd. To enter a default value for the Datetime data type, use the following format: yyyy/MM/dd HH:mm:ss.

    • Data Type - Data type to use for the SDC Field. Required when you specify a default value. The processor uses the database column data type by default.

    Use the Add icon to create additional column mappings.

    Multiple Values Behavior Action to take upon finding multiple matching values:
    • First value only - Returns the first value.
    • Split into Multiple Records - Returns every matching value in a separate record.
    Max Clob Size (characters) Maximum number of characters to be read in a Clob field. Larger data is truncated.
    Max Blob Size (bytes) Maximum number of bytes to be read in a Blob field.
    Enable Local Caching Specifies whether to locally cache the returned values.
    Maximum Entries to Cache Maximum number of values to cache. When the maximum number is reached, the processor evicts the oldest values from the cache.

    Default is -1, which means unlimited.

    Eviction Policy Type Policy used to evict values from the local cache when the expiration time has passed:
    • Expire After Last Access - Measures the expiration time since the value was last accessed by a read or a write.
    • Expire After Last Write - Measures the expiration time since the value was created, or since the value was last replaced.
    Expiration Time Amount of time that a value can remain in the local cache without being accessed or written to.

    Default is 1 second.

    Time Unit Unit of time for the expiration time.

    Default is seconds.

    Additional JDBC Configuration Properties Additional JDBC configuration properties to use. To add properties, click Add and define the JDBC property name and value.

    Use the property names and values as expected by JDBC.

  3. To enter JDBC credentials separately from the JDBC connection string, on the Credentials tab, configure the following properties:
    Credentials Property Description
    Username User name for the JDBC connection.
    Password Password for the JDBC account.
    Tip: To secure sensitive information such as usernames and passwords, you can use runtime resources or Hashicorp Vault secrets. For more information, see Using Runtime Resources or Accessing Hashicorp Vault Secrets.
  4. When using JDBC versions older than 4.0, on the Legacy Drivers tab, optionally configure the following properties:
    Legacy Driver Property Description
    JDBC Class Driver Name Class name for the JDBC driver. Required for JDBC versions older than version 4.0.
    Connection Health Test Query Optional query to test the health of a connection. Recommended only when the JDBC version is older than 4.0.
  5. On the Advanced tab, optionally configure advanced properties.
    The defaults for these properties should work in most cases:
    Advanced Property Description
    Maximum Pool Size The maximum number of connections to create.

    Default is 1. The recommended value is 1.

    Minimum Idle Connections The minimum number of connections to create and maintain. To define a fixed connection pool, set to the same value as Maximum Pool Size.

    Default is 1.

    Connection Timeout Maximum time to wait for a connection. Use a time constant in an expression to define the time increment.
    Default is 30 seconds, defined as follows:
    ${30 * SECONDS}
    Idle Timeout Maximum time to allow a connection to idle. Use a time constant in an expression to define the time increment.

    Use 0 to avoid removing any idle connections.

    Default is 30 minutes, defined as follows:
    ${30 * MINUTES}
    Max Connection Lifetime Maximum lifetime for a connection. Use a time constant in an expression to define the time increment.

    Use 0 to avoid removing any idle connections.

    Default is 30 seconds, defined as follows:
    ${30 * SECONDS}
    Auto Commit Determines if auto-commit mode is enabled. In auto-commit mode, the database commits the data for each record.

    Default is disabled.

    Enforce Read-only Connection Creates read-only connections to avoid any type of write.

    Default is enabled. Disabling this property is not recommended.

    Transaction Isolation Transaction isolation level used to connect to the database.

    Default is the default transaction isolation level set for the database. You can override the database default by setting the level to any of the following:

    • Read committed
    • Read uncommitted
    • Repeatable read
    • Serializable
    Advanced Property Description
    Maximum Pool Size The maximum number of connections to create.

    Default is 1. The recommended value is 1.

    Minimum Idle Connections The minimum number of connections to create and maintain. To define a fixed connection pool, set to the same value as Maximum Pool Size.

    Default is 1.

    Connection Timeout Maximum time to wait for a connection. Use a time constant in an expression to define the time increment.
    Default is 30 seconds, defined as follows:
    ${30 * SECONDS}
    Idle Timeout Maximum time to allow a connection to idle. Use a time constant in an expression to define the time increment.

    Use 0 to avoid removing any idle connections.

    Default is 30 minutes, defined as follows:
    ${30 * MINUTES}
    Max Connection Lifetime Maximum lifetime for a connection. Use a time constant in an expression to define the time increment.

    Use 0 to avoid removing any idle connections.

    Default is 30 seconds, defined as follows:
    ${30 * SECONDS}
    Auto Commit Determines if auto-commit mode is enabled. In auto-commit mode, the database commits the data for each record.

    Default is disabled.

    Enforce Read-only Connection Creates read-only connections to avoid any type of write.

    Default is enabled. Disabling this property is not recommended.

    Transaction Isolation Transaction isolation level used to connect to the database.

    Default is the default transaction isolation level set for the database. You can override the database default by setting the level to any of the following:

    • Read committed
    • Read uncommitted
    • Repeatable read
    • Serializable