MemSQL Fast Loader

Supported pipeline types:
  • Data Collector

The MemSQL Fast Loader destination uses a JDBC connection to insert data into a MemSQL or MySQL database table with a LOAD statement.

In pipelines that insert data into MemSQL or MySQL, you can use the MemSQL Fast Loader destination rather than the JDBC Producer destination for faster performance. In pipelines that update or delete data, however, you must use a JDBC Producer destination. The MemSQL Fast Loader destination does not process CDC records; the destination applies error handling to records with a CDC operation in a header attribute.

To configure a MemSQL Fast Loader destination, you specify connection information, table name, and optionally define field mappings and other properties that your driver requires. By default, the MemSQL Fast Loader destination writes data to the table based on the matching field names. You can override the default field mappings by defining specific mappings. To use a JDBC version older than 4.0, you can specify the driver class name and define a health check query.

Before you use the MemSQL Fast Loader destination, you must install the MemSQL stage library and complete the other prerequisite tasks. The MemSQL stage library is an Enterprise stage library that is free for development purposes only. For information about purchasing the stage library for use in production, contact StreamSets.

Prerequisites

Before using the MemSQL Fast Loader destination, complete the following prerequisites:

Install the MemSQL Stage Library

You must install the MemSQL stage library before using the MemSQL Fast Loader destination.

The MemSQL stage library is an Enterprise stage library that is free for development purposes only. For information about purchasing the stage library for use in production, contact StreamSets.

You can install the Enterprise stage library using Package Manager for a tarball Data Collector installation or as a custom stage library for a tarball, RPM, or Cloudera Manager Data Collector installation.

Supported Versions

The following table lists the versions of the MemSQL Enterprise stage library to use with specific Data Collector versions:
Data Collector Version Supported Stage Library Version
Data Collector 3.8.x and 3.9.x MemSQL Enterprise Library 1.0.1
Data Collector 3.7.x MemSQL Enterprise Library 1.0.0

Installing with Package Manager

You can use Package Manager to install the MemSQL stage library on a tarball Data Collector installation.

  1. Click the Package Manager icon: .
  2. In the Navigation panel, click Enterprise Stage Libraries.
  3. Select MemSQL Enterprise Library, then click the Install icon: .
  4. Read the StreamSets subscription terms of service. If you agree, select the checkbox and click Install.
    Data Collector installs the selected stage library.
  5. Restart Data Collector.

Installing as a Custom Stage Library

You can install the MemSQL Enterprise stage library as a custom stage library on a tarball, RPM, or Cloudera Manager Data Collector installation.

  1. To download the stage library, go to the StreamSets Download Enterprise Connectors page.
    The web page displays the Enterprise stage libraries organized by release date, with the latest versions at the top of the page.
  2. Click the Enterprise stage library name and version that you want to download.
  3. In the Download Enterprise Connectors form, enter your name and contact information.
  4. Read the StreamSets subscription terms of service. If you agree, accept the terms of service and click Submit.
    The stage library downloads.
  5. Install and manage the Enterprise stage library as a custom stage library.
    For more information, see Custom Stage Libraries.

Installing the JDBC Driver for MemSQL Fast Loader

The MemSQL Fast Loader destination requires that you install the JDBC driver for MySQL. Install the JDBC driver for MySQL whether you are using MemSQL or MySQL. The destination cannot access a MemSQL or MySQL database until you install this driver.
Note: StreamSets has tested the destination with MemSQL 6.5.16 with the MySQL Connector/J 8.0.12 driver.
  1. Download the JDBC driver for MySQL.
    You can download the driver on the Download Connector/J page on the MySQL website.
  2. Install the driver as an external library for the MemSQL Enterprise stage library.
    For information about installing additional drivers, see Install External Libraries.

Configuring a MemSQL Fast Loader Destination

Configure the MemSQL Fast Loader destination to insert data into a MemSQL or MySQL database table using JDBC. Do not use this destination to update or delete data, or to process CDC records.

Before you use MemSQL Fast Loader destination in a pipeline, complete the required prerequisites.

  1. In the Properties panel, on the General tab, configure the following properties:
    General Property Description
    Name Stage name.
    Description Optional description.
    Required Fields Fields that must include data for the record to be passed into the stage.
    Tip: You might include fields that the stage uses.

    Records that do not include all required fields are processed based on the error handling configured for the pipeline.

    Preconditions Conditions that must evaluate to TRUE to allow a record to enter the stage for processing. Click Add to create additional preconditions.

    Records that do not meet all preconditions are processed based on the error handling configured for the stage.

    On Record Error Error record handling for the stage:
    • Discard - Discards the record.
    • Send to Error - Sends the record to the pipeline for error handling.
    • Stop Pipeline - Stops the pipeline. Not valid for cluster pipelines.
  2. On the JDBC tab, configure the following properties:
    JDBC Property Description
    JDBC Connection String Connection string used to connect to the database.
    Use Credentials Enables entering credentials on the Credentials tab. Use when you do not include credentials in the JDBC connection string.
    Schema Name Optional database or schema name to use.

    Use when the database requires a fully-qualified table name.

    Table Name Database table name to use. Use the table name format required by the database.
    Field to Column Mapping Use to override the default field to column mappings. By default, fields are written to columns of the same name.
    When you override the mappings, you can define parameterized values to apply SQL functions to the field values before writing them to columns. For example, to convert a field value to an integer, enter the following for the parameterized value:
    CAST(? AS INTEGER)

    The question mark (?) is substituted with the value of the field. Leave the default value of ? if you do not need to apply a SQL function.

    Using simple or bulk edit mode, click the Add icon to create additional field to column mappings.

    Duplicate-key Error Handling Action to take when the record duplicates the primary key of a row in the table:
    • Ignore – Discard the record and retain existing row.
    • Replace – Overwrite the existing row with the record.
    Additional JDBC Configuration Properties Additional JDBC configuration properties to use. To add properties, click Add and define the JDBC property name and value.

    Use the property names and values as expected by JDBC.

  3. To enter JDBC credentials separately from the JDBC connection string, on the Credentials tab, configure the following properties:
    Credentials Property Description
    Username User name for the JDBC connection.
    Password Password for the JDBC account.
    Tip: To secure sensitive information such as user names and passwords, you can use runtime resources or credential stores.
  4. When using JDBC versions older than 4.0, on the Legacy Drivers tab, optionally configure the following properties:
    Legacy Drivers Property Description
    JDBC Class Driver Name Class name for the JDBC driver. Required for JDBC versions older than version 4.0.
    Connection Health Test Query Optional query to test the health of a connection. Recommended only when the JDBC version is older than 4.0.
  5. On the Advanced tab, optionally configure advanced properties.
    The defaults for these properties should work in most cases:
    Advanced Property Description
    Maximum Pool Size Maximum number of connections to create.

    Default is 1. The recommended value is 1.

    Minimum Idle Connections Minimum number of connections to create and maintain. To define a fixed connection pool, set to the same value as Maximum Pool Size.

    Default is 1.

    Connection Timeout Maximum time to wait for a connection. Use a time constant in an expression to define the time increment.
    Default is 30 seconds, defined as follows:
    ${30 * SECONDS}
    Idle Timeout Maximum time to allow a connection to idle. Use a time constant in an expression to define the time increment.

    Use 0 to avoid removing any idle connections.

    When the entered value is close to or more than the maximum lifetime for a connection, Data Collector ignores the idle timeout.

    Default is 10 minutes, defined as follows:
    ${10 * MINUTES}
    Max Connection Lifetime Maximum lifetime for a connection. Use a time constant in an expression to define the time increment.

    Use 0 to set no maximum lifetime.

    When a maximum lifetime is set, the minimum valid value is 30 minutes.

    Default is 30 minutes, defined as follows:
    ${30 * MINUTES}
    Transaction Isolation Transaction isolation level used to connect to the database.

    Default is the default transaction isolation level set for the database. You can override the database default by setting the level to any of the following:

    • Read committed
    • Read uncommitted
    • Repeatable read
    • Serializable
    Init Query SQL query to perform immediately after the stage connects to the database. Use to set up the database session as needed.

    For example, the following query sets the time zone for the session for a MySQL database: SET time_zone = timezone;