Kudu

The Kudu destination writes data to a Kudu table. You can also use the destination to write to a Kudu table created by Impala.

The destination writes record fields to table columns by matching names. The Kudu destination can insert or upsert data to the table.

When you configure the Kudu destination, you specify the connection information for one or more Kudu masters. You configure the table and write mode to use. When needed, you can specify a maximum batch size for the destination.

Note: Due to a Kudu limitation on Spark, pipeline validation does not validate Kudu stage configuration.

Configuring a Kudu Destination

Configure a Kudu destination to write to a Kudu table.
  1. On the Properties panel, on the General tab, configure the following properties:
    General Property Description
    Name Stage name.
    Description Optional description.
  2. On the Kudu tab, configure the following properties:
    Kudu Property Description
    Kudu Masters Comma-separated list of Kudu masters used to access the Kudu table.

    For each Kudu master, specify the host and port in the following format:

    <host>:<port>
    Kudu Table Name of the table to write to.
    To write to a Kudu table created by Impala, use the following format:
    impala::default.<table name> 
    Write Operation Operation to perform when writing to Kudu:
    • Insert - Inserts all data to the table.
    • Upsert - Inserts new data to the table and updates existing data.
  3. On the Advanced tab, optionally configure the following property:
    Advanced Property Description
    Max Batch Size Maximum number of records to read in a batch.

    -1 uses the batch size configured for the Spark cluster.