UDP to Kafka

The UDP to Kafka origin reads messages from one or more UDP ports and writes each message directly to Kafka.

Use the UDP to Kafka origin to read large volumes of data from multiple UDP ports and write the data immediately to Kafka, without additional processing.

Here is an example of the recommended architecture for using the UDP to Kafka origin:

If you need to process data before writing it to Kafka, need to write to a destination system other than Kafka, or if the origin does not need to process high volumes of data, use the UDP Source origin.

UDP to Kafka can read collectd messages, Netflow messages from NetFlow Version 5, and the following types of syslog messages:

When you configure UDP to Kafka, you specify the UDP ports to use, Kafka configuration information, and advanced properties such as the maximum number of write requests.

You can add Kafka configuration properties and enable Kafka security as needed.

Pipeline Configuration

When you use a UDP to Kafka origin in a pipeline, connect the origin to a Trash destination.

The UDP to Kafka origin writes records directly to Kafka. The origin does not pass records to its output port, so you cannot perform additional processing or write the data to other destination systems.

However, since a pipeline requires a destination, you should connect the origin to the Trash destination to satisfy pipeline validation requirements.

A pipeline with the UDP to Kafka origin should look like this:

Additional Kafka Properties

You can add custom Kafka configuration properties to the UDP to Kafka origin.

When you add a Kafka configuration property, enter the exact property name and the value. The stage does not validate the property names or values.

Several properties are defined by default, you can edit or remove the properties as necessary.

Note: Because the stage uses several configuration properties, it ignores user-defined values for the following properties:
  • key.serializer.class
  • metadata.broker.list
  • partitioner.class
  • producer.type
  • serializer.class

Enabling Kafka Security

When using Kafka version 0.9.0.0 or later, you can configure the UDP to Kafka origin to connect securely through SSL/TLS, Kerberos, or both.

These versions provide features to support secure connections through SSL/TLS or Kerberos (SASL). The Kafka community considers these features beta quality.

Earlier versions of Kafka do not support security.

Enabling SSL/TLS

Perform the following steps to enable the UDP to Kafka origin to use SSL/TLS to connect to Kafka version 0.9.0.0 or later.

  1. To use SSL/TLS to connect, first make sure Kafka is configured for SSL/TLS as described in the Kafka documentation: http://kafka.apache.org/documentation.html#security_ssl.
  2. On the General tab of the stage, set the Stage Library property to Apache Kafka 0.9.0.0 or a later version.
  3. On the Kafka tab, add the security.protocol Kafka configuration property and set it to SSL.
  4. Then, add the following SSL Kafka configuration properties:
    • ssl.truststore.location
    • ssl.truststore.password
    When the Kafka broker requires client authentication - when the ssl.client.auth broker property is set to "required" - add and configure the following properties:
    • ssl.keystore.location
    • ssl.keystore.password
    • ssl.key.password
    Some brokers might require adding the following properties as well:
    • ssl.enabled.protocols
    • ssl.truststore.type
    • ssl.keystore.type

    For details about these properties, see the Kafka documentation.

For example, the following properties allow the stage to use SSL/TLS to connect to Kafka 0.0.9.0 with client authentication:

Enabling Kerberos (SASL)

When you use Kerberos authentication, Data Collector uses the Kerberos principal and keytab to connect to Kafka version 0.9.0.0 or later. Perform the following steps to enable the UDP to Kafka origin to use Kerberos to connect to Kafka.

  1. To use Kerberos, first make sure Kafka is configured for Kerberos as described in the Kafka documentation: http://kafka.apache.org/documentation.html#security_sasl.
  2. In the Data Collector configuration file, $SDC_CONF/sdc.properties, make sure the following Kerberos properties are configured:
    • kerberos.client.enabled
    • kerberos.client.principal
    • kerberos.client.keytab
  3. On the General tab of the stage, set the Stage Library property to Apache Kafka 0.9.0.0 or a later version.
  4. On the Kafka tab, add the security.protocol Kafka configuration property, and set it to SASL_PLAINTEXT.
  5. Then, add the sasl.kerberos.service.name configuration property, and set it to the Kerberos principal name that Kafka runs as.

For example, the following Kafka properties enable connecting to Kafka 0.0.9.0 with Kerberos:

Enabling SSL/TLS and Kerberos

You can enable the UDP to Kafka origin to use SSL/TLS and Kerberos to connect to Kafka version 0.9.0.0 or later.

To use SSL/TLS and Kerberos, combine the steps required to enable each and set the security.protocol property as follows:
  1. Make sure Kafka is configured to use SSL/TLS and Kerberos (SASL) as described in the following Kafka documentation:
  2. In the Data Collector configuration file, $SDC_CONF/sdc.properties, make sure the following Kerberos properties are configured:
    • kerberos.client.enabled
    • kerberos.client.principal
    • kerberos.client.keytab
  3. On the General tab of the stage, set the Stage Library property to Apache Kafka 0.9.0.0 or a later version.
  4. On the Kafka tab, add the security.protocol property and set it to SASL_SSL.
  5. Then, add the sasl.kerberos.service.name configuration property, and set it to the Kerberos principal name that Kafka runs as.
  6. Then, add the following SSL Kafka configuration properties:
    • ssl.truststore.location
    • ssl.truststore.password
    When the Kafka broker requires client authentication - when the ssl.client.auth broker property is set to "required" - add and configure the following properties:
    • ssl.keystore.location
    • ssl.keystore.password
    • ssl.key.password
    Some brokers might require adding the following properties as well:
    • ssl.enabled.protocols
    • ssl.truststore.type
    • ssl.keystore.type

    For details about these properties, see the Kafka documentation.

Configuring a UDP to Kafka Origin

Configure a UDP to Kafka origin to process UDP messages and write them directly to Kafka.

  1. In the Properties panel, on the General tab, configure the following properties:
    General Property Description
    Name Stage name.
    Description Optional description.
    Stage Library Library version that you want to use.
    On Record Error Error record handling for the stage:
    • Discard - Discards the record.
    • Send to Error - Sends the record to the pipeline for error handling.
    • Stop Pipeline - Stops the pipeline. Not valid for cluster pipelines.
  2. On the UDP tab, configure the following properties:
    UDP Property Description
    Port Port to listen to for data. To list additional ports, click the Add icon.
    Note: To listen to a port below 1024, Data Collector must be run by a user with root privileges. Otherwise, the operating system does not allow Data Collector to bind to the port.
    Data Format Data format passed by UDP:
    • collectd
    • NetFlow
    • syslog
  3. On the Kafka tab, configure the following properties:
    UDP Property Description
    Broker URI Connection string for the Kafka broker. Use the following format: <host>:<port>.

    To ensure a connection, enter a comma-separated list of additional broker URIs.

    Topic Kafka topic to read.
    Kafka Configuration

    Additional Kafka configuration properties to use. To add properties, click Add and define the Kafka property name and value.

    Use the property names and values as expected by Kafka.

    For information about enabling secure connections to Kafka, see Enabling Kafka Security.

  4. On the Advanced tab, configure the following properties:
    Advanced Property Description
    Enable UDP Multithreading On 64-bit Linux, specifies whether to use multiple receiver threads for each port. Using multiple receiver threads can improve performance.

    Because the multithreading requires native libraries, it is only available when Data Collector runs on 64-bit Linux.

    Accept Threads Number of receiver threads to use for each port. For example, if you configure two threads per port and configure the origin to use three ports, the origin uses a total of six threads.

    Set to the expected number of CPU cores of the Data Collector machine that are dedicated to the Data Collector.

    Default is 1.

    Write Concurrency Maximum number of Kafka clients that the origin can use to write to Kafka.

    When configuring this property, consider the number of Kafka brokers, partitions, and volume of data to be written.