UDP to Kafka (Deprecated)

The UDP to Kafka origin reads messages from one or more UDP ports and writes each message directly to Kafka. However, the UDP to Kafka origin is now deprecated and will be removed in a future release. We recommend using the UDP Multithreaded Source origin that can use multiple threads to enable parallel processing of data from multiple UDP ports.

Use the UDP to Kafka origin to read large volumes of data from multiple UDP ports and write the data immediately to Kafka, without additional processing.

Here is an example of the recommended architecture for using the UDP to Kafka origin:

If you need to process data before writing it to Kafka, need to write to a destination system other than Kafka, or if the origin does not need to process high volumes of data, use the UDP Source origin.

UDP to Kafka can process collectd messages, NetFlow 5 and NetFlow 9 messages, and the following types of syslog messages:
  • RFC 5424
  • RFC 3164
  • Non-standard common messages, such as RFC 3339 dates with no version digit

When processing NetFlow messages, the stage generates different records based on the NetFlow version. When processing NetFlow 9, the records are generated based on the NetFlow 9 configuration properties. For more information, see NetFlow Data Processing.

When you configure UDP to Kafka, you specify the UDP ports to use, Kafka configuration information, and advanced properties such as the maximum number of write requests.

You can add Kafka configuration properties and enable Kafka security as needed.

Pipeline Configuration

When you use a UDP to Kafka origin in a pipeline, connect the origin to a Trash destination.

The UDP to Kafka origin writes records directly to Kafka. The origin does not pass records to its output port, so you cannot perform additional processing or write the data to other destination systems.

However, since a pipeline requires a destination, you should connect the origin to the Trash destination to satisfy pipeline validation requirements.

A pipeline with the UDP to Kafka origin should look like this:

Additional Kafka Properties

You can add custom Kafka configuration properties to the UDP to Kafka origin.

When you add a Kafka configuration property, enter the exact property name and the value. The stage does not validate the property names or values.

Several properties are defined by default, you can edit or remove the properties as necessary.

Note: Because the stage uses several configuration properties, it ignores user-defined values for the following properties:
  • key.serializer.class
  • metadata.broker.list
  • partitioner.class
  • producer.type
  • serializer.class

Kafka Security

You can configure the UDP to Kafka origin to connect securely to Kafka through SSL/TLS, Kerberos, or both. For more information about the methods and details on how to configure each method, see Security in Kafka Stages.

Configuring a UDP to Kafka Origin

Configure a UDP to Kafka origin to process UDP messages and write them directly to Kafka.

  1. In the Properties panel, on the General tab, configure the following properties:
    General Property Description
    Name Stage name.
    Description Optional description.
    Stage Library Library version that you want to use.
    On Record Error Error record handling for the stage:
    • Discard - Discards the record.
    • Send to Error - Sends the record to the pipeline for error handling.
    • Stop Pipeline - Stops the pipeline. Not valid for cluster pipelines.
  2. On the Kafka tab, configure the following properties:
    UDP Property Description
    Connection Existing connection that defines the information required to connect to an external system.

    To connect to an external system, you can select an existing connection that contains the details, or you can directly enter the details in the pipeline. When you select an existing connection, Pipeline Designer hides other properties so that you cannot directly enter connection details in the pipeline.

    Broker URI Comma-separated list of connection strings for the Kafka brokers. Use the following format for each broker: <host>:<port>.

    To ensure a pipeline can connect to Kafka in case a specified broker goes down, list as many brokers as possible.

    Topic Kafka topic to read.
    Kafka Configuration

    Additional Kafka configuration properties to use. Using simple or bulk edit mode, click the Add icon to add properties. Define the Kafka property name and value.

    Use the property names and values as expected by Kafka.

    Kafka Message Key Passes message key values stored in a record header attribute to Kafka as message keys.

    Enter an expression that specifies the attribute where the message keys are stored.

    To pass string message keys stored in an attribute, use:
    ${record:attribute('<message key attribute name>'}
    To pass Avro message keys stored in an attribute, use:
    ${avro:decode(record:attribute('avroKeySchema'),base64:decodeBytes(record:attribute('<messsage key attribute name')))}

    For more information, about working with Kafka message keys, see Kafka Message Keys.

  3. On the Security tab, configure the security properties to enable the stage to securely connect to Kafka.

    For information about the security options and additional steps required to enable security, see Security in Kafka Stages.

  4. On the UDP tab, configure the following properties:
    UDP Property Description
    Port Port to listen to for data. Using simple or bulk edit mode, click the Add icon to list additional ports.

    To listen to a port below 1024, Data Collector must be run by a user with root privileges. Otherwise, the operating system does not allow Data Collector to bind to the port.

    Note: No other pipelines or processes can already be bound to the listening port. The listening port can be used only by a single pipeline.
    Data Format Data format passed by UDP:
    • collectd
    • NetFlow
    • syslog
  5. On the Advanced tab, configure the following properties:
    Advanced Property Description
    Enable UDP Multithreading Specifies whether to use multiple receiver threads for each port. Using multiple receiver threads can improve performance.

    You can use multiple receiver threads using epoll, which can be available when Data Collector runs on recent versions of 64-bit Linux.

    Accept Threads Number of receiver threads to use for each port. For example, if you configure two threads per port and configure the origin to use three ports, the origin uses a total of six threads.

    Use to increase the number of threads passing data to the origin when epoll is available on the Data Collector machine.

    Default is 1.

    Write Concurrency Maximum number of Kafka clients that the origin can use to write to Kafka.

    When configuring this property, consider the number of Kafka brokers, partitions, and volume of data to be written.

  6. For NetFlow 9 data, on the NetFlow 9 tab, configure the following properties:
    When processing earlier versions of NetFlow data, these properties are ignored.
    Netflow 9 Property Description
    Record Generation Mode Determines the type of values to include in the record. Select one of the following options:
    • Raw Only
    • Interpreted Only
    • Both Raw and Interpreted
    Max Templates in Cache The maximum number of templates to store in the template cache. For more information about templates, see Caching NetFlow 9 Templates.

    Default is -1 for an unlimited cache size.

    Template Cache Timeout (ms) The maximum number of milliseconds to cache an idle template. Templates unused for more than the specified time are evicted from the cache. For more information about templates, see Caching NetFlow 9 Templates.

    Default is -1 for caching templates indefinitely.