HTTP to Kafka (Deprecated)

The HTTP to Kafka origin listens on an HTTP endpoint and writes the contents of all authorized HTTP POST requests directly to Kafka. However, the HTTP to Kafka origin is now deprecated and will be removed in a future release. We recommend using the HTTP Server origin that can use multiple threads to enable parallel processing of data from multiple HTTP clients.

Use the HTTP to Kafka origin to write large volumes of HTTP POST requests immediately to Kafka without additional processing. To perform processing, you can create a separate pipeline with a Kafka Consumer origin that reads from the Kafka topic.

If you need to process data before writing it to Kafka or need to write to a destination system other than Kafka, use the HTTP Server origin.

You can configure multiple HTTP clients to send data to the HTTP to Kafka origin. Just complete the necessary prerequisites before you configure the origin. Here is an example of the architecture for using the HTTP to Kafka origin:

When you configure HTTP to Kafka, you specify the listening port, Kafka configuration information, maximum message size, and the application ID. You can also configure SSL/TLS properties, including default transport protocols and cipher suites.

You can add Kafka configuration properties and enable Kafka security as needed.
Tip: Data Collector provides several HTTP origins to address different needs. For a quick comparison chart to help you choose the right one, see Comparing HTTP Origins.

Prerequisites

Before you run a pipeline with the HTTP to Kafka origin, configure the following prerequisites:
Configure HTTP clients to send data to the HTTP to Kafka listening port
When you configure the origin, you define a listening port number where the origin listens for data.
To pass data to the pipeline, configure each HTTP client to send data to a URL that includes the listening port number.
Use the following format for the URL:
<http | https>://<sdc_hostname>:<listening_port>/
The URL includes the following components:
  • <http | https> - Use https for secure HTTP connections.
  • <sdc_hostname> - The Data Collector host name.
  • <listening_port> - The port number where the origin listens for data.
For example: https://localhost:8000/
Include the application ID in request headers
When you configure the origin, you define an application ID. All messages sent to the HTTP to Kafka origin must include the application ID in the request header.
Add the following information to the request header for all HTTP POST requests that you want the origin to process:
X-SDC-APPLICATION-ID: <applicationID>
For example:
X-SDC-APPLICATION-ID: sdc_http2kafka

Pipeline Configuration

When you use an HTTP to Kafka origin in a pipeline, connect the origin to a Trash destination.

The HTTP to Kafka origin writes records directly to Kafka. The origin does not pass records to its output port, so you cannot perform additional processing or write the data to other destination systems.

However, since a pipeline requires a destination, you should connect the origin to the Trash destination to satisfy pipeline validation requirements.

A pipeline with the HTTP to Kafka origin should look like this:

Kafka Maximum Message Size

Configure the Kafka maximum message size in the origin in relationship to the equivalent Kafka cluster property. The origin property should be equal to or less than the Kafka cluster property.

The HTTP to Kafka origin writes the contents of each HTTP POST request to Kafka as a single message. So the maximum message size configured in the origin determines the maximum size of the HTTP request and limits the size of messages written to Kafka.

To ensure all messages are written to Kafka, set the origin property to equal to or less than the Kafka cluster property. Attempts to write messages larger than the specified Kafka cluster property fail, returning an HTTP 500 error to the originating HTTP client.

For example, if the Kafka cluster allows a maximum message size of 2 MB, configure the Maximum Message Size property in the origin to 2 MB or less to avoid HTTP 500 errors for larger messages.

By default, the maximum message size in a Kafka cluster is 1 MB, as defined by the message.max.bytes property.

Kafka Security

You can configure the HTTP to Kafka origin to connect securely to Kafka through SSL/TLS, Kerberos, or both. For more information about the methods and details on how to configure each method, see Security in Kafka Stages.

Configuring an HTTP to Kafka Origin

Configure an HTTP to Kafka origin to write high volumes of HTTP POST requests directly to Kafka.

  1. In the Properties panel, on the General tab, configure the following properties:
    General Property Description
    Name Stage name.
    Description Optional description.
    Stage Library Library version that you want to use.
    On Record Error Error record handling for the stage:
    • Discard - Discards the record.
    • Send to Error - Sends the record to the pipeline for error handling.
    • Stop Pipeline - Stops the pipeline. Not valid for cluster pipelines.
  2. On the Kafka tab, configure the following properties:
    Kafka Property Description
    Broker URI Connection string for the Kafka broker. Use the following format: <host>:<port>.

    To ensure a connection, enter a comma-separated list of additional broker URIs.

    Topic Kafka topic to read.
    Max Message Size (KB) Maximum size of the message to be written to Kafka.

    To avoid HTTP 500 errors, configure this property to equal to or less than the equivalent Kafka cluster property.

    Kafka Configuration

    Additional Kafka configuration properties to use. Using simple or bulk edit mode, click the Add icon to add properties. Define the Kafka property name and value.

    Use the property names and values as expected by Kafka.

    For information about enabling secure connections to Kafka, see Kafka Security.

    Provide Keytab Enables providing credentials for Kerberos authentication. For more information, see Providing Kerberos Credentials.
    Note: Configuring Kerberos credentials in stage properties is not supported in cluster pipelines at this time.
    Keytab Keytab for Kerberos authentication. Enter a Base64-encoded keytab or a credential function that returns a Base64-encoded keytab.

    Be sure to remove unnecessary characters, such as newline characters, before encoding the keytab.

    Principal Principal for Kerberos authentication. Use the following format:
    <principal name>/<host name>@<realm>
  3. On the HTTP tab, configure the following properties:
    HTTP Property Description
    HTTP Listening Port Listening port for the HTTP to Kafka origin. The port number must be included in the URL that the HTTP client uses to pass data.

    For more information, see Prerequisites.

    Application ID Application ID used to pass requests to the HTTP to Kafka origin. The Application ID must be included in the header of the HTTP POST request.

    For more information, see Prerequisites.

    Max Concurrent Requests Maximum number of HTTP clients allowed to send messages to the origin at one time.

    If the origin reaches the configured maximum and receives additional requests from different clients, it processes those requests as slots become available.

    Application ID in URL Enables reading the application ID from the URL. Use when HTTP clients include the application ID in the URL query parameter instead of in the request header.
  4. To use SSL/TLS, click the TLS tab and configure the following properties:
    TLS Property Description
    Use TLS Enables the use of TLS.
    Use Remote Keystore Enables loading the contents of the keystore from a remote credential store or from values entered in the stage properties. For more information, see Remote Keystore and Truststore.
    Private Key Private key used in the remote keystore. Enter a credential function that returns the key or enter the contents of the key.
    Certificate Chain Each PEM certificate used in the remote keystore. Enter a credential function that returns the certificate or enter the contents of the certificate.

    Using simple or bulk edit mode, click the Add icon to add additional certificates.

    Keystore File Path to the local keystore file. Enter an absolute path to the file or a path relative to the Data Collector resources directory: $SDC_RESOURCES.

    For more information about environment variables, see Data Collector Environment Configuration.

    By default, no keystore is used.

    Keystore Type Type of keystore to use. Use one of the following types:
    • Java Keystore File (JKS)
    • PKCS #12 (p12 file)

    Default is Java Keystore File (JKS).

    Keystore Password Password to the keystore file. A password is optional, but recommended.
    Tip: To secure sensitive information such as passwords, you can use runtime resources or credential stores.
    Keystore Key Algorithm Algorithm to manage the keystore.

    Default is SunX509.

    Use Default Protocols Uses the default TLSv1.2 transport layer security (TLS) protocol. To use a different protocol, clear this option.
    Transport Protocols TLS protocols to use. To use a protocol other than the default TLSv1.2, click the Add icon and enter the protocol name. You can use simple or bulk edit mode to add protocols.
    Note: Older protocols are not as secure as TLSv1.2.
    Use Default Cipher Suites Uses a default cipher suite for the SSL/TLS handshake. To use a different cipher suite, clear this option.
    Cipher Suites Cipher suites to use. To use a cipher suite that is not a part of the default set, click the Add icon and enter the name of the cipher suite. You can use simple or bulk edit mode to add cipher suites.

    Enter the Java Secure Socket Extension (JSSE) name for the additional cipher suites that you want to use.