HTTP to Kafka

The HTTP to Kafka origin listens on an HTTP endpoint and writes the contents of all authorized HTTP POST requests directly to Kafka.

Use the HTTP to Kafka origin to write large volumes of HTTP POST requests immediately to Kafka without additional processing. To perform processing, you can create a separate pipeline with a Kafka Consumer origin that reads from the Kafka topic.

If you need to process data before writing it to Kafka or need to write to a destination system other than Kafka, use the HTTP Server origin.

You can configure multiple HTTP clients to send data to the HTTP to Kafka origin. Just complete the necessary prerequisites before you configure the origin. Here is an example of the architecture for using the HTTP to Kafka origin:

When you configure HTTP to Kafka, you specify the listening port, Kafka configuration information, maximum message size, and the application ID. You can also configure SSL/TLS properties, including default transport protocols and cipher suites.

You can add Kafka configuration properties and enable Kafka security as needed.
Tip: Data Collector provides several HTTP origins to address different needs. For a quick comparison chart to help you choose the right one, see Comparing HTTP Origins.

Prerequisites

Before you run a pipeline with the HTTP to Kafka origin, configure the following prerequisites:
Configure HTTP clients to send data to the HTTP to Kafka listening port
When you configure the origin, you define a listening port number where the origin listens for data.
To pass data to the pipeline, configure each HTTP client to send data to a URL that includes the listening port number.
Use the following format for the URL:
<http | https>://<sdc_hostname>:<listening_port>/
The URL includes the following components:
  • <http | https> - Use https for secure HTTP connections.
  • <sdc_hostname> - The Data Collector host name.
  • <listening_port> - The port number where the origin listens for data.
For example: https://localhost:8000/
Include the application ID in request headers
When you configure the origin, you define an application ID. All messages sent to the HTTP to Kafka origin must include the application ID in the request header.
Add the following information to the request header for all HTTP POST requests that you want the origin to process:
X-SDC-APPLICATION-ID: <applicationID>
For example:
X-SDC-APPLICATION-ID: sdc_http2kafka

Pipeline Configuration

When you use an HTTP to Kafka origin in a pipeline, connect the origin to a Trash destination.

The HTTP to Kafka origin writes records directly to Kafka. The origin does not pass records to its output port, so you cannot perform additional processing or write the data to other destination systems.

However, since a pipeline requires a destination, you should connect the origin to the Trash destination to satisfy pipeline validation requirements.

A pipeline with the HTTP to Kafka origin should look like this:

Kafka Maximum Message Size

Configure the Kafka maximum message size in the origin in relationship to the equivalent Kafka cluster property. The origin property should be equal to or less than the Kafka cluster property.

The HTTP to Kafka origin writes the contents of each HTTP POST request to Kafka as a single message. So the maximum message size configured in the origin determines the maximum size of the HTTP request and limits the size of messages written to Kafka.

To ensure all messages are written to Kafka, set the origin property to equal to or less than the Kafka cluster property. Attempts to write messages larger than the specified Kafka cluster property fail, returning an HTTP 500 error to the originating HTTP client.

For example, if the Kafka cluster allows a maximum message size of 2 MB, configure the Maximum Message Size property in the origin to 2 MB or less to avoid HTTP 500 errors for larger messages.

By default, the maximum message size in a Kafka cluster is 1 MB, as defined by the message.max.bytes property.

Enabling Kafka Security

When using Kafka version 0.9.0.0 or later, you can configure the HTTP to Kafka origin to connect securely through SSL/TLS, Kerberos, or both.

These versions provide features to support secure connections through SSL/TLS or Kerberos (SASL). The Kafka community considers these features beta quality.

Earlier versions of Kafka do not support security.

Enabling SSL/TLS

Perform the following steps to enable the HTTP to Kafka origin to use SSL/TLS to connect to Kafka version 0.9.0.0 or later.

  1. To use SSL/TLS to connect, first make sure Kafka is configured for SSL/TLS as described in the Kafka documentation: http://kafka.apache.org/documentation.html#security_ssl.
  2. On the General tab of the stage, set the Stage Library property to Apache Kafka 0.9.0.0 or a later version.
  3. On the Kafka tab, add the security.protocol Kafka configuration property and set it to SSL.
  4. Then, add the following SSL Kafka configuration properties:
    • ssl.truststore.location
    • ssl.truststore.password
    When the Kafka broker requires client authentication - when the ssl.client.auth broker property is set to "required" - add and configure the following properties:
    • ssl.keystore.location
    • ssl.keystore.password
    • ssl.key.password
    Some brokers might require adding the following properties as well:
    • ssl.enabled.protocols
    • ssl.truststore.type
    • ssl.keystore.type

    For details about these properties, see the Kafka documentation.

For example, the following properties allow the stage to use SSL/TLS to connect to Kafka 0.0.9.0 with client authentication:

Enabling Kerberos (SASL)

When you use Kerberos authentication, Data Collector uses the Kerberos principal and keytab to connect to Kafka version 0.9.0.0 or later. Perform the following steps to enable the HTTP to Kafka origin to use Kerberos to connect to Kafka.

  1. To use Kerberos, first make sure Kafka is configured for Kerberos as described in the Kafka documentation: http://kafka.apache.org/documentation.html#security_sasl.
  2. In the Data Collector configuration file, $SDC_CONF/sdc.properties, make sure the following Kerberos properties are configured:
    • kerberos.client.enabled
    • kerberos.client.principal
    • kerberos.client.keytab
  3. On the General tab of the stage, set the Stage Library property to Apache Kafka 0.9.0.0 or a later version.
  4. On the Kafka tab, add the security.protocol Kafka configuration property, and set it to SASL_PLAINTEXT.
  5. Then, add the sasl.kerberos.service.name configuration property, and set it to the Kerberos principal name that Kafka runs as.

For example, the following Kafka properties enable connecting to Kafka 0.0.9.0 with Kerberos:

Enabling SSL/TLS and Kerberos

You can enable the HTTP to Kafka origin to use SSL/TLS and Kerberos to connect to Kafka version 0.9.0.0 or later.

To use SSL/TLS and Kerberos, combine the steps required to enable each and set the security.protocol property as follows:
  1. Make sure Kafka is configured to use SSL/TLS and Kerberos (SASL) as described in the following Kafka documentation:
  2. In the Data Collector configuration file, $SDC_CONF/sdc.properties, make sure the following Kerberos properties are configured:
    • kerberos.client.enabled
    • kerberos.client.principal
    • kerberos.client.keytab
  3. On the General tab of the stage, set the Stage Library property to Apache Kafka 0.9.0.0 or a later version.
  4. On the Kafka tab, add the security.protocol property and set it to SASL_SSL.
  5. Then, add the sasl.kerberos.service.name configuration property, and set it to the Kerberos principal name that Kafka runs as.
  6. Then, add the following SSL Kafka configuration properties:
    • ssl.truststore.location
    • ssl.truststore.password
    When the Kafka broker requires client authentication - when the ssl.client.auth broker property is set to "required" - add and configure the following properties:
    • ssl.keystore.location
    • ssl.keystore.password
    • ssl.key.password
    Some brokers might require adding the following properties as well:
    • ssl.enabled.protocols
    • ssl.truststore.type
    • ssl.keystore.type

    For details about these properties, see the Kafka documentation.

Configuring an HTTP to Kafka Origin

Configure an HTTP to Kafka origin to write high volumes of HTTP POST requests directly to Kafka.

  1. In the Properties panel, on the General tab, configure the following properties:
    General Property Description
    Name Stage name.
    Description Optional description.
    Stage Library Library version that you want to use.
    On Record Error Error record handling for the stage:
    • Discard - Discards the record.
    • Send to Error - Sends the record to the pipeline for error handling.
    • Stop Pipeline - Stops the pipeline. Not valid for cluster pipelines.
  2. On the Kafka tab, configure the following properties:
    Kafka Property Description
    Broker URI Connection string for the Kafka broker. Use the following format: <host>:<port>.

    To ensure a connection, enter a comma-separated list of additional broker URIs.

    Topic Kafka topic to read.
    Max Message Size (KB) Maximum size of the message to be written to Kafka.

    To avoid HTTP 500 errors, configure this property to equal to or less than the equivalent Kafka cluster property.

    Kafka Configuration

    Additional Kafka configuration properties to use. To add properties, click Add and define the Kafka property name and value.

    Use the property names and values as expected by Kafka.

    For information about enabling secure connections to Kafka, see Enabling Security.

  3. On the HTTP tab, configure the following properties:
    HTTP Property Description
    HTTP Listening Port Listening port for the HTTP to Kafka origin. The port number must be included in the URL that the HTTP client uses to pass data.

    For more information, see Prerequisites.

    Max Concurrent Requests Maximum number of HTTP clients allowed to send messages to the origin at one time.

    If the origin reaches the configured maximum and receives additional requests from different clients, it processes those requests as slots become available.

    Application ID Application ID used to pass requests to the HTTP to Kafka origin. The Application ID must be included in the header of the HTTP POST request.

    For more information, see Prerequisites.

    Application ID in URL Enables reading the application ID from the URL. Use when HTTP clients include the application ID in the URL query parameter instead of in the request header.
  4. To use SSL/TLS, click the TLS tab and configure the following properties:
    TLS Property Description
    Enable TLS

    Enables the use of TLS.

    Keystore File The path to the keystore file. Enter an absolute path to the file or a path relative to the Data Collector resources directory: $SDC_RESOURCES.

    For more information about environment variables, see Data Collector Environment Configuration.

    By default, no keystore is used.

    Keystore Type Type of keystore to use. Use one of the following types:
    • Java Keystore File (JKS)
    • PKCS-12 (p12 file)

    Default is Java Keystore File (JKS).

    Keystore Password Password to the keystore file. A password is optional, but recommended.
    Tip: To secure sensitive information such as passwords, you can use runtime resources or Hashicorp Vault secrets. For more information, see Using Runtime Resources or Accessing Hashicorp Vault Secrets.
    Keystore Key Algorithm The algorithm used to manage the keystore.

    Default is SunX509.

    Use Default Protocols Determines the transport layer security (TLS) protocol to use. The default protocol is TLSv1.2. To use a different protocol, clear this option.
    Transport Protocols The TLS protocols to use. To use a protocol other than the default TLSv1.2, click the Add icon and enter the protocol name.
    Note: Older protocols are not as secure as TLSv1.2.
    Use Default Cipher Suites Determines the cipher suite to use when performing the SSL/TLS handshake.

    Data Collector provides a set of cipher suites that it can use by default. For a full list, see Cipher Suites.

    Cipher Suites Cipher suites to use. To use a cipher suite that is not a part of the default set, click the Add icon and enter the name of the cipher suite.

    Enter the Java Secure Socket Extension (JSSE) name for the additional cipher suites that you want to use.