SDC RPC to Kafka (Deprecated)
The SDC RPC to Kafka origin reads data from one or more SDC RPC destinations and writes it immediately to Kafka. Use the SDC RPC to Kafka origin in an SDC RPC destination pipeline. However, the SDC RPC to Kafka origin is now deprecated and will be removed in a future release. We recommend using the SDC RPC origin.
Use the SDC RPC to Kafka origin when you have multiple SDC RPC origin pipelines with data that you want to write to Kafka without additional processing.
Like the SDC RPC origin, the SDC RPC to Kafka origin reads data from an SDC RPC destination in another pipeline. However, the SDC RPC to Kafka origin is optimized to write data from multiple pipelines directly to Kafka. When you use this origin, you cannot perform additional processing before writing to Kafka.
Here is an example of the recommended architecture for using the SDC RPC to Kafka origin:
When you configure the SDC RPC to Kafka origin, you define the port that the origin listens to for data, the SDC RPC ID, the maximum number of concurrent requests, and maximum batch request size. You can also configure SSL/TLS properties, including default transport protocols and cipher suites.
You also need to configure connection information for Kafka, including the broker URI, topic to write to, and maximum message size. You can add Kafka configuration properties and enable Kafka security as needed.
For more information about SDC RPC pipelines, see SDC RPC Pipeline Overview.
Pipeline Configuration
When you use an SDC RPC to Kafka origin in a pipeline, connect the origin to a Trash destination.
The SDC RPC to Kafka origin writes records directly to Kafka. The origin does not pass records to its output port, so you cannot perform additional processing or write the data to other destination systems.
However, since a pipeline requires a destination, you should connect the origin to the Trash destination to satisfy pipeline validation requirements.
A pipeline with the SDC RPC to Kafka origin should look like this:
Concurrent Requests
You can specify the maximum number of requests the SDC RPC to Kafka origin handles at one time.
An SDC RPC destination in an origin pipeline sends a request to the SDC RPC to Kafka origin when it wants to pass a batch of data to the origin. If you have one origin pipeline passing data to the SDC RPC to Kafka origin, you can set the maximum number of concurrent requests to 1 because the destination processes one batch of data at a time.
Typically, you would have more than one pipeline passing data to this origin. In this case, you should assess the number of origin pipelines, the expected output of the pipelines, and the resources of the Data Collector machine, and then tune the property as needed to improve pipeline performance.
For example, if you have 100 origin pipelines passing data to the SDC RPC to Kafka origin, but the pipelines produce data slowly, you can set the maximum to 20 to prevent these pipelines from using too much of the Data Collector resources during spikes in volume. Or, if the Data Collector has no resource issues and you want it to process data as quickly as possible, you can set the maximum to 90 or 100. Note that the SDC RPC destination also has advanced properties for retry and back off periods that can be used help tune performance.
Batch Request Size, Kafka Message Size, and Kafka Configuration
Configure the SDC RPC to Kafka maximum batch request size and Kafka message size properties in relationship to each other and to the maximum message size configured in Kafka.
The Max Batch Request Size (MB) property determines the maximum size of the batch of data that the origin accepts from each SDC RPC destination. Upon receiving a batch of data, the origin immediately writes the data to Kafka.
To promote peak performance, the origin writes as many records as possible into a single Kafka message. The Kafka Max Message Size (KB) property determines the maximum size of the message that it creates.
For example, say the origin uses the default 100 MB for the maximum batch request size and the default 900 KB for the maximum message size, and Kafka uses the 1 MB default for message.max.bytes.
When the origin requests a batch of data, it receives up to 100 MB of data at a time. When the origin writes to Kafka it groups records into as few messages as possible, including up to 900 KB of records in each message. Since the message size is less than the Kafka 1 MB requirement, the origin successfully writes all messages to Kafka.
If a record is larger than the 900 KB maximum message size, the origin generates an error and does not write the record - or the batch that includes the record - to Kafka. The SDC RPC destination that provided the batch with the oversized record processes the batch based on stage error record handling.
Additional Kafka Properties
You can add custom Kafka configuration properties to the SDC RPC to Kafka origin.
When you add a Kafka configuration property, enter the exact property name and the value. The stage does not validate the property names or values.
Several properties are defined by default, you can edit or remove the properties as necessary.
- key.serializer.class
- metadata.broker.list
- partitioner.class
- producer.type
- serializer.class
Kafka Security
You can configure the SDC RPC to Kafka origin to connect securely to Kafka through SSL/TLS, Kerberos, or both. For more information about the methods and details on how to configure each method, see Security in Kafka Stages.
Configuring an SDC RPC to Kafka Origin
Configure an SDC RPC to Kafka origin to write data from multiple SDC RPC destinations directly to Kafka.