Error Record Handling

You can configure error record handling at a stage level and at a pipeline level.

When an error occurs as a stage processes a record, Data Collector handles the record based on the stage configuration. One of the stage options is to pass the record to the pipeline for error handling. For this option, Data Collector processes the record based on the pipeline error record handling configuration.

When you configure a pipeline, be aware that stage error handling takes precedence over pipeline error handling. That is, a pipeline might be configured to write error records to file, but if a stage is configured to discard error records those records are discarded. You might use this functionality to reduce the types of error records that are saved for review and reprocessing.

Note that records missing required fields do not enter the stage. They are passed directly to the pipeline for error handling.

Pipeline Error Record Handling

Pipeline error record handling determines how Data Collector processes error records that stages send to the pipeline for error handling. It also handles records deliberately dropped from the pipeline such as records without required fields.

The pipeline provides the following error record handling options:
Discard
The pipeline discards the record. Data Collector includes the records in error record counts and metrics.
Write to Another Pipeline
The pipeline writes error records to an SDC RPC pipeline. Data Collector includes the records in error record counts and metrics.
When you write to another pipeline, Data Collector effectively creates an SDC RPC origin pipeline to pass the error records to another pipeline.
You need to create an SDC RPC destination pipeline to process the error records. The pipeline must include an SDC RPC origin configured to read error records from this pipeline.
For more information about SDC RPC pipelines, see SDC RPC Pipeline Overview.
Write to Elasticsearch
The pipeline writes error records and related details to Elasticsearch. Data Collector includes the records in error record counts and metrics.
You define the configuration properties for the Elasticsearch cluster to use.
Write to File
The pipeline writes error records and related details to a local directory. Data Collector includes the records in error record counts and metrics.
You define the directory to use and the maximum file size. Error files are named based on the File Prefix pipeline property.
Write to file is not supported for cluster pipelines at this time.
Write to Kafka
The pipeline writes error records and related details to Kafka. Data Collector includes the records in error record counts and metrics.
You define the configuration properties for the Kafka cluster to use.
Write to MapR Streams
The pipeline writes error records and related details to MapR Streams. Data Collector includes the records in error record counts and metrics.
You define the configuration properties for the MapR Streams to use.

When Data Collector encounters an unexpected error, it stops the pipeline and logs the error.

Stage Error Record Handling

Most stages include error record handling options. When an error occurs when processing a record, Data Collector processes records based on the On Record Error property for the stage.

Stages include the following error handling options:
Discard
The stage silently discards the record. Data Collector does not log information about the error or note the specific record that encountered an error. The discarded record is not included in Monitor mode error record counts or metrics.
Send to Error
The stage sends the record to the pipeline for error handling. The pipeline processes the record based on the pipeline error handling configuration.
When you monitor the pipeline, you can view the most recent error records and the issues they encountered on the Error Records tab for the stage. This information becomes unavailable after you stop the pipeline.
Stop Pipeline
Data Collector stops the pipeline and logs information about the error. The error that stopped the pipeline displays as an alert in Monitor mode and as an error in the pipeline history.
Stop pipeline is not supported for cluster mode pipelines at this time.

Example

A Kafka Consumer origin stage reads JSON data with a maximum object length of 4096 characters and the stage encounters an object with 5000 characters. Based on the stage configuration, Data Collector either discards the record, stops the pipeline, or passes the record to the pipeline for error record handling.

When the stage is configured to send the record to the pipeline, one of the following occurs based on how you configure the pipeline error handling:
  • When the pipeline discards error records, Data Collector discards the record without noting the action or the cause.

    When you monitor the pipeline, you can view the most recent set of error records and information about the errors on the Error Records tab for the stage. But this information becomes unavailable after you stop the pipeline.

  • When the pipeline writes error records to a destination, Data Collector writes the error record and additional error information to the destination. It also includes the error records in monitor counts and metrics.