Pipeline Monitoring

Overview

You can monitor the health and performance of running pipelines.

When you monitor a running pipeline, you can view real-time statistics about the pipeline and each stage. You can also view the error records encountered by each stage. You can capture and review a snapshot of the data being processed.

You can view the run history of a pipeline when you configure or monitor a pipeline. You can view log data when you preview or monitor a pipeline.

If you configured rules and alerts for a pipeline, then you can also view triggered alerts as you monitor the pipeline.

Viewing Pipeline and Stage Statistics

When you monitor a running pipeline, you can view real-time statistics for the pipeline and for stages in the pipeline.

  1. Open an actively running pipeline in the pipeline canvas.
    1. In the left navigation pane, click the Pipelines icon: .
    2. Click the name of the active pipeline that you want to monitor.
  2. Click the Monitoring tab.
    StreamSets Cloud displays statistics about the entire pipeline, including the record count for the pipeline, a summary of the record count by stage, record and batch throughput, and batch processing statistics.
    Note: The record and batch throughput graphs are calculated using an exponential moving average, weighing more heavily toward the most recent values and exponentially reducing the effect of old data.

    For example, the following image shows the monitoring information displayed for a sample pipeline:

  3. To view exact numbers in any of the charts, hover over the charts.
  4. To view statistics about a particular stage, select the stage in the pipeline canvas.

    StreamSets Cloud displays statistics about the selected stage, including record and batch throughput and batch processing statistics.

Monitoring Stage Errors

When you monitor a pipeline, you can view a sampling of the error records. You can also view error statistics for each stage.

  1. Open an actively running pipeline in the pipeline canvas.
    1. In the left navigation pane, click the Pipelines icon: .
    2. Click the name of the active pipeline that you want to monitor.
  2. Select a stage in the pipeline canvas that has encountered errors.
  3. Click the Error tab.

    StreamSets Cloud displays the Error Records tab by default, which includes a sample of error records with related error messages, as well as the count and an error histogram.

    You can expand and review the data in each error record. If the error was produced by an exception, you can click View Stack Trace to view the full stack trace.

    For example, the following image shows the error records encountered by a Field Splitter processor:

  4. To view stage errors as well as an error count and histogram, click the Stage Errors tab.

    Stage errors are operational errors, such as an origin being unable to create a record because of invalid source data.

Snapshots

A snapshot is a set of data captured as it moves through a running pipeline.

You can capture and view snapshots when you monitor a pipeline.

View a snapshot to verify how a pipeline processes data. You can view how snapshot data moves through a pipeline stage by stage or across multiple stages, just as you can when you preview a pipeline. You can drill down to review the values of each record to determine if the stage or group of stages transforms data as expected.

Unlike preview, you cannot edit data to perform testing when you review a snapshot.

Failure Snapshot

A failure snapshot is a partial snapshot that occurs automatically when the pipeline stops due to unexpected data. You can view the failure snapshot to troubleshoot the problem.

A failure snapshot captures the pipeline data that was in memory when the problem occurred. As a result, the snapshot includes the data that caused the problem and might include other unrelated data, but does not include data in each stage like a full snapshot.

Pipelines generate the failure snapshot by default. You can configure pipelines to skip generating the failure snapshot by clearing the Create Failure Snapshot pipeline property on the General tab.

Capturing and Viewing a Snapshot

You can capture a snapshot of data when you monitor a pipeline.

After you capture a snapshot, you can view the snapshot data stage by stage or through a group of stages, just as you can when you preview a pipeline.

  1. Open an actively running pipeline in the pipeline canvas.
    1. In the left navigation pane, click the Pipelines icon: .
    2. Click the name of an active pipeline.
  2. Above the pipeline canvas, click the Snapshot icon: .
  3. In the Snapshot dialog box, click Capture Snapshot to capture a set of data.
    StreamSets Cloud captures a snapshot of the next batch that passes through the pipeline and displays it in the list.
  4. To view a snapshot, hover over the Actions column for the snapshot that you want to use, click , and then click View.
    The canvas highlights the origin stage of the pipeline. The monitor panel displays snapshot data in the Output Data column. Since this is the origin of the pipeline, no input data displays.
  5. To view data for a different stage, select the stage in the pipeline canvas.
  6. To view the snapshot for multiple stages, click Multiple.
    By default, the first and last stages of the pipeline are selected in the pipeline canvas. The monitor panel displays the output data of the first stage in the group and the input data of the last stage in the group.
    1. To select a different stage as the first stage, select the first stage highlighted in green, and then select another stage.
    2. To select a different stage as the last stage, select the last stage highlighted in red, and then select another stage.
  7. To exit the snapshot review, click the Close Snapshot icon: .

Deleting a Snapshot

StreamSets Cloud retains all snapshots for the current pipeline run. You can delete snapshots for the current pipeline run when they are no longer needed.

When you stop the pipeline, StreamSets Cloud deletes all snapshots captured for the current pipeline run. When a snapshot is deleted, the information is irrevocably removed. You cannot retrieve a deleted snapshot.

  1. Open an actively running pipeline in the pipeline canvas.
    1. In the left navigation pane, click the Pipelines icon: .
    2. Click the name of an active pipeline.
  2. Above the pipeline canvas, click the Snapshot icon: .
    The Snapshot dialog box displays all available snapshots for the current pipeline run.
  3. Hover over the Actions column for the snapshot that you want to delete, click , and then click Delete.

Viewing Pipeline Run History

You can view the run history of a pipeline when you configure or monitor the pipeline.

Run history shows the following information for each run of the pipeline:
  • Run count
  • Date and time that the run started
  • Duration of the run in hours, minutes, and seconds
  • Error, input, and output record count for the run
  1. Open the pipeline in the pipeline canvas.
    1. In the left navigation pane, click the Pipelines icon: .
    2. Click the name of the pipeline that you want to view history for.
  2. In the toolbar above the pipeline canvas, click the History icon: .

    The Run History dialog box appears.

    For example, the following image shows a sample pipeline run history:

    Each record count displays rounded values. You can display exact record count values in tooltips.

    For example, the following image shows a tooltip with the exact value for the input records in the first pipeline run:

  3. Click OK when you've finished viewing the history.

Pipeline Logs

StreamSets Cloud generates log data when you preview a pipeline or run a pipeline.

Each log entry includes a timestamp and message along with additional information relevant for the message. The log can contain informational, warning, and error messages. View the logs to help with troubleshooting.

You can view the following types of logs:

Preview log
The preview log contains messages generated when you preview pipelines.
StreamSets Cloud generates a single preview log used for all of your previews. After a period of inactivity, the preview log is cleared.
To view a preview log, click the Preview Log tab in the properties pane as you preview a pipeline.
Run log
The run log contains messages generated when you run a pipeline.
StreamSets Cloud generates a separate run log for each run of the pipeline. The run log is viewable only while the pipeline run is active or has recently finished. When a pipeline run encounters an error and fails, the pipeline run itself remains active so that you can view the run log and troubleshoot issues. When you stop the pipeline run, the log is no longer accessible.
To view a run log, click the Log tab in the properties pane as you monitor a pipeline run.

The following image displays a sample run log:

Error Messages

When the log includes an error message, you can view the exception encountered with the error.

The log highlights error messages in red and displays an expand icon () next to the error message, as follows:

When you expand the icon, the log displays the exception encountered with the error, as follows: