Pipeline Monitoring

Overview

When Transformer runs a pipeline, you can view real-time statistics about the pipeline.

When you view a running pipeline, the Transformer UI displays the pipeline in Monitor mode. In Monitor mode, you can perform the following tasks:
  • View real-time pipeline and stage statistics.
  • Pause and then continue the monitoring.
  • Access the Spark web UI for the launched application.
  • View the pipeline run history.

Pipeline and Stage Statistics

When you monitor a pipeline, you can view real-time summary statistics for the pipeline and for stages in the pipeline.

In Monitor mode, the pipeline canvas displays a Running icon on the stages that are currently processing data. For example, the following image shows that both Orders and Store Details origins are currently processing data:

The Monitoring panel below the pipeline canvas displays statistics on the Summary tab. By default, the Monitoring panel displays pipeline statistics. Pipeline statistics include the record count for the pipeline, record and batch throughput, runtime statistics, and batch processing statistics. For a pipeline started with runtime parameters, pipeline statistics also display the parameter values that the pipeline is currently using.

Select a stage in the pipeline canvas to view statistics about the stage. Stage statistics include record and batch throughput and batch processing statistics.

Tip: You can hover over different parts of the charts to view exact numbers.

The following image shows some of the pipeline statistics available on the Summary tab:

Pause Monitoring

When you view a running pipeline, the Monitoring panel updates the statistics on the Summary tab in real time. To analyze the current statistics, you can temporarily pause and then continue the monitoring.

To pause the monitoring of a running pipeline, click the More icon () in the toolbar and then click Pause Monitoring. The pipeline continues to run, but the Monitoring panel stops updating the statistics.

To continue the monitoring, click the More icon and then click Continue Monitoring.

Spark Web UI

As you monitor a pipeline, you can also access the Spark web UI for the application launched for the pipeline. Use the Spark web UI to monitor the Spark jobs executed for the launched application, just as you monitor any other Spark application.

To access the Spark web UI for the current pipeline run, click the URL under Runtime Statistics in the Summary tab, as follows:

The Spark web UI lists the following information for the application:
  • Completed jobs and stages
  • Memory usage
  • Environment information
  • Running executors

Note that when Spark runs locally, Spark uses a single executor to run the pipeline. When Spark is deployed on a cluster, Spark by default uses as many executors as required to run the pipeline. To limit executor usage in the cluster, configure the spark.executor.instances property as described in Performance Tuning Properties.

For more information about using the Spark web UI to monitor Spark applications, see the Spark documentation.

Pipeline Run History

You can view the run history of a pipeline when you configure or monitor a pipeline. View the run history from either the Summary or History tab.

Summary Tab

To view pipeline run history from the Summary tab, select the pipeline run that you want to view from the list in the toolbar.

For example, the following image shows the Summary tab for the Revenue pipeline that has been run twice. To view the history of the run, you can select either pipeline run from the list in the toolbar:

When you view a previous run, the Summary tab displays a table of Spark application details at the top, including the application ID and name, the start and completion time, and the status of the run. The Summary tab also displays the most recent statistics captured for the pipeline.

Just as when you monitor a currently running pipeline, you can click the URL to access the Spark web UI. When you access the URL for a stopped pipeline, you’ll be taken to the Spark history server, as described in the Spark documentation.

History Tab

The History tab displays both the pipeline run history and the pipeline state history.

When you view the run history, the tab displays Spark application details for each pipeline run, including the application ID and name, the start and completion time, and the status of the run.

In the Summary column, click Metrics to view a summary of the completed run. The metrics summary includes the input and output count for the pipeline and for each stage. It also includes the pipeline stop and start time and the time that the last record was received.

In the Summary column, you can also click Spark UI to access the Spark web UI for the application. When you access the URL for a stopped pipeline, you’ll be taken to the Spark history server, as described in the Spark documentation.

The following image shows a sample run history:

When you view the state history, the tab displays the following:
  • Each time that the pipeline status changed
  • The changed pipeline status
  • Related messages
  • Parameter values used by the pipeline
  • Access to each run summary

The following image shows a sample state history: