Pipeline Maintenance

Pipeline Maintenance Overview

After creating and configuring a pipeline, you run the pipeline to start the flow of data from the origin to destination systems. Each pipeline runs until you stop the pipeline.

You can view the run history of a pipeline when you configure or monitor the pipeline. You can view log data generated when you preview a pipeline or run a pipeline.

You can export a pipeline and then import that pipeline into StreamSets Cloud. You might want to export a pipeline to create a backup or to share the pipeline with another user.

You can delete a pipeline when you no longer need the pipeline.

StreamSets Cloud charges for usage based on pipeline hours. You can view your pipeline hour usage for a specific time period.

Running a Pipeline

When you run a pipeline, you start the flow of data from the origin to destination systems. Each pipeline runs until you stop the pipeline.

A running pipeline is read-only. You must stop the pipeline to edit the pipeline.

When you stop a pipeline, most origins maintain the offset to note where they stop reading. When you run the pipeline again, you configure the pipeline run to start from an offset saved with a previous run or to start from the beginning to read all available data.

  1. Open the pipeline in the pipeline canvas.
    1. In the left navigation pane, click the Pipelines icon: .
    2. Click the name of the pipeline that you want to run.
  2. In the toolbar above the pipeline canvas, click the Run icon: .
  3. In the Run Configuration dialog box, select the offset that you want the origin to start reading from.

    For the initial run, you can start from the beginning only.

  4. Click Run Pipeline.

    It takes a few minutes for the pipeline to be deployed and started.

Viewing Pipeline Run History

You can view the run history of a pipeline when you configure or monitor the pipeline.

Run history shows the following information for each run of the pipeline:
  • Run count
  • Date and time that the run started
  • Duration of the run in hours, minutes, and seconds
  • Error, input, and output record count for the run
  1. Open the pipeline in the pipeline canvas.
    1. In the left navigation pane, click the Pipelines icon: .
    2. Click the name of the pipeline that you want to view history for.
  2. In the toolbar above the pipeline canvas, click the History icon: .

    The Run History dialog box appears.

    For example, the following image shows a sample pipeline run history:

    Each record count displays rounded values. You can display exact record count values in tooltips.

    For example, the following image shows a tooltip with the exact value for the input records in the first pipeline run:

  3. Click OK when you've finished viewing the history.

Pipeline Logs

StreamSets Cloud generates log data when you preview a pipeline or run a pipeline.

Each log entry includes a timestamp and message along with additional information relevant for the message. The log can contain informational, warning, and error messages. View the logs to help with troubleshooting.

You can view the following types of logs:

Preview log
The preview log contains messages generated when you preview pipelines.
StreamSets Cloud generates a single preview log used for all of your previews. After a period of inactivity, the preview log is cleared.
To view a preview log, click the Preview Log tab in the properties pane as you preview a pipeline.
Run log
The run log contains messages generated when you run a pipeline.
StreamSets Cloud generates a separate run log for each run of the pipeline. The run log is viewable only while the pipeline run is active. When a pipeline run encounters an error and fails, the pipeline run itself remains active so that you can view the run log and troubleshoot issues. When you stop the pipeline run, the log is no longer accessible.
To view a run log, click the Log tab in the properties pane as you monitor a pipeline run.

The following image displays a sample run log:

Stopping a Pipeline

Stop a pipeline when you want to stop processing data for the pipeline.

You also must stop a pipeline to edit the pipeline. A running pipeline is read-only.

  1. Open the pipeline in the pipeline canvas.
    1. In the left navigation pane, click the Pipelines icon: .
    2. Click the name of the pipeline that you want to stop.
  2. In the toolbar above the pipeline canvas, click the Stop icon: .
    When the pipeline successfully stops, you can edit the pipeline as needed.

Exporting a Pipeline

Export a pipeline to create a backup or to share the pipeline with another StreamSets Cloud user.

When you export a pipeline, StreamSets Cloud generates a JSON file named after the pipeline, as follows: <pipeline name>.json.

When pipeline stages contain secrets such as user names and passwords, the secret values are not exported.

  1. In the left navigation pane, click the Pipelines icon: .
  2. Hover over the pipeline that you want to export, click , and then click Export Pipeline.

    StreamSets Cloud exports the pipeline to your default downloads directory.

Importing a Pipeline

Import a pipeline to restore a backup file or to use a pipeline shared by another StreamSets Cloud user.

You can import a pipeline JSON file that has been exported from StreamSets Cloud. You cannot import a pipeline that has been exported from another StreamSets product.

When you import a pipeline, StreamSets Cloud uses the name of the JSON file as the pipeline name. If an existing pipeline uses the same name, you cannot import the file unless you rename the file or rename the existing pipeline.

  1. In the left navigation pane, click the Pipelines icon: .
  2. Click Import Pipeline.
  3. Browse and select the pipeline JSON file, and then click Open.
    Note: If the imported pipeline contains secrets such as user names and passwords, you must enter the secret values again. Secret values are not exported.

Deleting a Pipeline

You can delete a pipeline when you no longer need the pipeline.

Deleting a pipeline is permanent. To keep a backup, export the pipeline before you delete it.

  1. In the left navigation pane, click the Pipelines icon: .
  2. Hover over the pipeline that you want to delete, click , and then click Delete.
  3. Click OK to confirm the deletion.

Pipeline Hours

StreamSets Cloud charges for usage based on pipeline hours.

A pipeline hour is a fractional measurement of the number of hours a pipeline has been running. For example, if you run one pipeline for 1.05 hours and another pipeline for 1.62 hours, your total pipeline hours are 2.67.

StreamSets Cloud only charges you when doing work for you. Pipeline hour metering does not begin and end at the moment that you click the Run icon () or the Stop icon () for a pipeline. After clicking the Run or Stop icon, it can take a few minutes for the pipeline to be deployed and started or for the pipeline to come to a complete stop. As a result, StreamSets Cloud uses the following metering guidelines:

  • Begins metering pipeline hours when the pipeline deployment starts.
  • Ends metering pipeline hours after the pipeline completes the currently running batch and finishes cleaning up the deployment environment.

Pipeline hour charges are determined by your StreamSets Cloud plan. For a description of each plan, see the StreamSets Cloud pricing page.

Viewing Pipeline Hour Usage

You can view your total pipeline hour usage for a specific time period.

  1. In the left navigation pane, click the Administration icon: .
  2. Click Metering.

    By default, the Metering page displays your pipeline hour usage for the current month, for example:

  3. To view your pipeline hour usage for another time period, enter a different date range or group by a different time period.