Preview

Overview

You can preview pipeline data to help build or fine-tune a pipeline.

When you preview a pipeline, source data passes through the pipeline, allowing you to review how the data passes and changes through each stage. Because you preview pipeline data during pipeline development, a preview by default does not write data to destinations or pass data to executors connected to destination stages. StreamSets Cloud does not maintain a history of each pipeline preview.

You can preview complete or incomplete pipelines.

During a preview, you can view the preview data in List or Table view, and reload the preview data. You can preview data for one stage at a time or for a group of stages. You can also edit preview data to test and tune the pipeline logic.

Example

Let's say that you are processing customer orders. The Type field in each record contains either Customer - Direct or Customer - Channel as the value. You'd like to clean this data by keeping only Direct or Channel as the value before loading the data to the destination system. You add an Expression Evaluator processor to the pipeline and define an expression to replace the value of the Type field with the shortened value.

After configuring the processor, you want to verify that the expression cleans the data as expected. So you click the Preview icon (), even though the pipeline is incomplete because it does not yet include a destination. When the preview starts, you select the Expression Evaluator processor in the pipeline canvas. The preview panel displays the input and output data of the processor, highlighting the changed data in the Type field and confirming that the expression correctly removes the string Customer - from the Type values, as follows:

Preview Availability

You can preview a complete pipeline or an incomplete pipeline that is open in the pipeline canvas.

You can preview a pipeline under the following conditions:
  • StreamSets Cloud has finished allocating the required preview resources for the pipeline.
  • The pipeline includes an origin.
  • All stages in the pipeline are connected.
  • All required properties have valid values.
Note: StreamSets Cloud automatically validates pipelines when you preview them. The pipeline canvas displays errors when a stage has an invalid value or cannot connect to an external system.

Writing to Destinations and Executors

Because you preview pipeline data during pipeline development, a preview by default does not write data to destinations or pass data to executors connected to destination stages.

A preview also does not display the data that is written by destinations in the pipeline. You can, however, view the data that is passed to a destination stage, which is typically similar to what is written to destination systems.

When needed, you can configure a preview to write data to destination systems and to trigger executors connected to destination stages. For example, you might enable writing to an executor to verify that it performs the configured task as expected.

To write to destination systems and connected executors, in the Preview Configuration dialog box, select Write to Destinations and Executors.

Important: StreamSets advises against writing preview data to production destination systems.

Preview Log

StreamSets Cloud generates log data when you preview a pipeline.

When a preview fails, view the error messages in the preview log to help you troubleshoot issues. For more information about the preview log, see Pipeline Logs.

Notes

Keep the following notes in mind when previewing pipeline data:
  • Date, datetime, and time data - A preview displays date, datetime, and time data using the default format of the browser locale. For example, if the browser uses the en_US locale, preview displays dates using the following format: MMM d, y h:mm:ss a.
  • Oracle CDC Client pipelines - When previewing a pipeline that uses the Oracle CDC Client origin, the preview might time out before connecting to the origin system. When this occurs, try increasing the timeout to 120,000 milliseconds to allow the origin time to connect.
  • Whole file data format - When previewing a pipeline that processes whole file data, the preview displays only one record.

Preview Codes

A preview displays different colors for different types of data. It also uses other codes and formatting to highlight changed fields.

The following table describes the color and asterisk coding:
Preview Code Description
Black values Date data
Blue values Numeric data
Green values String data
Red values Boolean data
Asterisk Records that include edited field values
Red italic field labels Fields that contain edited data
Light red background Fields removed by a stage
Italic values Edited data
Green stage First stage in a multiple-stage preview
Red stage Last stage in a multiple-stage preview

Previewing a Pipeline

Preview a pipeline to review the values for each record to determine if the pipeline transforms data as expected. You can preview data for a single stage or for a group of linked stages.

  1. Open the pipeline in the pipeline canvas.
    1. In the left navigation pane, click the Pipelines icon: .
    2. Click the name of the pipeline that you want to preview.
  2. In the toolbar above the pipeline canvas, click the Preview icon: .
    Note: The Preview icon is enabled after StreamSets Cloud has finished allocating the required preview resources for the pipeline.
  3. In the Preview Configuration dialog box, configure the following properties:
    Preview Property Description
    Preview Source Source data for the preview. Select Configured Source to preview data provided from the origin system.
    Preview Batch Size Number of records to use in the preview.
    Preview Timeout Milliseconds to wait for preview data. Use to limit the time that the preview waits for data to arrive at the origin.
    Write to Destinations and Executors Passes preview data to destinations or executors.

    By default, does not pass data to destination or executor stages.

    Execute Pipeline Lifecycle Events Triggers the generation of any appropriate pipeline events, typically the Start event. If the event is configured to be used, event consumption is also triggered.
    Show Record/Field Header Displays record header attributes and field attributes when in List view. Attributes do not display in Table view.
    Show Field Type Displays the data type for fields in List view. Field types do not display in Table view.
    Remember the Configuration Stores the current preview configuration for use every time you request a preview for this pipeline.

    After you run the preview, you can change this option in the preview panel by selecting the Preview Configuration icon () and clearing the option. The change takes effect the next time you run a preview.

  4. Click Confirm.
    The pipeline canvas highlights the origin stage, and the preview panel displays preview data in List view. Since this is the origin of the pipeline, the panel displays no input data.

    To view preview data in table view, click the Table View icon: .

  5. To delete a record that you do not want to use, display the preview in List view, and then click the Delete icon.
  6. To view data for a different stage, select the stage in the pipeline canvas.
  7. To preview data for multiple stages, click Multiple.
    By default, the first and last stage of the pipeline are selected in the pipeline canvas. The preview panel displays the output data of the first stage in the group and the input data of the last stage in the group.
    1. To select a different stage as the first stage, select the first stage highlighted in green, and then select another stage.
    2. To select a different stage as the last stage, select the last stage highlighted in red, and then select another stage.
  8. To reload the preview data, click Reload Preview.
    Reloading the preview provides a new set of data.
  9. To exit the preview, click the Close Preview icon: .

Editing Preview Data

You can edit preview data to view how a stage or group of stages processes the changed data. Edit preview data to test for data conditions that might not appear in the preview data set.

Edit the preview data in the Output Data column for a stage that passes data to the stage that you want to test. After you edit preview data, you pass the changed data through the pipeline.

For example, let's say that you configure the Stream Selector processor to pass data to streams based on an expression that evaluates integer values. You want to test that the expression works as expected for positive and negative integer values, as well as zero. You edit the preview data in the Output Data column for the origin and then click Run With Changes to pass the changed data to the Stream Selector processor.

  1. To change field values, in the Output Data column of an origin or a processor, click the value that you want to change and enter a new value.
  2. To process the changed data, click Run With Changes.
    This runs the preview with the current set of data and stage configuration.
    In the Input Data column, records with changed values display with an asterisk and the changed values are highlighted. The Output Data column displays the results of processing. You can change and process preview data as often as necessary.
  3. To revert your changes to the data, click Revert Data Changes.
  4. To reload the preview with a new set of data, click Reload Preview.