Preview

Overview

You can preview data to help build or fine-tune a pipeline. You can preview complete or incomplete pipelines.

When you preview data, source data from the origin passes through the pipeline, allowing you to review how the data passes and changes through each stage. You can edit stage properties and run the preview again to see how your changes affect the data.

You can preview data for one stage at a time or for a group of stages. You can also view the preview data in list or table view.

When previewing data for a processor, you can choose how to display the order of output records. You can display output records in the order that matches the input records or in the order produced by the processor.

Transformer always processes previews of pipelines locally on the Transformer machine. However, when Transformer runs a pipeline on a cluster, Spark distributes the processing across nodes in the cluster.

Preview Availability

You can preview complete and incomplete pipelines.

The Preview icon () becomes active when preview is available. You can preview data under the following conditions:

  • All stages in the pipeline are connected.
  • All required properties are defined.
Tip: Stage configuration does not have to be accurate or complete to preview data. After you connect all stages, you can enable preview by entering any valid value for required properties.

Writing to Destinations

As a tool for development, preview does not write data to destinations by default.

If you like, you can configure the preview to write data to destinations. We advise against writing preview data to production destinations.

Preview Data Types

Preview displays generic data types, such as Boolean, String, and List. These data types represent the Spark data types that are being used. For example, in preview, List represents the Array Spark data type, and Map can represent either the Map or Struct Spark data types.
Note: Preview also displays date, datetime, and time data using the default format of the browser locale. For example, if the browser uses the en_US locale, preview displays dates using the following format: MMM d, y h:mm:ss a.

Preview Codes

In Preview mode, Transformer displays different colors for different types of data. Transformer uses other codes and formatting to highlight changed fields.

The following table describes the color and asterisk coding:
Preview Code Description
Black values Date data
Blue values Numeric data
Green values String data
Red values Boolean data
Light red background Fields removed by a stage
Green stage First stage in a multiple-stage preview
Red stage Last stage in a multiple-stage preview

Processor Output Order

When previewing data for a processor, you can preview both the input and the output data. You can display the output records in the order that matches the input records or in the order produced by the processor.

In most cases when you preview data for a processor, you'll want to compare matching input and output records side by side because the processor produces updated records. For example, when you preview data for a Field Renamer processor, Transformer by default displays the output records in matching order with the input records. The Preview panel highlights the changed field in each record, as follows:

However, some processors such as the Aggregate or Profile processor don’t update records; they create new records. And other processors such as the Sort processor reorder the records. In these cases, comparing matching input and output records isn’t relevant. It's more helpful to display the output records in the order produced by the processor.

For example, when you preview data for an Aggregate processor, Transformer displays the output records in the output order by default. The Preview panel displays the input records under Input Data and the output records under Output Data without attempting to match the records, as follows:

If you display the output records in matching order with the input records for the same Aggregate processor, Transformer attempts to match the input and output records. The Preview panel displays the input records first, noting under Output Data that no matching records exist. The Preview panel then displays the new output records created by the processor, as follows:

Previewing a Pipeline

Preview a pipeline to review the values for each record to determine if the pipeline transforms data as expected. You can preview data for a single stage or for a group of linked stages.

  1. In the toolbar above the pipeline canvas, click the Preview icon: .
    If the Preview icon is disabled, check the Issues list for unconnected stages and required properties that are not defined.
  2. In the Preview Configuration dialog box, configure the following properties:
    Preview Property Description
    Preview Batch Size Number of records to use in the preview. Honors values up to the maximum preview batch size defined in the Transformer configuration file.

    Default is 10. Default in the Transformer configuration file is 1,000.

    Preview Timeout Milliseconds to wait for preview data. Use to limit the time that preview waits for data to arrive at the origin. Relevant for transient origins only.
    Write to Destinations Determines whether the preview passes data to destinations.

    By default, does not pass data to destinations.

    Show Record/Field Header Displays record header attributes and field attributes when in List view. Attributes do not display in Table view.
    Show Field Type Displays the data type for fields in List view. Field types do not display in Table view.
    Remember the Configuration Stores the current preview configuration for use every time you request a preview for this pipeline.

    While running preview, you can change this option in the Preview panel by selecting the Preview Configuration tab and clearing the option. The change takes effect the next time you run preview.

  3. Click Run Preview.
    The Preview panel highlights the origin stage and displays preview data in list view. Since this is the origin of the pipeline, no input data displays.

    To view preview data in table view, click the Table View icon: .

  4. To delete a record that you do not want to use, click the Delete icon.
  5. To view data for the next stage, click the Next Stage icon: . Or, to view data for a different stage, select the stage in the pipeline canvas.

    When you preview data for a processor, you can choose the order in which to display the output data.

  6. To preview data for multiple stages, click Multiple to display two lists of stages in the Preview panel.
    1. From the list on the left, select the first stage to use.
    2. From the list on the right, select the last stage to use.
      The Preview panel displays the output data of the first stage in the group and the input data of the last stage in the group.
  7. To refresh the preview, click the Refresh Preview icon: .
    Refreshing the preview provides a new set of data.
  8. To exit preview, click the Close Preview icon: .

Editing Properties

When running preview, you can edit stage properties to see how the changes affect preview data. For example, you might edit the condition in a Stream Selector processor to see how the condition alters which records pass to the different output streams.

When you edit properties, you can test the change by refreshing the preview data.

  1. To edit stage properties while running preview, select the stage you want to edit and click the Stage Configuration icon: .
  2. Change properties as needed.
  3. To test the changed properties, click the Refresh Preview icon: .
    This refreshes the preview data.
  4. To revert your change, manually change the property back.