Data Preview

Data Preview Overview

You can preview data to help build or fine-tune a pipeline. When you preview data, the Data Collector passes data through the pipeline and allows you to review how the data passes through each stage.

You can use data preview to help you configure the pipeline. After you configure the origin and link the stages that you want to review, you start the data preview.

In Preview mode, you cannot add or delete stages, but you can edit stage properties and run the preview again to see how your changes affect preview data. Similarly, you can edit preview data to test and tune pipeline configuration.

You can preview data for one stage at a time or for a group of stages. You can view the data in list or table view. You can also refresh the preview data.

Data Preview Availability

You can preview complete and incomplete pipelines. The Data Preview icon becomes active when data preview is available.

You can preview data under the following conditions:
  • All stages in the pipeline are connected
  • All required properties are defined
Tip: Stage configuration does not have to be accurate or complete to preview data. After you connect all stages, you can enable data preview by entering any valid value for required properties.

Source Data for Data Preview

You can use the following types of data for a data preview:
  • Data from the origin - Use available data from the origin.
  • Data from a snapshot - Use snapshot data from the same pipeline or another pipeline.

Writing to Destinations

As a tool for development, data preview does not write data to destinations by default.

If you like, you can configure the preview to write data to destinations. We advise against writing preview data to production destinations.

Notes

Keep the following notes in mind when previewing your data:
  • Date, datetime, and time data - Data preview displays date, datetime, and time data using the default format of the browser locale. For example, if the browser uses the en_US locale, preview displays dates using the following format: MMM d, y h:mm:ss a.
  • Oracle CDC Client pipelines - When previewing a pipeline that uses the Oracle CDC Client origin, data preview might time out before connecting to the origin system. When this occurs, try increasing the timeout to 120,000 milliseconds to allow the origin time to connect.
  • Whole file data format - When previewing a pipeline that processes whole file data, data preview displays only one record.

Data Collector UI - Preview Mode

You can use Data Collector to view how data passes through the pipeline.

The following image shows Data Collector in Preview mode:

Area/Icon Name Description
1 Pipeline canvas Displays the pipeline.
2 Preview panel

Displays the data that enters and exits the selected stage or group of stages. It can also display stage properties and preview configuration.

StreamSets Control Hub icon Provides information about StreamSets Control Hub (SCH) and lets you register this Data Collector with Control Hub.
Home icon Displays a home page with a list of pipelines and their statuses, allowing you to perform pipeline maintenance and navigate to individual pipelines.
Package Manager icon Displays the Package Manager which allows you to install additional stage libraries for a core Data Collector installation.
Notifications icon Displays notifications.
Administration icon Provides access to Data Collector configuration properties, directories, and log. Also allows you to shut down Data Collector.
User icon Displays the active user and the roles assigned to the user. Also allows you to log out of Data Collector.
Help icon Provides context-sensitive help based on the information in the panel. Allows you to configure display settings and to specify whether to use a local or hosted version of the help.

Provides access to the REST API and the Data Collector version.

Link to a pipeline list Link to a pipeline list on the Home page. Use to view a list of available pipelines, perform pipeline maintenance like starting or sharing a pipeline, and navigate to individual pipelines.
Records icon Displays data preview records.
Stage Configuration icon Displays stage properties.
Preview Configuration icon Displays data preview properties.
Single button Displays input and output data for a single stage.
Multiple button Displays input and output data for a group of stages.
List View icon Displays preview data in a list.
Table View icon Displays preview data in a table.
Previous Stage icon Moves the preview to the previous stage.
Next Stage icon Moves the preview to the next stage.
Refresh Preview icon Provides a fresh set of data from the origin.
Run Preview with Changes icon Runs the data preview using changed data. Use to see how edited data is processed by the pipeline.
Revert Changes icon Reverts all changes to preview data and returns the preview to the origin.
Close Preview icon Closes the data preview.
Note: Some icons and options might not display. The items that display are based on the task that you are performing and roles assigned to your user account.

For information about maintaining pipelines on the Home page, see Data Collector UI - Pipelines on the Home Page.

For information about configuring pipelines, see Data Collector UI - Edit Mode.

For information about pipeline monitoring options, see Data Collector UI - Monitor Mode.

Preview Codes

In Preview mode, Data Collector displays different colors for different types of data. Data Collector uses other codes and formatting to highlight changed fields.

The following table describes the color and asterisk coding:
Preview Code Description
Black values Date data
Blue values Numeric data
Green values String data
Red values Boolean data
Asterisk Records that include edited field values
Red italic field labels Fields that contain edited data
Light red background Fields removed by a stage
Italic values Edited data
Green stage First stage in a multiple-stage preview
Red stage Last stage in a multiple-stage preview

Previewing a Single Stage

You can preview data for a single stage. In the Preview panel, you can review the values for each record to determine if the stage transforms data as expected.

  1. Above the pipeline canvas, click the Preview icon: .
    If the Preview icon is disabled, check the Issues list for unconnected stages and required properties that are not defined.
  2. In the Preview Configuration dialog box, configure the following properties, then click Run Preview.
    Preview Property Description
    Preview Source Source data for the preview:
    • Configured Source - Provides data from the origin system.
    • Snapshot Data - Uses available snapshot data.
    Preview Batch Size Number of records to use in the preview. Honors values up to the Data Collector preview batch size.

    Default is 10. The Data Collector default is 10.

    Preview Timeout Milliseconds to wait for preview data. Use to limit the time data preview waits for data to arrive at the origin. Relevant for transient origins only.
    Write to Destinations and Executors Determines whether the preview passes data to destinations or executors.

    By default, does not pass data to destination or executor stages.

    Execute Pipeline Lifecycle Events Triggers the generation of any appropriate pipeline events, typically the Start event. If the event is configured to be used, event consumption is also triggered.
    Show Record/Field Header Displays record header attributes and field attributes when in List view. Attributes do not display in Table view.
    Show Field Type Displays the data type for fields in List view. Field types do not display in Table view.
    Snapshot Data When using a snapshot for source data, select the snapshot to use.
    Remember the Configuration Stores the current preview configuration for use every time you request a preview for this pipeline.

    After you run data preview, you can change this option in the Preview panel by selecting the Preview Configuration icon () and clearing the option. The change takes effect the next time you run data preview.

    The Preview panel highlights the origin stage and displays preview data in table view. Since this is the origin of the pipeline, no input data displays.

    To view preview data in list view, click the List View icon: .

  3. To delete a record that you do not want to use, click the Delete icon.
  4. To view data for the next stage, click the Next Stage icon. Or, to view data for a different stage, select the stage in the pipeline canvas.
  5. To refresh the preview, click the Refresh Preview icon: .
    Based on the origin, refreshing the preview either provides a new set of data or reverts any changes to the existing data.
  6. To exit data preview, click the Close Preview icon: .

Previewing Multiple Stages

You can preview data for a group of linked stages within a pipeline.

When you preview multiple stages, you select the first stage and the last stage in the group. The Preview panel then displays the input data and the output data for the group. The input data is the data that enters the first stage. The output data is the data that passes from the last stage.

In the Preview panel, you can review the values for each record to determine if the group of stages transforms data as expected.

  1. Above the pipeline canvas, click the Preview icon: .
    If the Preview icon is disabled, check the Issues list for unconnected stages and required properties that are not defined.
  2. In the Preview Configuration dialog box, configure the following properties, then click Run Preview.
    Preview Property Description
    Preview Source Source data for the preview:
    • Configured Source - Provides data from the origin system.
    • Snapshot Data - Uses available snapshot data.
    Preview Batch Size Number of records to use in the preview. Honors values up to the Data Collector preview batch size.

    Default is 10. The Data Collector default is 10.

    Preview Timeout Milliseconds to wait for preview data. Use to limit the time data preview waits for data to arrive at the origin. Relevant for transient origins only.
    Write to Destinations and Executors Determines whether the preview passes data to destinations or executors.

    By default, does not pass data to destination or executor stages.

    Execute Pipeline Lifecycle Events Triggers the generation of any appropriate pipeline events, typically the Start event. If the event is configured to be used, event consumption is also triggered.
    Show Record/Field Header Displays record header attributes and field attributes when in List view. Attributes do not display in Table view.
    Show Field Type Displays the data type for fields in List view. Field types do not display in Table view.
    Snapshot Data When using a snapshot for source data, select the snapshot to use.
    Remember the Configuration Stores the current preview configuration for use every time you request a preview for this pipeline.

    After you run data preview, you can change this option in the Preview panel by selecting the Preview Configuration icon () and clearing the option. The change takes effect the next time you run data preview.

    The Preview panel highlights the origin stage and displays preview data in table view. Since this is the origin of the pipeline, no input data displays.

    To view preview data in list view, click the List View icon: .

  3. To delete a record that you do not want to use, click the Delete icon.
  4. To preview multiple stages, click Multiple.
    The Preview panel displays two lists of stages.
  5. From the list on the left, select the first stage to use.
  6. In the list on the right, select the last stage to use.
    The Preview panel displays the input and output data for the group of stages. You can review the details of each record.
  7. To refresh the preview, click the Refresh Preview icon: .
    Based on the origin, refreshing the preview either provides a new set of data or reverts any changes to the existing data.
  8. To return to previewing a single stage, click Single.
  9. To exit data preview and return to pipeline configuration, click Close Preview.

Editing Preview Data

You can edit preview data to view how a stage or group of stages processes the changed data. Edit preview data to test for data conditions that might not appear in the preview data set.

For example, when the stage filters integer data based on an expression, you might change the input data to test positive and negative integer values, as well as zero.

You can edit preview data in the following locations:
  • The output data column for an origin.
  • The input data column for processors.

When you edit preview data, you can pass the changed data through the pipeline, or you can revert your changes to return to the original data.

  1. To change field values, in the Output Data column of an origin or the Input Data column for all other stages, click the value that you want to change and enter a new value.
    You can edit values for any input data.
  2. To process changed data, click the Run With Changes icon: .
    This runs the data preview with the current set of data and stage configuration.
    In the Input Data column, records with changed values display with an asterisk and the changed values are highlighted. The Output Data column displays the results of processing. You can change and process preview data as often as necessary.
  3. To refresh the preview, click the Refresh Preview icon: .
    Based on the origin, refreshing the preview either provides a new set of data or reverts any changes to the existing data.
  4. To revert changes to data, click the Revert Data Changes icon: .

Editing Properties

In data preview, you can edit stage properties to see how the changes affect preview data. For example, you might edit the expression in an Expression Evaluator to see how the expression alters data.

When you edit properties, you can test the change with the existing preview data or you can refresh the preview data.

When changing properties in the origin, refresh the preview data to test your changes. Refreshing the preview data allows the Data Collector to use the latest origin properties to process preview data instead of using the cached data.
Note: Unlike changes to data, you cannot automatically revert property changes. Manually revert any changes that you do not want to preserve.
  1. To edit stage properties while in data preview, select the stage you want to edit and click the Stage Configuration icon: .
  2. Change properties as needed.
  3. To test properties changed in the origin, click the Refresh Preview icon: .
    This refreshes the preview data. Based on the origin type, it might use the same data or a new set of data with the updated origin properties.
    To test properties in any non-origin stage using the same set of data, click the Run With Changes icon: .
  4. If you want to revert your change, manually change the property back.