Design in Control Hub

You can design pipelines and pipeline fragments in Control Hub using the Control Hub Pipeline Designer. You can use Pipeline Designer to develop pipelines and fragments for Data Collector or Data Collector Edge.

Pipeline Designer enables you to configure pipelines, preview data, and publish pipelines. You can also design and publish pipeline fragments.

You can create new pipelines or edit previously published pipelines. When you create a pipeline in Pipeline Designer, you can start with a blank canvas or with a pipeline template. Pipeline Designer provides several system pipeline templates. You can also create user-defined pipeline templates. Use the Pipelines view to create a new pipeline or to access existing pipelines in the pipeline repository.

You can also create or edit previous published fragments. When you create a fragment in Pipeline Designer, you start with a blank canvas. Use the Pipeline Fragments view to create a new fragment or access existing fragments.

When you configure a pipeline or pipeline fragment in Pipeline Designer, you specify the authoring Data Collector to use. Pipeline Designer displays stages, stage libraries, and functionality based on the selected authoring Data Collector.

For more information about using Pipeline Designer, see Pipeline Designer UI and Pipeline Designer Tips.

Authoring Data Collector

When you create or edit a pipeline or pipeline fragment in Pipeline Designer, you select the authoring Data Collector to use. You can choose the system Data Collector or any Data Collector registered with your Control Hub organization that meets all of the requirements.

Choose an authoring Data Collector that is the same version as the execution Data Collectors that you intend to use to run the pipeline. Using a different Data Collector version can result in developing a pipeline that is invalid for execution Data Collectors.

For example, if the authoring Data Collector is a more recent version than the execution Data Collector, pipelines might include a stage, stage library, or stage functionality does not exist in the execution Data Collector.

Select one of the following types of Data Collectors to use as the authoring Data Collector:
System Data Collector

The system Data Collector is provided with Control Hub for exploration and light development. The system Data Collector is accessible to all users.

Use the system Data Collector to design pipelines only - it cannot be used to perform data preview or explicit pipeline validation.

For more information about how the system Data Collector works as an authoring Data Collector, see System Data Collector.
Registered Data Collector
You can select a registered Data Collector that meets all of the following requirements:
  • StreamSets recommends using the latest version of Data Collector.

    The minimum supported Data Collector version is 3.0.0.0. To design pipeline fragments, the minimum supported Data Collector version is 3.2.0.0. To use Data Protector, the minimum supported Data Collector version is 3.5.0.

  • The Data Collector uses the HTTPS protocol because Control Hub also uses the HTTPS protocol.
    Note: StreamSets recommends using a certificate signed by a certifying authority for a Data Collector that uses the HTTPS protocol. If you use a self-signed certificate, you must first use a browser to access the Data Collector URL and accept the web browser warning message about the self-signed certificate before users can select the Data Collector as the authoring Data Collector.
  • The Data Collector uses a publicly accessible URL.
For more information about how a registered Data Collector works as an authoring Data Collector, see Registered Data Collector.

For example, the following Select an Authoring Data Collector window displays the system Data Collector and two registered Data Collectors as choices for the authoring Data Collector. Notice how the second registered Data Collector listed in this image is not accessible and thus cannot be selected because it uses the HTTP protocol:

When you edit a pipeline or fragment in Pipeline Designer, you can change the authoring Data Collector. Click the Authoring Data Collector icon () in the top right corner of Pipeline Designer to view which Data Collector is being used and to optionally change the authoring Data Collector.

For example, the following image shows a pipeline that is currently using the system Data Collector:

Creating a New Pipeline

You can use Pipeline Designer to create new pipelines.

Pipeline Designer provides Data Collector and SDC Edge system pipeline templates. You can also create user-defined pipeline templates. You can review the pipeline templates to learn how you might develop a similar pipeline, or you might use the templates as a starting point for pipeline development.

When you create a pipeline, you specify the type to create - Data Collector or SDC Edge, whether to start from a blank canvas or from a template, and the authoring Data Collector to use. You can change the authoring Data Collector used during development.

  1. To create a pipeline, click Pipeline Repository > Pipelines to access the Pipelines view.
  2. Click the Add icon.
  3. Enter the name and optional description.
  4. Select the type of pipeline to create: Data Collector or Data Collector Edge.
  5. Select how you want to create the pipeline, and then click Next.
    • Blank Pipeline - Use a blank canvas for pipeline development.
    • Pipeline Template - Use an existing template as the basis for pipeline development.
  6. If you selected Pipeline Template, in the Select a Pipeline Template dialog box, filter by the template type, select the template to use, then click Next.
  7. In the Select an Authoring Data Collector dialog box, select the authoring Data Collector to use, then click Create.
    Pipeline Designer opens a blank canvas or the selected pipeline template.
If needed, you can change the authoring Data Collector for the pipeline with the Authoring Data Collector icon: .

Creating a New Fragment

You can use Pipeline Designer to create new pipeline fragments.

When you create a pipeline fragment, you specify the execution engine for the fragment, Data Collector or SDC Edge, and select the authoring Data Collector to use.

  1. To create a pipeline fragment, click Pipeline Repository > Pipeline Fragments to access the Pipeline Fragments view.
  2. Click the Add icon.
  3. Enter the name and optional description.
  4. Select the execution engine to use: Data Collector or Data Collector Edge.
  5. When creating a fragment, in the Select an Authoring Data Collector dialog box, select the authoring Data Collector to use, then click Create.
The Pipeline Designer displays with a blank canvas.
If needed, you can change the authoring Data Collector for the fragment with the Authoring Data Collector icon: .

Running a Test of a Draft Pipeline

As you design a pipeline, you can perform a test run of the draft pipeline in Pipeline Designer. Perform a test run of a draft pipeline to quickly test the pipeline logic.

You can perform a test run of a draft version of a fully configured pipeline. The Test Run menu becomes active when a draft pipeline is complete.

You cannot perform a test run of a published pipeline version. To run a published pipeline version, you must first create a job for the published pipeline version and then start the job.

  1. While viewing a draft version of a completed pipeline in Pipeline Designer, click Test Run in the top right corner of the toolbar, and then select one of the following options:
    • Start Pipeline - Start a test run of the pipeline.
    • Reset Origin and Start - Reset the origin and then start a test run of the pipeline.
    • Start with Parameters - Specify the parameter values to use and then start a test run of the pipeline.
    Monitor the test run of the pipeline, including viewing real-time statistics and error information.
  2. When you've finished monitoring the test run, click the Stop Test Run icon: .
    You can continue designing the pipeline and performing additional test runs until you decide that the pipeline is complete, and then publish the pipeline.

Publishing a Fragment or Pipeline

You can publish fragments and pipelines that are designed or updated in Pipeline Designer.

Publish a fragment to use the fragment in a pipeline. Pipelines can only use published fragments.

Publish a pipeline to create a job that runs the pipeline, to use the pipeline as a template, or to retain the published pipeline version for historical reference. You can only use published pipelines in jobs or as templates.

You can only publish valid pipelines. Pipeline Designer performs explicit validation before publishing a pipeline. As a result, the authoring Data Collector for the pipeline must be a registered Data Collector. For more information, see Authoring Data Collectors.

  1. While viewing fragment or pipeline in edit mode, click the Publish icon: .

    The Publish window appears.

  2. Enter a commit message.

    As a best practice, state what changed in this version so that you can track the commit history of the fragment or pipeline.

  3. Click Publish.