Orchestration Pipelines

Orchestration Pipeline Overview

An orchestration pipeline is a Data Collector pipeline with an orchestrator stage that schedules and integrates tasks between systems. Orchestration pipelines arrange tasks and complete workflows.

For example, you might have a workflow that involves periodically reading data from several origins and writing data to different destinations. The destinations might depend on the data read. An orchestration pipeline can complete the entire workflow. Based on a schedule, the orchestration pipeline can periodically start pipelines that read from several origins in parallel. Based on the data read, the orchestration pipeline can start subsequent pipelines that write data to the desired destinations.

Orchestrator Stages

Orchestration pipelines use orchestrator stages to schedule and arrange tasks that complete workflows.

The following stages are orchestrator stages:
  • Cron Scheduler origin - Generates a record periodically, based on a schedule. Use to schedule tasks.
  • Start Pipeline origin - Starts one or more pipelines with a StreamSets execution engine when the orchestration pipeline starts.
  • Control Hub API processor - Sends requests to a Control Hub RESTful API upon receipt of a record from the orchestration pipeline and writes data from the response to a specified output field.
  • Start Job processor - Starts one or more Control Hub jobs or job instances from a job template upon receipt of a record from the orchestration pipeline.
  • Start Pipeline processor - Starts one or more pipelines with a StreamSets execution engine upon receipt of a record from the orchestration pipeline.
The orchestrator stages have features that support orchestration:
  • To pass information between tasks in an orchestration pipeline, orchestrator stages that start jobs or pipelines can pass runtime parameters to the started jobs or pipelines.
  • To control the order of tasks in an orchestration pipeline, you configure when orchestrator stages that start jobs or pipelines send the record to the next stage. If you configure the orchestrator stage to run started jobs or pipelines in the background, then the orchestrator stage sends the record to the next stage immediately after starting the jobs or pipelines. If not, then the orchestrator stage only sends the record after the started jobs or pipelines finish.
  • When starting multiple jobs or pipelines, the orchestrator stages start those jobs or pipelines in parallel.

Sample Pipelines

You can use the orchestrator stages to develop orchestration pipelines that schedule and arrange tasks to complete workflows.

Recurring Task that Runs Multiple Dependent Pipelines

With the orchestrator stages, you can design an orchestration pipeline that triggers a set of tasks daily and starts dependent pipelines based on the success of an earlier pipeline. For example, an orchestration pipeline might run three separate but dependent pipelines, P1, P2, and P3. The orchestration pipeline runs two of the pipelines, P1 and P2, in parallel, and then starts the third pipeline only after both P1 and P2 finish, and P2 succeeds. The following image shows the orchestration pipeline.

The Cron Scheduler origin triggers a data flow through the orchestration pipeline each day. When triggered, the Start Pipeline processor starts the P1 and P2 pipelines in parallel, but does not run the pipelines in the background. Instead, the processor waits until both pipelines finish, and then updates and passes the record to a Stream Selector processor, which checks for the success of P2. If P2 succeeds, the processor sends the record to start P3; otherwise, the processor sends the record for error handling. The Start Pipeline processor runs P3 in the background. Therefore, the record immediately passes to the Email executor, which sends an email message announcing the start of P3.

To run the orchestration pipeline manually rather than automatically, replace the Cron Scheduler origin with a Start Pipeline origin configured to start P1 and P2 in parallel.