Designing the Data Flow

You can branch and merge streams data in a pipeline.

Branching Streams

You can branch streams of data in a pipeline by connecting a stage to multiple downstream stages.

When you branch streams of data, all data passes to all connected stages. You can configure required fields for a stage to discard records before they enter the stage, but by default all records are passed.

For example, the following pipeline includes two branches. All of the data from the MySQL Query Consumer origin passes to both branches of the pipeline for different types of processing. But you might optionally configure required fields for the Field Splitter processor or Field Replacer processor to discard any records that are not needed.

To route data based on more complex conditions, use a Stream Selector processor.

Some stages generate events that pass to event streams. Event streams create another branch in the pipeline. Event streams originate from an event-generating stage, such as an origin or destination, and pass from the stage through an event stream output, as follows:

For more information about the event framework and event streams, see Overview.

Merging Streams

You can merge streams of data in a pipeline by connecting two or more stages to the same downstream stage. When you merge streams of data, the pipeline channels the data from all streams to the same stage, but does not perform a join of records in the stream.

For example, in the following pipeline, the Stream Selector processor sends data with null values to the Field Replacer processor:

The data from the Stream Selector default stream and all data from the Field Replacer processor pass to the Expression Evaluator processor for further processing, but in no particular order and with no record merging.

Important: Pipeline validation does not prevent duplicate data. To avoid writing duplicate data to destinations, configure the pipeline logic to remove duplicate data or to prevent the generation of duplicate data.

Note that you cannot merge event streams with data streams. Event records must stream from the event-generating stage to destinations or executors without merging with data streams. For more information about the event framework and event streams, see Overview.