What is a Pipeline?

A pipeline describes the flow of data from the origin system to destination systems and defines how to transform the data along the way.

You can use a single origin stage to represent the origin system, multiple processor stages to transform data, and multiple destination stages to represent destination systems.

When you start a pipeline, Data Collector runs the pipeline until you stop the pipeline or shut down Data Collector. You can use Data Collector to run multiple pipelines.

While the pipeline runs, you can monitor the pipeline to verify that the pipeline performs as expected. You can also define metric and data rules and alerts to let you know when certain thresholds are reached.

You can add an event stream to a pipeline to enable event-driven task execution or to save event information. For more information, see Dataflow Triggers Overview.

To process large volumes of data from a Kafka cluster or HDFS, you can configure a pipeline to run in cluster execution mode. For more information, see Cluster Pipelines.