Batches

Data passes through a pipeline in batches.

The origin creates a batch as it reads data from the origin system or as data arrives from the origin system, noting the offset. The offset is the location where the origin stops reading.

The origin sends the batch when the batch is full or when the batch wait time limit elapses. The batch moves through the pipeline from processor to processor until it reaches pipeline destinations.

Destinations write the batch to destination systems, and the pipeline commits the offset internally. Based on the pipeline delivery guarantee, the pipeline either commits the offset as soon as it writes to any destination system or after receiving confirmation of the write from all destination systems. After the offset commit, the origin creates a new batch.

When you stop and then restart the pipeline, the origin can start reading from the last-saved offset or can start reading from the beginning.

Note: This describes general pipeline behavior. Behavior can differ based on the specific pipeline configuration.

Batches in Multithreaded Pipelines

The information above describes a standard single-threaded pipeline - the origin creates a batch and passes it through the pipeline, creating a new batch only after processing the previous batch.

Some origins can generate multiple threads to enable parallel processing in multithreaded pipelines. In a multithreaded pipeline, you configure the origin to create the number of threads or amount of concurrency that you want to use. And the pipeline creates a number of pipeline runners based on the pipeline Max Runners property to perform pipeline processing. Each thread connects to the origin system, creates a batch of data, and passes the batch to an available pipeline runner.

Each pipeline runner processes one batch at a time, just like a pipeline that runs on a single thread. When the flow of data slows, the pipeline runners wait idly until they are needed, generating an empty batch at regular intervals. You can configure the Runner Idle Time pipeline property to specify the interval or to opt out of empty batch generation.

All general references to pipelines in this guide describe single-threaded pipelines, but this information generally applies to multithreaded pipelines.