Origins

An origin stage represents the source for the pipeline. You can use a single origin stage in a pipeline.

You can use the following origins in a pipeline:
To help create or test pipelines, you can use the following development origins:
  • Dev Data Generator
  • Dev Random Record Source
  • Dev Raw Data Source
  • Dev Snapshot Replaying

For more information, see Development Stages.

Batch Size and Wait Time

For origin stages, the batch size determines the maximum number of records sent through the pipeline at one time. The batch wait time determines the time that the origin waits for data before sending a batch. At the end of the wait time, it sends the batch regardless of how many records the batch contains.

For example, a Amazon S3 origin is configured for a batch size of 20 records and a batch wait time of 240 seconds. When data arrives quickly, the Amazon S3 origin fills a batch with 20 records and sends it through the pipeline immediately, creating a new batch and sending it again as soon as it is full. As incoming data slows, a remaining batch contains a few records, gaining an extra record periodically. 240 seconds after creating the batch, the Amazon S3 origin sends the partially-full batch through the pipeline. It immediately creates a new batch and starts a new countdown.

Configure the batch wait time based on your processing needs. You might reduce the batch wait time to ensure all data is processed within a specified time frame or to make regular contact with pipeline destinations. Use the default or increase the wait time if you prefer not to process partial or empty batches.

Maximum Record Size

Most data formats have a property that limits the maximum size of the record that an origin can parse. For example, the delimited data format has a Max Record Length property, the JSON data format has Max Object Length, and the text data format has Max Line Length.

When the origin processes data that is larger than the specified length, the behavior differs based on the origin and the data format. For example, with some data formats, oversized records are handled based on the record error handling configured for the origin. While in other data formats, the origin might truncate the data. For details on how an origin handles size overruns for each data format, see the "Data Formats" section of the origin documentation.

When available, the maximum record size properties are limited by the maximum parser buffer size, which is 10670080 bytes.