Cron Scheduler

Supported pipeline types:
  • Data Collector

The Cron Scheduler origin generates records periodically based on a schedule. Use the origin to schedule tasks by triggering downstream stages in a pipeline.

The Cron Scheduler origin is an orchestration stage that you use in orchestration pipelines. Orchestration stages perform tasks, such as schedule and start pipelines and Control Hub jobs, that you can use to create an orchestrated workflow across the StreamSets platform. For example, an orchestration pipeline can use the Cron Scheduler origin to generate a record every Monday at 6 AM to trigger the Start Pipelines processor, which starts a pipeline that loads data from the previous week and generates a report.

When you configure the Cron Scheduler origin, you define the schedule for generating records. You specify the schedule with a cron expression and a time zone for that expression. At the scheduled time, the origin generates a record and passes it to the next stage in the orchestration pipeline.

Cron Expression

The origin uses the UNIX cron expression from the Cron Schedule property to determine when to generate records. You specify a time zone for the expression in the Time Zone property.

A cron expression is a string with six or seven fields separated by white space. For example, the following cron expression generates a record on the first day of every month at 9 AM:
0 0 9 1 1/1 ? *
The UI can create the expression based on your specifications or you can enter an expression in the cron expression syntax:
UI-created expressions
Select the tab that matches the desired the frequency for record generation, and then indicate when you want records generated. The UI produces a valid cron expression based on your selections.

For example, the following image shows a configuration that generates records on the first day of every month at 9 AM in the Central time zone in the United States. The UI shows the read-only cron expression produced by the UI configuration.

Manually-entered expressions
Select the Advanced tab and enter a cron expression directly. The fields in the cron expression can contain any of the following allowed values:
Field Mandatory Allowed Values
Seconds yes 0-59
Minutes yes 0-59
Hours yes 0-23
Day of month yes 1-31
Month yes 1-12 or JAN-DEC
Day of week yes 1-7 or SUN-SAT
Year no empty, 1970-2099

Each field can also contain various combinations of special characters allowed in the field. For example, the asterisk (*) special character can be used in all fields to represent all values within the field.

For a list of the special characters allowed in each field and for example expressions, see the Quartz Scheduler documentation.

Generated Record

The Cron Scheduler origin creates an orchestration record that includes a single timestamp field which contains the datetime when the record was created.

For example, the following preview shows an orchestration record generated by a Cron Scheduler origin that is configured to generate a record at 1 AM every day:

Configuring a Cron Scheduler Origin

Configure a Cron Scheduler origin to generate records as scheduled by a cron expression. The Cron Scheduler origin is an orchestration stage that you use in orchestration pipelines.

  1. In the Properties panel, on the General tab, configure the following properties:
    General Property Description
    Name Stage name.
    Description Optional description.
    On Record Error Error record handling for the stage:
    • Discard - Discards the record.
    • Send to Error - Sends the record to the pipeline for error handling.
    • Stop Pipeline - Stops the pipeline.
  2. On the Cron tab, configure the following properties:
    Cron Property Description
    Cron Schedule UNIX cron expression that specifies when to generate a record.

    For more information about cron expressions, see the Quartz Scheduler documentation.

    Time Zone Time zone for the schedule specified in the cron expression.