Scheduled Task Types

A scheduled task periodically triggers an action on one of the following task types:
  • Job
  • Report

Jobs

Use the scheduler to schedule jobs to start or stop on a regular basis.

When you define a scheduled task for a job, you specify whether the task completes a start or stop action. A single scheduled task cannot both start and stop the job.

Before scheduling a job, consider whether the job is a batch job or a streaming job:
Batch job
A batch job includes a pipeline that processes all available data, and then stops. Create schedules for batch jobs to start the jobs on a regular basis.
For example, let's say that your dataflow topology updates a database table daily at 4 am. Rather than have the pipeline process the data in a few minutes and sit idle for the rest of the day, you want to kick off the pipeline, have it process all data and then stop - just like traditional "batch" processing. You use the Pipeline Finisher executor in the pipeline to stop the pipeline when all data is processed.
You add the pipeline to a job and schedule the job to run daily at 4:00 am. The scheduler starts the job daily at the specified time. After the remote pipeline instance transitions to a finished state, the job also transitions to an inactive state. The next day, the scheduler starts the job again so that the pipeline can process the new set of data.
When you schedule batch jobs, you typically schedule them as recurring events.
Streaming job

A streaming job includes a pipeline that maintains a connection to the origin system and processes data as it becomes available. The pipeline runs continuously until you manually stop it because you expect data to continuously arrive. In most cases, there's no need to schedule streaming jobs.

However, you might want to schedule a streaming job so that the job initially starts at some point in the future. For example, you want to schedule a job to initially start next Saturday at midnight when no DevOps engineer is available to manually start the job.

In this case, you would schedule the start of the streaming job as a one-time event.

Or, you might want to schedule a streaming job to start and stop on a regular basis. For example, you want to run a streaming job continuously every day of the week except for Sunday. You create one scheduled task that starts the job every Monday at 12:00 am. Then, you create another scheduled task that stops the same job every Sunday at 12:00 am. The next Monday at 12:00 am, the scheduler starts the job again so that the pipeline can continue running.

In this case, you would schedule both the start and the stop of the streaming job as recurring events.

If a scheduled task triggers a job start when the job is already active or a job stop when the job is already inactive, then no action is performed. The scheduled task simply logs that it was not able to start or stop the job. The task then continues running until the next scheduled time when it triggers another job start or stop.

Reports

Use the scheduler to schedule the generation of data delivery reports on a regular basis. Data delivery reports present data ingestion metrics for a given job or topology. For example, you can schedule a daily report that generates the number of records that processed by a job the previous day.