Managing Jobs

When a job is active, you can synchronize or stop the job.

When a job is inactive, you can reset the origin for the job, edit the job, or delete the job.

When a job is active or inactive, you can edit the latest pipeline version, upgrade the job to use the latest pipeline version, or schedule the job to start, stop, or upgrade on a regular basis.

Synchronizing Jobs

Synchronize a job when you've changed the labels assigned to Data Collectors or Edge Data Collectors and the job is actively running on those components. Or, synchronize a job to trigger a restart of a non running pipeline that has encountered an error.

When you synchronize an active job, Control Hub performs the following actions:
  • Starts pipelines on additional Data Collectors or Edge Data Collectors that match the same labels as the job.
  • Stops pipelines on Data Collectors or Edge Data Collectors that no longer match the same labels as the job.
  • Restarts non running pipelines from the last-saved offset on the same Data Collector or Data Collector Edge that match the same labels as the job. For example, a pipeline might have stopped running after encountering an error or after being deleted from that Data Collector.

Control Hub does not perform any action on Data Collectors or Edge Data Collectors that have labels that still match the job labels and that are already running a pipeline for the job.

For example, let’s say a job is active on three Data Collectors with label Test. If you remove label Test from one of the Data Collectors, synchronize the active job so that the pipeline stops running on that Data Collector. Or, let's say that one of the three pipelines running for the job has encountered an error and has stopped running. If you synchronize the active job, Control Hub triggers a restart of the pipeline on that same Data Collector.

Note: To redistribute the pipeline load for a job enabled for failover, you must balance the job. For a comparison of the key differences between balancing and synchronizing jobs, see Comparing Balance Jobs and Synchronize Jobs.
To synchronize active jobs from the Jobs view, select jobs in the list, and then click the Sync Job icon: . Or to synchronize an active job when monitoring the job, click the Sync Job icon.
Tip: You can also synchronize jobs from a topology.

Job Offsets

Just as Data Collector and Data Collector Edge maintain the last-saved offset for some origins when you stop a pipeline, Control Hub maintains the last-saved offset for the same origins when you stop a job.

Control Hub maintains offsets for the following origins:

  • Amazon S3
  • Directory
  • Elasticsearch
  • File Tail
  • Google Cloud Storage
  • Hadoop FS Standalone
  • HTTP Client
  • JDBC Multitable Consumer
  • JDBC Query Consumer
  • Kinesis Consumer
  • MapR DB JSON
  • MapR FS Standalone
  • MongoDB
  • MongoDB Oplog
  • MySQL Binary Log
  • Salesforce
  • SFTP/FTP Client
  • SQL Server CDC Client
  • SQL Server Change Tracking
  • Teradata Consumer
  • Windows Event Log

Let's look at how Control Hub maintains the offset for pipelines running on Data Collectors with these origins. Edge Data Collectors maintain the offset in the exact same way:

  1. When you start a job, Control Hub can run a remote pipeline instance on each Data Collector assigned all labels tagged to the job. As a Data Collector runs a pipeline instance, it periodically sends the latest offset to Control Hub. If a Data Collector becomes disconnected from Control Hub, the Data Collector maintains the offset. It updates Control Hub with the latest offset as soon as it reconnects to Control Hub.
  2. When you stop a job, Control Hub instructs all Data Collectors running pipelines for the job to stop the pipelines. The Data Collectors send the last-saved offsets back to Control Hub. Control Hub maintains the last-saved offsets for all pipeline instances in that job.
  3. When you restart the job, Control Hub sends the last-saved offset for each pipeline instance to a Data Collector so that processing can continue from where the pipeline last stopped. Control Hub determines the Data Collector to use on restart based on whether failover is enabled for the job:
    • Failover is disabled - Control Hub sends the offset to the same Data Collector that originally ran the pipeline instance. In other words, Control Hub associates each pipeline instance with the same Data Collector.
    • Failover is enabled - Control Hub sends the offset to a different Data Collector with matching labels.

You can view the last-saved offset sent by each Data Collector or SDC Edge in the job History view.

If you want the Data Collectors or Edge Data Collectors to process all available data instead of processing data from the last-saved offset, simply reset the origin for the job before restarting the job. When you reset the origin for a job, you also reset the job metrics.

Note: If you edit the job so that it contains a new pipeline version with a different origin, reset the origin before restarting the job.

Resetting the Origin for Jobs

Reset the origin when you want the Data Collectors or Edge Data Collectors running the pipeline to process all available data instead of processing data from the last-saved offset.

You can reset the origin for all inactive jobs. When you reset an origin that Control Hub maintains the offset for, you reset both the origin and the metrics for the job. When you reset an origin that Control Hub does not maintain the offset for, you reset only the metrics for the job.

You can reset the origin from the Jobs view or when monitoring a job.
Tip: You can also reset the origin for a job from a topology.

To reset origins from the Jobs view, select jobs in the list, click the More icon () and then click Reset Origin.

To reset the origin when monitoring an inactive job, click the More icon and then click Reset Origin, as shown in the following image of an inactive job in the monitoring view:

Editing the Latest Pipeline Version

While viewing an inactive job or monitoring an active job, you can access the latest version of the pipeline to edit the pipeline in Pipeline Designer.

When you view or monitor a job, Control Hub displays a read-only view of the pipeline in the pipeline canvas. To edit the latest version of the pipeline, click the Edit icon next to the job name, and then click Edit Latest Version of Pipeline, as follows:

Control Hub creates a new draft of the latest version of the pipeline, and opens the draft in edit mode in Pipeline Designer.

When you edit a pipeline from a job, the job is not automatically updated to use the newly edited version. You must upgrade the job to use the latest published pipeline version. When working with job templates, you upgrade the job template to use the latest version.

Upgrading to the Latest Pipeline Version

You can upgrade a job or a job template to use the latest published pipeline version.

When a job or job template includes a pipeline that has a later published version, Control Hub notifies you by displaying the New Pipeline Version icon () next to the job or template.

You can simply click the icon to upgrade the job or job template to use the latest pipeline version. Or, you can select jobs or job templates in the Jobs view, click the More icon () and then click Use Latest Pipeline Version.

When you upgrade to the latest pipeline version, the tasks that Control Hub completes depend on the following job types:

Inactive job or a job template
When you upgrade an inactive job or a job template, Control Hub updates the job or job template to use the latest pipeline version.
When working with job templates, you must stop and restart the job instances so that they use the latest published pipeline version included in the job template.
Active job
When you upgrade an active job, Control Hub stops the job, updates the job to use the latest pipeline version, and then restarts the job. During the process, Control Hub displays a temporary Upgrading status for the job.

Stopping Jobs

Stop a job when you want to stop processing data for the pipeline included in the job.

When you stop a job that includes an origin that can be reset, Control Hub maintains the last-saved offset for the job. For more information, see Job Offsets.

  1. In the Navigation panel, click Jobs.
  2. Select active jobs in the list, and then click the Stop icon: . Or when monitoring an active job, click the Stop icon.
    Tip: You can also stop a job from a topology.
  3. In the confirmation dialog box that appears, click Yes.
  4. If a job remains in a Deactivating state, you can force Control Hub to stop the job immediately.
    In some situations, a job can remain in a Deactivating state for up to ten minutes. For example, if the Data Collector running the pipeline shuts down unexpectedly, Control Hub waits for ten minutes before forcing the job to stop.
    1. To force a deactivating job to stop, select the job in the Jobs view, click the More icon (), and then click Force Stop. Or from the job monitoring view, click Force Stop.
      A confirmation dialog box appears.
    2. To force stop the job, click Yes.

Scheduling Jobs

You can use the Control Hub scheduler to schedule jobs to start, stop, or upgrade to the latest pipeline version on a regular basis.

For more information about using the scheduler to schedule jobs, see Scheduled Task Types.

Editing Jobs

You can edit inactive jobs to change the job definition. When job instances are started from a job template, edit the job template to change the job definition. You cannot edit inactive job instances started from a job template.

Edit inactive jobs or job templates from the Jobs view. Hover over the inactive job or job template, and click the Edit icon: .

You can edit inactive jobs or job templates to change the following information:
  • Description
  • Pipeline commit/tag - You can select a different pipeline version to run.

    For example, after you start a job, you realize that the developer forgot to enable a metric rule for the pipeline, so you stop the job. You inform your developer, who edits the pipeline rules in Pipeline Designer and republishes the pipeline to as another version. You edit the inactive job to select that latest published version of the pipeline, and then start the job again.

    Important: If you edit the job so that it contains a new pipeline version with a different origin, you must reset the origin before restarting the job.
  • Labels - You can assign and remove labels from the job to change the group of Data Collectors or Edge Data Collectors that run the pipeline.
  • Statistics Refresh Interval - You can change the milliseconds to wait before Control Hub refreshes the statistics when you monitor the job.
  • Enable Time Series Analysis - You can change whether Control Hub stores time series data which you can analyze when you monitor the job.
  • Number of Instances - You can change the number of pipeline instances run for the job.
  • Pipeline Force Stop Timeout - Number of milliseconds to wait before forcing remote pipeline instances to stop.
  • Runtime Parameters - You can change the values used for the runtime parameters defined in the pipeline.
  • Enable or disable failover - You can enable or disable pipeline failover for the job.

Deleting Jobs

You can delete inactive jobs and job templates. Before you delete a job template, delete all inactive job instances created from that template.

  1. In the Navigation panel, click Jobs.
  2. Select jobs or templates in the list, and then click the Delete icon: .
  3. To view all deleted jobs and job templates, click the More icon () in the Jobs view, and then click Show Deleted Jobs.
    You can view the details and view the last monitoring statistics for deleted jobs and templates. However, you cannot perform any other actions on deleted jobs or templates.