Jobs Overview

A job defines the pipeline to run and the execution engine that runs the pipeline: Data Collector, Data Collector Edge, or Transformer. Jobs are the execution of the dataflow.

After you publish pipelines to Control Hub, you create a job to specify the published pipeline to run. You also tag labels to the job so that Control Hub knows which group of execution engines should run the pipeline.

By default when you start a job that contains a Data Collector pipeline, Control Hub sends an instance of the pipeline to one Data Collector with all labels. Similarly, when you start a job that contains a Data Collector Edge pipeline, Control Hub sends an instance of the pipeline to one Data Collector Edge with all labels. The Data Collector or Data Collector Edge remotely runs the pipeline instance. You can increase the number of pipeline instances that Control Hub runs for a Data Collector or Data Collector Edge job.

In contrast, when you start a job that contains a Transformer pipeline, Control Hub sends an instance of the pipeline to one Transformer with all labels. The Transformer remotely runs the pipeline instance on Apache Spark deployed to a cluster. Because Transformer runs pipelines on Spark, Spark runs the application just as it runs any other application, distributing the processing across nodes in the cluster. You cannot increase the number of pipeline instances that Control Hub runs for a Transformer job.

When Data Protector is enabled for the organization, Control Hub uses the read and write protection policies specified in the job to alter and protect sensitive data.

To minimize downtime due to unexpected pipeline failures, enable pipeline failover for jobs.

When a Data Collector pipeline is configured to aggregate statistics, Control Hub also creates a system pipeline for the job and instructs one of the Data Collectors to run the system pipeline. The system pipeline collects, aggregates, and pushes metrics for all of the remote pipeline instances back to Control Hub so that you can monitor the progress of the job.

Note: At this time, Transformer jobs cannot be used with Data Protector and cannot be enabled for pipeline failover. Data Collector Edge and Transformer pipelines cannot be configured to aggregate statistics.

If a job includes a pipeline that uses runtime parameters, you specify the parameter values that the job uses for the pipeline instances. Or, you can enable the job to work as a job template. A job template lets you run multiple job instances with different runtime parameter values from a single job definition.

When you stop a job, Control Hub instructs all execution engines running pipelines for the job to stop the pipelines.

After you create jobs, you create a topology to map multiple related jobs into a single view. Topologies are the end-to-end view of multiple dataflows. From a single topology view, you can start, stop, monitor, and synchronize all jobs included in the topology.

Working with Jobs

The Jobs view lists all jobs and job templates that have been created for your organization.

You can complete the following tasks in the Jobs view:

  • View job details, including the pipeline version, the job status, and the execution engine that runs the pipeline.
  • Create jobs and job templates.
  • Start and stop jobs.
  • Create and start job instances from job templates.
  • Monitor active jobs.
  • Upgrade a job to use the latest pipeline version.
  • Reset the origin and metrics for jobs.
  • Enable pipeline failover for jobs.
  • Balance a job enabled for pipeline failover to redistribute the pipeline load across available execution engines.
  • Synchronize an active job after you update the labels assigned to execution engines.
  • Schedule jobs to start, stop, or upgrade on a regular basis.
  • Create a topology for selected jobs, as described in Creating a Topology for Multiple Jobs.
  • Import and export jobs and job templates.
  • Share a job or job template with other users and groups.
  • Delete jobs and job templates.

The following image shows a list of jobs in the Jobs view. Each job is listed with the job name, pipeline name, pipeline version, and job status:

Note the following icons that display for the Jobs view or when you hover over a single job or job template. You'll use these icons frequently as you manage jobs:

Icon Name Description
Add Job Add a job or job template.
Import Jobs Import jobs or job templates.
Refresh Refresh the list of jobs and job templates in the view.
Toggle Filter Column Toggle the display of the Filter column, where you can search for jobs and job templates by name or filter by execution engine, status, or assigned label.
Start Job Start the job or job template.
Synchronize Job Synchronize an active job after you have updated the labels assigned to execution engines.
Balance Job Balance a job enabled for pipeline failover to redistribute the pipeline load across available execution engines.
Stop Job Stop the job.
Acknowledge Error Acknowledge error messages for the job.
Schedule Job Schedule the job to start on a regular basis.
Share Share the job or job template with other users and groups, as described in Permissions.
New Pipeline Version Upgrade a job to use the latest pipeline version.
Edit Edit an inactive job or a job template.
Delete Delete an inactive job or a job template.
Export Jobs Export the selected jobs or job templates.

Requirement for Jobs

Before you create a job, you need to publish the pipeline that you want to use.

You can publish the pipeline in several different ways, depending on where the pipeline was developed: