Labels Overview

Use labels to group execution engines registered with Control Hub.

All execution engines of the same type that have the same label function as a group. When you create a job, you assign labels to the job so Control Hub knows the group of associated execution engines to run the job on. For example, if you assign a sales label to a Transformer job, then the job can run on any Transformer with a sales label.

When applying labels, you can use any classification structure that you choose. For example, you might use labels to group Data Collectors by environment, to group Edge Data Collectors by project, and to group Transformers by department.

By default when you start a job, Control Hub runs one pipeline instance on an available execution engine. For example, if three Data Collectors have all of the specified labels for a job that contains a standalone pipeline, by default Control Hub runs one pipeline instance on the Data Collector running the fewest number of pipelines.

You can increase the number of pipeline instances that Control Hub runs for a job for Data Collector and SDC Edge pipelines. For Transformer pipelines, Control Hub runs a single pipeline instance for each job.

You can include one or more backup Data Collectors in a group to support pipeline failover for jobs.

You assign labels to the following execution components, using the same label for the components that you want to function as a group:

Data Collector
Each registered Data Collector is either an authoring Data Collector used to design all types of pipelines or an execution Data Collector used to run standalone and cluster pipelines.
Use labels to clearly designate which Data Collectors are dedicated to pipeline design. For example, assign an Authoring label to the authoring Data Collector used to design pipelines in Pipeline Designer. When you create jobs, avoid selecting the Authoring label to ensure that jobs are not started on the authoring Data Collector.
Use labels to group execution Data Collectors by any classification you choose.

When you start a job on a group of execution Data Collectors with the same label, any of the Data Collectors can run a pipeline instance for the job. As a result, all Data Collectors that function as a group must use the same Data Collector version and must have an identical configuration to ensure consistent processing.

Transformer
Each registered Transformer can act as an authoring Transformer and as an execution engine used to run Transformer pipelines. Use labels to group Transformers by any classification you choose.

When you start a job, the Transformer with the same labels that is running the least number of pipelines runs the job. Since any Transformer in the group might run the job, all Transformers that function as a group must be the same Transformer version and have identical configuration to ensure consistent processing.

Data Collector Edge (SDC Edge)
Each registered SDC Edge is an execution engine used to run edge pipelines. Use labels to group Edge Data Collectors by any classification you choose.
When you start a job on a group of Edge Data Collectors with the same label, any of the Edge Data Collectors can run a pipeline instance for the job. As a result, all Edge Data Collectors that function as a group should use the same SDC Edge version and should have an identical configuration to ensure consistent processing.
Deployment
Labels that you assign to a deployment are assigned to all Data Collector containers provisioned by the deployment. After the provisioned Data Collectors are running and registered with Control Hub, you can assign additional labels to the provisioned Data Collectors just as you can for any registered Data Collector.
In most cases, a deployment automatically provisions execution Data Collectors used to run standalone or cluster pipelines. A deployment can also automatically provision authoring Data Collectors dedicated to pipeline design as long as the authoring Data Collectors are provisioned from a unique deployment that doesn't include any execution Data Collectors.
Use labels to clearly designate a deployment that provisions authoring Data Collectors. Use labels for deployments that provision execution Data Collectors by any classification you choose.

Labels and Pipeline Type

Control Hub determines the execution engine used to run a remote pipeline instance based on the pipeline type and the label assigned to the job and the execution engine.

For example, when you start a job for a Data Collector pipeline, Control Hub runs a remote pipeline instance on Data Collectors with labels that match those defined in the job.

Control Hub only runs a pipeline on the expected execution engine. That is, it won't run a Data Collector pipeline on a Transformer. You can, therefore, use the same labels on different execution engines, without worrying about whether Control Hub runs a pipeline on the wrong engine.

For example, you assign the label IoT to a Data Collector and to an SDC Edge. When you run a job with the IoT label that contains a standalone pipeline, Control Hub runs the pipeline on the Data Collector only. When you run a job with the IoT label that contains an edge pipeline, Control Hub runs the pipeline on the SDC Edge only.

Label Examples

Let's look at some ways that you can use labels to group execution engines:

Labels by geographic region
Your organization has multiple data centers located in different geographic regions, and one central location that manages the flow of data across all of the data centers. Data engineers in the central location design pipelines used for all of the data centers. You assign an Authoring label to the single authoring Data Collector that runs in the central location.
You create a unique label for each of your data centers to designate the execution Data Collectors that run in those data centers.
You assign the label WestDataCenter to the Data Collectors installed in the data center located in the western region, and assign the label EastDataCenter to the Data Collectors installed in the eastern data center. When you create jobs, you select the appropriate data center label to ensure that the jobs are started on the group of execution Data Collectors installed in that data center.
Labels by environment
Your organization uses development and test environments to design and test pipelines before replicating the final pipelines in the production environment. You assign an Authoring label to an authoring Data Collector used to design Data Collector and edge pipelines.
You create Test and Production labels to designate the execution Data Collectors and Edge Data Collectors that run pipelines in the two environments.
You assign the Test label to execution Data Collectors and Edge Data Collectors used to run test pipelines. You assign the Production label to execution Data Collectors and Edge Data Collectors used to run production pipelines. When you create jobs, you select the appropriate label to ensure that the jobs run in the correct environment.
Labels by project
Your organization needs to build some Transformer pipelines for the Marketing department and for the Finance department.
Since you can use the same Transformer for pipeline design and pipeline execution, you can skip the Authoring label. Instead, you assign the Marketing label to the Transformers dedicated to the Marketing department, and you assign the Finance label to the Transformers dedicated to the Finance department.
When designing pipelines, you use a Transformer with the Marketing label to design the Marketing pipelines, and a Transformer with the Finance label to design the Finance pipelines.
When you create jobs, you select the appropriate department label to ensure that the jobs run on one of Transformers dedicated to that department.