Labels Overview

Use labels to group Data Collectors and Edge Data Collectors (SDC Edge) registered with Control Hub. For example, you can use labels to group Data Collectors and Edge Data Collectors by project, geographic region, environment, department, or any other classification you choose.

When you create a job, you tag labels to the job so that Control Hub knows on which group of Data Collectors or Edge Data Collectors the job should start.

You assign labels to the following execution components, using the same label for the components that you want to function as a group:

Data Collector
Each registered Data Collector is either an authoring Data Collector used to design all types of pipelines or an execution Data Collector used to run standalone and cluster pipelines.
Use labels to clearly designate which Data Collectors are dedicated to pipeline design. For example, assign an Authoring label to the authoring Data Collector used to design pipelines in Pipeline Designer. When you create jobs, avoid selecting the Authoring label to ensure that jobs are not started on the authoring Data Collector.
Use labels to group execution Data Collectors by any classification you choose.

When you start a job on a group of execution Data Collectors with the same label, any of the Data Collectors can run a pipeline instance for the job. As a result, all Data Collectors that function as a group must use the same Data Collector version and must have an identical configuration to ensure consistent processing.

Data Collector Edge (SDC Edge)
Each registered SDC Edge is an execution component used to run edge pipelines. Use labels to group Edge Data Collectors by any classification you choose.
When you start a job on a group of Edge Data Collectors with the same label, any of the Edge Data Collectors can run a pipeline instance for the job. As a result, all Edge Data Collectors that function as a group should use the same SDC Edge version and should have an identical configuration to ensure consistent processing.
Deployment
Labels that you assign to a deployment are assigned to all Data Collector containers provisioned by the deployment. After the provisioned Data Collectors are running and registered with Control Hub, you can assign additional labels to the provisioned Data Collectors just as you can for any registered Data Collector.
In most cases, a deployment automatically provisions execution Data Collectors used to run standalone or cluster pipelines. A deployment can also automatically provision authoring Data Collectors dedicated to pipeline design as long as the authoring Data Collectors are provisioned from a unique deployment that doesn't include any execution Data Collectors.
Use labels to clearly designate a deployment that provisions authoring Data Collectors. Use labels for deployments that provision execution Data Collectors by any classification you choose.

By default when you start a job, Control Hub runs one pipeline instance on an available Data Collector or SDC Edge. For example, if three Data Collectors have all of the specified labels for a job that contains a standalone pipeline, by default Control Hub runs one pipeline instance on the Data Collector running the fewest number of pipelines. You can increase the number of pipeline instances that Control Hub runs for a job.

You can include one or more backup Data Collectors in a group to support pipeline failover for jobs.

Labels and Pipeline Type

Control Hub determines whether to run a remote pipeline instance on Data Collector or SDC Edge based on the label and pipeline type.

When you start a job that contains a standalone or cluster pipeline, Control Hub runs a remote pipeline instance on Data Collectors with matching labels. When you start a job that contains an edge pipeline, Control Hub runs a remote pipeline instance on Edge Data Collectors with matching labels.

So if you assign an identical label to a Data Collector and to an SDC Edge, Control Hub does not run the same pipelines on both.

For example, you assign the label IoT to a Data Collector and to an SDC Edge. When you run a job with the IoT label that contains a standalone pipeline, Control Hub runs the pipeline on the Data Collector only. When you run a job with the IoT label that contains an edge pipeline, Control Hub runs the pipeline on the SDC Edge only.

Label Examples

Let's look at some ways that you can use labels to group Data Collectors and Edge Data Collectors:

Labels by geographic region
Your organization has multiple data centers located in different geographic regions, and one central location that manages the flow of data across all of the data centers. Data engineers in the central location design pipelines used for all of the data centers. You assign an Authoring label to the single authoring Data Collector that runs in the central location.
You create a unique label for each of your data centers to designate the execution Data Collectors that run in those data centers.
You assign the label WestDataCenter to the Data Collectors installed in the data center located in the western region, and assign the label EastDataCenter to the Data Collectors installed in the eastern data center. When you create jobs, you select the appropriate data center label to ensure that the jobs are started on the group of execution Data Collectors installed in that data center.
Labels by environment
Your organization uses development and test environments to design and test edge and standalone pipelines before replicating the final pipelines in the production environment. You assign an Authoring label to the single authoring Data Collector used to design the edge and standalone pipelines.
You create Test and Production labels to designate the execution Data Collectors and Edge Data Collectors that run pipelines in those environments.
You assign the Test label to execution Data Collectors and Edge Data Collectors used to run test pipelines. You assign the Production label to execution Data Collectors and Edge Data Collectors used to run production pipelines. When you create jobs, you select the appropriate label to ensure that the jobs are started in the correct environment.
Labels by project
Your organization has some pipelines used by the Marketing department, and another set of pipelines used by the Finance department. Data engineers use a single authoring Data Collector to design the pipelines for both departments. You assign an Authoring label to the single authoring Data Collector.
You need to run the pipelines for each department on a group of execution Data Collectors dedicated to that department.
You assign the Marketing label to the execution Data Collectors dedicated to the Marketing department, and assign the Finance label to the execution Data Collectors dedicated to the Finance department. When you create jobs, you select the appropriate department label to ensure that the jobs are started on the group of Data Collectors dedicated to that department.