Glossary

Glossary of Terms

authoring Data Collector
A registered Data Collector dedicated to pipeline design. You can design pipelines in the Control Hub Pipeline Designer after selecting an available authoring Data Collector to use. The selected authoring Data Collector determines the stages, stage libraries, and functionality that display in Pipeline Designer.

Or, you can directly log into an authoring Data Collector to design pipelines.

batch
A set of records that passes through a pipeline. Data Collector processes data in batches.
CDC-enabled origin
An origin that can process changed data and place CRUD operation information in the sdc.operation.type record header attribute.
classification rules
As part of Data Protector, classification rules identify and categorize data, enabling data protection by protection policies.
classifiers
Classifiers are defined within Data Protector classification rules and specify the details of how to identify a category of data.
cluster execution mode
Pipeline execution mode that allows you to process large volumes of data from Kafka or HDFS.
cluster pipeline, cluster mode pipeline
A pipeline configured to run in cluster execution mode.
control character
A non-printing character in a character set, such as the acknowledgement or escape characters.
Control Hub controlled pipeline

A pipeline that is managed by Control Hub and run remotely on execution Data Collectors. Control Hub controlled pipelines include published and system pipelines run from jobs.

CRUD-enabled stage
A processor or destination that can use the CRUD operation written in the sdc.operation.type header attribute to write changed data.
data alerts
Alerts based on rules that gather information about the data that passes between two stages.
Data Collector configuration file (sdc.properties)
Configuration file with most Data Collector properties. Found in the following location:
$SDC_CONF/sdc.properties
Data Collector Edge (SDC Edge)

A lightweight agent without a UI that runs pipelines in edge execution mode on edge devices.

data delivery report
A report that presents data ingestion metrics for a given job or topology.
data SLA

A service level agreement that defines the data processing rates that jobs within a topology must meet.

data drift alerts
Alerts based on data drift functions that gather information about the structure of data that passes between two stages.
data preview
Preview of data as it moves through a pipeline. Use to develop and test pipelines.
dataflow triggers
Instructions for the pipeline to kick off asynchronous tasks in external systems in response to events that occur in the pipeline. For more information, see Dataflow Triggers Overview.
delivery guarantee
Pipeline property that determines how the Data Collector handles data when the pipeline stops unexpectedly.
deployment

A logical grouping of Data Collector containers deployed by a Provisioning Agent to a container orchestration system, such as Kubernetes. All Data Collector containers in a deployment are identical and highly available.

destination
A stage type used in a pipeline to represent where the Data Collector writes processed data.
development stages, dev stages
Stages such as the Dev Data Generator origin and the Dev Random Error processor that enable pipeline development and testing. Not meant for use in production pipelines.
edge pipeline, edge mode pipeline
A pipeline that runs in edge execution mode on a Data Collector Edge (SDC Edge) installed on an edge device. Use edge pipelines to read data from the edge device or to receive data from another pipeline and then act on that data to control the edge device.
event framework

The event framework enables the pipeline to trigger tasks in external systems based on actions that occur in the pipeline, such as running a MapReduce job after the pipeline writes a file to HDFS. You can also use the event framework to store event information, such as when an origin starts or completes reading a file.

event record
A record created by an event-generating stage when a stage-related event occurs, like when an origin starts reading a new file or a destination closes an output file.
execution Data Collector
A registered Data Collector dedicated to running pipelines from Control Hub jobs.
executor
A stage type used to perform tasks in external systems upon receiving an event record.
explicit validation
A semantic validation that checks all configured values for validity and verifies whether the pipeline can run as configured. Occurs when you click the Validate icon, request data preview, or start the pipeline.
field path
The path to a field in a record. Use to reference a field.
implicit validation
Lists missing or incomplete configuration. Occurs by default as Data Collector saves your changes in the pipeline canvas.
job

The execution of a dataflow. A job defines the pipeline to run and the Data Collectors that run the pipeline.

job template
A job definition that lets you run multiple job instances with different runtime parameter values. When creating a job, you can enable the job to work as a job template if the job includes a pipeline that uses runtime parameters.
label

A grouping of Data Collectors registered with Control Hub. You assign labels to each Data Collector, using the same label for Data Collectors that you want to function as a group. When you create a job, you tag labels to the job so that Control Hub knows on which group of Data Collectors the job should start.

late directories
Origin directories that appear after a pipeline starts.
local pipeline

A pipeline that is managed by a Data Collector and run locally on that Data Collector. Use a Data Collector to start, stop, and monitor local pipelines.

metric alerts
Email alerts based on stage or pipeline metrics.
microservice pipeline
A pipeline that creates a fineĀ­grained service to perform a specific task.
multithreaded pipelines
A pipeline with an origin that generates multiple threads, enabling the processing of high volumes of data in a single pipeline on one Data Collector.
organization

A secure space provided to a set of user accounts from an enterprise. All Data Collectors, pipelines, fragments, jobs, and topologies added by any user in the organization belong to that organization. A user logs in to Control Hub as a member of an organization and can access data that belongs to that organization only.

organization administrator

A user account that has the Organization Administrator role for any organization, allowing the user to perform administrative tasks for the organization.

origin
A stage type used in a pipeline to represent the source of data in a pipeline.
pipeline
A representation of a stream of data that is processed by the Data Collector.
Pipeline Designer
The Control Hub pipeline development tool. Based on the Data Collector pipeline configuration canvas, you can use Pipeline Designer to design and publish Data Collector and SDC Edge pipelines and fragments.
pipeline fragment

A stage or set of connected stages that you can reuse in pipelines. Use pipeline fragments to easily add the same processing logic to multiple pipelines.

pipeline label

A label that enables grouping similar pipelines or pipeline fragments. Use pipeline labels to easily search and filter pipelines and fragments when viewing them in the pipeline repository.

pipeline repository

The Control Hub repository that stores all pipelines and fragments designed in the Control Hub Pipeline Designer and all pipelines published or imported from an authoring Data Collector. The pipeline repository maintains a version history of all published and imported pipelines and fragments.

pipeline runner
Used in multithreaded pipelines to run a sourceless instance of a pipeline.
pipeline tag

A pointer to a specific commit or version in the Control Hub pipeline repository.

preconditions
Conditions that a record must satisfy to enter the stage for processing. Records that don't meet all preconditions are processed based on stage error handling.
procedures
Procedures are defined as part of a Data Protector protection policy. They specify how data is altered and protected.
processors
A stage type that performs specific processing on pipeline data.
protection policies
As part of Data Protector, protection policies alter and protect data in motion. Policies are applied to jobs and can be used upon read or write.
Provisioning Agent

A containerized application that runs in a container orchestration framework, such as Kubernetes. The agent communicates with Control Hub to automatically provision Data Collector containers in the Kubernetes cluster in which it runs. Provisioning includes deploying, registering, starting, scaling, and stopping the Data Collector containers.

published pipeline

A pipeline that has been published or imported to Control Hub. Publish a pipeline before creating a job for the pipeline.

required fields
A required field is a field that must exist in a record to allow it into the stage for processing. Records that don't have all required fields are processed based on pipeline error handling.
RPC ID
A user-defined identifier configured in the SDC RPC origin and destination to allow the destination to write to the origin.
runtime parameters
Parameters that you define for the pipeline and call from within that same pipeline. Use to specify values for pipeline properties when you start the pipeline.
runtime properties
Properties that you define in a file local to the Data Collector and call from within a pipeline. Use to define different sets of values for different Data Collector instances.
runtime resources
Values that you define in a restricted file local to the Data Collector and call from within a pipeline. Use to load sensitive information from files at runtime.
scheduled task
A long-running task that periodically triggers an action on other Control Hub tasks at the specified frequency. For example, a scheduled task can start or stop a job or generate a data delivery report on a weekly or monthly basis.
SDC Record data format
A data format used for Data Collector error records and an optional format to use for output records.
SDC RPC pipelines
A set of pipelines that use the SDC RPC destination and SDC RPC origin to pass data from one pipeline to another without writing to an intermediary system.
security violation destination
A Data Protector protection policy feature that enables writing records with classified but unprotected fields to a destination for review.
sourceless pipeline instance
An instance of the pipeline that includes all of the processors and destinations in the pipeline and represents all pipeline processing after the origin. Used in multithreaded pipelines.
standalone pipeline, standalone mode pipeline
A pipeline configured to run in the default standalone execution mode.
subscription
An object that listens for Control Hub events and then completes an action when those events occur.
system Data Collector
The default authoring Data Collector provided with the Control Hub Pipeline Designer for exploration and light development. The system Data Collector cannot be used to perform data preview or explicit pipeline validation.
system pipeline

A pipeline that Control Hub automatically creates when the published pipeline included in a job is configured to aggregate statistics. System pipelines collect, aggregate, and push metrics for all of the remote pipeline instances to Control Hub.

topology

An interactive end-to-end view of data as it traverses multiple pipelines that work together. You can map all data flow activities that serve the needs of one business function in a single topology.