Monitoring Data Collectors

When you view registered Data Collectors in the Execute view, you can monitor the performance of each Data Collector and the pipelines currently running on each Data Collector.

To monitor a Data Collector, simply expand the Data Collector details in the Execute > Data Collectors view.

Performance

When you view the details of a Data Collector version 3.4.0 or later in the Execute view, you can monitor the performance of the Data Collector. You can monitor the performance of both manually administered and automatically provisioned Data Collectors.

Note: Control Hub does not display performance information for earlier versions of Data Collector.
Control Hub displays the following performance information for Data Collectors:
CPU Load
Percentage of CPU being used by the Data Collector.
Memory Used
Amount of memory being used by the Data Collector out of the total amount of memory allocated to that Data Collector.
For example, let's say that a Data Collector displays the following value for Memory Used:
216.36 MB of 1038.88 MB
That means that the Data Collector is using 216.36 MB out of the total 1038.88 MB of memory allocated to that Data Collector in the Java heap size. You configure the Data Collector Java heap size in the SDC_JAVA_OPTS environment variable. For more information, see Java Heap Size in the Data Collector documentation.

You can sort the list of Data Collectors by the CPU load or by the memory usage so that you can easily determine which Data Collectors are using the most resources.

You can also analyze historical time series charts for the CPU load and memory usage. For example, you can view the performance information for the last hour or for the last seven days. The following image displays the location where you select a time period for analysis of the charts:

By default, registered Data Collectors send the CPU load and memory usage to Control Hub every minute. You can change the frequency with which each Data Collector sends this information to Control Hub by modifying the dpm.remote.control.status.events.interval property in the Control Hub configuration file, $SDC_CONF/dpm.properties.

Pipeline Status

When you view the details of a Data Collector in the Execute view, Control Hub displays the list of pipelines currently running on this Data Collector.

Control Hub can display the following types of running pipelines for each Data Collector:

Local pipelines
A local pipeline is a pipeline that is managed by a Data Collector and run locally on that Data Collector. Local pipelines should only be run on development Data Collectors. Use a Data Collector to start, stop, and monitor local pipelines. You can click the Monitor link for a local pipeline to log in to this Data Collector and display the local pipeline in Monitor mode.
Control Hub controlled pipelines
A Control Hub controlled pipeline is a pipeline that is managed by Control Hub and run remotely on registered Data Collectors. Control Hub controlled pipelines should only be run on execution Data Collectors. Control Hub controlled pipelines include the following:
  • Published pipelines run from Control Hub jobs.

    After you publish or import pipelines to Control Hub, you add them to a job, and then start the job. When you start a job on a group of Data Collectors, Control Hub remotely runs an instance of the published pipeline on each Data Collector. Use Control Hub to start, stop, and monitor published pipelines that are run from jobs.

    Control Hub uses the following format to name published pipelines:
    <pipeline name>:<job ID>:<organization ID>
  • System pipelines run from Control Hub jobs.

    Control Hub automatically generates and runs system pipelines to aggregate statistics for jobs. System pipelines collect, aggregate, and push metrics for all of the remote pipeline instances run from a job. When you start a job on a group of Data Collectors, Control Hub picks one Data Collector to run the system pipeline.

    Control Hub uses the following format to name system pipelines:
    System Pipeline for Job <job name>:<system job ID>:<organization ID>
    Note: Control Hub generates system pipelines as needed. Published pipelines that are not configured to aggregate statistics do not require system pipelines.

The following image shows the Pipeline Status area for a Data Collector that is currently running a local pipeline and two published pipelines:

Tip: As a best practice, use labels to separate development Data Collectors from execution Data Collectors. That way, you can ensure that published pipelines are only run on execution Data Collectors and not on a Data Collector that a developer is currently using to design pipelines and run pipelines locally.