Administration

Viewing Data Collector Configuration Properties

To view Data Collector configuration properties, click Administration > Configuration.

For details about the configuration properties or to edit the configuration file, see Configuring Data Collector.

Viewing Data Collector Directories

You can view the directories that the Data Collector uses. You might check the directories being used to access a file in the directory or to increase the amount of available space for a directory.

Data Collector directories are defined in environment variables. For more information, see Data Collector Environment Configuration.

To view Data Collector directories, click Administration > SDC Directories.

The following table describes the Data Collector directories that display:
Directory Includes Environment Variable
Runtime Base directory for Data Collector executables and related files. SDC_DIST
Configuration The Data Collector configuration file, sdc.properties, and related realm properties files and keystore files.

Also includes the logj4 properties file.

SDC_CONF
Data Pipeline configuration and run details. SDC_DATA
Log Data Collector log file, sdc.log. SDC_LOG
Resources Directory for runtime resource files. SDC_RESOURCES

Viewing Data Collector Metrics

You can view metrics about Data Collector, such as the CPU usage or the number of pipeline runners in the thread pool.

  1. To view Data Collector metrics, click Administration > SDC Metrics.
    The Data Collector Metrics page displays all metrics by default.
  2. To modify the metrics that display on the page, click the More icon, and then click Settings.
  3. Remove any metric charts that you don't want to display, and then click Save.

Viewing Data Collector Logs

You can view and download log data. When you download log data, you can select the file to download.

  1. To view log data for the Data Collector, click Administration > Logs.
    The Data Collector UI displays roughly 50,000 characters of the most recent log information.
  2. To stop the automatic refresh of log data, click Stop Auto Refresh.
    Or, click Start Auto Refresh to view the latest data.
  3. To view earlier events, click Load Previous Logs.
  4. To download the latest log file, click Download. To download a specific log file, click Download > <file name>.
    The most recent information is in the file with the highest number.

Modifying the Log Level

If the Data Collector logs do not provide enough troubleshooting information, you can modify the log level to display messages at another severity level.

By default, Data Collector logs messages at the INFO severity level. You can configure the following log levels:
  • TRACE
  • DEBUG
  • INFO (Default)
  • WARN
  • ERROR
  • FATAL
  1. Click Administration > Logs.
  2. Click Log Config.
    Data Collector displays the contents of the log configuration file, $SDC_CONF/sdc-log4j.properties.
  3. Change the default value of INFO for the following lines in the file:
    log4j.logger.com.streamsets.pipeline=INFO
    log4j.logger.com.streamsets.datacollector=INFO

    For example, to set the log level to DEBUG, modify the lines as follows:

    log4j.logger.com.streamsets.pipeline=DEBUG
    log4j.logger.com.streamsets.datacollector=DEBUG
  4. Click Save.
    The changes that you make to the log level take effect immediately - you do not need to restart Data Collector. You can also change the log file by directly editing the log configuration file, $SDC_CONF/sdc-log4j.properties.
    Note: For a Cloudera Manager installation, use Cloudera Manger to modify the log level. In Cloudera Manager, select the StreamSets service, then click Configuration. Click Category > Logs, and then modify the value of the Data Collector Logging Threshold property.

When you’ve finished troubleshooting, set the log level back to INFO to avoid having verbose log files.

Shutting Down Data Collector

You can shut down and then manually launch Data Collector to apply changes to the Data Collector configuration file, environment configuration file, or user logins.
To use the command line for shutdown when Data Collector is started as a service, use the required command for your operating system:
  • For CentOS 6, Red Hat Enterprise Linux 6, or Ubuntu 14.04 LTS, use: service sdc stop

  • For CentOS 7, Red Hat Enterprise Linux 7, or Ubuntu 16.04 LTS, use: systemctl stop sdc

To use the command line for shutdown when Data Collector is started manually, use the Data Collector process ID in the following command:
kill -15 <process ID>

To use the Data Collector UI for shutdown:

  1. Click Administration > Shut Down.
  2. When a confirmation dialog box appears, click Yes.

Restarting Data Collector

You can restart Data Collector to apply changes to the Data Collector configuration file, environment configuration file, or user logins. During the restart process, Data Collector shuts down and then automatically restarts.
Note: If you installed Data Collector through Cloudera Manager, you must use Cloudera Manager to restart Data Collector. For information about how to restart a service through Cloudera Manager, see the Cloudera documentation. If you run Data Collector from Docker, you must restart Data Collector with the following Docker command: docker restart.
  1. Click Administration > Restart.
  2. When a confirmation dialog box appears, click Yes.
    The restart process can take a few moments to complete. Refresh the browser to log in again.

Viewing Users and Groups

If you use file-based authentication, you can view all user accounts granted access to this Data Collector instance, including the roles and groups assigned to each user.

To view users and groups, click Administration > Users and Groups. Data Collector displays a read-only view of the users, groups, and roles.

You configure users, groups, and roles for file-based authentication in the associated realm.properties file located in the Data Collector configuration directory, $SDC_CONF. For more information, see Configuring File-Based Authentication.

Note: If the Data Collector is registered with StreamSets Control Hub and you click Administration > Users and Groups, the Data Collector logs you into Control Hub and displays the Users view within Control Hub. Registered Data Collectors use Control Hub user authorization. For more information, see Register Data Collector with Control Hub.

Support Bundles

You can use Data Collector to generate a support bundle. A support bundle is a ZIP file that includes Data Collector logs, environment and configuration information, pipeline JSON files, resource files, and pipeline snapshots. You upload the generated file to the StreamSets support team so that we can use the information to troubleshoot your support tickets. Or, you can download the generated file and then send the file to another StreamSets community member.

Data Collector uses several generators to create a support bundle. Each generator bundles different types of information. You can choose to use all or some of the generators.

Each generator automatically redacts all passwords entered in pipelines, configuration files, or resource files. The generators replace all passwords with the text "REDACTED" in the generated files. You can customize the generators to redact other sensitive information, such as machine names or usernames.

Before uploading a generated ZIP file to support, we recommend verifying that the file does not include any other sensitive information that you do not want to share.

Generators

Data Collector can use the following generators to create a support bundle:

Generator Description
SDC Info Includes the following information:
  • Data Collector configuration files.
  • Permissions granted to users on Data Collector directories.
  • Data Collector environment configuration file.
  • Data Collector version and system properties for the machine where Data Collector is installed.
  • Data Collector runtime information including pipeline metrics and a thread dump.
Pipelines Includes the following JSON files for each pipeline:
  • history.json
  • info.json
  • offset.json
  • pipeline.json

By default, all Data Collector pipelines are included in the bundle.

Logs Includes the most recent content of the following log files:
  • Garbage collector log - gc.log
  • Data Collector log - sdc.log
Snapshots Includes snapshots created for each pipeline.

In addition, Data Collector always generates the following files when you create a support bundle:

  • metadata.properties - ID and version of the Data Collector that generated the bundle.
  • generators.properties - List of generators used for the bundle.

Generating a Support Bundle

When you generate a support bundle, you choose the information to include in the bundle. Only users with the admin role can generate support bundles.

You can download the bundle to verify the contents or you can directly upload the bundle to the StreamSets support team.
Note: If needed, you can disable the ability to automatically upload bundles to StreamSets by modifying the bundle.upload.enabled property in the Data Collector configuration file, $SDC_CONF/sdc.properties. For more information, see Configuring Data Collector.
  1. Click the Help icon, and then click Support Bundle.
  2. Select the generators that you want to use.
  3. Click one of the following options:
    • Download - Generates the support bundle and saves the ZIP file to your default downloads directory.

      Use to verify that the ZIP file does not include other sensitive information that you do not want to share. For example, you might want to remove the pipelines not associated with your support ticket. By default, all Data Collector pipelines are included in the bundle. If you modify the ZIP file in any way, you must manually upload the file to StreamSets support.

    • Upload - Generates the support bundle and automatically uploads the ZIP file to the StreamSets support team.

Customizing Generators

By default, the generators redact all passwords entered in pipelines, configuration files, or resource files. You can customize the generators to redact other sensitive information, such as machine names or usernames.

To customize the generators, modify the support bundle redactor file, $SDC_CONF/support-bundle-redactor.json. The file contains rules that the generators use to redact sensitive information. Each rule contains the following information:

  • description - Description of the rule.
  • trigger - String constant that triggers a redaction. If a line contains this trigger string, then the redaction continues by applying the regular expression specified in the search property.
  • search - Regular expression that defines the sub-string to redact.
  • replace - String to replace the redacted information with.
You can add additional rules that the generators use to redact information. For example, to customize the generators to redact the names of all machines in the StreamSets domain, add the following rule to the file:
{
"description": "Custom domain names",
"trigger": ".streamsets.com",
"search": "[a-z_-]+.streamsets.com",
"replace": "REDACTED.streamsets.com"
}

REST Response

You can view REST response JSON data for different aspects of the Data Collector, such as pipeline configuration information or monitoring details.

You can use the REST response information to provide Data Collector details to a REST-based monitoring system. Or you might use the information in conjunction with the Data Collector REST API.

You can access the following REST response data:
  • Pipeline Configuration - Provides information about the pipeline and each stage in the pipeline.
  • Pipeline Rules - Provides information about metric and data rules and alerts.
  • Definitions - Provides information about all available Data Collector stages.
  • Preview Data- Provides information about the preview data moving through the pipeline. Also includes monitoring information that is not used in preview.
  • Pipeline Monitoring - Provides monitoring information for the pipeline.
  • Pipeline Status - Provides the current status of the pipeline.
  • Data Collector Metrics - Provides metrics about Data Collector.
  • Thread Dump - Lists all active Java threads used by Data Collector.

Viewing REST Response Data

You can view REST response data from the location where the relevant information displays. For example, you can view Data Collector Metrics REST response data from the Data Collector Metrics page.

You can view REST response data from the following locations:
Edit mode
From the Properties panel, you can use the More icon () to view the following REST response data:
  • Pipeline Configuration
  • Pipeline Rules
  • Pipeline Status
  • Definitions
Preview mode
From the Preview panel, you can use the More icon to view the Preview Data REST response data.
Monitor mode
From the Monitor panel, you can use the More icon to view the following REST response data:
  • Pipeline Monitoring
  • Pipeline Configuration
  • Pipeline Rules
  • Pipeline Status
  • Definitions
Data Collector Metrics page
From the Data Collector Metrics page, Administration > SDC Metrics , you can use the More icon to view the following REST response data:
  • Data Collector Metrics
  • Thread Dump

Disabling the REST Response Menu

You can configure the Data Collector to disable the display of REST responses.

  1. To disable the REST Response menus, click the Help icon, and then click Settings.
  2. In the Settings window, select Hide the REST Response Menu.

Command Line Interface

Data Collector provides a command line interface that includes a basic cli command. Use the command to perform some of the same actions that you can complete from the Data Collector UI. Data Collector must be running before you can use the cli command.

You can use the following commands with the basic cli command:
help
Provides information about each command or subcommand.
manager
Provides the following subcommands:
  • start - Starts a pipeline.
  • status - Returns the status of a pipeline.
  • stop - Stops a pipeline.
  • reset-origin - Resets the origin when possible.
  • get-committed-offsets - Returns the last-saved offset for pipeline failover.
  • update-committed-offsets - Updates the last-saved offset for pipeline failover.
store
Provides the following subcommands:
  • import - Imports a pipeline.
  • list - Lists information for all available pipelines.
system
Provides the following subcommands:
  • enableDPM - Register the Data Collector with StreamSets Control Hub.
  • disableDPM - Unregister the Data Collector from Control Hub.

Java Configuration Options for the Cli Command

Use the SDC_CLI_JAVA_OPTS environment variable to modify Java configuration options for the cli command.

For example, to set the -Djavax.net.ssl.trustStore option for the cli command when using Data Collector with HTTPS, run the following command:

export SDC_CLI_JAVA_OPTS="-Djavax.net.ssl.trustStore=<path to truststore file> ${SDC_CLI_JAVA_OPTS}"

Using the Cli Command

Call the cli command from the $SDC_DISTdirectory.

Use the following command as the base for all cli commands:
bin/streamsets cli \
(-U <sdcURL> | --url <sdcURL>) \
[(-a <sdcAuthType> | --auth-type <sdcAuthType>)] \
[(-u <sdcUser> | --user <sdcUser>)] \
[(-p <sdcPassword> | --password <sdcPassword>)] \
[(-D <dpmURL> | --dpmURL <dpmURL>)] \
<command> <subcommand> [<args>] 

The usage of the basic command options depends on whether or not the Data Collector is registered with Control Hub.

Not Registered with Control Hub

The following table describes the options for the basic command when the Data Collector is not registered with Control Hub:
Option Description
-U <sdcURL>

or

--url <sdcURL>
Required. URL of the Data Collector.

The default URL is http://localhost:18630/.

-a <sdcAuthType>

or

--auth-type <sdcAuthType>
Optional. HTTP authentication type used by the Data Collector.
-u <sdcUser>

or

--user <sdcUser>

Optional. User name to use to log in. The roles assigned to the user account determine the tasks that you can perform.

If you omit this option, the Data Collector allows admin access.

-p <sdcPassword>

or

--password <sdcPassword>

Optional. Required when you enter a user name. Password for the user account.
-D <dpmURL>

or

--dpmURL <dpmURL>
Not applicable. Do not use when the Data Collector is not registered with Control Hub.
<command> Required. Command to perform.
<subcommand> Required for all commands except help. Subcommand to perform.
<args> Optional. Include arguments and options as needed.

Registered with Control Hub

The following table describes the options for the basic command when the Data Collector is registered with Control Hub:
Option Description
-U <sdcURL>

or

--url <sdcURL>
Required. URL of the Data Collector.

The default URL is http://localhost:18630/.

-a <sdcAuthType>

or

--auth-type <sdcAuthType>
Required. Authentication type used by the Data Collector. Set to dpm.

If you omit this option, Data Collector uses the Form authentication type, which causes the command to fail.

-u <sdcUser>

or

--user <sdcUser>

Required. User account to log in. Enter your Control Hub user ID using the following format:
<ID>@<organization ID>

The roles assigned to the Control Hub user account determine the tasks that you can perform.

If you omit this option, Data Collector uses the admin user account, which causes the command to fail.

-p <sdcPassword>

or

--password <sdcPassword>

Required. Enter the password for your Control Hub user account.
-D <dpmURL>

or

--dpmURL <dpmURL>
Required. Set to: https://cloud.streamsets.com.
<command> Required. Command to perform.
<subcommand> Required for all commands except help. Subcommand to perform.
<args> Optional. Include arguments and options as needed.

Help Command

Use the help command to view additional information for the specified command.

For additional information for each command, including the available arguments, use the help command as follows:
bin/streamsets cli \
(-U <sdcURL> | --url <sdcURL>) \
[(-a <sdcAuthType> | --auth-type <sdcAuthType>)] \
[(-u <sdcUser> | --user <sdcUser>)] \
[(-p <sdcPassword> | --password <sdcPassword>)] \
[(-D <dpmURL> | --dpmURL <dpmURL>)] \
help <command> [<subcommand>]
For example, the following command displays the details for the manager command. Use the same command options when the Data Collector is registered or is not registered with Control Hub:
bin/streamsets cli -U http://localhost:18630 help manager

Manager Command

The manager command provides subcommands to start and stop a pipeline, view the status of all pipelines, and reset the origin for a pipeline. It can also be used to get the last-saved offset and to update the last-saved offset for a pipeline.

The manager command returns the pipeline status object after it successfully completes the specified subcommand. The following is a sample of the pipeline status object:
{
  "user" : "admin",
  "name" : "MyPipelinejf45e1f1-dfc1-402c-8587-918bc6e831db",
  "pipelineID" : "MyPipelinejf45e1f1-dfc1-402c-8587-918bc6e831db",
  "rev" : "0",
  "status" : "STOPPING",
  "message" : null,
  "timeStamp" : 1447116703147,
  "attributes" : { },
  "executionMode" : "STANDALONE",
  "metrics" : null,
  "retryAttempt" : 0,
  "nextRetryTimeStamp" : 0
}

Note that the timestamp is in the Long data format.

You can use the following manager subcommands:

start
Starts a pipeline. Returns the pipeline status when successful.
The start subcommand uses the following syntax:
manager start \
(-n <pipelineID> | --name <pipelineID>) \
[(-r <pipelineRev> | --revision <pipelineRev>)] \
[--stack] \
[(-R <runtimeParametersString> | --runtimeParameters <runtimeParametersString>)]
Start Option Description
-n <pipelineID>

or

--name <pipelineID>

Required. ID of the pipeline to start.

Data Collector generates the ID when the pipeline is created. Data Collector uses the alphanumeric characters entered for the pipeline title as a prefix for the generated pipeline ID.

-r <pipelineRev>

or

-- revision <pipelineRev>

Optional. The revision of the pipeline. Use to start an older version of the pipeline.

By default, the Data Collector starts the most recent version.

--stack Optional. Returns additional information when the Data Collector cannot start the pipeline.

Use to debug the problem or pass to StreamSets for help.

-R <runtimeParametersString>

or

--runtimeParameters <runtimeParametersString>

Optional. Runtime parameter values to start the pipeline with. Overrides the parameter default values defined for the pipeline.
Enter the runtime parameters using the following format:
'
{"<runtime parameter1": "<value1>", "<runtime parameter2>": "<value2>"}
'
For example:
'
{"RootDir": "/error", "JDBCConnection": "jdbc:mysql://localhost:3306/customers"}
'
For example, the following command starts the pipeline with an ID of MyPipelinejf45e1f1-dfc1-402c-8587-918bc6e831db when the Data Collector is not registered with Control Hub:
bin/streamsets cli -U http://localhost:18630 manager start -n MyPipelinejf45e1f1-dfc1-402c-8587-918bc6e831db
The following command starts the same pipeline when the Data Collector is registered with Control Hub:
bin/streamsets cli -U http://localhost:18630 -a dpm -u user1@mycompany -p MyPassword \
--dpmURL https://cloud.streamsets.com manager start -n MyPipelinejf45e1f1-dfc1-402c-8587-918bc6e831db
The following command starts the first version of the same pipeline when the Data Collector is not registered with Control Hub:
bin/streamsets cli -U http://localhost:18630 manager start -n MyPipeilnejh65k1f8-dfc1-603h-8124-718nj6e561db -r 1
stop
Stops a pipeline. Returns the pipeline status when successful.
The stop subcommand uses the following syntax:
manager stop \
[--forceStop] \
(-n <pipelineID> | --name <pipelineID>) \
[(-r <pipelineRev> | --revision <pipelineRev>)] \
[--stack] 
Stop Option Description
--forceStop Optional. Forces the pipeline to stop immediately.

In some situations, a pipeline can remain in a Stopping state for up to five minutes. For example, if a scripting processor in the pipeline includes code with a timed wait or an infinite loop, Data Collector waits for five minutes before forcing the pipeline to stop.

-n <pipelineID>

or

--name <pipelineID>

Required. ID of the pipeline to stop.

Data Collector generates the ID when the pipeline is created. Data Collector uses the alphanumeric characters entered for the pipeline title as a prefix for the generated pipeline ID.

-r <pipelineRev>

or

-- revision <pipelineRev>

Optional. The revision of the pipeline. Use to stop an older version of the pipeline.

By default, the Data Collector stops the most recent version.

--stack Optional. Returns additional information when the Data Collector cannot stop the pipeline.

Use to debug the problem or pass to StreamSets for help.

For example, the following command stops the pipeline with an ID of MyPipelinejf45e1f1-dfc1-402c-8587-918bc6e831db when the Data Collector is not registered with Control Hub:
bin/streamsets cli -U http://localhost:18630 manager stop -n MyPipelinejf45e1f1-dfc1-402c-8587-918bc6e831db
The following command stops the same pipeline when the Data Collector is registered with Control Hub:
bin/streamsets cli -U http://localhost:18630 -a dpm -u user1@mycompany -p MyPassword \
--dpmURL https://cloud.streamsets.com manager stop -n MyPipelinejf45e1f1-dfc1-402c-8587-918bc6e831db
The following command forces the first version of the same pipeline to stop immediately when the Data Collector is not registered with Control Hub:
bin/streamsets cli -U http://localhost:18630  manager stop --forceStop -n MyPipelinejh65k1f8-dfc1-603h-8124-718nj6e561db -r 1
status
Returns the status of a pipeline. Returns the pipeline status when successful.
The status subcommand uses the following syntax:
manager status \
(-n <pipelineID> | --name <pipelineID>) \
[(-r <pipelineRev> | --revision <pipelineRev>)] \
[--stack] 
Status Option Description
-n <pipelineID>

or

--name <pipelineID>

Required. ID of the pipeline.

Data Collector generates the ID when the pipeline is created. Data Collector uses the alphanumeric characters entered for the pipeline title as a prefix for the generated pipeline ID.

-r <pipelineRev>

or

-- revision <pipelineRev>

Optional. The revision of the pipeline. Use for older versions of the pipeline.

By default, the Data Collector returns information for the most recent version.

--stack Optional. Returns additional information when the Data Collector cannot return the status of the pipeline.

Use to debug the problem or pass to StreamSets for help.

For example, the following command returns the status of the pipeline with an ID of MyPipelinejf45e1f1-dfc1-402c-8587-918bc6e831db when the Data Collector is not registered with Control Hub:
bin/streamsets cli -U http://localhost:18630  manager status -n MyPipelinejf45e1f1-dfc1-402c-8587-918bc6e831db
The following command returns the status of the same pipeline when the Data Collector is registered with Control Hub:
bin/streamsets cli -U http://localhost:18630  -a dpm -u user1@mycompany -p MyPassword \
--dpmURL https://cloud.streamsets.com manager status -n MyPipelinejf45e1f1-dfc1-402c-8587-918bc6e831db
The following command returns the status of the first version of the same pipeline when the Data Collector is not registered with Control Hub:
bin/streamsets cli -U http://localhost:18630  manager status -n MyPipelinejh65k1f8-dfc1-603h-8124-718nj6e561db -r 1
reset-origin
Resets the origin of a pipeline. Use for pipeline origins that can be reset. Some pipeline origins cannot be reset. Returns the pipeline status when successful.
The reset-origin subcommand uses the following syntax:
manager reset-origin \
(-n <pipelineID> | --name <pipelineID>) \
[(-r <pipelineRev> | --revision <pipelineRev>)] \
[--stack]
Reset Origin Option Description
-n <pipelineID>

or

--name <pipelineID>

Required. ID of the pipeline to reset the origin.

Data Collector generates the ID when the pipeline is created. Data Collector uses the alphanumeric characters entered for the pipeline title as a prefix for the generated pipeline ID.

-r <pipelineRev>

or

-- revision <pipelineRev>

Optional. The revision of the pipeline. Use to reset the origin for an older version of the pipeline.

By default, the Data Collector resets the origin for the most recent version.

--stack Optional. Returns additional information when the Data Collector cannot reset the origin.

Use to debug the problem or pass to StreamSets for help.

For example, the following command resets the origin of the pipeline with an ID of MyPipelinejf45e1f1-dfc1-402c-8587-918bc6e831db when the Data Collector is not registered with Control Hub:
bin/streamsets cli -U http://localhost:18630  manager reset-origin -n MyPipelinejf45e1f1-dfc1-402c-8587-918bc6e831db
The following command resets the origin of the same pipeline when the Data Collector is registered with Control Hub:
bin/streamsets cli -U http://localhost:18630  -a dpm -u user1@mycompany -p MyPassword \
--dpmURL https://cloud.streamsets.com manager reset-origin -n MyPipelinejf45e1f1-dfc1-402c-8587-918bc6e831db
get-committed-offsets
Returns the last-saved offset for a pipeline with an origin that saves offsets. Some origins, such as the HTTP Server, have no need to save offsets.
Pipeline offsets are managed by Data Collector. There's no need to get or replace the last-saved offset unless implementing pipeline failover using an external storage system.
When implementing pipeline failover, use this subcommand to store the last-saved offset to a file. When necessary, you can use the update-committed-offsets command to update the pipeline offset with the contents of the file.
The get-committed-offsets subcommand uses the following syntax:
manager get-committed-offsets \
(-n <pipelineID> | --name <pipelineID>) \
[(-r <pipelineRev> | --revision <pipelineRev>)] \
[--stack] 
Get Offset Option Description
-n <pipelineID>

or

--name <pipelineID>

Required. ID of the pipeline.

Data Collector generates the ID when the pipeline is created. Data Collector uses the alphanumeric characters entered for the pipeline title as a prefix for the generated pipeline ID.

-r <pipelineRev>

or

-- revision <pipelineRev>

Optional. The revision of the pipeline. Use to reset the origin for an older version of the pipeline.

By default, the Data Collector uses the most recent version.

--stack Optional. Returns additional information when the Data Collector cannot retrieve the last-saved offset.

Use to debug the problem or pass to StreamSets for help.

For example, the following command returns the last-saved offset for a pipeline with an ID of MyPipelinejf45e1f1-dfc1-402c-8587-918bc6e236lc when the Data Collector is not registered with Control Hub:
bin/streamsets cli -U http://localhost:18630 manager get-committed-offsets \
 -n MyPipelinejf45e1f1-dfc1-402c-8587-918bc6e236lc
The following command returns the last-saved offset of the same pipeline when the Data Collector is registered with Control Hub:
bin/streamsets cli -U http://localhost:18630 -a dpm -u user1@mycompany -p MyPassword \
--dpmURL https://cloud.streamsets.com manager get-committed-offsets -n MyPipelinejf45e1f1-dfc1-402c-8587-918bc6e236lc
update-committed-offsets
Updates the last-saved offset for a pipeline with an origin that saves offsets. Some origins, such as the HTTP Server, have no need to save offsets.
Pipeline offsets are managed by Data Collector. There's no need to update the last-saved offset unless performing pipeline failover from a file that contains the last-saved offset stored by using get-committed-offsets.
Change the last-saved offset with great caution and only when the pipeline is not running.
The updated-committed-offsets subcommand uses the following syntax:
manager update-committed-offsets \
(-f <fileName> | --file <fileName>) \
(-n <pipelineID> | --name <pipelineID>) \
[(-r <pipelineRev> | --revision <pipelineRev>)] \
[--stack] 
Update Offset Option Description
-f <fileName>

or

--file <fileName>

Required. Relative or absolute path to the file that contains the last-saved offset.

The file should contain only the last-saved offset retrieved by using the get-committed-offset subcommand.

-n <pipelineID>

or

--name <pipelineID>

Required. ID of the pipeline.

Data Collector generates the ID when the pipeline is created. Data Collector uses the alphanumeric characters entered for the pipeline title as a prefix for the generated pipeline ID.

-r <pipelineRev>

or

-- revision <pipelineRev>

Optional. The revision of the pipeline. Use to update the last-saved offset for an older version of the pipeline.

By default, the Data Collector uses the most recent version.

--stack Optional. Returns additional information when the Data Collector cannot update the last-saved offset.

Use to debug the problem or pass to StreamSets for help.

For example, the following command updates the last-saved offset for a pipeline using the offset in the specified file when the Data Collector is not registered with Control Hub:
bin/streamsets cli -U http://localhost:18630 manager get-committed-offsets \
-f /sdc/offsetfiles/mypipeline/offset.txt -n MyPipelinejf45e1f1-dfc1-402c-8587-918bc6e236lc
The following command updates the last-saved offset for the same pipeline using the offset in the specified file when the Data Collector is registered with Control Hub:
bin/streamsets cli -U http://localhost:18630 -a dpm -u user1@mycompany -p MyPassword \
--dpmURL https://cloud.streamsets.com manager get-committed-offsets -f /sdc/offsetfiles/mypipeline/offset.txt \
-n MyPipelinejf45e1f1-dfc1-402c-8587-918bc6e236lc

Store Command

The store command provides subcommands to view a list of all pipelines and to import a pipeline.

You can use the following subcommands with the store command:
list
Lists all available pipelines. The list subcommand uses the following syntax:
store list
Returns the following information for each pipeline:
 {
  "name" : "<pipeline ID>",
  "pipelineId" : "<pipeline ID>",
  "title" : "<pipeline title>",
  "description" : "< >",
  "created" : <created time>,
  "lastModified" : <last modified time>,
  "creator" : "admin",
  "lastModifier" : "admin",
  "lastRev" : "0",
  "uuid" : "<internal ID used for optimistic locking>",
  "valid" : true,
  "metadata" : {
    "labels" : [ ],
    "dpm.pipeline.id" : "<Control Hub pipeline ID>:<organization name>",
    "dpm.pipeline.version" : "<published pipeline version>"
  }
},
For example, the following command lists all pipelines associated with the Data Collector when it is not registered with Control Hub:
bin/streamsets cli -U http://localhost:18630 store list
The following command lists all pipelines associated with the Data Collector when it is registered with Control Hub:
bin/streamsets cli -U http://localhost:18630 -a dpm -u user1@mycompany -p MyPassword \
--dpmURL https://cloud.streamsets.com store list
import
Imports a pipeline. Use to import a pipeline JSON file, typically exported from a Data Collector. Returns a message when the import is successful.
The import subcommand uses the following syntax:
store import \
(-n <pipelineTitle> | --name <pipelineTitle>) \
[--stack] \
[(-f <fileName> | --file <fileName>)]
Import Option Description
-n <pipelineTitle>

or

--name <pipelineTitle>

Required. Title for the imported pipeline.

If the title includes spaces, surround the title in quotation marks.

--stack Optional. Returns additional information when the Data Collector cannot import the pipeline.

Use to debug the problem or pass to StreamSets for help.

-f <fileName>

or

--file <fileName>

Optional. The location and name of the file to import.

Enter a path relative to the Data Collector installation directory.

For example, the following command creates a pipeline with the title "Files to HDFS" based on the files2hdfs.json file when the Data Collector is not registered with Control Hub:
bin/streamsets cli -U http://localhost:18630 store import -n "Files to HDFS" -f ../../exported_pipelines/files2hdfs.json
The following command creates a pipeline with the title "Files to HDFS" based on the files2hdfs.json file when the Data Collector is registered with Control Hub:
bin/streamsets cli -U http://localhost:18630 -a dpm -u user1@mycompany -p MyPassword \
--dpmURL https://cloud.streamsets.com store import -n "Files to HDFS" -f ../../exported_pipelines/files2hdfs.json

System Command

The system command provides subcommands to register and unregister the Data Collector with Control Hub.

You can use the following subcommands with the system command:

enableDPM
Registers the Data Collector with Control Hub. For a description of the syntax, see Registering from the Command Line Interface.
disableDPM
Unregisters the Data Collector with Control Hub. For a description of the syntax, see Unregistering from the Command Line Interface.