Credential Stores

Data Collector pipeline stages communicate with external systems to read and write data. Many of these external systems require credentials - user names or passwords - to access the data. When you configure pipeline stages for these external systems, you define the credentials that the stage uses to connect to the system.

If you enter credential values directly in stage properties, you expose the credentials to any user with access to the pipeline. To access external systems without exposing the credentials, define credentials in a credential store and then use the Data Collector credential functions in the stage properties to retrieve those values.

At this time, the following JDBC stages can use the credential functions:
  • JDBC Multitable Consumer origin
  • JDBC Query Consumer origin
  • Oracle CDC Client origin
  • SQL Server CDC Client origin
  • SQL Server Change Tracking origin
  • JDBC Lookup processor
  • JDBC Tee processor
  • JDBC Producer destination
  • JDBC Query executor
Data Collector has a credential store API that integrates with the following credential store systems:

You can configure a Data Collector to use multiple credential stores at the same time. Each credential store is identified by a unique ID.

Tip: When you define credentials in a credential store instead of directly in stage properties, you also make it easier to migrate pipelines to another environment. For example, if you migrate multiple pipelines from a development to a production environment, you do not need to edit each pipeline to define the correct credentials for the production environment. You can simply replace the development credentials store with the production version.

Group Access to Credentials

When you use credential functions in a pipeline, you can further secure the credential values by allowing only a specific group the ability to validate, preview, or run the pipeline.

The credential functions include a group argument that defines the group that can access the credential. Only users that have execute permission on the pipeline and that belong to this group can validate, preview, or run the pipeline that retrieves the credential values.

If you do not want to restrict access to the credentials, specify the default "all" group so that all users with execute permission on the pipeline can validate, preview, or run the pipeline that retrieves the credential values.

Note: If Data Collector shuts down while running a pipeline that uses a credential function, Data Collector restarts the pipeline without checking the group access.

Java Keystore Credential Store

To use the Java keystore credential store system, install the Java keystore credential store stage library and define the configuration properties used to connect to the credential store.

Use the jks-cs command to add credentials to the credential store. Then, use credential functions in pipeline stage properties to retrieve the credential values.

Step 1. Install the Credential Store Stage Library

By default, a full Data Collector installation includes the Java keystore credential store stage library. The core installation does not include the library.

To verify that a Data Collector has the Java keystore credential store stage library installed, click the Package Manager icon () to display the list of installed stage libraries. If the library is not installed, install the library before configuring the Java keystore credential store.

Step 2. Configure the Credential Store Properties

To enable Data Collector to connect to the Java keystore credential store, configure the Java keystore properties in the $SDC_CONF/credential-stores.properties file.

Important: For a Cloudera Manager installation, configure all credential store properties through Cloudera Manager. In Cloudera Manager, select the StreamSets service and then click Configuration. Add a line for each credential store property that you need to configure to the Data Collector Advanced Configuration Snippet (Safety Valve) for sdc.properties field as follows:
credentialStores=jks
  1. Uncomment the credentialStores property in the file.

    If enabling the only the Java keystore credential store, set the property to "jks". If enabling both the Java keystore and the Vault credential stores, leave the default value of "jks,vault".

  2. Configure the following properties in the Java Keystore Credential Store section of the file:
    Java Keystore Property Description
    credentialStore.jks.config.keystore.type Format of the Java keystore file:
    • JCEKS
    • PKCS12

    Default is PKCS12.

    credentialStore.jks.config.keystore.file Path and name of the Java keystore file. Enter an absolute path to the file, or a path relative to the Data Collector configuration directory, $SDC_CONF.

    Default is jks-credentialStore.pkcs12.

    credentialStore.jks.config.keystore.storePassword Password that Data Collector uses to access the Java keystore file.

    You must change the default value before using the keystore file.

  3. Restart Data Collector to enable the changes.

Step 3. Add Credentials to the Credential Store

Use the jks-cs add command to add credentials to the Java keystore file. You can add multiple credentials to the file.

Use the command as follows:
$SDC_DIST/bin/streamsets jks-cs add -i <storeId> -n <credential name> -c <credential value>
For example, the following command adds a credential named OracleDBPassword with the value 278yT6!u to the Java keystore credential store:
$SDC_DIST/bin/streamsets jks-cs add -i jks -n OracleDBPassword -c 278yT6!u
Note: The jks-cs command also includes delete and list subcommands that you use to manage the credentials defined in the keystore file. For information on using these commands, see jks-cs Command.

Step 4. Call the Credentials from the Pipeline

Use the credential:get() function in pipeline stage properties to retrieve credential values from the Java keystore.

Use the credential function in any JDBC stage property that displays the key icon next to it. For example:

Important: When you use a credential function in a stage property, the function must be the only value defined in the property. For example, you cannot include another function or a literal value along with the credential function.
The credential:get() function uses the following arguments:
  • storeId - Unique ID of the credential store to use. Enter "jks" to access the Java keystore credential store.
  • group - Group to which a user must belong before that user can access the credential. Only users that have execute permission on the pipeline and that belong to this group can validate, preview, or run the pipeline that retrieves the credential values. If you specify the default "all" group, then all users with execute permission on the pipeline can validate, preview, or run the pipeline that retrieves the credential values.
  • credential name - Name of the credential value to retrieve from the credential store.
For example, the following expression returns the value of the OracleDBPassword credential and allows any user belonging to the devops group access to the credential when validating, previewing, or running the pipeline:
${credential:get("jks", "devops", "OracleDBPassword")}

jks-cs Command

The jks-cs command provides subcommands to add, list, and delete credentials in the Java keystore credential store.

Any changes made to the Java keystore file take effect immediately. For example, if you change the value of an existing credential in the file, running pipelines that require a new connection to the external system use the new credential value.

You can use the following subcommands with the jks-cs command:
add
Adds a credential to the Java keystore credential store.
Use the command as follows:
$SDC_DIST/bin/streamsets jks-cs add \
(-i <storeId> | --id <storeId>) \
(-n <credential name> | --name <credential name>) \
(-c <credential value> | --credential <credential value>)
Add Option Description
-i <storeId>

or

--id <storeId>

Required. Unique ID for the credential store. Use jks.
-n <credential name>

or

--name <credential name>

Required. Name of the credential to add to the Java keystore credential store.
-c <credential value>

or

--credential <credential value>

Required. Value of the credential to add to the Java keystore credential store.
For example, the following command adds a credential named OracleDBPassword with the value 278yT6!u to the Java keystore credential store:
$SDC_DIST/bin/streamsets jks-cs add -i jks -n OracleDBPassword -c 278yT6!u
delete
Deletes a credential from the Java keystore credential store.
Use the command as follows:
$SDC_DIST/bin/streamsets jks-cs delete \
(-i <storeId> | --id <storeId>) \
(-n <credential name> | --name <credential name>)
Delete Option Description
-i <storeId>

or

--id <storeId>

Required. Unique ID for the credential store. Use jks.
-n <credential name>

or

--name <credential name>

Required. Name of the credential to delete from the Java keystore credential store.
For example, the following command deletes a credential named SQLServerDBPassword from the Java keystore credential store:
$SDC_DIST/bin/streamsets jks-cs delete -i jks -n SQLServerDBPassword
list
Lists the names of all credentials defined in the Java keystore credential store. The command does not list the credential values.
Use the command as follows:
$SDC_DIST/bin/streamsets jks-cs list \
(-i <storeId> | --id <storeId>)
List Option Description
-i <storeId>

or

--id <storeId>

Required. Unique ID for the credential store. Use jks.
For example, the following command lists the names of all credentials defined in the Java keystore credential store:
$SDC_DIST/bin/streamsets jks-cs list -i jks

Vault Credential Store

To use the Vault credential store system, install the Vault credential store stage library and define the configuration properties used to connect to Vault. Then, use credential functions in pipeline stage properties to retrieve the credential values.

Note: This documentation includes details about Hashicorp Vault to simplify the configuration process. For more information, see the Vault documentation.

Step 1. Install the Credential Store Stage Library

By default, a full Data Collector installation includes the Vault keystore credential store stage library. The core installation does not include the library.

To verify that a Data Collector has the Vault credential store stage library installed, click the Package Manager icon () to display the list of installed stage libraries. If the library is not installed, install the library before configuring the Vault credential store.

Step 2. Configure the Credential Store Properties

To enable Data Collector to connect to the Vault credential store, configure the Vault properties in the $SDC_CONF/credential-stores.properties file.

Important: For a Cloudera Manager installation, configure all credential store properties through Cloudera Manager. In Cloudera Manager, select the StreamSets service and then click Configuration. Add a line for each credential store property that you need to configure to the Data Collector Advanced Configuration Snippet (Safety Valve) for sdc.properties field as follows:
credentialStores=vault
  1. Uncomment the credentialStores property in the file.

    If enabling the only the Vault credential store, set the property to "vault". If enabling both the Java keystore and the Vault credential stores, leave the default value of "jks,vault".

  2. Configure the following properties in the Hashicorp Vault Credential Store section of the file.

    The Vault server URL, Role ID, and Secret ID are required properties. Configure other properties as needed:

    Vault Property Description
    credentialStore.vault.config.pathKey.separator Optional. Separator to use for the path and key values in the credential name argument used by the credential functions.
    You use the following format for the name argument:
    <path><separator><key>
    For example, if you keep the default of &, the format for the name argument is:
    <path>&<key>
    credentialStore.vault.config.addr Required. Vault server URL entered in the following format:
    https://<host name>:<port number>

    Use HTTPS to avoid unencrypted communication.

    credentialStore.vault.config.role.id Required. Vault Role ID that Data Collector uses to authenticate with Vault. The Role ID is configured within Vault by your Vault administrator.
    The Data Collector Vault integration relies on Vault's App Role authentication backend.
    Important: The App ID authentication backend has been deprecated by Hashicorp and will be removed in a future release. As a result, do not configure the credentialStore.vault.config.app.id property for new installations.
    credentialStore.vault.config.secret.id Required. Vault Secret ID that Data Collector uses to authenticate with Vault. The Secret ID is configured within Vault by your Vault administrator.
    Enter one of the following:
    • Secret ID value.
    • File that contains the Secret ID value. For increased security, store the Secret ID in a separate file and reference the file in the $SDC_CONF/credential-stores.properties file as follows: ${file("<filename>")}.

      By default, the file name is vault-secret-id and expected in the $SDC_CONF directory. For more information, see Referencing Sensitive Values in Files.

    credentialStore.vault.config.lease.renewal.interval.sec Optional. Seconds to wait before checking for leases that need renewal.

    Default is 60.

    credentialStore.vault.config.lease.expiration.buffer.sec Optional. Buffer for expiring leases. Data Collector renews leases that expire in less than the specified number of seconds.

    Default is 120.

    credentialStore.vault.config.open.timeout Optional. Timeout to establish an HTTP connection to Vault in milliseconds.

    Default is 0 for no limit.

    credentialStore.vault.config.proxy.address Optional. Proxy URL. Configure to use a proxy to access Vault.
    credentialStore.vault.config.proxy.port Optional. Proxy port. Configure to use a proxy to access Vault.
    credentialStore.vault.config.proxy.username Optional. Proxy username. Configure to use a proxy to access Vault.
    credentialStore.vault.config.proxy.password Optional. Proxy password. Configure to use a proxy to access Vault.
    credentialStore.vault.config.read.timeout Optional. Milliseconds to wait for data before timing out.

    Default is 0 for no limit.

    credentialStore.vault.config.ssl.enabled.protocols Optional. SSL/TLS-enabled protocols. Versions TLSv1.2 or later are recommended.

    Default is TLSv1.2,TLSv1.3.

    credentialStore.vault.config.ssl.truststore.file Optional. Path to a Java truststore file. Required when using a private CA or certificates not trusted by the Java default truststore.
    credentialStore.vault.config.ssl.truststore.password Optional. Password for the truststore file.
    credentialStore.vault.config.ssl.verify Optional. Whether to verify that the Vault server hostname matches its certificate.

    Default is true. False is not recommended.

    credentialStore.vault.config.ssl.timeout Optional. Timeout for the SSL/TLS handshake in milliseconds.

    Default is 0 for no limit.

    credentialStore.vault.config.timeout Optional. Timeout to read from Vault in milliseconds, after a connection has been established.

    Default is 0 for no limit.

  3. Restart Data Collector to enable the changes.

Step 3. Call the Credentials from the Pipeline

Use the credential:get() or credential:getWithOptions() function in pipeline stage properties to retrieve credential values from Vault.

Use the credential functions in any JDBC stage property that displays the key icon next to it. For example:

Important: When you use a credential function in a stage property, the function must be the only value defined in the property. For example, you cannot include another function or a literal value along with the credential function.
The credential functions use the following arguments:
  • storeId - Unique ID of the credential store to use. Enter "vault" to access the Vault credential store.
  • group - Group to which a user must belong before that user can access the credential. Only users that have execute permission on the pipeline and that belong to this group can validate, preview, or run the pipeline that retrieves the credential values. If you specify the default "all" group, then all users with execute permission on the pipeline can validate, preview, or run the pipeline that retrieves the credential values.
  • credential name - Name of the credential value to retrieve from Vault. Use the following format: "<path><separator><key>", where:
    • <path> is the path in Vault to read.
    • <separator> is the separator defined for the path and key values in the $SDC_CONF/credential-stores.properties file.
    • <key> is the key for the value that you want returned.
  • options - Used only by the credential:getWithOptions() function. Additional options to communicate with the credential store. For Vault, you can enter a delay in milliseconds to allow time for external processing.

    Use the following format to specify an option:

    "<option>=<value>"
    For example, to set the Vault delay to 1,000 milliseconds, enter the following for the options argument:
    "delay=1000"
For example, the following expression returns the value of the OracleDBPassword credential stored in the Vault /databases/ path after waiting for a delay of 1,000 milliseconds. The credential name argument uses the default separator of &. The expression allows any user belonging to the devops group access to the credential when validating, previewing, or running the pipeline:
${credential:getwithOptions("vault", "devops", "/databases/&OracleDBPassword", "delay=1000")}