Credential Stores

Data Collector pipeline stages communicate with external systems to read and write data. Many of these external systems require credentials - user names or passwords - to access the data. When you configure pipeline stages for these external systems, you define the credentials that the stage uses to connect to the system.

If you enter credential values directly in stage properties, you expose the credentials to any user with access to the pipeline. To access external systems without exposing the credentials, define credentials in a credential store and then use the Data Collector credential functions in the stage properties to retrieve those values.

Data Collector has a credential store API that integrates with the following credential store systems:
Important: Use the Java keystore credential store system in a development environment only. In a production environment, use a centralized keystore, such as CyberArk, Hashicorp Vault, or Azure Key Vault, to better secure credentials. A Java keystore credential storage system requires the distribution of a keystore file, which complicates security. Before using a Java keystore system, decide how the keystore will be distributed and consult with your IT security team to ensure that the system meets IT policies.

You can configure a Data Collector to use multiple credential stores at the same time. Each credential store is identified by a unique ID.

Tip: When you define credentials in a credential store instead of directly in stage properties, you also make it easier to migrate pipelines to another environment. For example, if you migrate multiple pipelines from a development to a production environment, you do not need to edit each pipeline to define the correct credentials for the production environment. You can simply replace the development credentials store with the production version.

Group Access to Credentials

When you use credential functions in a pipeline, you can further secure the credential values by allowing only a specific group the ability to validate, preview, or run the pipeline.

The credential functions include a group argument that defines the group that can access the credential. Only users that have execute permission on the pipeline and that belong to this group can validate, preview, or run the pipeline that retrieves the credential values.

When working only with Data Collector, simply specify the group name, such as "devops". When working with Control Hub, specify the group using the required naming convention: <group ID>@<organization ID>. For example, "devops@MyCompany".

If you do not want to restrict access to the credentials, specify the default "all" group when working only with Data Collector or the default "all@<organization ID>" group when working with Control Hub.

Note: If Data Collector shuts down while running a pipeline that uses a credential function, Data Collector restarts the pipeline without checking the group access.

CyberArk

To use the CyberArk credential store system, install the CyberArk credential store stage library and define the configuration properties used to connect to CyberArk Application Identity Manager. Then, use credential functions in pipeline stage properties to retrieve the credential values.

At this time, CyberArk integration is only supported using web services to the CyberArk Central Credential Provider.
Note: This documentation includes details about CyberArk to simplify the configuration process. For more information, see the CyberArk documentation.

Step 1. Install the Credential Store Stage Library

By default, a full Data Collector installation includes the CyberArk credential store stage library. The core installation does not include the library.

To verify that a Data Collector has the CyberArk credential store stage library installed, click the Package Manager icon () to display the list of installed stage libraries. If the library is not installed, install the library before configuring the CyberArk credential store.

Step 2. Configure the Credential Store Properties

To enable Data Collector to connect to the CyberArk credential store, configure the CyberArk properties in the $SDC_CONF/credential-stores.properties file.
Important: For a Cloudera Manager installation, configure all credential store properties through Cloudera Manager. In Cloudera Manager, select the StreamSets service and then click Configuration. Add a line for each credential store property to the Data Collector Advanced Configuration Snippet (Safety Valve) for sdc.properties field as follows:
credentialStores=cyberark
  1. Uncomment the credentialStores property in the file.

    If enabling only the CyberArk credential store, set the property to "cyberark". If enabling multiple credential stores, set the property to each credential store type. For example, to use both the Java keystore and the CyberArk credential stores, set the value to "jks,cyberark".

  2. Configure the following properties in the CyberArk Credential Store section of the file.

    The CyberArk credential store definition, web service URL, and application ID are required properties. Uncomment and configure other properties as needed.

    The file includes the following properties:

    CyberArk Property Description
    credentialStore.cyberark.def Required. Defines the implementation of the CyberArk credential store.

    Do not change the default value.

    credentialStore.cyberark.config.credential.refresh.millis Optional. Number of milliseconds that Data Collector locally caches a credential. When the time expires, Data Collector retrieves the credential from CyberArk.
    credentialStore.cyberark.config.credential.retry.millis Optional. Number of milliseconds that Data Collector waits before attempting to retry a retrieval of a credential from CyberArk, in the case of an error.
    credentialStore.cyberark.config.connector Optional. Connector type to CyberArk. Leave the default of “webservices” since only web services is currently supported.
    credentialStore.cyberark.config.ws.url Required. CyberArk Central Credential Provider web service URL.

    Use the following format:

    https://<host name>:<port>/AIMWebService/api/Accounts
    credentialStore.cyberark.config.ws.appId Required. CyberArk application ID for this Data Collector. You must create the application ID in CyberArk.
    credentialStore.cyberark.config.ws.maxConcurrentConnections Optional. Maximum number of concurrent web service calls that Data Collector can make to CyberArk.
    credentialStore.cyberark.config.ws.validateAfterInactivity.millis Optional. Number of milliseconds of inactivity before Data Collector validates the HTTP connection to CyberArk.
    credentialStore.cyberark.config.ws.connectionTimeout.millis Optional. Number of milliseconds to wait for a connection to CyberArk.
    credentialStore.cyberark.config.ws.nameSeparator Optional. Separator to use for the CyberArk safe, folder, object name, and element name values in the credential name argument used by the credential functions.
    Use the following format for the name argument:
    <safe><separator><folder><separator><object name><separator><element name>
    For example, if you keep the default ampersand (&), the format for the name argument is:
    <safe>&<folder>&<object name>&<element name>
    credentialStore.cyberark.config.ws.http.authentication Optional. Authentication type used by the CyberArk Central Credential Provider web services: none, basic, or digest.

    Default is none.

    credentialStore.cyberark.config.ws.http.authentication.user Optional. Username if using basic or digest authentication.
    credentialStore.cyberark.config.ws.http.authentication.password Optional. Password if using basic or digest authentication.
    credentialStore.cyberark.config.ws.truststoreFile Optional. Path to the truststore file if using HTTPS and the server certificate is using a private CA or is not trusted by the Java default truststore file.

    Enter a path relative to the Data Collector configuration directory, $SDC_CONF, or enter an absolute path.

    credentialStore.cyberark.config.ws.truststorePassword Optional. Password for the truststore file.
    credentialStore.cyberark.config.ws.supportedProtocols Optional. SSL/TLS-enabled protocols. Versions TLSv1.2 or later are recommended.
    credentialStore.cyberark.config.ws.hostnameVerifier.skip Optional. Determines whether the host name of the CyberArk Central Credential Provider web services should be verified against the domain defined in the HTTPS certificate.

    By default, the host name is verified.

    credentialStore.cyberark.config.ws.keystoreFile Optional. If using HTTPS and the CyberArk Central Credential Provider web services requires client side certificates, the path to the keystore file that contains the client certificate.

    Enter a path relative to the Data Collector configuration directory, $SDC_CONF, or enter an absolute path.

    credentialStore.cyberark.config.ws.keystorePassword Optional. Password for the keystore file.
    credentialStore.cyberark.config.ws.keyPassword Optional. Password to access the certificate within the keystore file.
  3. Restart Data Collector to enable the changes.

Step 3. Call the Credentials from the Pipeline

Use the credential:get() or credential:getWithOptions() function in pipeline stage properties to retrieve credential values from CyberArk.

Use the credential functions in any stage property that displays the key icon next to it. For example:

Important: When you use a credential function in a stage property, the function must be the only value defined in the property. For example, you cannot include another function or a literal value along with the credential function.
The credential functions use the following arguments:
  • storeId - Unique ID of the credential store to use. Enter "cyberark" to access the CyberArk credential store.
  • userGroup - Group to which a user must belong before that user can access the credential. Only users that have execute permission on the pipeline and that belong to this group can validate, preview, or run the pipeline that retrieves the credential values.

    If working with Control Hub, specify the group using the required naming convention: <group ID>@<organization ID>. To grant access to all users, specify the default "all" group when working only with Data Collector or the default "all@<organization ID>" group when working with Control Hub.

  • name - Name of the credential value to retrieve from CyberArk. Use the following format: "<safe><separator><folder><separator><object name><separator><element name>", where:
    • <safe> is the CyberArk safe to read. For example, "production".
    • <separator> is the separator defined for the safe, folder, object name, and element name values in the $SDC_CONF/credential-stores.properties file. Or if you use the credential:getWithOptions() function, you can define the separator in the options argument.
    • <folder> is the folder in CyberArk to read. For example, "Root\\sqldatabases".
    • <object name> is the object in CyberArk to read. For example, "payroll".
    • <element name> is the name for the value that you want returned. For example, "Content" to return the password. Or "Username" to return an optional user name value. If you do not specify <element name> in the credential name argument, Data Collector uses "Content".
  • storeOptions - Used only by the credential:getWithOptions() function. Additional options to communicate with the credential store. For CyberArk, you can use the following options:
    • separator - Separator to use for the credential name.
    • ConnectionTimeout - Connection timeout value in milliseconds.
    • FailRequestOnPasswordChange - Whether to fail the request on a password change, set to true or false. See the CyberArk documentation for details on this option.
    Use the following format to specify options:
    "<option1>=<value>,<option2>=<value>"
    For example, to use the pipe symbol (|) as the separator, enter the following for the options argument:
    "separator=|"
For example, the following expression returns the password for the payroll object stored in the CyberArk Root\\sqldatabases folder in the production safe. The credential name argument uses the default ampersand (&) as the separator. The expression allows any user belonging to the devops group access to the credential when validating, previewing, or running the pipeline:
${credential:get("cyberark", "devops", "production&Root\\sqldatabases&payroll&Content")}
The following expression returns the same password, but specifies the pipe symbol (|) as the separator:
${credential:getWithOptions("cyberark", "devops", "production|Root\\sqldatabases|payroll|Content", "separator=|")}

Hashicorp Vault

To use the Hashicorp Vault credential store system, install the Vault credential store stage library and define the configuration properties used to connect to Hashicorp Vault. Then, use credential functions in pipeline stage properties to retrieve the credential values.

Note: This documentation includes details about Hashicorp Vault to simplify the configuration process. For more information, see the Vault documentation.

Step 1. Install the Credential Store Stage Library

By default, a full Data Collector installation includes the Hashicorp Vault keystore credential store stage library. The core installation does not include the library.

To verify that a Data Collector has the Hashicorp Vault credential store stage library installed, click the Package Manager icon () to display the list of installed stage libraries. If the library is not installed, install the library before configuring the Hashicorp Vault credential store.

Step 2. Configure the Credential Store Properties

To enable Data Collector to connect to the Hashicorp Vault credential store, configure the Hashicorp Vault properties in the $SDC_CONF/credential-stores.properties file.

Important: For a Cloudera Manager installation, configure all credential store properties through Cloudera Manager. In Cloudera Manager, select the StreamSets service and then click Configuration. Add a line for each credential store property to the Data Collector Advanced Configuration Snippet (Safety Valve) for sdc.properties field as follows:
credentialStores=vault
  1. Uncomment the credentialStores property in the file.

    If enabling only the Hashicorp Vault credential store, set the property to "vault". If enabling multiple credential stores, set the property to each credential store type. For example, to use both the Java keystore and the Hashicorp Vault credential stores, set the value to "jks,vault".

  2. Configure the following properties in the Hashicorp Vault Credential Store section of the file.

    The Vault credential store definition, server URL, Role ID, and Secret ID are required properties. Configure other properties as needed:

    Vault Property Description
    credentialStore.vault.def Required. Defines the implementation of the Vault credential store.

    Do not change the default value.

    credentialStore.vault.config.pathKey.separator Optional. Separator to use for the path and key values in the credential name argument used by the credential functions.
    You use the following format for the name argument:
    <path><separator><key>
    For example, if you keep the default ampersand (&), the format for the name argument is:
    <path>&<key>
    credentialStore.vault.config.addr Required. Vault server URL entered in the following format:
    https://<host name>:<port number>

    Use HTTPS to avoid unencrypted communication.

    credentialStore.vault.config.role.id Required. Vault Role ID that Data Collector uses to authenticate with Vault. The Role ID is configured within Vault by your Vault administrator.
    The Data Collector Vault integration relies on Vault's App Role authentication backend.
    Important: The App ID authentication backend has been deprecated by Hashicorp and will be removed in a future release. As a result, do not configure the credentialStore.vault.config.app.id property for new installations.
    credentialStore.vault.config.secret.id Required. Vault Secret ID that Data Collector uses to authenticate with Vault. The Secret ID is configured within Vault by your Vault administrator.
    Enter one of the following:
    • Secret ID value.
    • File that contains the Secret ID value. For increased security, store the Secret ID in a separate file and reference the file in the $SDC_CONF/credential-stores.properties file as follows: ${file("<filename>")}.

      By default, the file name is vault-secret-id and expected in the $SDC_CONF directory. For more information, see Referencing Sensitive Values in Files.

    credentialStore.vault.config.lease.renewal.interval.sec Optional. Seconds to wait before checking for leases that need renewal.

    Default is 60.

    credentialStore.vault.config.lease.expiration.buffer.sec Optional. Buffer for expiring leases. Data Collector renews leases that expire in less than the specified number of seconds.

    Default is 120.

    credentialStore.vault.config.open.timeout Optional. Timeout to establish an HTTP connection to Vault in milliseconds.

    Default is 0 for no limit.

    credentialStore.vault.config.proxy.address Optional. Proxy URL. Configure to use a proxy to access Vault.
    credentialStore.vault.config.proxy.port Optional. Proxy port. Configure to use a proxy to access Vault.
    credentialStore.vault.config.proxy.username Optional. Proxy username. Configure to use a proxy to access Vault.
    credentialStore.vault.config.proxy.password Optional. Proxy password. Configure to use a proxy to access Vault.
    credentialStore.vault.config.read.timeout Optional. Milliseconds to wait for data before timing out.

    Default is 0 for no limit.

    credentialStore.vault.config.ssl.enabled.protocols Optional. SSL/TLS-enabled protocols. Versions TLSv1.2 or later are recommended.

    Default is TLSv1.2,TLSv1.3.

    credentialStore.vault.config.ssl.truststore.file Optional. Path to a Java truststore file. Required when using a private CA or certificates not trusted by the Java default truststore.
    credentialStore.vault.config.ssl.truststore.password Optional. Password for the truststore file.
    credentialStore.vault.config.ssl.verify Optional. Whether to verify that the Vault server hostname matches its certificate.

    Default is true. False is not recommended.

    credentialStore.vault.config.ssl.timeout Optional. Timeout for the SSL/TLS handshake in milliseconds.

    Default is 0 for no limit.

    credentialStore.vault.config.timeout Optional. Timeout to read from Vault in milliseconds, after a connection has been established.

    Default is 0 for no limit.

  3. Restart Data Collector to enable the changes.

Step 3. Call the Credentials from the Pipeline

Use the credential:get() or credential:getWithOptions() function in pipeline stage properties to retrieve credential values from Hashicorp Vault.

Use the credential functions in any stage property that displays the key icon next to it. For example:

Important: When you use a credential function in a stage property, the function must be the only value defined in the property. For example, you cannot include another function or a literal value along with the credential function.
The credential functions use the following arguments:
  • storeId - Unique ID of the credential store to use. Enter "vault" to access the Hashicorp Vault credential store.
  • userGroup - Group to which a user must belong before that user can access the credential. Only users that have execute permission on the pipeline and that belong to this group can validate, preview, or run the pipeline that retrieves the credential values.

    If working with Control Hub, specify the group using the required naming convention: <group ID>@<organization ID>. To grant access to all users, specify the default "all" group when working only with Data Collector or the default "all@<organization ID>" group when working with Control Hub.

  • name - Name of the credential value to retrieve from Hashicorp Vault. Use the following format: "<path><separator><key>", where:
    • <path> is the path in Vault to read.
    • <separator> is the separator defined for the path and key values in the $SDC_CONF/credential-stores.properties file.
    • <key> is the key for the value that you want returned.
  • storeOptions - Used only by the credential:getWithOptions() function. Additional options to communicate with the credential store. For Hashicorp Vault, you can enter a delay in milliseconds to allow time for external processing. Use the delay option when using the Vault AWS secret backend to generate AWS access credentials based on IAM policies. According to Vault documentation, you might need a delay of 10 seconds or more before the credentials can be used successfully.

    Use the following format to specify an option:

    "<option>=<option>"
    For example, to set the Vault delay to 1,000 milliseconds, enter the following for the options argument:
    "delay=1000"
For example, the following expression returns the value of the key password stored in the Vault path /secret/databases/oracle after waiting for a delay of 1,000 milliseconds. The credential name argument uses the default ampersand (&) as the separator. The expression allows any user belonging to the devops group access to the credential when validating, previewing, or running the pipeline:
${credential:getWithOptions("vault", "devops", "/secret/databases/oracle&password", "delay=1000")}

Java Keystore

To use the Java keystore credential store system, install the Java keystore credential store stage library and define the configuration properties used to connect to the credential store.

Use the stagelib-cli jks-credentialstore command to add credentials to the credential store. Then, use credential functions in pipeline stage properties to retrieve the credential values.
Important: Use the Java keystore credential store system in development environments only.

Step 1. Install the Credential Store Stage Library

By default, a full Data Collector installation includes the Java keystore credential store stage library. The core installation does not include the library.

To verify that a Data Collector has the Java keystore credential store stage library installed, click the Package Manager icon () to display the list of installed stage libraries. If the library is not installed, install the library before configuring the Java keystore credential store.

Step 2. Configure the Credential Store Properties

To enable Data Collector to connect to the Java keystore credential store, configure the Java keystore properties in the $SDC_CONF/credential-stores.properties file.

Important: For a Cloudera Manager installation, configure all credential store properties through Cloudera Manager. In Cloudera Manager, select the StreamSets service and then click Configuration. Add a line for each credential store property to the Data Collector Advanced Configuration Snippet (Safety Valve) for sdc.properties field as follows:
credentialStores=jks
  1. Uncomment the credentialStores property in the file.

    If enabling only the Java keystore credential store, set the property to "jks". If enabling multiple credential stores, set the property to each credential store type. For example, to use both the Java keystore and the Vault credential stores, set the value to "jks,vault".

  2. Configure the following properties in the Java Keystore Credential Store section of the file:
    Java Keystore Property Description
    credentialStore.jks.def Defines the implementation of the Java Keystore credential store.

    Do not change the default value.

    credentialStore.jks.config.keystore.type Format of the Java keystore file:
    • JCEKS
    • PKCS12

    Default is PKCS12.

    credentialStore.jks.config.keystore.file Path and name of the Java keystore file. Enter an absolute path to the file, or a path relative to the Data Collector configuration directory, $SDC_CONF.

    Default is jks-credentialStore.pkcs12.

    credentialStore.jks.config.keystore.storePassword Password that Data Collector uses to access the Java keystore file.

    You must change the default value before using the keystore file.

  3. Restart Data Collector to enable the changes.

Step 3. Add Credentials to the Credential Store

Use the stagelib-cli jks-credentialstore command to add credentials to the Java keystore file. You can add multiple credentials to the file.

Use the command from the $SDC_DIST directory as follows:
bin/streamsets stagelib-cli jks-credentialstore add -i <storeId> -n <credential name> -c <credential value>
For example, the following command adds a credential named OracleDBPassword with the value 278yT6u to the Java keystore credential store:
bin/streamsets stagelib-cli jks-credentialstore add -i jks -n OracleDBPassword -c 278yT6u
Note: The stagelib-cli jks-credentialstore command also includes delete and list subcommands that you use to manage the credentials defined in the keystore file. For information on using these commands, see jks-credentialstore Command.

Step 4. Call the Credentials from the Pipeline

Use the credential:get() function in pipeline stage properties to retrieve credential values from the Java keystore.

Use the credential function in any stage property that displays the key icon next to it. For example:

Important: When you use a credential function in a stage property, the function must be the only value defined in the property. For example, you cannot include another function or a literal value along with the credential function.
The credential:get() function uses the following arguments:
  • storeId - Unique ID of the credential store to use. Enter "jks" to access the Java keystore credential store.
  • userGroup - Group to which a user must belong before that user can access the credential. Only users that have execute permission on the pipeline and that belong to this group can validate, preview, or run the pipeline that retrieves the credential values.

    If working with Control Hub, specify the group using the required naming convention: <group ID>@<organization ID>. To grant access to all users, specify the default "all" group when working only with Data Collector or the default "all@<organization ID>" group when working with Control Hub.

  • name - Name of the credential value to retrieve from the credential store.
For example, the following expression returns the value of the OracleDBPassword credential and allows any user belonging to the devops group access to the credential when validating, previewing, or running the pipeline:
${credential:get("jks", "devops", "OracleDBPassword")}

jks-credentialstore Command

The stagelib-cli jks-credentialstore command provides subcommands to add, list, and delete credentials in the Java keystore credential store.

Any changes made to the Java keystore file take effect immediately. For example, if you change the value of an existing credential in the file, running pipelines that require a new connection to the external system use the new credential value.
Note: In previous releases, the jks-cs command provided the same subcommands to add, list, and delete credentials in the Java keystore credential store. However, the jks-cs command is now deprecated and will be removed in a future release.
You can use the following subcommands with the stagelib-cli jks-credentialstore command:
add
Adds a credential to the Java keystore credential store.
Use the command from the $SDC_DIST directory as follows:
bin/streamsets stagelib-cli jks-credentialstore add \
(-i <storeId> | --id <storeId>) \
(-n <credential name> | --name <credential name>) \
(-c <credential value> | --credential <credential value>)
Add Option Description
-i <storeId>

or

--id <storeId>

Required. Unique ID for the credential store. Use jks.
-n <credential name>

or

--name <credential name>

Required. Name of the credential to add to the Java keystore credential store.

If the name includes non-alphanumeric characters, use single quotation marks around the name.

-c <credential value>

or

--credential <credential value>

Required. Value of the credential to add to the Java keystore credential store.

If the value includes non-alphanumeric characters, use single quotation marks around the value.

For example, the following command adds a credential named OracleDBPassword with the value df35yT_&5 to the Java keystore credential store:

bin/streamsets stagelib-cli jks-credentialstore add -i jks -n OracleDBPassword -c 'df35yT_&5'
delete
Deletes a credential from the Java keystore credential store.
Use the command from the $SDC_DIST directory as follows:
bin/streamsets stagelib-cli jks-credentialstore delete \
(-i <storeId> | --id <storeId>) \
(-n <credential name> | --name <credential name>)
Delete Option Description
-i <storeId>

or

--id <storeId>

Required. Unique ID for the credential store. Use jks.
-n <credential name>

or

--name <credential name>

Required. Name of the credential to delete from the Java keystore credential store.

If the name includes non-alphanumeric characters, use single quotation marks around the name.

For example, the following command deletes a credential named SQLServerDBPassword from the Java keystore credential store:
bin/streamsets stagelib-cli jks-credentialstore delete -i jks -n SQLServerDBPassword
list
Lists the names of all credentials defined in the Java keystore credential store. The command does not list the credential values.
Use the command from the $SDC_DIST directory as follows:
bin/streamsets stagelib-cli jks-credentialstore list \
(-i <storeId> | --id <storeId>)
List Option Description
-i <storeId>

or

--id <storeId>

Required. Unique ID for the credential store. Use jks.
For example, the following command lists the names of all credentials defined in the Java keystore credential store:
bin/streamsets stagelib-cli jks-credentialstore list -i jks

Microsoft Azure Key Vault

Before Data Collector can connect to the Microsoft Azure Key Vault credential store system, you must complete several prerequisites in Azure so that Data Collector can access the Azure Key Vault as an application.

After completing the prerequisites, install the Azure Key Vault credential store stage library and define the configuration properties used to connect to Azure Key Vault. Then, use credential functions in pipeline stage properties to retrieve the credential values.

Note: This documentation includes details about Azure Key Vault to simplify the configuration process. For more information, see the Azure Key Vault documentation.

Prerequisites

Before Data Collector can connect to the Microsoft Azure Key Vault credential store system, complete the following prerequisites within Azure:

Register Data Collector with Azure Active Directory
Use the Azure portal to register Data Collector as an application in Azure Active Directory. When an application such as Data Collector accesses credentials in an Azure key vault, the application must use an authentication token from Azure Active Directory.
The registration process assigns Data Collector the following values, which you will specify when you configure the credential store properties:
  • application ID
  • authentication key
For more information about registering applications in Azure Active Directory, see the Azure Key Vault documentation.
Authorize Data Collector to use keys or secrets in the Azure key vault
Use the Azure portal to authorize Data Collector to use the keys or secrets in the Azure key vault. Azure Key Vault requires that applications be authorized to access each key vault.
For information about authorizing applications to use keys or secrets, see the Azure Key Vault documentation.

Step 1. Install the Credential Store Stage Library

By default, a full Data Collector installation includes the Azure Key Vault credential store stage library. The core installation does not include the library.

To verify that a Data Collector has the Azure Key Vault credential store stage library installed, click the Package Manager icon () to display the list of installed stage libraries. If the library is not installed, install the library before configuring the Azure Key Vault credential store.

Step 2. Configure the Credential Store Properties

To enable Data Collector to connect to the Azure Key Vault credential store, configure the Azure Key Vault properties in the $SDC_CONF/credential-stores.properties file.
Important: For a Cloudera Manager installation, configure all credential store properties through Cloudera Manager. In Cloudera Manager, select the StreamSets service and then click Configuration. Add a line for each credential store property to the Data Collector Advanced Configuration Snippet (Safety Valve) for sdc.properties field as follows:
credentialStores=azure
  1. Uncomment the credentialStores property in the file.

    If enabling only the Azure Key Vault credential store, set the property to "azure". If enabling multiple credential stores, set the property to each credential store type. For example, to use both the Java keystore and the Azure Key Vault credential stores, set the value to "jks,azure".

  2. Configure the following properties in the Azure Key Vault Credential Store section of the file.

    The Azure Key Vault credential store definition, URL, client ID, and client key are required properties. Uncomment and configure other properties as needed.

    The file includes the following properties:

    Azure Key Vault Property Description
    credentialStore.azure.def Required. Defines the implementation of the Azure Key Vault credential store.

    Do not change the default value.

    credentialStore.azure.config.credential.refresh.millis Optional. Number of milliseconds that Data Collector locally caches a credential. When the time expires, Data Collector retrieves the credential from Azure Key Vault.
    credentialStore.azure.config.credential.retry.millis Optional. Number of milliseconds that Data Collector waits before attempting to retry a retrieval of a credential from Azure Key Vault, in the case of an error.
    credentialStore.azure.config.vault.url Required. URL to the key vault created in Azure Key Vault.

    Use the following format:

    https://<key vault name>.vault.azure.net/
    credentialStore.azure.config.client.id Required. Application ID assigned to this Data Collector when you registered Data Collector as an application in Azure Active Directory, as described in prerequisites.
    credentialStore.azure.config.client.key Required. Authentication key assigned to this Data Collector when you registered Data Collector as an application in Azure Active Directory, as described in prerequisites.
  3. Restart Data Collector to enable the changes.

Step 3. Call the Credentials from the Pipeline

Use the credential:get() or credential:getWithOptions() function in pipeline stage properties to retrieve credential values from Azure Key Vault.

Use the credential functions in any stage property that displays the key icon next to it. For example:

Important: When you use a credential function in a stage property, the function must be the only value defined in the property. For example, you cannot include another function or a literal value along with the credential function.
The credential functions use the following arguments:
  • storeId - Unique ID of the credential store to use. Enter "azure" to access the Azure Key Vault credential store.
  • userGroup - Group to which a user must belong before that user can access the credential. Only users that have execute permission on the pipeline and that belong to this group can validate, preview, or run the pipeline that retrieves the credential values.

    If working with Control Hub, specify the group using the required naming convention: <group ID>@<organization ID>. To grant access to all users, specify the default "all" group when working only with Data Collector or the default "all@<organization ID>" group when working with Control Hub.

  • name - Name of the key or secret to retrieve from Azure Key Vault.
  • storeOptions - Used only by the credential:getWithOptions() function. Additional options to communicate with the credential store. For Azure Key Vault, you can use the following options to override several properties in the $SDC_CONF/credential-stores.properties file:
    • url - Overrides the credentialStore.azure.config.vault.url property.
    • retry - Overrides the credentialStore.azure.config.credential.retry.millis property.
    • refresh - Overrides the credentialStore.azure.config.credential.refresh.millis property.
    Use the following format to specify options:
    "<option1>=<value>,<option2>=<value>"
For example, the following expression returns the value of the SQLpassword secret. The expression allows any user belonging to the devops group access to the credential when validating, previewing, or running the pipeline:
${credential:get("azure", "devops", "SQLpassword")}
The following expression returns the same secret value, but overrides the retry time configured in the $SDC_CONF/credential-stores.properties file:
${credential:getWithOptions("azure", "devops", "SQLpassword", "retry=3000")}