Credential Stores

You can configure Transformer to access sensitive information that is secured in a credential store.

Transformer pipelines communicate with external systems to perform tasks such as launching a Spark application, or reading and writing data. Most of these external systems require sensitive information, such as user names or passwords, to access the system. When you configure pipeline stages for these external systems, you must specify the details that the stages need to connect to the system.

If you enter sensitive information directly in stage and pipeline properties, you expose those details to any user with access to the pipeline. To access external systems without exposing sensitive details, add them as secrets to a credential store and then use StreamSets credential functions in stage and pipeline properties to retrieve those values.

Defining secrets in a credential store can make it easier to migrate pipelines to another environment. For example, if you migrate multiple pipelines from a development to a production environment, you do not need to edit each pipeline with details for the production environment. You can simply replace the development credential store with the production version.

You can configure Transformer to use multiple credential stores. Each credential store is identified by a unique credential store ID.

You can use the following credential stores with Transformer:

Enabling Credential Stores

You can configure Transformer to use one or more credential stores. Each credential store is identified by a unique credential store ID.

You specify the credential stores that Transformer can use in the $TRANSFORMER_CONF/credential-stores.properties file. The file includes the following information:
credentialStores property
This property defines the credential stores that Transformer can use.
By default, the property is commented out and includes a default credential store ID for each of the supported credential store types, such as aws for AWS Secrets Manager and azure for Azure Key Vault.
To enable using credential stores, you uncomment this property and enter a comma-separated list of the credential store IDs to use.
You can specify multiple credential stores of the same type or of different types, such as two AWS Secret Managers and one Java Keystore. You simply specify a unique ID for each credential store.
Sets of related properties
Each supported credential store type has a set of related properties. The property names include the default credential store IDs originally specified in the credentialStores property.
For example, the AWS Secrets Manager properties include aws, the default Secrets Manager ID, in each Secrets Manager property name, such as credentialStore.aws.config.region and credentialStore.aws.config.access.key.
When you use a custom credential store ID, you must update all related property names to match the custom ID. For example, if you want to use awsUS as a custom ID, you must update all Secrets Manager default property names for the awsUS credential store replacing aws with awsUS.
Note: When you want to use multiple credential stores of the same type, you must have a set of related store properties that are renamed and defined appropriately for each credential store.

For example, say you want to use two Azure credential stores, azureDev for development and azureProd for production. To do this, you specify the credential store IDs in the credentialStores property and make a copy of the related Azure credential store properties, so you have one set for each credential store.

Then, you rename and configure the properties for azureDev, and you do the same for azureProd. The resulting properties might look as follows, with important changes highlighted:
################################################
#        Transformer Credential Stores         #
################################################

credentialStores=azureDev,azureProd

############################################################
# azureDev: Azure Key Vault Credential Store Configuration #
############################################################

credentialStore.azureDev.def=streamsets-transformer-azure-keyvault-credentialstore-lib::com_streamsets_datacollector_credential_azure_keyvault_AzureKeyVaultCredentialStore
credentialStore.azureDev.config.credential.refresh.millis=30000
credentialStore.azureDev.config.credential.retry.millis=15000
credentialStore.azureDev.config.vault.url=https://development.vault.azure.net/
credentialStore.azureDev.config.client.id=devClientID
credentialStore.azureDev.config.client.key=devClientKey
credentialStore.azureDev.config.enforceEntryGroup=false

#############################################################
# azureProd: Azure Key Vault Credential Store Configuration #
#############################################################

credentialStore.azureProd.def=streamsets-transformer-azure-keyvault-credentialstore-lib::com_streamsets_datacollector_credential_azure_keyvault_AzureKeyVaultCredentialStore
credentialStore.azureProd.config.credential.refresh.millis=30000
credentialStore.azureProd.config.credential.retry.millis=15000
credentialStore.azureProd.config.vault.url=https://production.vault.azure.net/
credentialStore.azureProd.config.client.id=prodClientID
credentialStore.azureProd.config.client.key=prodClientKey
credentialStore.azureProd.config.enforceEntryGroup=false

Group Access to Secrets

As an additional layer of security, you can employ user groups to further limit access to the secrets defined in credential stores.

Transformer provides two methods to limit access with user groups:
Required group argument in credential functions
Credential functions include a group argument that defines the group that can access the secret. The group argument ensures that the user who attempts to preview, validate, or start a pipeline that includes a credential function belongs to the group specified in the function. The user must also have execute permission on the pipeline.
When working only with Transformer, simply specify the group name, such as devops. When working with Control Hub, specify the group using the required naming convention: <group ID>@<organization ID>. For example, devops@MyCompany.
If you do not want to restrict access to a secret, specify the default all group when working only with Transformer. When working with Control Hub and Transformer version 3.14.0 or later, you can specify the default group using all or all@<organization ID>. StreamSets recommends using all so that you do not need to modify credential functions when migrating pipelines from Transformer to Control Hub.
Note: When working with Control Hub and a Transformer version earlier than 3.14.0, you must use the default all@<organization ID> group.
If Transformer shuts down while running a pipeline that uses a credential function, Transformer restarts the pipeline without checking the group access.
Optional group secrets in the credential store

In addition to using the group argument in credential functions, you can configure Transformer to require group secrets for a credential store.

A group secret is a comma-delimited list of Transformer user groups that are permitted to access the associated secret.

When Transformer requires group secrets, you must define a group secret for every secret that Transformer accesses. The name of the group secret is based on the secret name, as follows:
<secret name>-groups
When you configure a credential function to call a secret, the user group specified in the credential function must be listed in the associated group secret.

To require the use of group secrets, in the $TRANSFORMER_CONF/credential-stores.properties file, set the credentialStore.<cstore ID>.config.enforceEntryGroup property to true.

For example, say you enable Transformer to require group secrets for Azure Key Vault. Then, in a Azure Event Hubs origin, you use the following expression to retrieve a shared access key from the azure credential store:
${credential:get("azure", "production", sharedAccessKey)}
When you run the pipeline, Transformer validates all of the following:
  • The user who starts the pipeline is in the production Transformer user group.
  • The sharedAccessKey secret has an associated sharedAccessKey-groups secret defined in the credential store.
  • The sharedAccessKey-groups secret includes the production user group.

When Transformer is not configured to require group secrets, Transformer validates only the first point, verifying that the user belongs to the specified group.

AWS Secrets Manager

To use the AWS Secrets Manager credential store system, install the AWS Secrets Manager credential store stage library and define the configuration properties used to connect to Secrets Manager. Then, use credential functions in pipeline stage properties to retrieve stored values.

In Secrets Manager, you must configure an access and secret key pair with correct permission to read the key. To follow best practices, make secrets read-only and limit access. See the Secrets Manager documentation on identity and access management (IAM) policies.

Note: This documentation includes Secrets Manager information needed for the configuration process. For more information, see the AWS Secrets Manager documentation.

Step 1. Configure the Credential Store Properties

To enable Transformer to connect to the AWS Secrets Manager credential store, configure the Secrets Manager properties in the $TRANSFORMER_CONF/credential-stores.properties file.

  1. Uncomment the credentialStores property in the file and specify the credential store ID to use. Use only alphabetic characters for the credential store ID.

    By default, the property lists a default credential store ID for each type of credential store, aws for AWS Secrets Manager, azure for Azure Key Vault, and so on. When using one credential store of any type, it's simplest to use the default value.

    To use just a single Secrets Manager, set the value to aws.

    To enable multiple credential stores, specify comma-separated list of credential store IDs. For example, to use a Java keystore and a Secrets Manager credential store, set the value to jks,aws. To use multiple Secrets Manager credential stores, simply specify separate IDs for each, such as awsDev,awsProd.

  2. Uncomment and configure the following properties as needed.

    If you specified a custom credential store ID, update the names of the following properties, and then configure them as needed. When using the default credential store ID, aws , leave the property names intact, and simply configure the properties.

    To use multiple AWS Secrets Manager credential stores, make a copy of the properties for each credential store. Then, update the credential store ID in each set of property names before defining the properties. For an example, see Enabling Credential Stores.

    Important: Instead of entering sensitive data such as passwords in clear text in the configuration file, you can protect the sensitive data by storing the data in an external location and then using functions to retrieve the data.

    These properties are grouped in the AWS Secrets Manager section of the file:

    Secrets Manager Property Description
    credentialStore.<cstore ID>.def Required. Defines the implementation of the AWS Secrets Manager credential store.

    Do not change the default value.

    credentialStore.<cstore ID>.config.nameKey.separator Optional. Separator to use in the name argument for credential functions.
    Note: In Secrets Manager, names can contain alphanumeric and the following special characters: / _ + = . @ - . Therefore, avoid using those characters as separators.
    credentialStore.<cstore ID>.config.region Required. AWS region that hosts Secrets Manager. For a list of available regions, see the AWS Region Table.
    credentialStore.<cstore ID>.config.access.key Required. AWS access key.
    credentialStore.<cstore ID>.config.secret.key Required. AWS secret key.
    credentialStore.<cstore ID>.config.cache.max.size Optional. Maximum number of secrets Transformer can cache locally. Default is 1024.
    credentialStore.<cstore ID>.config.cache.ttl.millis Optional. Number of milliseconds that Transformer considers a cached secret valid before requiring a refresh. Default is 1 hour.
    credentialStore.<cstore ID>.config.enforceEntryGroup Optional. Requires Transformer to verify if the user who previews, validates, or starts the pipeline belongs to a group that is permitted to access the secret.

    When set to true, each secret must have a corresponding <secret name>-groups secret that contains a comma-separated list of groups that is permitted to access the secret.

    For more information, see Group Access to Secrets.

    Default is false.

  3. Restart Transformer to enable the changes.

Step 2. Call Secrets from the Pipeline

Use the credential:get() or credential:getWithOptions() function in pipeline stage properties to retrieve secrets from AWS Secrets Manager.

Use the credential functions in any stage property that displays the key icon next to it. For example:

Important: When you use a credential function in a stage or pipeline property, the function must be the only value defined in the property. For example, you cannot include another function or a literal value along with the credential function.

For details about credential functions, see Credential Functions.

Azure Key Vault

Before Transformer can connect to the Microsoft Azure Key Vault credential store system, you must complete several prerequisites in Azure so that Transformer can access the Azure Key Vault as an application.

After completing the prerequisites, install the Azure Key Vault credential store stage library and define the configuration properties used to connect to Azure Key Vault. Then, define credential functions in stage or pipeline properties to retrieve stored values.

Note: This documentation includes details about Azure Key Vault to simplify the configuration process. For more information, see the Azure Key Vault documentation.

Prerequisites

Before Transformer can connect to the Microsoft Azure Key Vault credential store system, you must complete the following prerequisites within Azure:

Register Transformer with Azure Active Directory
Use the Azure portal to register Transformer as an application in Azure Active Directory. When an application such as Transformer accesses secrets in an Azure key vault, the application must use an authentication token from Azure Active Directory.
The registration process assigns Transformer the following values, which you will specify when you configure the credential store properties:
  • application ID
  • authentication key
For more information about registering applications in Azure Active Directory, see the Azure Key Vault documentation.
Authorize Transformer to use keys or secrets in the Azure key vault
Use the Azure portal to authorize Transformer to use the keys or secrets in the Azure key vault. Azure Key Vault requires that applications be authorized to access each key vault.
For information about authorizing applications to use keys or secrets, see the Azure Key Vault documentation.

Step 1. Configure the Credential Store Properties

To enable Transformer to connect to the Azure Key Vault credential store, configure the Azure Key Vault properties in the $TRANSFORMER_CONF/credential-stores.properties file.

  1. Uncomment the credentialStores property in the file and specify the credential store ID to use. Use only alphabetic characters for the credential store ID.

    By default, the property lists a default credential store ID for each type of credential store, aws for AWS Secrets Manager, azure for Azure Key Vault, and so on. When using one credential store of any type, it's simplest to use the default value.

    To use just a single Azure Key Vault, set the value to azure.

    To enable multiple credential stores, specify comma-separated list of credential store IDs. For example, to use a Java keystore and an Azure Key Vault credential store, set the value to jks,azure. To use multiple Azure Key Vault credential stores, simply specify separate IDs for each, such as azureDev,azureProd.

  2. Uncomment and configure the following properties as needed.

    If you specified a custom credential store ID, update the names of the following properties, and then configure them as needed. When using the default credential store ID, azure, leave the property names intact, and simply configure the properties.

    To use multiple Azure Key Vault credential stores, make a copy of the properties for each credential store. Then, update the credential store ID in each set of property names before defining the properties. For an example, see Enabling Credential Stores.

    Important: Instead of entering sensitive data such as passwords in clear text in the configuration file, you can protect the sensitive data by storing the data in an external location and then using functions to retrieve the data.

    The properties are grouped in the Azure Key Vault section of the file:

    Azure Key Vault Property Description
    credentialStore.<cstore ID>.def Required. Defines the implementation of the Azure Key Vault credential store.

    Do not change the default value.

    credentialStore.<cstore ID>.config.credential.refresh.millis Optional. Number of milliseconds that Transformer locally caches a secret. When the time expires, Transformer retrieves the secret from Azure Key Vault.
    credentialStore.<cstore ID>.config.credential.retry.millis Optional. Number of milliseconds that Transformer waits before attempting to retry a retrieval of a secret from Azure Key Vault, in the case of an error.
    credentialStore.<cstore ID>.config.vault.url Required. URL to the key vault created in Azure Key Vault.

    Use the following format:

    https://<key vault name>.vault.azure.net/
    credentialStore.<cstore ID>.config.client.id Required. Application ID assigned to this Transformer when you registered Transformer as an application in Azure Active Directory, as described in Prerequisites.
    credentialStore.<cstore ID>.config.client.key Required. Authentication key assigned to this Transformer when you registered Transformer as an application in Azure Active Directory, as described in Prerequisites.
    credentialStore.<cstore ID>.config.enforceEntryGroup Optional. Requires Transformer to verify if the user who previews, validates, or starts the pipeline belongs to a group that is permitted to access the secret.

    When set to true, each secret must have a corresponding <secret name>-groups secret that contains a comma-separated list of groups that is permitted to access the secret.

    For more information, see Group Access to Secrets.

    Default is false.

  3. Restart Transformer to enable the changes.

Step 2. Call Secrets from the Pipeline

Specify credential functions in stage or pipeline properties to retrieve secrets stored in Azure Key Vault.

You can configure credential functions in any property that displays the key icon next to it. For example:

Important: When you use a credential function in a stage or pipeline property, the function must be the only value defined in the property. For example, you cannot include another function or a literal value along with the credential function.

For details about credential functions, see Credential Functions.

Java Keystore

To use the Java keystore credential store system, install the Java keystore credential store stage library and define the configuration properties used to connect to the credential store.

Use the stagelib-cli jks-credentialstore command to add secrets to the credential store. Then, use credential functions in stage or properties to retrieve those secrets.
Important: Use the Java keystore credential store system in development environments only.

A Java keystore credential storage system requires the distribution of a keystore file, which complicates security. Before using a Java keystore system, decide how the keystore will be distributed and consult with your IT security team to ensure that the system meets IT policies.

Step 1. Configure Credential Store Properties

To enable Transformer to connect to the Java keystore credential store, configure the Java keystore properties in the $TRANSFORMER_CONF/credential-stores.properties file.

  1. Uncomment the credentialStores property in the file and specify the credential store ID to use. Use only alphabetic characters for the credential store ID.

    By default, the property lists a default credential store ID for each type of credential store, aws for AWS Secrets Manager, azure for Azure Key Vault, and so on. When using one credential store of any type, it's simplest to use the default value.

    To use just a single Java keystore, set the value to jks.

    To enable multiple credential stores, specify comma-separated list of credential store IDs. For example, to use a Java keystore and a Secrets Manager credential store, set the value to jks,aws. To use multiple Java keystore credential stores, simply specify separate IDs for each, such as jksDev,jksProd.

  2. Uncomment and configure the following properties as needed.

    If you specified a custom credential store ID, update the names of the following properties, and then configure them as needed. When using the default credential store ID, jks, leave the property names intact, and simply configure the properties.

    To use multiple Java keystore credential stores, make a copy of the properties for each credential store. Then, update the credential store ID in each set of property names before defining the properties. For an example, see Enabling Credential Stores.

    Important: Instead of entering sensitive data such as passwords in clear text in the configuration file, you can protect the sensitive data by storing the data in an external location and then using functions to retrieve the data.

    These properties are grouped in the Java keystore section of the file:

    Java Keystore Property Description
    credentialStore.<cstore ID>.def Required. Defines the implementation of the Java Keystore credential store.

    Do not change the default value.

    credentialStore.<cstore ID>.config.keystore.type Required. Format of the Java keystore file:
    • JCEKS
    • PKCS12

    Default is PKCS12.

    credentialStore.<cstore ID>.config.keystore.file Required. Path and name of the Java keystore file. Enter an absolute path to the file, or a path relative to the Transformer configuration directory, $TRANSFORMER_CONF.

    Default is jks-credentialStore.pkcs12.

    credentialStore.<cstore ID>.config.keystore.storePassword Required. Password that Transformer uses to access the Java keystore file.

    You must change the default value before using the keystore file.

    To protect the password, store the password in an external location and then use a function to retrieve the password.

  3. Restart Transformer to enable the changes.

Step 2. Add Secrets to the Java Keystore

Use the stagelib-cli jks-credentialstore command to define secrets in the Java keystore file. You can define multiple secrets in the file.

Use the command from the $TRANSFORMER_DIST directory as follows:

bin/streamsets stagelib-cli jks-credentialstore add -i <cstore ID> -n <secret name> -c <secret value>

For example, the following command adds a secret named OracleDBPassword with the value 278yT6u to the jks Java keystore credential store:

bin/streamsets stagelib-cli jks-credentialstore add -i jks -n OracleDBPassword -c 278yT6u
Note: The stagelib-cli jks-credentialstore command also includes delete and list subcommands that you use to manage the secrets defined in the keystore file. For information on using these commands, see jks-credentialstore Command.

Step 3. Call Secrets from the Pipeline

Specify the credential:get() function in stage or pipeline properties to call secrets from the Java keystore.

You can configure a credential function in any property that displays the key icon next to it. For example:

Important: When you use a credential function in a stage or pipeline property, the function must be the only value defined in the property. For example, you cannot include another function or a literal value along with the credential function.
For details about the credential:get() function, see Credential Functions.

jks-credentialstore Command

The stagelib-cli jks-credentialstore command provides subcommands to add, list, and delete secrets in the Java keystore credential store.

Any changes made to the Java keystore file take effect immediately. For example, if you change the value of an existing secret in the file, running pipelines that require a new connection to the external system use the updated value.

You can use the following subcommands with the stagelib-cli jks-credentialstore command:
add
Adds a secret to the Java keystore credential store.
Use the command from the $TRANSFORMER_DIST directory as follows:
bin/streamsets stagelib-cli jks-credentialstore add \
(-i <cstore ID> | --id <cstore ID>) \
(-n <secret name> | --name <secret name>) \
(-c <secret value> | --credential <secret value>)
Add Option Description
-i <cstore ID>

or

--id <cstore ID>

Required. Unique ID for the credential store.

The default ID for a Java keystore is jks.

-n <secret name>

or

--name <secret name>

Required. Name of the secret to add to the Java keystore credential store.

If the name includes non-alphanumeric characters, use single quotation marks around the name.

-c <secret value>

or

--credential <secret value>

Required. Value of the secret to add to the Java keystore credential store.

If the value includes non-alphanumeric characters, use single quotation marks around the value.

For example, the following command adds a secret named OracleDBPassword with the value df35yT_&5 to the devjks Java keystore credential store:

bin/streamsets stagelib-cli jks-credentialstore add -i devjks -n OracleDBPassword -c 'df35yT_&5'
delete
Deletes a secret from the Java keystore credential store.
Use the command from the $TRANSFORMER_DIST directory as follows:
bin/streamsets stagelib-cli jks-credentialstore delete \
(-i <cstore ID> | --id <cstore ID>) \
(-n <secret name> | --name <secret name>)
Delete Option Description
-i <cstore ID>

or

--id <cstore ID>

Required. Unique ID for the credential store.

The default ID for a Java keystore is jks.

-n <secret name>

or

--name <secret name>

Required. Name of the secret to delete from the Java keystore credential store.

If the name includes non-alphanumeric characters, use single quotation marks around the name.

For example, the following command deletes a secret named SQLServerDBPassword from the devjks Java keystore credential store:
bin/streamsets stagelib-cli jks-credentialstore delete -i devjks -n SQLServerDBPassword
list
Lists the names of all secrets defined in the Java keystore credential store. The command does not list the secret values.
Use the command from the $TRANSFORMER_DIST directory as follows:
bin/streamsets stagelib-cli jks-credentialstore list \
(-i <cstore ID> | --id <cstore ID>)
List Option Description
-i <cstore ID>

or

--id <cstore ID>

Required. Unique ID for the credential store.

The default ID for a Java keystore is jks.

For example, the following command lists the names of all secrets defined in the devjks Java keystore credential store:
bin/streamsets stagelib-cli jks-credentialstore list -i devjks