Accessing Hashicorp Vault Secrets

Data Collector can access information, a.k.a. secrets, stored in Hashicorp Vault.

You can use Vault secrets in place of username and password properties, similar properties such as AWS access key IDs and secret access keys, and in HTTP headers and bodies when using HTTPS.

To access to Vault secrets, you must authorize Data Collector with Vault. After authorization, you can use expressions in the pipeline that access Vault at run time.

To access information stored in Vault, perform the following steps:
  1. Configure the Vault properties file.
  2. Authorize the Data Collector in Vault.
  3. In the pipeline, use an expression to access Vault secrets.
Note: This documentation includes details about Hashicorp Vault to simplify the installation and configuration process. For more information, see the Vault documentation.

Step 1. Configure Vault Properties

To enable Data Collector to connect to Vault, configure the Data Collector Vault properties file, $SDC_CONF/vault.properties.

The Vault server URL and App ID are required properties. Configure other properties as needed.

The file includes the following properties. Uncomment any properties that you want to use:
Vault Property Description
vault.addr Required. Vault server URL. Use HTTPS to avoid unencrypted communication.
vault.app.id Required. App ID for Data Collector. The App ID must exist in Vault and should be a UUID or a similarly complex string to ensure better security.
vault.lease.renewal.interval.sec Seconds to wait before checking for leases that need renewal.

Default is 60.

vault.lease.expiration.buffer.sec Buffer for expiring leases. Data Collector renews leases that expire in less than the specified number of seconds.

Default is 120.

vault.proxy.address Optional proxy URL. Configure to use a proxy to access Vault.
vault.proxy.port Optional proxy port. Configure to use a proxy to access Vault.
vault.proxy.username Optional proxy username. Configure to use a proxy to access Vault.
vault.proxy.password Optional proxy password. Configure to use a proxy to access Vault.
vault.read.timeout Milliseconds to wait for data before timing out.

Default is 0 for no limit.

vault.ssl.enabled.protocols SSL/TLS-enabled protocols. Versions TLSv1.2 or later are recommended.
vault.ssl.truststore.file Path to a Java TrustStore file. Required when using a private CA or certificates not trusted by the Java default TrustStore.
vault.ssl.truststore.password Password for the TrustStore file.
vault.ssl.verify Whether to verify that the Vault server hostname matches its certificate.

Default is true. False is not recommended.

vault.ssl.timeout Timeout for the SSL/TLS handshake in milliseconds.

Default is 0 for no limit.

vault.open.timeout Connection timeout for requests to Vault in milliseconds.

Default is 0 for no limit.

Step 2. Authorize Data Collector in Vault

The Data Collector Vault integration relies on Vault's App ID auth backend. The App ID auth backend requires authorizing a combination of User ID and App ID.

Each Data Collector has a unique User ID based on the host it resides on. To determine the User ID, run the following command:
bin/streamsets show-vault-id

After determining the User ID for Data Collector, authorize it in Vault with the appropriate App ID. For more information, see the Vault documentation.

If the Data Collector is moved to another host, verify the new User ID and authorize it in Vault.

Step 3. Call Vault from the Pipeline

After enabling Data Collector to access Vault and authorizing Data Collector with Vault, you can use expressions in pipeline and stage properties to access Vault secrets.

The expression language provides Vault functions to return Vault secrets. You can use Vault functions in username, password, and similar properties such as AWS access key IDs and secret access keys. You can also use the functions in HTTP headers and bodies when using HTTPS.

You can use the following functions to access Vault secrets:
vault:read()
Use to return the value for the Vault path and key that you provide. Typically, you'll use this function to access secrets.
vault:readWithDelay()
Use to return the value for the specified Vault path and key with a delay. Use this function to incorporate a delay in the response to allow time for other processes to complete.
For example, you should use this function when using the Vault AWS secret backend to generate AWS access credentials based on IAM policies. According to Vault documentation, you might need a delay of 10 seconds or more before the credentials can be used successfully.

For more information, see Miscellaneous Functions.