Encrypt and Decrypt Fields

The Encrypt and Decrypt Fields processor encrypts or decrypts field values.

You can use the processor to encrypt one or more fields in a record. You can also use the processor to decrypt one or more fields that were encrypted by another Encrypt and Decrypt Fields processor. You cannot use the processor to perform encryption and decryption at the same time. Use an additional processor when you want to perform both tasks.

The Encrypt and Decrypt Fields processor uses the Amazon AWS Encryption SDK to encrypt and decrypt fields. When encrypting fields, the processor encrypts the data key and any additional encryption details, and stores the encrypted details along with the encrypted data. When decrypting fields, the processor extracts the encrypted data key and additional details, decrypts the key, and then uses it to decrypt the data.

You can use Amazon AWS Key Management Service (KMS) as a key provider for the processor, or you can supply the data key in the processor configuration properties. When using Amazon AWS KMS, you specify the KMS Key Amazon Resource Name (ARN). You can use IAM roles or AWS access key pairs to connect to Amazon AWS. When using a user-supplied key, you specify a Base64 encoded key and can optionally configure a key ID.

For both key provider types, you specify the cipher suite and frame size to use. When encrypting data, you can optionally define an encryption context and configure data key caching.

Note: When decrypting fields that were encrypted by an Encrypt and Decrypt Fields processor, you need to use the same key provider, cipher suite, and any additional details, such as encryption contexts, that were used by the processor that encrypted the data.

For information about the structure of encrypted data, see the AWS Encryption SDK documentation.

Supported Data Types

When encrypting a field, the Encrypt and Decrypt Fields processor includes the data type of the field in the encrypted data. When decrypting the same field, the processor restores the field to its original data type.

The Encrypt and Decrypt Fields processor can encrypt or decrypt string or byte array data. So you can use the processor to encrypt or decrypt data that can be converted to string or byte array.

You can use the Encrypt and Decrypt Fields processor to encrypt or decrypt the following data types:

  • Boolean
  • Byte
  • Byte Array
  • Character
  • Date
  • Datetime
  • Decimal
  • Double
  • Float
  • Integer
  • Long
  • Short
  • String
  • Time
  • Zoned Datetime

Key Provider

When you use the Encrypt and Decrypt Fields processor, you specify the key provider for the stage.

You can use Amazon AWS Key Management System (KMS) as the key provider or you can use your own user-supplied key:

Amazon AWS KMS
Uses a master key provided by the AWS KMS service.
Requires configuring the KMS Key ARN property in the processor to identify the Amazon Resource Name (ARN) for the Customer Master Keys (CMK). For information about locating the key ARN, see the AWS KMS documentation.
You can optionally use AWS Access Key ID and Secret Access Key to connect to AWS.
User supplied key
Requires specifying a Base64 encoded master key.

You can use credential functions to use a key from a supported credential store. You can also use the base64EncodeString() function to encode the string returned by the function.

The length of the encoded key must match the length expected by the selected cipher. For example, when using a 256-bit (32 bytes) cipher suite, the key must be 32 bytes in length.

You can optionally include a string key ID to be used when encrypting the data.

AWS Credentials

When you use Amazon AWS KMS as the key provider, Data Collector must pass credentials to AWS.

Use one of the following methods to pass AWS credentials:

IAM roles
When Data Collector runs on an Amazon EC2 instance, you can use the AWS Management Console to configure an IAM role for the EC2 instance. Data Collector uses the IAM instance profile credentials to automatically connect to AWS.
When you use IAM roles, you do not need to specify the Access Key ID and Secret Access Key properties in the origin.
For more information about assigning an IAM role to an EC2 instance, see the Amazon EC2 documentation.
AWS access key pairs
When Data Collector does not run on an Amazon EC2 instance or when the EC2 instance doesn’t have an IAM role, you must specify the Access Key ID and Secret Access Key properties in the origin.
Tip: To secure sensitive information such as access key pairs, you can use runtime resources or credential stores.

Cipher Suite

When you use the Encrypt and Decrypt Fields processor, you specify the cipher suite to use. The processor uses the selected cipher suite to encrypt or decrypt the data.

The processor provides the following cipher suites for processing:
  • ALG_AES_256_GCM_IV12_TAG16_HKDF_SHA384_ECDSA_P384 (default)
  • ALG_AES_192_GCM_IV12_TAG16_HKDF_SHA384_ECDSA_P384

  • ALG_AES_128_GCM_IV12_TAG16_HKDF_SHA256_ECDSA_P256

  • ALG_AES_256_GCM_IV12_TAG16_HKDF_SHA256 (no signature)

  • ALG_AES_192_GCM_IV12_TAG16_HKDF_SHA256 (no signature)

  • ALG_AES_128_GCM_IV12_TAG16_HKDF_SHA256 (no signature)

  • ALG_AES_256_GCM_IV12_TAG16_NO_KDF (not recommended)

  • ALG_AES_192_GCM_IV12_TAG16_NO_KDF (not recommended)

  • ALG_AES_128_GCM_IV12_TAG16_NO_KDF (not recommended)

For an overview of how the AWS Encryption SDK supports cipher suites, see the AWS Encryption SDK documentation. The documentation also provides additional details about cipher suites.

Encryption Contexts

You can specify encryption contexts to be included in the encrypted data. Encryption contexts, also known as additional authenticated data (AAD), are key value pairs that are encrypted and included with the encrypted data.

Optionally use encryption contexts as an additional tool to prevent tampering with encrypted data.

When used to encrypt data, the encryption contexts are required to decrypt the data as well.

Data Key Caching

By default, the Encrypt and Decrypt Fields processor generates a new data key for each encryption operation. You can enable caching and reusing data keys to increase pipeline performance when security considerations allow.

Consider the possible security ramifications before enabling data key caching. This AWS blog post describes some of the issues to consider. For details on how data key caching works, see the AWS Encryption SDK documentation.

When you enable data key caching, you configure the following properties:
  • Cache Capacity
  • Max Data Key Age
  • Records per Data Key
  • Bytes per Data Key

Encrypt and Decrypt Records

You can use the Encrypt and Decrypt Fields processor to encrypt or decrypt a whole record by serializing the record to a single field before passing it to the processor.

You can use the Data Generator processor to serialize the record to the root field of the record. When you configure the Data Generator processor, you specify the data format to use for the serialized record. Use a text-based format, such as JSON, which results in a String field, or a binary format such as Avro which results in a Byte Array field.

Configuring an Encrypt and Decrypt Field Processor

Configure an Encrypt and Decrypt Field processor to encrypt or decrypt field values.
  1. In the Properties panel, on the General tab, configure the following properties:
    General Property Description
    Name Stage name.
    Description Optional description.
    Required Fields Fields that must include data for the record to be passed into the stage.
    Tip: You might include fields that the stage uses.

    Records that do not include all required fields are processed based on the error handling configured for the pipeline.

    Preconditions Conditions that must evaluate to TRUE to allow a record to enter the stage for processing. Click Add to create additional preconditions.

    Records that do not meet all preconditions are processed based on the error handling configured for the stage.

    On Record Error Error record handling for the stage:
    • Discard - Discards the record.
    • Send to Error - Sends the record to the pipeline for error handling.
    • Stop Pipeline - Stops the pipeline. Not valid for cluster pipelines.
  2. On the Action tab, configure the following properties:
    Action Property Description
    Mode The action for the processor to perform: encrypting or decrypting data in the specified fields.
    Fields Field paths for the fields to encrypt.
    Tip: To encrypt an entire record, you can use a Data Generator processor earlier in the pipeline to serialize the record to a single field.
  3. On the Key Provider tab, configure the following properties:
    Key Provider Property Description
    Master Key Provider The data key provider for encoding or decoding data:
    • Amazon AWS KMS - Uses data keys from the Amazon AWS Key Management Service.
    • User Supplied Key - Uses a Base64 encoded key specified in the processor.
    Cipher The cipher suite to use for encoding or decoding data.
    Frame Size The frame size. For more information, see the AWS Encryption SDK documentation.
    Access Key ID

    AWS access key ID.

    Required when using Amazon AWS KMS for the key provider and not using IAM roles with IAM instance profile credentials.

    Secret Access Key AWS secret access key.

    Required when using the Amazon AWS KMS for the key provider and not using IAM roles with IAM instance profile credentials.

    KMS Key ARN The Amazon Resource Name (ARN) for the KMS key. Required when using the Amazon AWS KMS key provider.

    For information about locating the key ARN, see the AWS KMS documentation.

    Base64 Encoded Key The Base64 encoded data key to use when using a user-supplied key.

    You can use credential functions to use a key from a supported credential store. You can also use the base64EncodeString() function to encode the string returned by the function.

    The length of the encoded key must match the length expected by the selected cipher. For example, when using a 256-bit (32 bytes) cipher suite, the key must be 32 bytes in length.

    Key ID An optional key ID to use in addition to the Base64 encoded key when using a user-supplied key.

    Enter a string value.

    Encryption Context (AAD) Key value pairs to be used as encryption contexts, also known as additional authenticated data.
    Data Key Caching Enables caching and reusing data keys. Use to improve pipeline performance when security considerations allow.
    Cache Capacity The maximum number of keys to cache in memory.
    Max Data Key Age The maximum number of seconds that a data key can be used before the data key is retired.
    Max Records per Data Key The maximum number of fields that a data key can encrypt before the data key is retired.
    Max Bytes per Data Key The maximum number of bytes that a data key can be used to encrypt before the data key is retired.