Amazon S3 Executor

The Amazon S3 executor performs a task in Amazon S3 each time it receives an event.

Upon receiving an event, the executor can perform one of the following tasks:
  • Create a new Amazon S3 object for the specified content
  • Copy an object under 5 GB to another location in the same bucket and optionally delete the original object
  • Adds tags to an existing object

Each Amazon S3 executor can perform one type of task. To perform additional tasks, use additional executors.

Use the Amazon S3 executor as part of an event stream. You can use the executor in any logical way, such as writing information from an event record to a new S3 object, or copying or tagging objects after they are written by the Amazon S3 destination.

When you configure the Amazon S3 executor, you specify the connection information, such as access keys, region, and bucket. You configure the expression that represents the object name and location. When creating new objects, you specify the content to place in the objects. When copying objects, you specify the location of the object and the location for the copy. You can also configure the executor to delete the original object after it is copied. When adding tags to an existing object, you specify the tags that you want to use.

You can optionally use an HTTP proxy to connect to Amazon S3.

AWS Credentials

When Data Collector uses the Amazon S3 executor, it must pass credentials to Amazon Web Services.

Use one of the following methods to pass AWS credentials:

IAM roles
When Data Collector runs on an Amazon EC2 instance, you can use the AWS Management Console to configure an IAM role for the EC2 instance. Data Collector uses the IAM instance profile credentials to automatically connect to AWS.
When you use IAM roles, you do not need to specify the Access Key ID and Secret Access Key properties in the destination.
For more information about assigning an IAM role to an EC2 instance, see the Amazon EC2 documentation.
AWS access key pairs

When Data Collector does not run on an Amazon EC2 instance or when the EC2 instance doesn’t have an IAM role, you must specify the Access Key ID and Secret Access Key properties in the destination.

Tip: To secure sensitive information such as access key pairs, you can use runtime resources or credential stores.

Create New Objects

You can use the Amazon S3 executor to create new Amazon S3 objects and write the specified content to the object when the executor receives an event record.

When you create an object, you specify where to create the object and the content to write to the object. You can use an expression to represent both the location for the object and the content to use.

For example, say you want the executor to create a new Amazon S3 object for each object that the Amazon S3 destination writes, and to use the new object to store the record count information for each written object. Since the object-written event record includes the record count, you can enable the destination to generate records and route the event to the Amazon S3 executor.

The object-written event record includes the bucket and object key of the written object. So, to create a new record-count object in the same bucket as the written object, you can use the following expression for the Object property, as follows:
${record:value('/bucket')}/${record:value('/objectKey')}.recordcount
The event record also includes the number of records written to the object. So, to write this information to the new object, you can use the following expression for the Content property, as follows:
${record:value('/recordCount')}
Tip: Stage-generated event records differ from stage to stage. For a description of stage events, see "Event Record" in the documentation for the event-generating stage. For a description of pipeline events, see Pipeline Event Records.

Copy Objects

You can use the Amazon S3 executor to copy an object to another location within the same bucket when the executor receives an event record. You can optionally delete the original object after the copy. The object must be under 5 GB in size.

When you copy an object, you specify the location of the object to be copied, and the location for the copy. The target location must be within the same bucket as the original object. You can use an expression to represent both locations. You can also specify whether to delete the original object.

A simple example is to move each written object to a Completed directory after it is closed. To do this, you configure the Amazon S3 destination to generate events. Since the object-written event record includes the bucket and object key, you can use that information to configure the Object property, as follows:
${record:value('/bucket')}/${record:value('/objectKey')}
Then, to move the object to a Completed directory, retaining the same object name, you can configure the New Object Path property, as follows:
${record:value('/bucket')}/completed/${record:value('/objectKey')}

You can then select Delete Original Object to remove the original object.

To do something more complicated, like move only the subset of objects with a _west suffix to a different location, you can add a Stream Selector processor in the event stream to route only events where the /objectKey field includes a _west suffix to the Amazon S3 executor.

Tag Existing Objects

You can use the Amazon S3 executor to add tags to existing Amazon S3 objects. Tags are key-value pairs that you can use to categorize objects, such as product: <product>.

You can configure multiple tags. When you configure a tag, you can define a tag with just the key or specify a key and value. You can also use expressions to define tag values.

For example, you can use an expression to specify the number of records that were written to an object based on the recordCount field in the event record, as follows:
key: processed records
value: ${record:value('/recordCount')}

For more information about tags, including Amazon S3 restrictions, see the Amazon S3 documentation.

Configuring an Amazon S3 Executor

Configure an Amazon S3 executor to create new Amazon S3 objects or to add tags to existing objects.

  1. In the Properties panel, on the General tab, configure the following properties:
    General Property Description
    Name Stage name.
    Description Optional description.
    Required Fields Fields that must include data for the record to be passed into the stage.
    Tip: You might include fields that the stage uses.

    Records that do not include all required fields are processed based on the error handling configured for the pipeline.

    Preconditions Conditions that must evaluate to TRUE to allow a record to enter the stage for processing. Click Add to create additional preconditions.

    Records that do not meet all preconditions are processed based on the error handling configured for the stage.

    On Record Error Error record handling for the stage:
    • Discard - Discards the record.
    • Send to Error - Sends the record to the pipeline for error handling.
    • Stop Pipeline - Stops the pipeline.
  2. On the Amazon S3 tab, configure the following properties:
    Amazon S3 Property Description
    Access Key ID

    AWS access key ID.

    Required when not using IAM roles with IAM instance profile credentials.

    Secret Access Key

    AWS secret access key.

    Required when not using IAM roles with IAM instance profile credentials.

    Region Amazon S3 region.
    Endpoint Endpoint to connect to when you select Other for the region. Enter the endpoint name.
    Bucket Bucket that contains the objects to be created, copied, or updated.
    Note: The bucket name must be DNS compliant. For more information about bucket naming conventions, see the Amazon S3 documentation.
  3. On the Tasks tab, configure the following properties:
    Task Property Description
    Task Task to perform upon receiving an event record. Select one of the following options:
    • Create New Object - Use to create a new S3 object with the configured content.
    • Copy Object - Use to copy a closed S3 object to another location in the same bucket.
    • Add Tags to Existing Object - Use to add tags to a closed S3 object.
    Object Path to the object to use. To use the object whose closure generated the event record, use the following expression:
    ${record:value('/bucket')}/${record:value('/objectKey)}
    To use a whole file whose closure generated the event record, use the following expression:
    ${record:value('/targetFileInfo/bucket')}/${record:value('/targetFileInfo/objectKey)}
    Content The content to write to new objects. You can use expressions to represent the content to use. For more information, see Create New Objects.
    New Object Path Path for the copied object. You can use expressions to represent the location and name of the object. For more information, see Copy Objects.
    Tags The tags to add to an existing object. Using simple or bulk edit mode, click the Add icon to configure a tag.

    You can configure multiple tags. When you configure a tag, you can define a tag with just the key or specify a key and value. You can also use expressions to define tag values.

  4. To use an HTTP proxy, on the Advanced tab, configure the following properties:
    Advanced Property Description
    Use Proxy Specifies whether to use a proxy to connect.
    Proxy Host Proxy host.
    Proxy Port Proxy port.
    Proxy User User name for proxy credentials.
    Proxy Password Password for proxy credentials.
    Tip: To secure sensitive information such as user names and passwords, you can use runtime resources or credential stores.