Kinesis Firehose

Supported pipeline types:
  • Data Collector

  • Data Collector Edge

The Kinesis Firehose destination writes data to an Amazon Kinesis Firehose delivery stream. Firehose automatically delivers the data to the Amazon S3 bucket or Amazon Redshift table that you specify in the delivery stream.

To write data to Amazon Kinesis Streams, use the Kinesis Producer destination. To write data directly to Amazon S3, use the Amazon S3 destination.

When you use the Kinesis Firehose destination to deliver data to Amazon S3, Firehose can buffer incoming records into larger file sizes before delivering the data to Amazon S3. You configure the buffer size and buffer interval when you create the delivery stream.

When you configure the Kinesis Firehose destination, you specify an existing delivery stream to write to, AWS credentials and region, and the data format to use.

AWS Credentials

When Data Collector writes data to a Kinesis Firehose destination, it must pass credentials to Amazon Web Services.

Use one of the following methods to pass AWS credentials:

IAM roles
When Data Collector runs on an Amazon EC2 instance, you can use the AWS Management Console to configure an IAM role for the EC2 instance. Data Collector uses the IAM instance profile credentials to automatically connect to AWS.
When you use IAM roles, you do not need to specify the Access Key ID and Secret Access Key properties in the destination.
For more information about assigning an IAM role to an EC2 instance, see the Amazon EC2 documentation.
AWS access key pairs

When Data Collector does not run on an Amazon EC2 instance or when the EC2 instance doesn’t have an IAM role, you must specify the Access Key ID and Secret Access Key properties in the destination.

Tip: To secure sensitive information such as access key pairs, you can use runtime resources or credential stores.

Delivery Stream

The Kinesis Firehose destination writes data to an existing delivery stream in Amazon Kinesis Firehose. Before using the Kinesis Firehose destination, use the AWS Management Console to create a delivery stream to an Amazon S3 bucket or Amazon Redshift table.

For more information about creating a Firehose delivery stream, see the Amazon Kinesis Firehose documentation.

Data Formats

The Kinesis Firehose destination writes data to a Kinesis Firehose delivery stream based on the data format that you select.

In Data Collector Edge pipelines, the origin supports only the JSON data format.

The Kinesis Firehose destionation processes data formats as follows:

Delismited
The destination writes records as delimited data. When you use this data format, the root field must be list or list-map.
You can use the following delimited format types:
  • Default CSV - File that includes comma-separated values. Ignores empty lines in the file.
  • RFC4180 CSV - Comma-separated file that strictly follows RFC4180 guidelines.
  • MS Excel CSV - Microsoft Excel comma-separated file.
  • MySQL CSV - MySQL comma-separated file.
  • PostgreSQL CSV - PostgreSQL comma-separated file.
  • PostgreSQL Text - PostgreSQL text file.
  • Tab-Separated Values - File that includes tab-separated values.
  • Custom - File that uses user-defined delimiter, escape, and quote characters.
JSON
The destination writes records as JSON data. Use the multiple objects format, where each file includes multiple JSON objects. Each object is a JSON representation of a record.
Note: The JSON array of objects format is not supported for the Kinesis Firehose destination.

Configuring a Kinesis Firehose Destination

Configure a Kinesis Firehose destination to write data to an Amazon Kinesis Firehose delivery stream.

  1. In the Properties panel, on the General tab, configure the following properties:
    General Property Description
    Name Stage name.
    Description Optional description.
    Required Fields Fields that must include data for the record to be passed into the stage.
    Tip: You might include fields that the stage uses.

    Records that do not include all required fields are processed based on the error handling configured for the pipeline.

    Preconditions Conditions that must evaluate to TRUE to allow a record to enter the stage for processing. Click Add to create additional preconditions.

    Records that do not meet all preconditions are processed based on the error handling configured for the stage.

    On Record Error Error record handling for the stage:
    • Discard - Discards the record.
    • Send to Error - Sends the record to the pipeline for error handling.
    • Stop Pipeline - Stops the pipeline. Not valid for cluster pipelines.
  2. On the Kinesis tab, configure the following properties:
    Kinesis Property Description
    Access Key ID

    AWS access key ID.

    Required when not using IAM roles with IAM instance profile credentials.

    Secret Access Key

    AWS secret access key.

    Required when not using IAM roles with IAM instance profile credentials.

    Destination Type Type of Amazon destination to write to. Select Existing Stream.
    Region Amazon Web Services region.
    Endpoint Endpoint to connect to when you select Other for the region. Enter the endpoint name.
    Stream Name Existing delivery stream to write to.

    Use the AWS Management Console to create the delivery stream to an Amazon S3 bucket or Amazon Redshift table.

    Maximum Record Size (KB) Maximum size of a single record. When records exceed this size, the destination handles the records based on the error record handling configured for the stage.
    Warning: A Firehose record can have a maximum size of 1,000 KB. If you configure a maximum size larger than 1,000 KB, Firehose does not accept any data written by the destination.
  3. On the Data Format tab, configure the following property:
    Data Format Property Description
    Data Format Data format to use. Use one of the following data formats:
    • Delimited
    • JSON

    In Data Collector Edge pipelines, the origin supports only the JSON data format.

  4. For delimited data, on the Data Format tab, configure the following properties:
    Delimited Property Description
    Delimiter Format Format for delimited data:
    • Default CSV - File that includes comma-separated values. Ignores empty lines in the file.
    • RFC4180 CSV - Comma-separated file that strictly follows RFC4180 guidelines.
    • MS Excel CSV - Microsoft Excel comma-separated file.
    • MySQL CSV - MySQL comma-separated file.
    • PostgreSQL CSV - PostgreSQL comma-separated file.
    • PostgreSQL Text - PostgreSQL text file.
    • Tab-Separated Values - File that includes tab-separated values.
    • Custom - File that uses user-defined delimiter, escape, and quote characters.
    Header Line Indicates whether to create a header line.
    Replace New Line Characters Replaces new line characters with the configured string.

    Recommended when writing data as a single line of text.

    New Line Character Replacement String to replace each new line character. For example, enter a space to replace each new line character with a space.

    Leave empty to remove the new line characters.

    Delimiter Character Delimiter character for a custom delimiter format. Select one of the available options or use Other to enter a custom character.

    You can enter a Unicode control character using the format \uNNNN, where ​N is a hexadecimal digit from the numbers 0-9 or the letters A-F. For example, enter \u0000 to use the null character as the delimiter or \u2028 to use a line separator as the delimiter.

    Default is the pipe character ( | ).

    Escape Character Escape character for a custom delimiter format. Select one of the available options or use Other to enter a custom character.

    Default is the backslash character ( \ ).

    Quote Character Quote character for a custom delimiter format. Select one of the available options or use Other to enter a custom character.

    Default is the quotation mark character ( " ).

    Charset Character set to use when writing data.
  5. For JSON data, on the Data Format tab, configure the following property:
    JSON Property Description
    JSON Content Determines how JSON data is written. Select Multiple JSON Objects. Each file includes multiple JSON objects. Each object is a JSON representation of a record.
    Note: The JSON array of objects format is not supported for the Kinesis Firehose destination.
    Charset Character set to use when writing data.