Google BigQuery

The Google BigQuery destination streams data into Google BigQuery.

You can use other destinations to write to Google Cloud Storage and Google Pub/Sub.

When you configure the destination, you define the existing BigQuery dataset and table to stream data into. The destination streams each record into a row in a BigQuery table. You can optionally define an expression to specify the insert ID to insert or update. The insert ID is a unique ID for each row. If you do not specify an insert ID, the destination inserts each record into a new row.

The destination maps fields from records to BigQuery columns based on matching names and compatible data types. You can configure the destination to ignore invalid columns when the destination cannot map a field to an existing BigQuery column. You can also configure the table cache size for the destination.

You also define the project and credentials to use to connect to Google BigQuery.

For more information about streaming data into Google BigQuery, see the Google BigQuery documentation.

Credentials

When the Google BigQuery destination streams data into Google BigQuery, it must pass credentials to BigQuery using a Google Cloud service account credentials file.

Complete the following steps to configure the destination to use a service account credentials file:

  1. Generate a service account credentials file in JSON format.

    Use the Google Cloud Platform Console or the gcloud command-line tool to generate and download the credentials file. For more information about generating a credentials file, see the Google Cloud Platform documentation.

  2. On the Credentials tab for the stage, select Service Account Credentials (JSON) for the Credentials Provider property, then paste the full contents of the credentials file into the Credentials File Content (JSON) property.

BigQuery Data Types

The Google BigQuery destination maps fields from records to BigQuery columns in existing tables based on matching names and compatible data types. If needed, the destination converts StreamSets Cloud data types to BigQuery data types.

The following table lists the StreamSets Cloud data types and the BigQuery data types that the destination converts them to:

StreamSets Cloud Data Type BigQuery Data Type
Boolean Boolean
Byte Array Bytes
Date Date
Datetime Datetime or Timestamp
Double Float
Float Float
Integer Integer
List Array
List_Map Record with repeated fields
Long Integer
Map Record
Short Integer
String String
Time Time
The Google BigQuery destination cannot convert the following StreamSets Cloud data types:
  • Byte
  • Character
  • Decimal

Configuring a Google BigQuery Destination

Configure a Google BigQuery destination to stream data into Google BigQuery.

  1. In the properties pane, on the General tab, configure the following properties:
    General Property Description
    Name Stage name.
    Description Optional description.
    Required Fields Fields that must include data for the record to be passed into the stage.
    Tip: You might include fields that the stage uses.

    Records that do not include all required fields are processed based on the error handling configured for the pipeline.

    Preconditions Conditions that must evaluate to TRUE to allow a record to enter the stage for processing. Click Add to create additional preconditions.

    Records that do not meet all preconditions are processed based on the error handling configured for the stage.

    On Record Error Error record handling for the stage:
    • Discard - Discards the record.
    • Send to Error - Sends the record to the pipeline for error handling.
    • Stop Pipeline - Stops the pipeline.
  2. On the BigQuery tab, configure the following properties:
    BigQuery Property Description
    Dataset BigQuery dataset to write to.
    Enter the name of an existing dataset or an expression that evaluates to the name of an existing dataset. For example, if the dataset name is stored in the 'dataset' record attribute, enter the following expression:
    ${record:attribute('dataset')}
    Table Name Name of the BigQuery table to write to.
    Enter the name of an existing table or an expression that evaluates to the name of an existing table. For example, if the table name is stored in the 'table' record attribute, enter the following expression:
    ${record:attribute('table')}
    Insert ID Expression Expression that evaluates to the BigQuery insert ID to insert or update. The insert ID is a unique ID for each row. Leave blank to insert each record into a new row.

    For more information about the insert ID property used to stream data into BigQuery, see the Google BigQuery documentation.

    Ignore Invalid Column Ignores invalid columns.

    If selected and the destination encounters a field path that cannot be mapped to a BigQuery column with the same name and a compatible data type, the destination ignores the invalid column and writes the remaining fields in the record to BigQuery. If cleared and the destination encounters an invalid column, the record is sent to the stage for error handling.

    Table Cache Size Maximum number of table ID entries to cache locally.

    When the destination evaluates the dataset and table name to write to, it checks that the table ID exists in BigQuery and then caches the table ID. When possible, the destination uses the cache to avoid unnecessary retrievals from BigQuery. When the cache reaches the maximum size, the oldest cached entries are evicted to allow for new data.

    Default is -1, an unlimited cache size.

  3. On the Credentials tab, configure the following properties:
    Credentials Property Description
    Project ID Google BigQuery project ID to connect to.
    Credentials Provider Select Service Account Credentials (JSON).

    The remaining options are not supported at this time.

    Credentials File Content (JSON) Google Cloud service account credentials file in JSON format used to connect to Google BigQuery.
    Use a text editor to open the credentials file, and then copy and paste the full contents of the file into the property.
    Note: When you enter secrets such as the credentials file content, the stage encrypts the secret values.