Cassandra

Supported pipeline types:
  • Data Collector

The Cassandra destination writes data to a Cassandra cluster.

When you configure the Cassandra destination, you define connection information and map incoming fields to columns in the Cassandra table. You also configure whether the destination writes each batch to Cassandra as a logged batch or an unlogged batch.

You configure whether the destination uses no authentication or username and password authentication to access the Cassandra cluster. If you install the DataStax Enterprise (DSE) Java driver, you can configure the destination to use DSE username and password authentication or Kerberos authentication.

You can also enable SSL/TLS for the connection.

Batch Type

The Cassandra destination can write batches to a Cassandra cluster using one of the following batch types:

Logged
Logged batches written to Cassandra use the Cassandra distributed batch log and are atomic. This means that the destination can only write entire batches of records to Cassandra. If an error occurs with one or more records in a batch, the destination fails the entire batch. When a batch fails, all records are sent to the stage for error handling.
Unlogged
Unlogged batches written to Cassandra do not use the Cassandra distributed batch log and are nonatomic. This means that the destination can write partial batches of records to Cassandra. If an error occurs with one or more records in a batch, the destination sends only those failed records to the stage for error handling. The destination writes the remaining records in the batch to Cassandra.

By default, the destination uses the logged batch type.

For more information about the Cassandra distributed batch log, see the Cassandra Query Language (CQL) documentation.

Authentication

Configure the Cassandra destination to use one of the following authentication providers to access the Cassandra cluster:

  • None - Performs no authentication.
  • Username/Password - Uses Cassandra username and password authentication.
  • Username/Password (DSE) - Uses DataStax Enterprise username and password authentication. Requires that you install the DSE Java driver.
  • Kerberos (DSE) - Uses Kerberos authentication. Requires that you install the DSE Java driver.

Before selecting one of the DSE authentication providers, install the DSE Java driver version 1.2.4 or later. For a compatibility matrix, see the Cassandra documentation. For information about installing additional drivers, see Install External Libraries.

Kerberos (DSE) Authentication

If you install the DSE Java driver, you can use Kerberos authentication to connect to a Cassandra cluster. When you use Kerberos authentication, Data Collector uses the Kerberos principal and keytab to connect to the cluster. By default, Data Collector uses the user account who started it to connect.

The Kerberos principal and keytab are defined in the Data Collector configuration file, $SDC_CONF/sdc.properties. To use Kerberos authentication, configure all Kerberos properties in the Data Collector configuration file, install the DSE Java driver, and then enable Kerberos (DSE) authentication in the Cassandra destination.

Cassandra Data Types

Due to Cassandra requirements, the data types of the incoming fields must match the data types of the corresponding Cassandra columns. When appropriate, use a Field Type Converter processor earlier in the pipeline to convert data types.

For details about the conversion of Java data types to Cassandra data types, see the Cassandra documentation.

The Cassandra destination supports the following Cassandra data types:
  • ASCII
  • Bigint
  • Boolean
  • Counter
  • Decimal
  • Double
  • Float
  • Int
  • List
  • Map
  • Text
  • Timestamp
  • Timeuuid
  • Uuid
  • Varchar
  • Varint
The following data types are not supported at this time:
  • Blob
  • Inet
  • Set

Configuring a Cassandra Destination

Configure a Cassandra destination to write data to a Cassandra cluster.
  1. In the Properties panel, on the General tab, configure the following properties:
    General Property Description
    Name Stage name.
    Description Optional description.
    Required Fields Fields that must include data for the record to be passed into the stage.
    Tip: You might include fields that the stage uses.

    Records that do not include all required fields are processed based on the error handling configured for the pipeline.

    Preconditions Conditions that must evaluate to TRUE to allow a record to enter the stage for processing. Click Add to create additional preconditions.

    Records that do not meet all preconditions are processed based on the error handling configured for the stage.

    On Record Error Error record handling for the stage:
    • Discard - Discards the record.
    • Send to Error - Sends the record to the pipeline for error handling.
    • Stop Pipeline - Stops the pipeline. Not valid for cluster pipelines.
  2. On the Cassandra tab, configure the following properties:
    Cassandra Property Description
    Cassandra Contact Points Host names for nodes in Cassandra cluster. Using simple or bulk edit mode, click the Add icon to enter several host names to ensure a connection.
    Cassandra Port The port number for the Cassandra nodes.
    Authentication Provider Determines the authentication provider used to access the cluster:
    • None - Performs no authentication.
    • Username/Password - Uses Cassandra username and password authentication.
    • Username/Password (DSE) - Uses DataStax Enterprise username and password authentication. Requires that you install the DSE Java driver.
    • Kerberos (DSE) - Uses Kerberos authentication. Requires that you install the DSE Java driver.
    Protocol Version Native protocol version that defines the format of the binary messages exchanged between the driver and Cassandra. Select the protocol version that you are using.

    For information about determining your protocol version, see the Cassandra documentation.

    Compression Optional compression type for transport-level requests and responses.
    Batch Type Type of batch to write to Cassandra:
    • Logged
    • Unlogged
    Max Batch Size Maximum number of statements to include in each batch written to Cassandra. Ensure that this number does not exceed the batch size configured in the Cassandra cluster.
    Fully-Qualified Table Name Name of the Cassandra table to use. Enter a fully-qualified name using the following format: <key space>.<table name>.
    Field to Column Mapping Map fields from the record to Cassandra columns. Using simple or bulk edit mode, click the Add icon to create additional field mappings.
    Note: The record field data type must match the data type of the Cassandra column.
  3. To use username/password authentication, click the Credentials tab, and then enter a user name and password.
    Tip: To secure sensitive information such as user names and passwords, you can use runtime resources or credential stores.
  4. To use SSL/TLS, on the TLS tab, configure the following properties:
    TLS Property Description
    Use TLS

    Enables the use of TLS.

    Keystore File Path to the keystore file. Enter an absolute path to the file or a path relative to the Data Collector resources directory: $SDC_RESOURCES.

    For more information about environment variables, see Data Collector Environment Configuration.

    By default, no keystore is used.

    Keystore Type Type of keystore to use. Use one of the following types:
    • Java Keystore File (JKS)
    • PKCS #12 (p12 file)

    Default is Java Keystore File (JKS).

    Keystore Password Password to the keystore file. A password is optional, but recommended.
    Tip: To secure sensitive information such as passwords, you can use runtime resources or credential stores.
    Keystore Key Algorithm The algorithm used to manage the keystore.

    Default is SunX509.

    Truststore File The path to the truststore file. Enter an absolute path to the file or a path relative to the Data Collector resources directory: $SDC_RESOURCES.

    For more information about environment variables, see Data Collector Environment Configuration.

    By default, no truststore is used.

    Truststore Type Type of truststore to use. Use one of the following types:
    • Java Keystore File (JKS)
    • PKCS #12 (p12 file)

    Default is Java Keystore File (JKS).

    Truststore Password Password to the truststore file. A password is optional, but recommended.
    Tip: To secure sensitive information such as passwords, you can use runtime resources or credential stores.
    Truststore Trust Algorithm The algorithm used to manage the truststore.

    Default is SunX509.

    Use Default Protocols Determines the transport layer security (TLS) protocol to use. The default protocol is TLSv1.2. To use a different protocol, clear this option.
    Transport Protocols The TLS protocols to use. To use a protocol other than the default TLSv1.2, click the Add icon and enter the protocol name. You can use simple or bulk edit mode to add protocols.
    Note: Older protocols are not as secure as TLSv1.2.
    Use Default Cipher Suites Determines the default cipher suite to use when performing the SSL/TLS handshake. To use a different cipher suite, clear this option.
    Cipher Suites Cipher suites to use. To use a cipher suite that is not a part of the default set, click the Add icon and enter the name of the cipher suite. You can use simple or bulk edit mode to add cipher suites.

    Enter the Java Secure Socket Extension (JSSE) name for the additional cipher suites that you want to use.