Database Connections

When you design pipelines that access databases, you must decide how the pipeline stages connect to your databases.

You can configure stages to connect to databases in one of the following ways:
Connect directly by whitelisting
You can grant pipeline stages direct access to a database by whitelisting StreamSets Cloud IP addresses in the database server.
After you whitelist the IP addresses in the database server, any pipeline stage can connect to the database, as long as the stage connection properties are also correctly configured. For example, if you whitelist the StreamSets Cloud IP addresses in your Oracle database, then any Oracle stage that you add to a pipeline can access the database.
Connect through an SSH tunnel
You can configure pipeline stages to connect to a database server through an SSH tunnel. You grant the stage direct access to an SSH server. The SSH server then connects to the database.
After completing the initial setup tasks for the SSH server, you must configure each stage to use SSH tunneling to connect to the database. For example, if you have added multiple Oracle stages to your pipelines, you must separately configure each stage to use an SSH tunnel to connect to the database.

StreamSets recommends connecting through an SSH tunnel instead of connecting directly by whitelisting IP addresses. When you use an SSH tunnel, the stage uses an encrypted connection without requiring any changes to the database. In addition, you can harden the SSH tunnel server against external attacks without requiring any changes to the database.

Whitelist IP Addresses

You can whitelist all StreamSets Cloud IP addresses in a database server to allow pipeline stages direct access to the database, as displayed in the following image:

When allowing a pipeline stage direct access to a database, StreamSets Cloud requires that you configure the stage to use SSL/TLS encryption to connect to the database server. Using SSL/TLS encryption ensures that your data is processed securely. For details about configuring SSL/TLS for a stage, see "SSL/TLS Encryption" in the stage documentation.

Use an SSH Tunnel

You can configure pipeline stages to use an SSH tunnel to securely connect to a database.

To use SSH tunneling, you must have an SSH server running on a host machine. The SSH server acts as an intermediary to the database servers.

When you configure SSH tunneling, a pipeline stage uses an encrypted connection to communicate with the SSH server. The SSH server then connects to the database server using an unencrypted connection.

The following image displays how pipeline stages use SSH tunneling to connect to a database:

To use SSH tunneling, you set up the SSH server to accept incoming connections from StreamSets Cloud. Then, you configure each stage that connects to a database to use SSH tunneling.

Tip: You can optionally configure the pipeline stage to use encrypted connections from the SSH server to the database server using SSL/TLS. For details about configuring encrypted connections for a stage, see "SSL/TLS Encryption" in the stage documentation.

Prerequisites

Complete the following prerequisites before using SSH tunneling:

SSH server requirements
Verify that the SSH server can access the database servers.
SSH user account requirements
Verify that the SSH server includes a user account allowed to receive incoming SSH connections.

Setting Up the SSH Server

Setting up the SSH server includes granting StreamSets Cloud direct access to the SSH server, and then downloading and installing the public SSH key used by StreamSets Cloud on the SSH server.

SSH tunneling uses SSH keys to encrypt and decrypt data. StreamSets Cloud provides you with a single public SSH key that is unique to your account. All stages that you add to pipelines use the same public key to connect to the SSH server.

You install a public key on the SSH server for a specific SSH user account. If you configure each stage to use the same SSH user, then you install the public key once on the SSH server. However, if you configure stages to use different SSH users, then you must install the public key for each SSH user account.

  1. To grant StreamSets Cloud access to the SSH server, allow incoming connections to the SSH server port from all StreamSets Cloud IP addresses.
  2. Download the public SSH key created for your StreamSets Cloud user account.

    You can download the public key from any database stage that supports SSH tunneling.

    1. Open the pipeline in the pipeline canvas.
    2. In the pipeline canvas, select a pipeline stage that supports SSH tunneling.

      For example, select the MySQL Query Consumer origin.

    3. On the SSH Tunnel tab, select Use SSH Tunneling.
    4. Next to the SSH Public Key property, click the Download icon: .

      StreamSets Cloud downloads the key to a file named streamsetscloud_tunnelkey.pub in your default downloads directory.

  3. Install the downloaded public SSH key on the SSH server.
    You might run the following command to use the ssh-copy-id tool to install the key:
    ssh-copy-id -f -i ~/<path to local file>/streamsetscloud_tunnelkey.pub <SSH user>@<SSH server host>
    For example:
    ssh-copy-id -f -i ~/.ssh/public_keys/streamsetscloud_tunnelkey.pub my_user@my_host

Configuring Each Stage to Use SSH Tunneling

Configure each stage that connects to a database to use SSH tunneling.

  1. Open the pipeline in the pipeline canvas.
  2. In the pipeline canvas, select a pipeline stage that supports SSH tunneling.
    For example, select the MySQL Query Consumer origin.
  3. On the SSH Tunnel tab, configure the following properties:
    SSH Tunnel Property Description
    Use SSH Tunneling Enables the stage to connect to the database server through an SSH tunnel.

    When cleared, the stage must connect to the database server using SSL/TLS encryption.

    SSH Tunnel Host Host name for the SSH server.
    SSH Tunnel Port Port number for the SSH server.

    For example: 22.

    SSH Tunnel Host Fingerprint Host key fingerprint used to verify the identity of the SSH server. For example: 97:3c:ae:76:73:f3:ef:a7:18:02:6a:c6:57:43:82:f6.

    You can enter multiple fingerprints separated by commas. Be sure to properly generate the fingerprint.

    Enter the host key fingerprint when you want the stage to verify the identity of the SSH server. Leave blank when you want the stage to establish an SSH connection without any verification.

    SSH Tunnel Username SSH user account used to install the StreamSets Cloud public SSH key on the SSH server.
    SSH Public Key Public SSH key for your StreamSets Cloud account.

    You download and install the key on the SSH server when you set up the SSH server.

Generating Host Key Fingerprints

When you want the pipeline stage to verify the identity of the SSH server, generate and enter the host key fingerprint for the SSH server.
Important: You can use multiple methods to generate the host key fingerprint for an SSH server. However, StreamSets Cloud requires that you use the following instructions to generate the fingerprint.
  1. Connect to the SSH server host.
  2. Run the following command:
    echo "$(ssh-keyscan localhost 2>/dev/null | ssh-keygen -l -f - | cut -d ' ' -f 2)" | paste -sd "," -
  3. Copy the output of the command.
  4. On the SSH Tunnel tab for the stage, paste the output of the command as the value for the SSH Tunnel Host Fingerprint property.