Overview

A connection defines the information required to connect to an external system.

Pipelines communicate with external systems to read and write data. Most of these external systems require sensitive information, such as user names or passwords, to access the data. When you configure pipelines and pipeline fragments, you can enter the details needed to connect to the external system, or you can select an existing connection that contains the details.

Using connections provides the following benefits:
Increased security
When you use connections, you can limit the number of users that need to know the security credentials for external systems.
For example, you want to ensure that only the DevOps team knows the security credentials required to access external systems. A DevOps engineer logs into Control Hub to create all connections to the external systems, and then shares the connections with data engineers who design pipelines, granting them the ability to use the connections. Data engineers select the appropriate connection name for a pipeline stage, but cannot view the connection details.
Reusability
You can create a connection once and then reuse that connection in multiple pipelines. Reusing connections reduces the possibility of user errors and simplifies updates to connection values.
For example, you might create a single connection to your source data stored in Amazon S3. You name the connection SourceData. You develop multiple pipelines to process this source data. Each time you add an Amazon S3 origin to a pipeline, you simply select the existing SourceData connection. You do not need to re-enter the AWS authentication details for each Amazon S3 origin. When you need to update the authentication details, you make a single update to the connection. All Amazon S3 origins using that connection reflect the updated values in subsequent pipeline runs.

When you create a connection, you select an available authoring Data Collector. The Data Collector version and the installed stage libraries determine the connection types, such as Amazon S3 or JDBC, that you can create.

You can use connections in Control Hub Pipeline Designer when designing Data Collector or Transformer pipelines. You cannot use connections in the Data Collector or Transformer pipeline canvas.

For more information on the supported connection types, see Connection Types Overview.

Connection Requirements

Before you create connections, note the following requirements:
Data Collector and Transformer versions

To create and use connections, use the following minimum versions for registered Data Collectors and Transformers:

  • Data Collector version 3.19.0 or later
  • Transformer version 3.16.0 or later

Later versions introduce support for additional connection types, as listed in Execution Engine Versions.

Important: You cannot use the system Data Collector to configure a pipeline that uses connections.
Data Collector stage libraries
When you create a connection, you select an available authoring Data Collector. The stage libraries installed on that Data Collector determine the connection types, such as Amazon S3 or JDBC, that you can create. For example, to create an Amazon S3 connection, you must select an authoring Data Collector that has the Amazon Web Services stage library installed.

Run pipelines that use connections on an execution Data Collector or Transformer version that supports connections. Pipelines that use connections fail when run on a Data Collector or Transformer version that does not support connections. In addition, ensure that the stage libraries required by each connection are installed on all execution Data Collectors.

Execution Engine Versions

Connection support was introduced with Data Collector version 3.19.0 and Transformer version 3.16.0. Later versions introduce support for additional connection types.

The following table lists the new connection types supported with each execution engine version:

Execution Engine Version Newly Supported Connection Types
Data Collector version 3.20.0
  • Azure Data Lake Storage Gen2
  • JMS
Data Collector version 3.19.0
  • Amazon Kinesis Firehose
  • Amazon Kinesis Streams
  • Amazon S3
  • Amazon SQS
  • Databricks Delta Lake (requires the Databricks Enterprise stage library version 1.2.x or later)
  • Google BigQuery
  • Google Cloud Storage
  • Google Pub/Sub
  • JDBC
  • Kafka
  • Kudu
  • Salesforce
Transformer version 3.16.0
  • Amazon EMR Cluster Manager
  • Amazon S3
  • JDBC
  • Kudu
Note: To create these connection types for use in Transformer pipelines, you must use an authoring Data Collector version 3.19.0 or later.

Working with Connections

The Connections view lists all connections that you have access to.

You can complete the following tasks in the Connections view:
  • Create connections.
  • Assign tags to connections.
  • Test that configured connection values are valid.
  • View connection details, including the connection type, assigned tags, and the list of pipelines and pipeline fragments that use the connection.
  • Edit connection details.
  • Share connections with other users and groups.
  • Delete connections.

The following image shows a list of connections in the Connections view. Each connection is listed with its type, tags, and owner:

Note the following icons that display for connections when you select a connection. You'll use these icons frequently as you manage connections:

Icon Name Description
Add Add a connection.
Refresh Refresh the list of connections.
Toggle Filter Column Toggle the display of the Filter column, where you can filter connections by connection type or tag. You can also search for connections by name or description.
Share Share connections with other users and groups, as described in Permissions.
Edit Edit the connection.
Delete Delete the connection.