Einstein Analytics

The Einstein Analytics destination writes data to Salesforce Einstein Analytics. The destination connects to Einstein Analytics to upload external data to a dataset.

When you configure the destination, you define connection information, including the API version that the destination uses to connect to Einstein Analytics.

You specify the edgemart alias or name of the dataset to upload data to. You can also optionally define the name of the edgemart container or app that contains the dataset.

The destination can upload external data to a new dataset or to an existing dataset using an append, delete, overwrite, or upsert operation. Based on the operation type, you define the metadata of the data to be uploaded in JSON format.

You can optionally use an HTTP proxy to connect to Salesforce Einstein Analytics. When enabled in Salesforce, you can configure the destination to use mutual authentication to connect to Salesforce.

Changing the API Version

Data Collector ships with version 43.0 of the Salesforce Web Services Connector libraries. You can use a different Salesforce API version if you need to access functionality not present in version 43.0.

  1. On the Salesforce tab, set the API Version property to the version that you want to use, for example 39.0.
  2. Download the relevant version of the following JAR files from Salesforce Web Services Connector (WSC):
    • WSC JAR file - force-wsc-<version>.0.0.jar

    • Partner API JAR file - force-partner-api-<version>.0.0.jar

    Where <version> is the API version number, for example, 39.

    For information about downloading libraries from Salesforce WSC, see https://developer.salesforce.com/page/Introduction_to_the_Force.com_Web_Services_Connector.

  3. In the following Data Collector directory, replace the default force-wsc-43.0.0.jar and force-partner-api-43.0.0.jar files with the versioned JAR files that you downloaded:
    $SDC_DIST/streamsets-libs/streamsets-datacollector-salesforce-lib/lib/
  4. Restart Data Collector for the changes to take effect.

Define the Operation

Configure the Einstein Analytics destination to perform one of the following operations when it uploads external data to a dataset:
  • Append - Appends data to the dataset, creating the dataset if it doesn’t exist.
  • Delete - Deletes rows from the dataset. The rows to delete must contain a single field with a unique identifier.
  • Overwrite - Replaces data in the dataset, creating the dataset if it doesn't exist.
  • Upsert - Inserts or updates rows in the dataset, creating the dataset if it doesn’t exist. The rows to upsert must contain a single field with a unique identifier.

For more information about unique identifiers, see the Salesforce Developer documentation.

Metadata JSON

Uploading external data to an Einstein Analytics dataset involves using the following files:
  • Data file that contains the external data.
  • Optional metadata file that describes the schema of the data in JSON format.

The Einstein Analytics destination creates the data file based on the incoming record. You define the metadata in JSON format when you configure the destination.

You must define metadata for the append, upsert, and delete operations. For append and upsert, the metadata must match the metadata of the dataset being uploaded to. For delete, the metadata must be a subset of the dataset columns.

You can optionally define metadata for the overwrite operation so that Einstein Analytics can correctly interpret the data type of the data. If you do not enter metadata, then Einstein Analytics treats every field as text.

For more information about how Einstein Analytics handles JSON metadata for uploaded external data, see the Salesforce Developer documentation.

Dataflow (Deprecated)

In previous releases, you could configure the destination to use an Einstein Analytics dataflow to combine multiple datasets together. However, using dataflows is now deprecated and will be removed in a future release. We recommend configuring the destination to use the append operation to combine data into a single dataset.

An Einstein Analytics dataflow includes instructions and transformations to combine datasets. Create the dataflow in Einstein Analytics. Then when you configure the Einstein Analytics destination, specify the name of the existing dataflow. The dataflow should not contain any content, as the Einstein Analytics destination overwrites any existing content.

By default, a dataflow runs every 24 hours. However, you can configure the dataflow to run each time the destination closes and uploads a dataset to Einstein Analytics. In Einstein Analytics, you can run a dataflow a maximum of 24 times in a 24 hour period. So if you choose to run the dataflow after each dataset upload, make sure that the configured dataset wait time is more than an hour.

For more information about creating dataflows, see the Salesforce Einstein Analytics documentation.

Configuring an Einstein Analytics Destination

Configure an Einstein Analytics destination to write data to Salesforce Einstein Analytics.
  1. In the Properties panel, on the General tab, configure the following properties:
    General Property Description
    Name Stage name.
    Description Optional description.
    Required Fields Fields that must include data for the record to be passed into the stage.
    Tip: You might include fields that the stage uses.

    Records that do not include all required fields are processed based on the error handling configured for the pipeline.

    Preconditions Conditions that must evaluate to TRUE to allow a record to enter the stage for processing. Click Add to create additional preconditions.

    Records that do not meet all preconditions are processed based on the error handling configured for the stage.

    On Record Error Error record handling for the stage:
    • Discard - Discards the record.
    • Send to Error - Sends the record to the pipeline for error handling.
    • Stop Pipeline - Stops the pipeline.
  2. On the Analytics tab, configure the following properties:
    Analytics Property Description
    Username Salesforce username in the following email format: <text>@<text>.com.
    Password Salesforce password.

    If the machine running Data Collector is outside the trusted IP range configured in your Salesforce environment, you must generate a security token and then set this property to the password followed by the security token.

    For example, if the password is abcd and the security token is 1234, then set this property to abcd1234. For more information on generating a security token, see Reset Your Security Token.

    Tip: To secure sensitive information such as user names and passwords, you can use runtime resources or credential stores.
    Auth Endpoint Salesforce SOAP API authentication endpoint. Enter one of the following values:
    • login.salesforce.com - Use to connect to a Production or Developer Edition organization.
    • test.salesforce.com - Use to connect to a sandbox organization.

    Default is login.salesforce.com.

    API Version Salesforce API version to use to connect to Salesforce.

    Default is 43.0. If you change the version, you also must download the relevant JAR files from Salesforce Web Services Connector (WSC).

    Edgemart Alias Dataset name. The alias must be unique across an organization.
    Append Timestamp to Alias Appends the edgemart alias or dataset name with the timestamp of the dataset upload.

    To create a new dataset for each upload of data, select this option. To append, delete, overwrite, or upsert data to an existing dataset, clear this option.

    Edgemart Container Name of the edgemart container or app that contains the dataset. Enter the developer name or the ID of the app rather than the display label.

    For example, the developer name of an app is "AnalyticsCloudPublicDatasets", but the display label of the app is "Analytics Cloud Public Datasets".

    To get the developer name or ID, run the following query in Salesforce:

    SELECT Id,DeveloperName,Name, AccessType,CreatedDate,Type FROM Folder where Type = 'Insights' 

    If not defined when the destination creates a new dataset, the destination uses the user's private app. If not defined when the destination uploads to an existing dataset, Einstein Analytics resolves the app name.

    If defined when the destination uploads to an existing dataset, the name must match the name of the current app containing the existing dataset.

    Operation Operation to perform when uploading external data to a dataset.
    Dataset Wait Time (secs) Maximum time in seconds to wait for new data to arrive. After no data has arrived in this amount of time, the destination uploads the data to Einstein Analytics.

    The dataset wait time must be at least as long as the Batch Wait Time for the origin in the pipeline.

    Use Dataflow Determines whether to use an Einstein Analytics dataflow to combine multiple datasets together.
    Important: Using dataflows is now deprecated and will be removed in a future release. We recommend configuring the destination to use the append operation to combine data into a single dataset.
    Dataflow Name Name of the existing dataflow.

    You must create the dataflow in Einstein Analytics.

    Run Dataflow After Upload Determines whether the destination runs the dataflow each time that it uploads a dataset to Einstein Analytics.
    Metadata JSON Metadata in JSON format that describes the schema of the data to be uploaded.

    Required for the append, upsert, and delete operations. Optional for the overwrite operation.

  3. On the Advanced tab, configure the following properties:
    Advanced Property Description
    Use Proxy Specifies whether to use an HTTP proxy to connect to Salesforce.
    Proxy Hostname Proxy host.
    Proxy Port Proxy port.
    Proxy Requires Credentials Specifies whether the proxy requires a user name and password.
    Proxy Username User name for proxy credentials.
    Proxy Password Password for proxy credentials.
    Tip: To secure sensitive information such as user names and passwords, you can use runtime resources or credential stores.
    Use Mutual Authentication When enabled in Salesforce, you can use SSL/TLS mutual authentication to connect to Salesforce.

    Mutual authentication is not enabled in Salesforce by default. To enable mutual authentication, contact Salesforce.

    Before enabling mutual authentication, you must store a mutual authentication certificate in the Data Collector resources directory. For more information, see Keystore and Truststore Configuration.

    Keystore File The path to the keystore file. Enter an absolute path to the file or a path relative to the Data Collector resources directory: $SDC_RESOURCES.

    For more information about environment variables, see Data Collector Environment Configuration.

    By default, no keystore is used.

    Keystore Type Type of keystore to use. Use one of the following types:
    • Java Keystore File (JKS)
    • PKCS #12 (p12 file)

    Default is Java Keystore File (JKS).

    Keystore Password Password to the keystore file. A password is optional, but recommended.
    Tip: To secure sensitive information such as passwords, you can use runtime resources or credential stores.
    Keystore Key Algorithm The algorithm used to manage the keystore.

    Default is SunX509.