Installation with Cloud Service Providers

You can install the full Data Collector using cloud service providers such as Microsoft Azure or Microsoft Azure HDInsight. When you install Data Collector using a cloud service provider, you install Data Collector as a service.

Data Collector installed using a cloud service provider includes a 30 day trial license. To renew the license, see Renewing the License.

Install Data Collector on Azure

You can install the full Data Collector on a CentOS 7 virtual machine hosted on Microsoft Azure. When you install Data Collector on Azure, you run Data Collector as a service.

  1. Log in to the Microsoft Azure portal: https://portal.azure.com.
  2. In the Navigation panel, click Create a resource.
  3. Search the Marketplace for "StreamSets Data Collector for Microsoft Azure", and then click Create.
  4. On the Create virtual machine > Basics page, enter the name of the new virtual machine, the user name to log in to that virtual machine, and the authentication method to use for logins.

    You can create the virtual machine in a new or existing resource group.

    For example, the following image creates a virtual machine named "sdctrial" with a user named "sdc" who can log into the virtual machine using password authentication. The virtual machine is created in a new resource group named "sdctrial":

  5. Click OK.
  6. On the Choose a size page, select a virtual machine size that meets the minimum Data Collector requirements, and then click Select.
  7. On the Settings page, accept the defaults or configure the optional features, and then click OK.
    Note: The virtual machine is automatically configured to allow incoming connections on the default Data Collector port of 18630 used for the HTTP protocol. If you change the default port or configure HTTPS after installation, you'll also need to configure the virtual machine to allow incoming connections on the changed port.
  8. Verify the details in the Summary page, and then click Create.
  9. After the resource is successfully deployed, use SSH to log in to the virtual machine using the authentication method you selected:
    • For SSH public key authentication, use the following command:
      ssh -i <private key file> <username>@<virtual machine IP address>
      For example:
      ssh -i sdctrial_key.txt sdc@13.66.169.236
    • For password authentication, use the following command and then enter the password when prompted:
      ssh <username>@<virtual machine IP address>
      For example:
      ssh sdc@13.66.169.236
    After successfully logging in, the following message displays:
    -------------------------------------------------------------------------------
    Please run /opt/sdc_setup.sh as ROOT to install StreamSets Data Collector (SDC)
    -------------------------------------------------------------------------------
  10. Installing Data Collector as a service requires root privileges. Switch to the root user by running the following command:
    sudo su
  11. Optionally, create a system user and group on the virtual machine that Data Collector uses to run as a service.
    If you want to run Data Collector using the same user that you used to log into the virtual machine, you can skip this step.
    If you want to use a different user, create another system user and group on the virtual machine before running the installation script.
  12. Use the following command to run the Data Collector installation script:
    /opt/sdc_setup.sh
  13. When prompted, enter the existing system user name and group on the virtual machine that Data Collector uses to run as a service.
    The sdc_setup.sh script installs Oracle Java 8 and installs Data Collector as a service in the /opt directory.
  14. As the root user, run the following command to start Data Collector as a service:
    service sdc start
    Note: It can take up to five minutes for Data Collector to start.
  15. To access the Data Collector UI, enter the following URL in the address bar of your browser:
    http://<virtual machine IP address>:18630

Install Data Collector on Azure HDInsight

You can install the full Data Collector on a Microsoft Azure HDInsight cluster on Ubuntu 16.04. When you install Data Collector on HDInsight, you run Data Collector as a service.

  1. Log in to the Microsoft Azure portal: https://portal.azure.com.
  2. In the Navigation panel, click Create a resource.
  3. Search the Marketplace for "StreamSets Data Collector for HDInsight", and then click Create.
  4. On the HDInsight page, click Custom (size, settings, apps).
  5. On the Basics page, enter a cluster name, choose a cluster type, and enter a cluster login user name and password.

    You can create the cluster in a new or existing resource group.

    For example, the following image creates a cluster named "sdctrial" on a Hadoop 2.7 on Linux (HDI 3.6) cluster. The cluster is created in a new resource group named "sdctrial":

  6. Click Next.
  7. On the Storage page, accept the defaults or configure the storage options and then click Next.
  8. On the Applications page, click StreamSets Data Collector for HDInsight.
  9. Review and accept the legal terms, and then click Next on the Applications page.
  10. On the Cluster size page, select a cluster size that meets the minimum Data Collector requirements, and then click Next.
  11. On the Advanced settings page, accept the defaults or configure the advanced features, and then click Next.
  12. Verify the details in the Summary page, and then click Create.
    It can take up to 20 minutes to deploy the cluster.
  13. After the cluster is successfully deployed, view the HDInsight cluster in the Azure portal, and then click Applications.
  14. In the Installed apps page, click StreamSets Data Collector for HDInsight.
  15. In the Properties page, click the Data Collector URL.

    The login page of the Data Collector UI displays.

Renewing the License

Data Collector for cloud service providers requires an active license. By default, the installation includes a 30 day trial license. If the license is about to expire, you'll need to request a new activation key to renew the license.

Tip: Be sure to renew the license before it expires. When the license expires, you can no longer use Data Collector.
  1. To view the license details, log in to Data Collector and click Help > Register.
  2. In the Data Collector Activation Key dialog box, copy the value of the SDC ID property, which is a unique ID for your Data Collector installation.
  3. Open a StreamSets support ticket or contact a StreamSets sales representative to request the activation key for your Data Collector ID.
  4. After you receive the activation key, log in to Data Collector and click Help > Register.
  5. In the Data Collector Activation Key dialog box, click Browse to select your activation key, and then click Upload.