Installation with Cloud Service Providers

You can install the full Data Collector using cloud service providers such as Microsoft Azure or Microsoft Azure HDInsight.

Install Data Collector on Azure

You can install the full Data Collector on a CentOS 7.x virtual machine hosted on Microsoft Azure.

When you install Data Collector on Azure, you run Data Collector as a service.

  1. Log in to the Microsoft Azure portal: https://portal.azure.com.
  2. In the Navigation panel, click Create a resource.
  3. Search the Marketplace for StreamSets Data Collector for Microsoft Azure, and then click Create.
  4. On the Create virtual machine > Basics page, enter the name of the new virtual machine, the user name to log in to that virtual machine, and the authentication method to use for logins.
    Important: Do not use sdc as the user name to log in to the virtual machine. The sdc user account must be reserved as the system user account that runs Data Collector as a service.

    You can create the virtual machine in a new or existing resource group.

    You can optionally change the virtual machine size, but the default size is sufficient in most cases. If you change the default, select a size that meets the minimum Data Collector requirements.

    For example, the following image creates a virtual machine named sdctrial with a user named sdcuser who can log into the virtual machine using password authentication. The virtual machine is created in a new resource group named sdctrial:

  5. Click Next.
  6. On the Disks page under Advanced, verify that Use managed disks is enabled.
  7. On the remaining pages, accept the defaults or configure the optional features.
    Note: The virtual machine is automatically configured to allow incoming connections on the default Data Collector port of 18630 used for the HTTP protocol. If you change the default port or configure HTTPS after installation, you'll also need to configure the virtual machine to allow incoming connections on the changed port.
  8. Verify the details in the Review and Create page, and then click Create.
    It can take several minutes for the resource to deploy and for Data Collector to start as a service.
  9. To access the Data Collector UI, enter the following URL in the address bar of your browser:
    http://<virtual machine IP address>:18630

Install Data Collector on Azure HDInsight

You can install the full Data Collector on a Microsoft Azure HDInsight cluster on Ubuntu 16.04. When you install Data Collector on HDInsight, you run Data Collector as a service.

Data Collector installed on HDInsight includes a 30-day trial license. To renew the license, see Renewing the License.

  1. Log in to the Microsoft Azure portal: https://portal.azure.com.
  2. In the Navigation panel, click Create a resource.
  3. Search the Marketplace for StreamSets Data Collector for HDInsight Cloud, and then click Create.
  4. On the HDInsight page, click Custom (size, settings, apps).
  5. On the Basics page, enter a cluster name, choose a cluster type, and enter a cluster login user name and password.

    You can create the cluster in a new or existing resource group.

    For example, the following image creates a cluster named sdctrial on a Hadoop 2.7 (HDI 3.6) cluster. The cluster is created in a new resource group named sdctrial:

  6. Click Next.
  7. On the Security + networking page, accept the defaults or configure the security options and then click Next.
  8. On the Storage page, configure the storage options and then click Next.
  9. On the Applications page, click StreamSets Data Collector for HDInsight.
  10. Review and accept the legal terms, click Create, and then click Next.
  11. On the Cluster size page, select a cluster size that meets the minimum Data Collector requirements, and then click Next.
  12. On the Script actions page, click Next.
  13. Verify the details in the Summary page, and then click Create.
    It can take up to 20 minutes to deploy the cluster.
  14. After the cluster is successfully deployed, view the HDInsight cluster in the Azure portal, and then click Applications.
  15. Locate the StreamSets Data Collector for HDInsight Cloud application, and then click Portal in the URI column to access the Data Collector UI.

Renewing the License

Data Collector for Azure HDInsight requires an active license. By default, the installation includes a 30-day trial license. If the license is about to expire, you'll need to request a new activation key to renew the license.

Tip: Be sure to renew the license before it expires. When the license expires, you can no longer use Data Collector.
  1. To view the license details, log in to Data Collector and click Help > Register.
  2. In the Data Collector Activation Key dialog box, copy the value of the SDC ID property, which is a unique ID for your Data Collector installation.
  3. Open a StreamSets support ticket or contact a StreamSets sales representative to request the activation key for your Data Collector ID.
  4. After you receive the activation key, log in to Data Collector and click Help > Register.
  5. In the Data Collector Activation Key dialog box, click Browse to select your activation key, and then click Upload.