Enabling Data Protector

Enable Data Protector to use classification rules and protection policies to identify and protect your organization's sensitive data.

You must complete several steps to enable Data Protector in Control Hub.

Step 1. Stop All Active Jobs

If your organization has existing jobs, you must stop all active jobs before you enable Data Protector for the organization.

  1. Log in to Control Hub as a job operator for the organization.
  2. In the Navigation panel, click Jobs.
  3. Select all active jobs in the list, and then click the Stop icon.
  4. In the confirmation dialog box that appears, click Stop.
  5. If a job remains in a Deactivating state, you can force Control Hub to stop the job immediately.
    1. To force a deactivating job to stop, select the job in the Jobs view, click the More icon (), and then click Force Stop.
      A confirmation dialog box appears.
    2. To force stop the job, click Stop.

Step 2. Upgrade All Data Collectors to Version 3.5.0 or Later

Data Collector version 3.5.0 or later is required to use Data Protector. Upgrade all registered Data Collectors belonging to the organization that plans to use Data Protector.

For upgrade instructions, see Upgrade in the Data Collector documentation.
Tip: Be sure to correctly update the Data Collector configuration files during the upgrade. That way, you don't need to register the upgraded Data Collector with Control Hub again. Instead, each upgraded Data Collector uses the same authentication token that the previous Data Collector used to issue authenticated requests to Control Hub.

Step 3. Install Data Protector on All Data Collectors

Install version 1.3 or later of the Data Protector stage library on all registered Data Collectors belonging to the organization that plans to use Data Protector. The Data Collectors must be version 3.5.0 or later.

To install Data Protector, store the Data Protector stage library in a local custom stage library directory external to the Data Collector installation directory. Use an external directory to enable easier use of the Data Protector stage library after Data Collector upgrades.

Use one of the following procedures to install Data Protector. Use the procedure appropriate for how the Data Collector was installed.

Installing on Cloudera Manager Installations

To install Data Protector on a Cloudera Manager installation of Data Collector, perform the following steps:

  1. Use the information provided in the email from StreamSets to download and extract the Data Protector tarball.
    The tarball contains the Data Protector stage library:
    streamsets-datacollector-dataprotector-lib
  2. In Cloudera Manager, select the StreamSets service and then click Configuration.
  3. On the Configuration page, in the Data Collector Advanced Configuration Snippet (Safety Valve) for sdc-env.sh field, add the USER_LIBRARIES_DIR environment variable and point it to the custom stage library directory, as follows:
    export USER_LIBRARIES_DIR="<custom stage library directory>"
    For example:
    export USER_LIBRARIES_DIR="/opt/sdc-user-libs/"
  4. On every node that runs Data Collector, copy the Data Protector stage library to the directory defined for the USER_LIBRARIES_DIR environment variable.
  5. When using the Java Security Manager, which is enabled by default, update the Data Collector Advanced Configuration Snippet (Safety Valve) for sdc-security.policy property to include the custom stage library directory as follows:
    // custom stage library directory
    grant codebase "file://<custom stage library directory>-" {
       permission java.security.AllPermission;
    };
    For example:
    // custom stage library directory
    grant codebase "file:///opt/sdc-user-libs/-" {
       permission java.security.AllPermission;
    };
  6. Restart Data Collector.

Installing on RPM and Tarball Installations

To install Data Protector on an RPM or tarball Data Collector installation, perform the following steps:

  1. Use the information provided in the email from StreamSets to download and extract the Data Protector tarball.
    The tarball contains the Data Protector stage library:
    streamsets-datacollector-dataprotector-lib
  2. If necessary, create a local directory external to the Data Collector installation directory for custom stage libraries. Use an external directory to enable use of the libraries after Data Collector upgrades.
    For example, if you installed Data Collector in the following directory:
    /opt/sdc/
    you might create the custom stage library directory at:
    /opt/sdc-user-libs
  3. Copy the Data Protector stage library to the directory.
  4. Add the USER_LIBRARIES_DIR environment variable to the appropriate file and point it to the custom stage library directory.
    Modify environment variables using the method required by your installation type.

    Set the environment variable as follows:

    export USER_LIBRARIES_DIR="<Data Protector stage library directory>"

    For example:

    export USER_LIBRARIES_DIR="/opt/sdc-user-libs/"
  5. When using the Java Security Manager, which is enabled by default, update the Data Collector security policy to include the custom stage library directory as follows:
    1. In the Data Collector configuration directory, open the security policy file:
      $SDC_CONF/sdc-security.policy
    2. Add the following lines to the file:
      // custom stage library directory
      grant codebase "file://<custom stage library directory>-" {
         permission java.security.AllPermission;
      };
      For example:
      // custom stage library directory
      grant codebase "file:///opt/sdc-user-libs/-" {
         permission java.security.AllPermission;
      };
  6. Restart Data Collector.

Step 4. Enable for the Organization

Before you can use Data Protector, the organization administrator must enable Data Protector for the organization.

  1. Log in to Control Hub as the organization administrator.
  2. Click Administration > Organizations.
  3. Hover over a specific organization name, and then click the Configuration icon.
    The Organization Configuration window displays.
  4. Scroll down in the window and select Enable Data Protector.
  5. Click Save.

Step 5. Assign Data Protector Roles to Users

Before users can manage classification rules and protection policies in Control Hub, assign the required Data Protector roles to user accounts or to the groups to which they belong. By default, the Data Protector roles are not assigned to new users or groups.

  1. In the Navigation panel, click Administration > Groups or Administration > Users.
    As the organization administrator, you'll see all the users or groups in your organization.
  2. Click the name of the group or user.
  3. Select the following Data Protector roles:
    • Classification Administrator
    • Policy Manager
  4. Click Save.

Step 6. Start All Jobs

After you enable Data Protector for an organization with existing jobs, you can start the jobs while you implement data protection. Creating classification rules and protection policies to implement data protection can take some time.

Note: When you start the jobs, Control Hub assigns the default read and write policy to the jobs. However, the default policies do not actually protect data until you define procedures for the policies.
  1. Log in to Control Hub as a job operator for the organization.
  2. In the Navigation panel, click Jobs.
  3. Select inactive jobs in the list and then click the Start Job icon above the job list.