Data Protection in Control Hub

In Control Hub, each job uses one protection policy that protects sensitive data upon reading data, and another that protects sensitive data upon writing data. The users who configure jobs can specify any read or write policy that is available to them.

The following image illustrates how read and write policies work with a job:

But before sensitive data can be protected, it must be identified. This is where classification rules come in. Data Protector uses classification rules to identify sensitive data before implementing protection policies.

Data Protector provides StreamSets classification rules to identify general sensitive data, such as birth dates and IP addresses, as well as international data, such as driver's license numbers for different countries. You create additional classification rules to identify sensitive data not recognized by the StreamSets rules, such as industry-specific or organization-specific data.

Classification rules operate at an organization level, so once you create and commit a classification rule, that rule is available throughout the organization. It can then be used by any policy to identify the data that the policy is meant to protect.