Understanding Classification Rules

Classification rules identify and classify data. Classification rules are used with protection policies to classify and protect sensitive data. For example, you might use a classification rule to identify user addresses, then use a protection policy to drop that information from all records.

Data Protector includes a set of StreamSets classification rules to classify a wide range of general data, from social security numbers and IP addresses to Indian and German taxpayer IDs. Review the StreamSets classification rules and verify how they classify your data. Then, create custom classification rules to identify and classify categories of data that are not automatically classified or that are specific to your organization, such as customer IDs.

Creating a custom classification rule requires configuring the rule to indicate the category of data to be classified, then defining one or more classifiers that specify how to identify the data. You can identify data based on field names, field paths, or field values, and use regular expressions to encompass a range of field names, values, or paths.

Use classification preview or data preview to see how classification rules classify test data. Then, commit new or updated classification rules when they are ready. Once committed, rules are available for use by any policy in the organization.

For more information about how classification rules function as part of Data Protector, see Data Protection in Control Hub.

Working with Custom Classification Rules and Classifiers

The Custom Classifications view lists all of the custom classification rules that have been created for your organization. When you view rule details, you can also view all classifiers associated with the rule.

You can complete the following tasks from the Custom Classifications view:

  • Create classification rules - Create new classification rules to identify custom categories of data that need to be altered and protected. Use the Add Rule icon to create a new rule. When you create a new rule, it includes two default classifiers. These classifiers are templates for a field name and a field value classifier that you can edit or delete.
  • View classification rules details - Display the classification name, category, and score, as well as all associated classifiers. To view details about the rule, click the rule name.
  • Add, edit, and delete classifiers - When viewing classification rule details, you can add, edit, or delete classifiers.
    • To add a classifier to help determine how the rule is applied, click Add Classifier.
    • To edit a classifier to update how the rule is applied, click Edit for the classifier.
    • To delete a classifier that is no longer needed, click Delete for the classifier.
      Note: The classifier is immediately deleted and cannot be recovered.
  • Edit classification rules - Edit rules to update rule properties. To edit a rule, click the rule name to view rule details, then click Edit.
  • Delete classification rules - Delete rules that you no longer require. To delete a rule, click the rule name to view rule details, then click Delete.
    Note: The classification rule and its classifiers are immediately deleted and cannot be recovered.
  • Preview classification rules - Preview how classification rules apply to test data.
  • Commit classification rules - Commit classification rules to make them available for use in the organization.

The following image shows a list of rules in the Custom Classifications view. Note that the number of rules displays at the top of the page. If there are uncommitted changes, an asterisk also displays.

Each rule displays the rule name, classification category, and score. When you click the rule name, rule details show associated classifiers.

In this example, two classifiers define how the Company ID rule is applied to data, with one classifier matching on the field name and the other on field values: