Classification Rule Components

When you configure a classification rule, you specify the category of data to be classified and the score to assign to the data that matches the classification rule. Both are used by any protection policies that alter and protect data classified by the rule.

After you create a classification rule, define classifiers to identify the data to be classified.

When you configure a classification rule, you define the following properties:
Name
A name for the classification rule. Displays in the list of classification rules.
Category
A name that represents the type or category of data that the classification rule identifies, such as company IDs or medical record numbers. You specify the category to protect when you configure protection policy procedures.
When you create a classification rule, the category name requires CUSTOM: as a prefix, as follows: CUSTOM:<category>. For example, CUSTOM:COMPANY_ID.
The categories associated with StreamSets classification rules, such as US_SSN, do not use the prefix.
You can create multiple rules for a single category, each applying a different score when successful. This way, you can create a strong data and metadata match for score of 1.0 and then a second rule that performs only a data match and uses a lower score.
When you configure a protection policy procedure to alter and protect the data classified by a rule, you specify category names to define the data that you want to protect.
Score
A classification score that you assign to data classified by the rule. You specify a classification score threshold when you configure protection policy procedures.
The classification score is a value between 0 and 1.0, where 0 represents the lowest possible score and 1.0 the highest. For example, if you know that all records are captured by the classifiers defined for the company ID rule, you can set the score to 1.0.
When you configure a protection policy procedure to alter and protect the data classified by a rule, you specify a classification score threshold. Only data classified with a score above the threshold is protected by the procedure.
Comments
You can add comments for the classification rule that clarify how it operates or should be used. Comments display only within the rule properties.

For example, the following classification rule uses a category of CUSTOM:CompanyID and a score of .99 because we are confident that the classifiers that we create will capture all company IDs.

When you save a new classification rule, the new rule displays with two classifiers that act as templates that you can edit or delete. You can configure the classifiers at this time, or come back to them later.