Understanding Procedures

Procedures define how a policy protects data. Typically, a procedure protects data based on classification categories, but you can also specify a known field path for protection when needed.

The procedures used for a policy are created from within the policy. You cannot move or reuse a procedure from one policy to another.

When you configure a procedure, you specify the basis for the protection - classification category or field path - and specify the category pattern or field path to use. Then, you select the protection method and configure related properties.

When using a field path, you can specify a single field path and the protection method for that data.

When using classification categories, you can use a single procedure to protect a set of categories when their category names can be defined by one regular expression, and when you want to use the same protection method for those categories.

For example, the following procedure rounds all classification categories with DATE at the end of the name, such as BIRTH_DATE and START_DATE, where the classification score is 0.8 or higher. The Round Dates protection method rounds to the year, but can also round to the month or quarter:

Standardizing Data

You can configure procedures to standardize data. You can use category functions with an Expression Evaluator protection method to standardize the format of the following categories of data: email, phone numbers, social security numbers, and zip codes.

When Data Protector recognizes a category of data, such as phone numbers or social security numbers, you can configure procedures to standardize the format of the data. For example, you can standardize the data to use hyphens instead of periods or spaces between groups of numbers.

You might standardize data simply to have a standard format for the data. Or you might standardize data as a precursor to using deterministic data generation functions to protect the data. Deterministic data generation functions generate fake data while ensuring that the same fake data is generated for the same input data.

For example, say you have phone numbers with a range of formats, such as (xxx) xxx-xxxx, xxx-xxx-xxxx, and xxx.xxx.xxxx. When used with a deterministic phone number function, (222) 333-4444 is treated as a different phone number from 222-333-4444. As a result, the function generates two different fake phone numbers to replace the two values.

If you standardize the data to one format before passing it to the deterministic function, then the function treats the two numbers as a recurrence of the same number, and replaces both values with the same fake number. This standardization allows downstream data analysis to recognize the recurrence of the numbers, even though they have been protected.

The following expression shows how you might use category functions to standardize the format of phone numbers:
${US_PHONE:areaCode()}-${US_PHONE:exchangeCode()}-${US_PHONE:lineNumber()}

The expression returns the original area code of the phone number, the original exchange code of the phone number, and the original line number of the phone number, while replacing the existing formatting with hyphens.

If the phone numbers occasionally omit an area code, you can use the US_PHONE:areaCodeOrDefault() function to create a placeholder as follows:

${US_PHONE:areaCodeOrDefault('xxx')}-${US_PHONE:exchangeCode()}-${US_PHONE:lineNumber()}

This expression returns the original phone number with hyphens as the separator character and uses the string, xxx, as the default area code when none is available. The invalid xxx area code is an indicator that the area code did not exist in the original data.

This expression converts the data as follows:
Original Phone Number Standardized Phone Number
(415) 262-3333 415-262-3333
617.345.8888 617-345-8888
565-6666 xxx-565-6666

The procedure that standardizes phone numbers as described might look like this: