Configuring a Procedure

Configure procedures to specify how a policy should alter and protect data. Configure the procedures for a policy from within the policy.
  1. On the Protection Policies view, click the name of the policy, then click View Procedures.
    The Procedure view displays.
  2. To create a new procedure, click the Add Procedure icon: .
    To edit an existing procedure. select the procedure name, then click Edit.
  3. In the New Procedure or Manage Procedure dialog box, configure the following properties:
    Procedure Property Description
    Procedure Basis The basis for applying the procedure:
    • Category Pattern - One or more classification categories to apply the procedure to. The category names must be defined by a single regular expression.
    • Field Path - A single field path to apply the procedure to.
    Classification Category Pattern A regular expression that represents one or more classification categories to apply the procedure to.

    When specifying the regular expression, be sure to consider the StreamSets classification categories. Verify that these categories are deliberately included or excluded, as needed.

    Custom categories are named as follows: CUSTOM:<category>.

    Classification Score Threshold Minimum classification score that a classified field must have to be protected by the procedure.

    Enter a value between 0 and 1.0.

    Used for procedures based on category patterns only.

    Field Path The field path to apply the procedure to.

    Used for procedures based on field paths only.

    Authoring SDC The Data Collector to use to author the procedure.

    Default is System Data Collector. Typically, you can use the default.

    You can select a registered Data Protector-enabled Data Collector when the Data Collector has updated functionality that you want to use.
    Protection Method Protection method to use to alter and protect the data. Select one of the following methods, then configure the related properties:
    • Custom Mask - Masks any data that can be converted to a string using a user-defined mask.
    • Drop Field - Drops qualifying fields. Use for any type of data.
    • Expression Evaluator - Replaces data of any type with the results of the expression.
    • Groovy Script Runner - Protects data using Groovy code. Use for any type of data.
    • Hash Data - Hashes data using one of the supported algorithms. Use for any data that can be converted to a string.
    • JavaScript Script Runner - Protects data using JavaScript code. Use for any type of data.
    • Jython Script Runner - Protects data using Jython code. Use for any type of data.
    • Obfuscate Names - Reduces names to initials or first names. Use for string data.
    • Replace Values - Replaces data of any type with the specified numeric, string, or datetime data.
    • Round Dates - Rounds dates to the year, quarter or month. Use for Date, Datetime, or Zoned Datetime data.
    • Round Numbers - Rounds numbers to above or below a specified threshold or to ranges of a specified size. Use for numeric or datetime data.
    • Scramble Numbers - Scrambles numbers by adding or subtracting a specified range of values. Use for numeric or datetime data.
    • Standard Mask - Masks data classified by the following StreamSets classification rules:
      • CREDIT_CARD
      • EMAIL
      • US_PHONE
      • US_SSN
      • US_ZIP_CODE
  4. When using the Custom Mask protection method, configure the following properties:
    Custom Mask Property Description
    Masks The mask to replace data of the same length. To configure additional masks, click Add Another.

    Use a pound symbol (#) to reveal characters in the data. All other characters are used as literals.

    You can configure masks in simple or bulk edit mode. In bulk edit mode, configure parameter values in JSON format.

    Missing Mask Behavior The action to take when a mask does not exist with the exact length of the data being processed:
    • Terminate Pipeline Execution - Stops the job.
    • Apply Partial Mask - Applies the specified mask to the data. Truncates longer data to the length of the mask.
    • Drop Field - Drops the field from the record.
    Mask for Partial Mask Mask to use for data that does not match the length of the specified masks.

    Use a pound symbol (#) to reveal characters in the data. All other characters are used as literals.

    Available when applying a partial mask. Use to protect data of unexpected lengths.

  5. When using the Expression Evaluator protection method, configure the following property:
    Expression Evaluator Property Description
    Expression The expression to use to generate replacement data. The results of the expression replace the classified data.
    You can use most functions, as well as constants, datetime variables, literals, and operators available in the StreamSets expression language. In addition, you can use the following Data Protector functions:
    • Category functions - Returns parts of the original data.
    • Data generation functions - Generates fake data.

    For more information, see Expression Evaluator.

  6. When using the Groovy Script Runner protection method, configure the following properties:
    Groovy Script Runner Property Description
    Init Script Optional initialization script to use.

    Use to set up any required connections or resources. Runs once when the pipeline starts.

    Script Main processing script to use.

    Runs once for each field value that the procedure protects.

    Destroy Script Optional destroy script to use.

    Use to close any connections or resources that were used. Runs once when the pipeline stops.

  7. When using the Hash Data protection method, configure the following properties:
    Hash Data Property Description
    Algorithm The hash algorithm to use:
    • MD2
    • MD5
    • SHA_1
    • SHA_256
    • SHA_384
    • SHA_512
    Base64 Decode Input Enables decoding Base64 encoded data before hashing.
    Base64 Encode Output Enables performing Base64 encoding on hashed data.
    Append Salt Enables appending a user-defined salt to the data before hashing.
    Salt is Base64 Encoded Enables using a Base64 encoded salt.
    Salt The salt to append to data before hashing.

    When using a Base64 encoded salt, you can define an expression that uses a Base64 function to encode the salt, or you can enter a Base64 encoded salt.

  8. When using the JavaScript Script Runner protection method, configure the following properties:
    JavaScript Script Runner Property Description
    Init Script Optional initialization script to use.

    Use to set up any required connections or resources. Runs once when the pipeline starts.

    Script Main processing script to use.

    Runs once for each field value that the procedure protects.

    Destroy Script Optional destroy script to use.

    Use to close any connections or resources that were used. Runs once when the pipeline stops.

  9. When using the Jython Script Runner protection method, configure the following properties:
    Jython Script Runner Property Description
    Init Script Optional initialization script to use.

    Use to set up any required connections or resources. Runs once when the pipeline starts.

    Script Main processing script to use.

    Runs once for each field value that the procedure protects.

    Destroy Script Optional destroy script to use.

    Use to close any connections or resources that were used. Runs once when the pipeline stops.

  10. When using the Obfuscate Names protection method, configure the following properties:
    Obfuscate Names Property Description
    Transformation Type of obfuscation to perform:
    • Abbreviate name to first letter of each name - Abbreviates each component of the name to an initial, in the original case. Hyphenated names are treated as a single name.
    • Preserve only the first name - Retains the first component of the name and drops the rest of the name.
  11. When using the Replace Values protection method, configure the following properties:
    Replace Values Property Description
    Data Type Data type of the field to be replaced.
    Value Replacement value. Use numeric, datetime, or string data.
  12. When using the Round Dates protection method, configure the following properties:
    Round Dates Property Description
    Round Format Round format to use:
    • Year
    • Year and month
    • Year and quarter
  13. When using the Round Numbers protection method, configure the following properties:
    Round Numbers Property Description
    Round Method Round method to use:
    • Above/Below - Rounds numbers to above or below the specified threshold.
    • Range - Rounds numbers to a range based on the specified size.
    Threshold The threshold to use for the Above/Below round method. The threshold is included in the Below category.
    Range Size The size of the range to use for the Range round method.
  14. When using the Scramble Numbers protection method, configure the following properties:
    Scramble Numbers Property Description
    Lower Bound Lower boundary to use for the scramble.
    Upper Bound Upper boundary to use for the scramble.
    Allow Negative Allows subtracting the scramble range from original values.
  15. When using the Standard Mask protection method, configure the following properties, as needed:
    Like other protection methods, the Standard Mask protection method protects the data defined in the procedure. Unrelated properties are ignored.
    Standard Mask Format Description
    CREDIT_CARD Masks data classified by the CREDIT_CARD StreamSets classification rule. Use one of the following options:
    • VISA 1234 - Replaces data with the credit card type and the last part of the credit card number.
    • x6842 - Replaces data with the last part of the credit card number preceded by an x.
    • Custom Format - Replaces data with a user-defined custom format. The default, ${CREDIT_CARD:type()} x${CREDIT_CARD:lastPart()}, shows the expression used to create the VISA 1234 option. Alter or replace the default as needed.

    Default is VISA 1234.

    EMAIL Masks data classified by the EMAIL StreamSets classification rule. Use one of the following options:
    • s*@streamsets.com - Replaces data with addresses that reduce the local part of the address to an initial, while retaining the original domain name.
    • sales@s*.com - Replaces data with addresses that retain the original local part of the address while reducing the domain name to an initial.
    • Custom Format - Replaces data with a user-defined custom format. The default, ${str:substring(EMAIL:localPart(), 0, 1)}*@${EMAIL:domain()}, shows the expression used to create the s*@streamsets.com option. Alter or replace the default as needed.

    Default is s*@streamsets.com.

    US_PHONE Masks data classified by the US_PHONE StreamSets classification rule. Use one of the following options:
    • (xxx) xxx 7890 - Replaces data with numbers that retain the original line number while obscuring the rest of the data.
    • Custom Format - Replaces data with a user-defined custom format. The default, (xxx) xxx ${US_PHONE:lineNumber()}, shows the expression used to create the (xxx) xxx 7890 option. Alter or replace the default as needed.

    Default is (xxx) xxx 7890.

    US_SSN Masks data classified by the US_SSN StreamSets classification rule. Use one of the following options:
    • xxx-xx-1234 - Replaces data with numbers that retain the original serial number while obscuring the rest of the data.
    • Custom Format - Replaces data with a user-defined custom format. The default, xxx-xx-${US_SSN:serialNumber()}, shows the expression used to create the xxx-xx-1234 option. Alter or replace the default as needed.

    Default is xxx-xx-1234.

    US_ZIP_CODE Masks data classified by the US_ZIP_CODE StreamSets classification rule. Use one of the following options:
    • Prefix Only (940xx) - Replaces data with numbers that retain the original state group and region numbers while obscuring the rest of the data.
    • Suffix Only (xxx86) - Replaces data with numbers that retain the original city area while obscuring the rest of the data.
    • Custom Format - Replaces data with a user-defined custom format. The default, xxx${US_ZIP_CODE:cityArea()}, shows the expression used to create the Suffix Only option. Alter or replace the default as needed.

    Default is Prefix Only (940xx).

  16. Click Save to save your changes.