Classification Preview

Classification preview displays real-time classification of test data by both StreamSets classification rules and custom classification rules. Classification preview requires access to a authoring Data Protector-enabled Data Collector.
Important: Classification preview is a technology preview feature. Do not expect production-level quality. For a proven method of previewing classification by classification rules, use data preview.

Preview classifications from the Custom Classifications view. When you preview classification rules, the preview applies all defined rules to the data. This includes all StreamSets classification rules as well as all committed and uncommitted classification rules listed on the Custom Classifications view.

If a committed classification rule has an uncommitted update, the preview uses the uncommitted update. This enables you to test changes to existing rules before committing them.

Classification preview displays the same type of information as data preview, but does so without requiring you to create a test pipeline.

The following classification preview displays how StreamSets rules classify the default JSON test data:

The test data window displays on the left, the preview results on the right. As with data preview, the following information displays to the right of the field values in the classification preview results:
  • The classification score, such as 95%.

    For custom rules, the score is based on the score defined in the classifiers. For StreamSets rules, the scoring guidelines are outlined here.

  • The category name for the rule that classified the data, such as US_PHONE.

    When multiple classifications occur, the preview displays the strongest classification and provides a note icon indicating the name and score for the other classification. When you hover over the note icon, all classifications and scores display.

Test Data

Classification preview provides default test data in JSON format. You can update or replace the data to see how the preview classifies custom test data. You can enter a JSON record or a JSON array.

To test a single JSON record, use the following format:

{
  "<field1>": <numeric value>,
  "<field2>": "<string>"
}
For example:
{
  "phoneNumber": "650-333-2222",
  "zipcode": "111113333",
  "url": "http://www.company.com",
  "email": "jdoe@company.com",
  "ID": 33333,
  "misc": "misc info"
}

To test multiple records, enter a JSON array using the following format:

[{
  "<field1>": <numeric value>,
  "<field2>": "<string>"
},
{
  "<field1>": <numeric value>,
  "<field2>": "<string>"
},
{
  "<field1>": <numeric value>,
  "<field2>": "<string>"
}]

The following example contains three records:

[{
  "userID": 14533,
  "email": "jb@company.com",
  "IP": "216.3.128.12"
},
{
  "user_ID": 53842,
  "email": "a_user@myemail.com",
  "ip": "23.129.64.104"
},
{
  "user": 25901,
  "email": "myname@whatco.net",
  "IPaddress": "24.172.133.234"
}]

Previewing Classifications

Preview classifications to test StreamSets and custom classification rules.

  1. Click Data Protector > Custom Classifications.
  2. On the Custom Classifications view, click the Preview icon: .
    Test data displays with classifications based on StreamSets classification rules and all committed and uncommitted custom classification rules.
    If you have uncommitted changes to a committed custom rule, the preview displays classifications based on the uncommitted changes.
    If the classifications do not display, ensure that at least one authoring Data Protector-enabled Data Collector is available.
  3. To alter or replace the default test data, click in the test data window on the left, and change the test data as needed.
    You can enter a single JSON record or a valid JSON array. Be sure to use the correct JSON format.
    Classifications update in real time.
  4. When your testing is complete, click Close to close the preview window.