Filtering rows in data protection rules (IBM Knowledge Catalog)
You can select to filter rows as the action in data protection rules. You can include or exclude rows based on values in a specified column in the same asset or in a reference asset. To select a reference asset, you can choose any data asset that you have permission to view from any catalog that you are collaborator in.
For IBM Watson Query, row filtering can be based on columns of any data type. The filter value must be valid for the column data type it applies to. For example, you cannot define a rule such as include rows where CREDIT_SCORE = 'ABC'
if CREDIT_SCORE
is an integer column. Since ABC
is a string filter value, but the column is of type integer, then the queries with such a row filter results in
an error.
Row filtering methods
When you create a data protection rule, you can select Filter rows from the list of actions and then specify the filter method.
For example, suppose you have a database with tables that combine government, enterprise, and retail client data. The Billing table has data for all types of customers and a CustomerType column that indicates
the type of customer. You can create a data protection rule that excludes all rows for government customers. You can either create a rule that excludes all rows if the value of the CustomerType column is government
,
or create a rule that includes only the rows where the value of the CustomerType column is equal to enterprise
or retail
.
Now suppose that your Billing table does not include a CustomerType column, but it does include a CustomerID column. You have another table, CustomerInfo, that contains both the CustomerID and the CustomerType columns. You can create a data protection rule that effectively joins the Billing and CustomerInfo tables and filters rows based on the CustomerType values.
Filter method | Description | Example |
---|---|---|
Include rows | Include only the rows that fit the criteria of the specified values in the specified column of the asset. | Include rows if the CustomerType column has values that are equal to enterprise or retail . |
Exclude rows | Exclude the rows that fit the criteria of the specified values in the specified column of the asset. | Exclude rows if the CustomerType column has values that are equal to government . |
Include rows based on reference asset | Include only the rows that fit the criteria of the specified values in the specified column of the reference asset. | Include rows if the CustomerID column values match values in the CustomerID column of the reference data asset for rows where the CustomerType column has values that are equal to enterprise or retail . |
Exclude rows based on reference asset | Exclude the rows that fit the criteria of the specified values in the specified column of the reference asset. | Exclude rows if the CustomerID column values match values in the CustomerID column of the reference data asset for rows where the CustomerType column has values that are equal to government . |
Row filtering conditions
The row filtering conditions specify the column and values to filter.
If you choose the Include rows or Exclude rows method, you select a column name, select an operator, and specify a column name. Available operators include equals, greater than, lesser than, greater than equals (greater than or equals), or lesser than equals (lesser than or equals). You can add more conditions.
If you choose the Include rows based on reference asset or Exclude rows based on reference asset method, you select the reference asset and then match the column in the reference asset to the column in the target asset. Then, you specify a column name in the reference asset to filter on, select an operator, and specify a column name. Available operators include equals, greater than, lesser than, greater than equals (greater than or equals), or lesser than equals (lesser than or equals). You can add more conditions.
Value guidelines
- Values that include only numbers are processed as a numeric data type.
- Values that include only numbers that are enclosed by single quotation marks are processed as a string data type.
- Values that include any characters that are not numbers are processed as a string data type.
- Wildcards are not supported. You must include actual values.
- Values are case-sensitive. For example, if you create a rule to filter rows based on the value of
government
, the rule does not filter rows with the value ofGovernment
.
Reference asset guidelines
- The reference data asset must have a column with some of the same values as the specified matching column in the target data assets or the rule isn't applied.
- If the reference data asset is in a different catalog from the target asset, the target asset preview can't be seen and the asset can't be downloaded.
- If the reference data asset references a different connection asset from the target asset, the target asset preview can't be seen and the asset can't be downloaded.
- For Apache Hive data assets, the reference asset must belong to the same database as the source data asset.
Limitations to filtering rows
Row filtering runs an SQL statement to enforce the rule on the target data asset. The SQL statement is run exactly as coded. The only way to know whether the rule works as expected is to examine the schema of the data assets in governed catalogs
that have matching criteria. Any data assets with columns that have similar values but different column names are not affected. For example, if the rule specified a column name of CustomerID, the rule does not affect assets
with the column name CustomerIdentifier. You might not receive any direct indication of a rule that did not trigger due to missing or incorrect filter criteria, however administrators can inspect the policy enforcement log
to view any such situations. Your data quality must also be high. The values that your data protection rule with row filtering depend on must be consistent. For example, if government
is misspelled as goverment
or
written as Government
, that row is not filtered out.
If your filtering criteria includes a reference data asset, the data access might encounter performance delays inherent to the join. For example, the lack of proper indexing or SQL errors due to a missing join object can cause delays.
Previews and downloads might not be available
You cannot preview or download an asset that is affected by data protection rules that filter rows in the following situations:
- The data asset is affected by more than one data protection rule that filters rows by more than one reference asset.
- The data asset is affected by a data protection rule that filters rows by a reference asset that is not in same catalog as the source data asset.
- The data asset is affected by a data protection rule that filters rows by a reference asset that does not use the same connection as the source data asset.
Learn more
Parent topic: Designing data protection rules