CSV file for term assignment based on rules

Last updated: Jul 04, 2025

Create a CSV file with the name ikc-term-assignment-rules.csv that defines the rules for term assignment and upload it to the project. The CSV file must conform to formatting rules.

General formatting rules
Rule columns
Rule file options

General formatting rules

The CSV file must comply with the Common Format and MIME Type for comma-separated values (CSV) Files and must be encoded in UTF-8.

Limitations

The maximum recommended size of the CSV import file is 50 MB.

Header row

The header row of the CSV file represents the properties that make up the rule and the action to take.

Follow these guidelines for the header row:

The header row must be the first row in the file and must not be repeated.
Separate column names with a comma. If you create the file in a spreadsheet editor, the commas are added automatically when you save the file in CSV format.
The header row must include the mandatory columns for the rule.
You can omit any optional columns.
You can add arbitrary other columns, which will be ignored.
Use the exact column names in the header row. Column names are case-sensitive.
Make sure the column names do not include extra white space characters. White space characters might be added by a spreadsheet or text editor, but not be visible. If you receive an import error that the column names are incorrect, even though your columns are spelled and capitalized correctly, check for white spaces.

Column specification

To delimit values for different columns, use a comma. If you create the file in a spreadsheet editor, the commas are added automatically when you save the file in CSV format.

To omit a value for a column, use a comma directly after the previous comma and without any other characters. For example, two consecutive commas indicate that the second column is empty.

To enclose fields, use double quotation marks (").

Term category paths

You must specify the full category path for a term. To delimit the category path, use two greater-than (>>) symbols between each level of the category hierarchy and between the category path and the artifact name. If you start the path with >>, the root category is [uncategorized].

Rule columns

The CSV file can contain mandatory and optional columns.

To define the rule condition, include these columns:

OBJECT_TYPE

The type of object where terms should be assigned. Valid values:

asset
column

This column is mandatory and must not be empty.

PROPERTY

The property to match. Valid values:

name
description
mostfreqvalues
Any of the most frequent values of the data profile. Rules with this property require data profiling before the rule can be properly applied.
OBJECT_TYPE must be column.
dataclassname
The name of the data class that is assigned to a column.
OBJECT_TYPE must be column.
assetid
The ID of the data asset.

This column is mandatory and must not be empty.

MATCH_STRING

The string to match against the property. You can set any value. This column is mandatory and must not be empty.

MATCH_TYPE

Describes how the match string should be matched against the property. This column is mandatory and must not be empty. Valid values:

equals
Case-insensitive exact match.
equalscs
Case-sensitive exact match.
contains
Match if the property contains the match string. Matching is case-insensitive.
containscs
Match if the property contains the match string. Matching is case-sensitive.

To define which terms to assign with which confidence, include these columns:

TERM_NAME

The name of the term including the category path as described in Term category path. For example, Category 1 >> Category2 >> MyTerm.

Either TERM_NAME or TERM_ID must be present. You can specify both. In that case, TERM_ID takes precedence. If you plan to use the rules file in different systems with similar terms and category hierarchies, use term names instead of term IDs.

TERM_ID

The ID of the term. You can use the artifact ID or the global ID.

CONFIDENCE

A float value between 0 and 1 that indicates the confidence to assign. The default value is 1.0 (=100%). Independent of the locale, the decimal point is .

Additional columns that you can include:

ACTIVE

If you set the value no, the rule is not considered during assignment. During development, you might want to disable certain rules without removing them from the CSV file.

GROUP

A group of rules that allows you to set up more complex assignment rules, such as, If a column name contains X and its description contains Y, then assign term T1 and T2.

At least one condition and one action must be defined per rule group.

Rule file options

You can supply additional options to influence how rules are applied in the description field of the uploaded rule file. Add lines in the format <option-name>=<option-value>. The description field can contain any other text as well.

default_confidence_if_missing

A float value between 0 and 1 that indicates a default confidence other than 1.0 if the CONFIDENCE column is empty.

use_expanded_names

Defines when a generated name should also be considered when rules are evaluated. This option is valid only if gen AI based enrichment capabilities are enabled in IBM Knowledge Catalog Standard or IBM Knowledge Catalog Premium.

Possible values:

NEVER: Do not consider generated names.
SUGGESTED: Consider a suggested generated name.
ACCEPTED: Consider an assigned generated name.

Default value is ACCEPTED.

use_generated_descriptions

Defines when a generated description should also be considered as a description when rules are evaluated. This option is valid only if gen AI based enrichment capabilities are enabled in IBM Knowledge Catalog Standard or IBM Knowledge Catalog Premium.

Possible values:

NEVER: Do not consider generated descriptions
SUGGESTED: Consider a suggested generated description.
ACCEPTED: Consider an assigned generated description.

Default value is ACCEPTED.

Examples

Rule examples

The following example describes three rules:

If a column has a name that contains the string address, assign term personal data with 100% confidence. 100% is the default if the CONFIDENCE column is empty.
If a column has a name that contains the string customer, assign term data subject with 90% confidence.
If an asset has a description that contains string client, also assign term data subject, but with 100% confidence.

The term names are written as a path in the category tree: GDPR is a root category that contains the terms personal data and data subject.

The COMMENT column contains additional information about the rule but does not affect term assignment.

OBJECT_TYPE	PROPERTY	MATCH_TYPE	MATCH_STRING	TERM_NAME	CONFIDENCE	COMMENT
column	name	contains	address	GDPR >> personal data		Address is personal data
column	name	contains	customer	GDPR >> data subject	0.9	Customers are data subjects
asset	description	contains	client	GDPR >> data subject		Clients are data subjects

Rule group example

The following example shows a rule group G1 that joins two conditions and a rule group G2 that defines two terms to be assigned for one condition:

G1: If a column's name contains address and its description contains identifier then assign term online identifier with confidence 92%.
G2: If a column has postfach ("P.O. Box" in German) as one of its most frequent values then assign term European Union with 90% confidence and term data subject with 95% confidence.

OBJECT_TYPE	PROPERTY	MATCH_TYPE	MATCH_STRING	TERM_NAME	CONFIDENCE	GROUP
column	name	contains	address			G1
column	description	contains	identifier	GDPR >> online identifier	0.92	G1
column	mostfreqvalues	contains	postfach	GDPR >> European Union	0.9	G2
				GDPR >> data subject	0.95	G2

Sample rule file description

The following example is a valid rule file description:

This the best rule file in the world.

default_confidence_if_missing = 0.95
use_expanded_names = ACCEPTED
use_generated_descriptions = SUGGESTED

Closing remarks.

Parent topic: Default enrichment settings

Was the topic helpful?

0/1000