0 / 0
Data governance (Watson Knowledge Catalog)

Data governance (Watson Knowledge Catalog)

Data governance is the process of tracking and controlling data assets based on asset metadata. Catalogs are workspaces where you provide controlled access to governed assets.

Required service
Watson Knowledge Catalog

A catalog contains assets and collaborators. Collaborators are the people who add assets into the catalog and the people who need to use the assets. You can customize data governance to enrich and control data assets in catalogs.

Learn more about governance or get started with catalogs and governance:

Data governance approaches

You can set up data governance in an iterative manner. You can start with a simple implementation of data governance that relies on predefined artifacts and default features. Then, as your needs change, you can customize your data governance framework to better describe and protect your data assets.

To see the tools that you can use to govern data, open the tools and services map and click Governance in the tasks section.

Simplest implementation of data governance

You use a catalog to share assets across your organization. A catalog can act as a feature store by containing data sets with columns that are used as features (inputs) in machine learning models. A Watson Knowledge Catalog administrator must create a catalog for sharing assets and add data engineers, data scientists, and business analysts as collaborators.

Catalogs store and track assets. Projects are where users prepare data assets and build models. Assets move between the catalog and projects.

Catalog collaborators can add assets to the catalog to share with others or find and use assets in the following ways:

  • Data engineers add cleansed data, virtualized data, and integrated data to the catalog.
  • Data engineers import tables or files from a data source to the catalog.
  • Data scientists and business analysts find data assets in catalogs and add them to projects to work with the data.

Data assets accumulate metadata over time in the following ways:

  • Data assets are profiled, which automatically assigns predefined data classes that describe the format of the data.
  • Catalog collaborators add tags, predefined business terms, data classes, and classifications, relationships, and ratings to assets.
  • All actions on assets are automatically saved in the asset history.

See Creating a catalog.

Customization options for data governance

You can add or update any of these custom options to your data governance implementation at any time. When your data changes, you can reimport metadata about the tables or files and enrich your data assets with your business vocabulary and data quality analysis. You can create increasingly precise rules to protect data as you expand your business vocabulary. Throughout the data governance cycle, your data scientists and other data consumers can find trusted data in catalogs. The following illustration shows how data governance is a continuous cycle of refreshing the metadata for data assets to reflect changes in the data and changes in your business vocabulary.

The cycle of data governance tasks

Establish your business vocabulary

  • Your governance team can establish a business vocabulary that describes the meaning of data with business terms and the format of data with data classes. A business vocabulary helps your business users more easily find what they are looking for using nontechnical terms.
  • Your team can quickly establish your business vocabulary by importing your existing business vocabulary or importing Knowledge Accelerators that provide between dozens to thousands of governance artifacts.
  • Your Watson Knowledge Catalog administrator can customize the workflow, organization, properties, and relationships of governance artifacts.

See Planning to implement a governance framework.

Import and enrich data assets with your business vocabulary

  • Data stewards can regularly run metadata import and enrichment jobs that update the catalog with changes to tables or files from your data sources and automatically assign the appropriate business terms and data classes.
  • When your team adds governance artifacts, the metadata enrichment jobs suggest the new artifacts to the new or updated data assets.
  • When data stewards confirm or adjust business term assignments during metadata enrichment, the machine learning algorithms for term assignment become more accurate for your data.
  • Data stewards can configure metadata import and enrichment to run only when changes are detected.

See Planning to curate data assets to share in catalogs.

Analyze data quality

  • Data stewards can analyze data quality with default settings during metadata enrichment. Data quality analysis is applied to each asset as a whole and to columns in tables.
  • Data stewards can create custom data quality definitions and apply them in data quality rules, or apply SQL-based data quality rules.

See Planning to curate data assets to share in catalogs.

Protect your data with rules

  • Your governance team can create a plan for data protection rules by writing policies that document your organization’s standards and guidelines for protecting and managing data. For example, a policy can describe a specific regulation and how a data protection rule ensures compliance with that regulation.
  • Your governance team can create data protection rules that define how to keep private information private. Data protection rules are automatically evaluated for enforcement every time a user attempts to access a data asset in any governed catalog on the platform. Data protection rules can define how to control access to data, mask sensitive values, or filter rows from data assets.
  • Your team can start with data protection rules that are based on custom tags, users, or predefined data classes, business terms, and classifications. When your governance team adds governance artifacts, the team can define data protection rules based on your business vocabulary.
  • Data engineers can enforce data protection rules on virtualized data.
  • Data engineers can permanently mask data in data assets with masking flows.

See Planning to protect data with rules.

Getting started with Watson Knowledge Catalog

The tasks to get started with Watson Knowledge Catalog depend on your goal. The actions that you can take are defined by your Cloud Pak for Data service access roles. Some actions also have workspace role requirements, such as being a collaborator in a catalog or category.

To check your service access roles, see Determining your IBM Cloud account and service access roles. To understand your Watson Knowledge Catalog roles, see user roles and permissions.

The following table shows common goals, the required Cloud Pak for Data service access roles, and links to information to get you started.

Goal Required Cloud Pak for Data service access role More information
Set up or administer Watson Knowledge Catalog Manager Planning to implement data governance
Setting up Watson Knowledge Catalog
Managing Watson Knowledge Catalog
Find assets or features in a catalog Any role Finding assets in a catalog
Searching for assets across the platform
Adding a catalog asset to a project
Curate data CloudPak Data Steward or
CloudPak Data Engineer
Curating data
Planning to curate data
Manage data quality CloudPak Data Steward or
CloudPak Data Engineer
Managing data quality
Create governance artifacts CloudPak Data Steward or
CloudPak Data Engineer
Managing governance artifacts
Importing Knowledge Accelerators
Planning to implement a governance framework
Create data protection rules CloudPak Data Steward or
CloudPak Data Engineer
Data protection rules
Planning to protect data with rules
Run Watson Knowledge Catalog APIs The same role for performing the task in the UI. Watson APIs
Generate reports on Watson Knowledge Catalog Reporting administrator Setting up reporting

Learn more

Generative AI search and answer
These answers are generated by a large language model in watsonx.ai based on content from the product documentation. Learn more