Data governance use case

Last updated: Nov 26, 2024

Many enterprises struggle to balance the benefits of providing access to data with the need to protect sensitive data. Cloud Pak for Data as a Service provides the methods that your enterprise needs to automate Data governance so you can make sure that data is both accessible and protected.

Watch this video to see the data fabric use case for implementing a Data governance solution in Cloud Pak for Data.

This video provides a visual method to learn the concepts and tasks in this documentation.

Challenges

Many enterprises face the following Data governance challenges:

Providing data privacy at scale: Organizations must comply with data privacy regulations for data in data sources across multiple cloud platforms and on-premises.
Accessing data high-quality data: Organizations must provide access to high-quality enterprise data across multiple teams.
Creating a complete customer profile: Teams need to build accurate views of customers at scale, quickly, to optimize self-service processes and data stewardship.
Providing self-service data consumption: Data consumers, such as data scientists, struggle to find and use the data that they need.

You can solve these challenges by implementing a data fabric with Cloud Pak for Data as a Service.

Example: Golden Bank's challenges

Follow the story of Golden Bank as the governance team implements data governance. Golden Bank has a large amount of customer and mortgage data that includes sensitive data. The bank wants to ensure the quality of the data, mask the sensitive data, and make it available for use across several departments.

Process

How you implement data governance depends on the needs of your organization. You can implement data governance in a linear or iterative manner. You can rely on default features and predefined artifacts, or customize your solution.

To implement data governance, your organization might follow this process:

Establish your business vocabulary
Define rules to protect your data
Curate and consolidate your data
Share your data in catalogs

The IBM Knowledge Catalog service in Cloud Pak for Data provides the tools and processes that your organization needs to implement a Data governance solution.

Image showing the flow of assets in the Data governance use case

1. Establish your business vocabulary

To meet the challenges, your team needs to establish a business vocabulary by importing or creating governance artifacts that act as metadata to classify and describe the data:

Before you can automate data privacy, your team needs to ensure that the data to control is accurately identified.
Before you can analyze data quality, you need to identify the format of the data.
To make data easy to find, your team needs to ensure that the content of the data is accurately described.

In this first step of the process, your governance team can build on the foundation of the predefined governance artifacts and create custom governance artifacts that are specific to your organization. You can create artifacts to describe the format, business meaning, sensitivity, range of values, and governance policies of the data.

What you can use	What you can do	Best to use when
Categories	Use the predefined category to store your governance artifacts. Create categories to organize governance artifacts in a hierarchical structure similar to folders. Add collaborators with roles that define their permissions on the artifacts in the category.	You need more than the predefined category. You want fine-grained control of who can own, author, and view governance artifacts.
Workflows	Use the default workflow configuration that does not restrict who creates governance artifacts or require reviews. Configure workflows for governance artifacts and designate who can create which types of governance artifacts in which categories.	You want to control who creates governance artifacts. You want draft governance artifacts to be reviewed before they are published.
Governance artifacts	Use the predefined business terms, data classes, and classifications. Create governance artifacts that act as metadata to enrich, define, and control data assets.	You want to add knowledge and meaning to assets to help people understand the data. You want to improve data quality analysis.
Knowledge Accelerators	Import a set of predefined governance artifacts to improve data classification, regulatory compliance, self-service analytics, and other governance operations.	You need a standard vocabulary to describe business issues, business performance, industry standards, and regulations. You want to save time by importing pre-created governance artifacts.

Example: Golden Bank's business vocabulary

The governance team leader at Golden Bank starts by creating a category, Banking, to hold the governance artifacts that the team plans to create. The team leader adds the rest of the governance team members as collaborators to the Banking category with the Editor role so that they have permission to create governance artifacts. Then, the team leader configures workflows so that a different team member is responsible for creating each type of artifact. All workflows require an approval step by the team leader.

One governance team member imports a set of business terms from a spreadsheet. Some of the business terms are associated with the occupations of the personal clients. Another team member creates a reference data set, “Professions”, that contains a list of occupations, where each occupation has an ID number. A third team member creates a custom data class, “Profession” to identify the profession of personal clients, based on the reference data set.

2. Define rules to protect your data

In the next step of the process, your team defines rules to ensure compliance with data privacy regulations by controlling who can see what data. Your team creates data protection rules to define how to protect data in governed catalogs. Your team can use these data protection rules to mask sensitive data based on the content, format, or meaning of the data, or the identity of the users who access the data.

What you can use	What you can do	Best to use when
Data protection rules	Protect sensitive information from unauthorized access in governed catalogs by denying access to data, masking data values, or filtering rows in data assets. Dynamically and consistently mask data in governed catalogs at a user-defined granular level.	You need to automatically enforce data privacy across your governed catalogs. You want to retain availability and utility of data while you also comply with privacy regulations.
Masking flows	Use advanced format-preserving data masking capabilities when you extract copies or subsets of production data.	You need anonymized training data and test sets that retain data integrity.
Policies and governance rules	Describe and document your organization’s guidelines, regulations, standards, or procedures for data security. Describe the required behavior or actions to implement the governance policy.	You want the people who use the data understand the data governance policies.

Example: Golden Bank's data protection rules

To create a predictive model for mortgage approvals, Golden Bank's data scientists need access to data sets that include sensitive data. For example, the data scientists want to access the table with data about mortgage applicants, which includes a column with social security numbers.

A governance team member creates a data protection rule that masks social security numbers. If the assigned data class of a column in a data asset is "US Social Security Number", the values in that column are replaced with 10 Xs.

A governance team member creates a policy that includes the data protection rule. The policy describes the business reasons for implementing the rule.

3. Curate data to share in catalogs

Data stewards curate high-quality data assets in projects and publish them to catalogs where the people who need the data can find them. Data stewards enrich the data assets by assigning governance artifacts as metadata that describes the data and informs the semantic search for data.

What you can use	What you can do	Best to use when
Metadata import	Automatically import technical metadata for the data that is associated with a connection to create data assets.	You need to create many data assets from a data source. You need to refresh the data assets that you previously imported.
Metadata enrichment	Profile multiple data assets in a single run to automatically assign data classes and identify data types and formats of columns. Automatically assign business terms to assets and generate term suggestions based on data classification. Rerun the import and the enrichment jobs at intervals to discover and evaluate changes to data assets.	You need to curate and publish many data assets that you imported.
Data quality analysis	Run data quality checks on your data sets to scan for quality issues in your data. Continuously track changes to content and structure of data, and recurringly analyze changed data.	You need to know whether the quality of your data might affect the accuracy of your data analysis or models. Your users need to identify which data sets to remediate.

Example: Golden Bank's data curation

The data stewards on the governance team start importing metadata to create data assets in a project. After metadata import, Golden Bank has two data assets that represent tables with a column that is named "ID". After metadata enrichment, those columns are clearly differentiated by their assigned metadata:

One column is assigned the business terms “Occupation” and “Profession”, and the data class “Profession”.
The other column is assigned the business terms "Personal identifier" and "Private individual" and the data class "US Social Security Number".

The data stewards run data quality analysis on the data assets to make sure that the overall data quality score exceeds the Golden Bank threshold of 95%.

The governance team leader creates a catalog, "Mortgage Approval Catalog" and adds the data stewards and data scientists as catalog collaborators. The data stewards publish the data assets that they created in the project into the catalog.

4. Share or work with your data

The catalog helps your teams understand your data and makes the right data available for the right use. Data scientists and other types of users can help themselves to the data that they need while they remain compliant with corporate access and data protection policies. They can add data assets from a catalog into a project, where they collaborate to prepare, analyze, and model the data.

What you can use	What you can do	Best to use when
Catalogs	Organize your assets to share among the collaborators in your organization. Take advantage of AI-powered semantic search and recommendations to help users find what they need.	Your users need to easily understand, collaborate, enrich, and access the high-quality data. You want to increase visibility of data and collaboration between business users. You need users to view, access, manipulate, and analyze data without understanding its physical format or location, and without having to move or copy it. You want users to enhance assets by rating and reviewing assets.
Global search	Search for assets across all the projects, catalogs, and deployment spaces to which you have access. Search for governance artifacts across the categories to which you have access.	You need to find data or another type of asset, or a governance artifact.
Data Refinery	Cleanse data to fix or remove data that is incorrect, incomplete, improperly formatted, or duplicated. Shape data to customize it by filtering, sorting, combining, or removing columns.	You need to improve the quality or usefulness of data.

Example: Golden Bank's catalog

The data scientists find the data assets that they need in the catalog and copy those assets to a project. In their project, the data scientists can refine the data to prepare it for training a model.

Tutorials for Data governance

Tutorial	Description	Expertise for tutorial
Curate high-quality data	Create high-quality data assets by enriching your data and running data quality analysis.	Run the Metadata import and Metadata enrichment tools.
Protect your data	Control access to data across Cloud Pak for Data as a Service.	Create data protection rules.
Consume your data	Find, shape, and analyze data.	Explore a catalog and run the Data Refinery tool.
Govern virtualized data	Enrich virtualized data and ensure that virtual data is protected.	Use the Data Virtualization interface, projects, and catalogs to govern virtualized data.