AI governance tutorial: Build and deploy a model

This tutorial is the first in a series of two tutorials. Take this tutorial to build, deploy, and track a model with the AI governance use case of the data fabric trial. Your goal is to train a model to predict which applicants qualify for mortgages and then deploy the model for evaluation. You must also set up tracking for the model to document the model history and generate an explanation for its performance.

The following animated image provides a quick preview of what you’ll accomplish by the end of the second tutorial where you will use Watson OpenScale to configure and evaluate monitors for the deployed model to ensure that the model is accurate and treating all applicants fairly. Right-click the image and open it in a new tab to view a larger image.

Screenshots of the tutorial

The story for the tutorial is that Golden Bank wants to expand its business by offering low-rate mortgage renewals for online applications. Online applications expand the bank’s customer reach and reduce the bank’s application processing costs. As a data scientist at Golden Bank, you must create a mortgage approval model that avoids unanticipated risk and treats all applicants fairly. You will run a Jupyter Notebook to build a model and automatically capture metadata that tracks the model in an AI Factsheet.

In this tutorial, you will complete these tasks:

Task 1: Set up tracking for your model.
Task 2: Configure IBM OpenPages Model Risk Governance
Task 3: Create a model use case in the model inventory.
Task 4: Run the notebook to create the model.
Task 5: View the model's factsheet and associate it with a model use case.
Task 6: Deploy the model.
Task 7: Perform model risk assessment

If you need help with this tutorial, you can ask a question or find an answer in the Cloud Pak for Data Community discussion forum.

Tip: For the optimal experience completing this tutorial, open Cloud Pak for Data in one browser window, and keep this tutorial page open in another browser window to switch easily between the two applications. Consider arranging the two browser windows side-by-side to make it easier to follow along.

Side-by-side tutorial and UI

Preview the tutorial

Watch Video Watch this video to preview the steps in this tutorial. There might be slight differences in the user interface shown in the video. The video is intended to be a companion to the written tutorial.

This video provides a visual method as an alternative to following the written steps in this documentation.

Prerequisites

The following prerequisites are required to complete this tutorial.

Access type	Description	Documentation
Services	- Watson Studio - Watson Machine Learning - AI Factsheets - Watson Knowledge Catalog - OpenPages - Watson OpenScale - Db2	- Watson Studio - Watson Machine Learning - AI Factsheets - Watson Knowledge Catalog - IBM OpenPages with Watson - Watson OpenScale - Db2
Role	Data Scientist	- Predefined roles and permissions - Manage roles
Permissions	- Manage information assets - Access advanced governance - Administer platform	- Predefined roles and permissions - Manage roles
Additional access	- Editor access to Default Catalog - Admin access to the OpenPages instance - MRG - All Permissions access role assigned in OpenPages - Admin access to the Watson OpenScale instance - Completed setup for Watson OpenScale	- Add collaborators - Manage users for the OpenPages service - Assign and remove a role from a user or group - Manage users for the Watson OpenScale service - Automated setup
Additional configuration	Disable Enforce the exclusive use of secrets	Require users to use secrets for credentials

Follow these steps to verify your roles and permissions. If your Cloud Pak for Data account does not meet all of the prerequisites, contact your administrator.

Click your profile image in the toolbar.
Click Profile and settings.
Select the Roles tab.

The permissions that are associated with your role (or roles) are listed in the Enabled permissions column. If you are a member of any user groups, you inherit the roles that are assigned to that group. These roles are also displayed on the Roles tab, and the group from which you inherit the role is specified in the User groups column. If the User groups column shows a dash, that means the role is assigned directly to you.

Roles and permissions

Create the sample project

If you did not already create the sample project for this tutorial, follow these steps:

Download the AI-governance.zip file.
From the Cloud Pak for Data navigation menu , choose Projects > All projects.
On the Projects page, click New project.
Select Create a project from a file.
Upload the previously downloaded ZIP file.
On the Create a project page, copy and paste the project name and add an optional description for the project.
```
AI governance
```
Click Create.
Click View new project to verify that the project and assets were created successfully.
Click the Assets tab, to view the project's assets.

Check your progress

The following image shows the sample project. You are now ready to start the tutorial.

Tip: If you encounter a guided tour while completing this tutorial in the Cloud Pak for Data user interface, click Maybe later.

Task 1: Set up tracking for your model

You track models by adding model use cases to a catalog. You can create a new catalog, if you have access to create catalogs, or you can use the Default Catalog.

Option 1: Use the Default Catalog

Follow these steps to see whether you have Editor access to the Default Catalog:

From the Cloud Pak for Data navigation menu , choose Catalogs > All catalogs.
On the Catalogs page, open the Default Catalog. If you do not see the Default Catalog on the Catalogs page, then contact your administrator to request Editor access to the Default Catalog.
Click the Access control tab.
Verify that your access is Editor or higher. If your access is Viewer, then contact your administrator to request Editor access to the Default Catalog.
Copy the catalog ID from the catalog URL. The catalog ID is the string after “/catalogs/” in the URL and before the first question mark. For example, if the URL contains …data/catalogs/bc40d84c-2b5f-4dbf-bdc6-c1e69c608326?context…, then the catalog ID is bc40d84c-2b5f-4dbf-bdc6-c1e69c608326. You will need the catalog ID later to associate the model use case in OpenPages with this catalog.

Option 2: Create a catalog

Follow these steps to create a catalog:

From the Cloud Pak for Data navigation menu , choose Catalogs > All catalogs.
Click New Catalog.
For the Name, copy and paste the catalog name exactly as shown with no leading or trailing spaces:
```
Mortgage Approval Catalog
```
Select Enforce data protection rules, confirm the selection, and accept the defaults for the other fields.
Click Create.
Copy the catalog ID from the catalog URL. The catalog ID is the string after “/catalogs/” in the URL and before the first question mark. For example, if the URL contains …data/catalogs/bc40d84c-2b5f-4dbf-bdc6-c1e69c608326?context…, then the catalog ID is bc40d84c-2b5f-4dbf-bdc6-c1e69c608326. You will need the catalog ID later to associate the model use case in OpenPages with this catalog.

Check your progress

The following image shows your catalog. You are now ready to create the model use case that is stored in the catalog.

Task 2: Configure IBM OpenPages Model Risk Governance

This tutorial uses a model risk governance sample. Follow these steps to download the sample files and configure IBM OpenPages Model Risk Governance:

Download the following files:
In OpenPages, click the settings icon to select System Migration > Import Configuration > add file from the zip. Select MRG-Users-op-config.xml.
Import the Golden_Bank_trial_content.xlsx file from the zip using FastMap Import. Click the settings icon to select FastMap Import.
Click the settings icon to select System Migration > Import Configuration > add file from the zip. Select MRG-CP4D-trial-contents-op-config.xml.

Task 3: Create the model use case in the model inventory

For this type of project, it is best to create the model use case when a project commences. A model use case can reference multiple machine learning models that you can use to solve business problems. Then, data engineers and model evaluators can add models to the model use case and track the model as it progresses through its lifecycle.

Create the business entity

You first need to create a business entity. A business entity is an abstract representation of your business structure. A business entity can contain sub-entities (such as departments, business units, or geographic locations). By creating the business entity, you tell OpenPages which catalog you want to use for syncing the model use cases. Follow these steps to create the business entity:

Tip: If this occasion is your first time accessing the Model inventory, you see a guided tour asking if you want to set up model governance. For now, click Maybe later.

From the Cloud Pak for Data home page, navigate to Services > Instances.
Click your OpenPages instance.
In the Access information section, click the Launch icon next to the URL. The OpenPages dashboard displays.
From the navigation menu , choose Organization > Business Entities.
Click New.
For the business entity name, type the same name as the catalog you are using: either Default Catalog or the catalog you created in Step 2.
Set the primary business entity:
1. Click Select Primary Business Entity.
2. Search for catalogs.
3. Select Catalogs (Library > MRG > WKC > Catalogs).
4. Click Done.
Click Save.
Click the edit icon next to the Catalog ID field, and paste your catalog ID.
Click Save.

Create the model use case

Now you are ready to create the model use case so data scientists and model evaluators can add models to the model use case and track the model progress throughout it's lifecycle. Follow these steps to create the model use case:

Click the home icon to return to the OpenPages dashboard.
Scroll down, and click New Model Use Case.
For the Model use case name, copy and paste the name exactly as shown with no leading or trailing spaces:
```
Mortgage Approval Model Use Case
```

For the Description, copy and paste the following text:

This model use case is for the Mortgage approval model at Golden Bank.

For the Purpose, copy and paste the following text:

Assists with automating the process of issuing a mortgage to an applicant. Decide if the person should be given a mortgage or not.

Add two business entities:
1. On the Primary Business Entity tab, click Add.
2. Search for and select the business entity (either Mortgage Approval Catalog or Default Catalog).
3. Click the Other Business Entity tab.
4. Click Add.
5. Select Golden Bank.
6. Click Done.
Click Save.

Check your progress

The following image shows your model use case. The model use case is now ready for data engineers and model evaluators to add models and track models as they progress through their lifecycle. The next task is to run the notebook to create the model.

Task 4: Run the notebook to create the model

Now you are ready to run the first notebook included in the sample project. The notebook includes the code to:

Set up AI Factsheets used to track the lifecycle of the model.
Load the training data, which is stored in the Db2 Warehouse connection in the sample project.
Specify the target, categorical, and numerical columns along with the thresholds used to build the model.
Build data pipelines.
Build machine learning models.
View the model results.
Save the model.

Follow these steps to run the notebook included in the sample project. Take some time to read through the comments in the notebook, which explain the code in each cell.

From the Cloud Pak for Data navigation menu , choose Projects > All projects.
Click the AI governance project name.
Click the Assets tab, and then navigate to Notebooks.
Click the Overflow menu for the 1-model-training-with-factsheets notebook, and choose Edit.
Click Cell > Run All to run all of the cells in the notebook. Alternatively, you can run the notebook cell by cell if you want to explore each cell and its output.
The first cell requires your input.
1. At the Enter host name prompt, type your Cloud Pak for Data hostname beginning with https://, and press Enter. For example, https://mycpdcluster.mycompany.com.
2. At the Username prompt, type your Cloud Pak for Data username, and press Enter.
3. At the Password prompt, type your Cloud Pak for Data password, and press Enter.
The notebook takes 1 - 3 minutes to complete. You can monitor the progress cell by cell, noticing the asterisk "In [*]" changing to a number, for example, "In [1]".
If you encounter any errors during the notebook run, try these tips:
- Click Kernel > Restart & Clear Output to restart the kernel, and then run the notebook again.
- Verify that you created the model use case by copying and pasting the specified artifact name exactly with no leading or trailing spaces.

Check your progress

The following image shows the notebook when the run is complete. The notebook saved the model in the project, so you are now ready to view and add it to the model inventory.

Task 5: View the model's factsheet and associate it with a model use case

After running all the cells in notebook, follow these steps to view the model's factsheet in the project and then associate that model with a model use case in the model inventory:

Click the AI governance project name in the navigation trail.
Click the Assets tab, and then navigate to Models.
Click the Mortgage Approval Prediction Model asset that was created by the notebook.
Review the AI Factsheet for your model. AI Factsheets capture model metadata across the model development lifecycle, facilitating subsequent enterprise validation or external regulation. AI Factsheets enables model validators and approvers to get an accurate, always up-to-date view of the model lifecycle details.
In the last task, you ran a notebook containing the AI Factsheets Python client code in the notebook that captured training metadata. Scroll to the Training metrics and Training tags sections to review the captured training metadata.
The following image shows the AI Factsheet for the model:
Scroll up on the model page, and click Track this model.
1. From the list of model use cases, select Mortgage Approval Model Use Case.
2. Select Create a new record.
3. Click Track.
Back on the model page, click Open in model inventory.
On the model use case page, click the Asset tab.
Review the Model tracking. AI Factsheets track models through their lifecycle. This model is still in the Develop stage as it has not been deployed yet.

Check your progress

The following image shows the model use case with the model in the Develop phase. Now that you reviewed metadata such as the training data source, training metrics, and input schema that was captured in the AI Factsheet, you are ready to deploy the model.

Task 6: Deploy the model

Before you can deploy the model, you need to promote the model to a new deployment space. Deployment spaces help you to organize supporting resources such as input data and environments; deploy models or functions to generate predictions or solutions; and view or edit deployment details.

Promote the model to a deployment space

Follow these steps to promote the model to a new deployment space:

From the model use case, under the Develop phase, click Mortgage Approval Prediction Model.
Click Open in project to open the model in the AI governance project.
On the model page, click Promote to deployment space.
For the Target space, select Create a new deployment space.
1. For the deployment space name, copy and paste the name exactly as shown with no leading or trailing spaces:
```
Golden Bank Preproduction Space
```
2. Click Create.
3. Click Close.
For the Target space, ensure that Golden Bank Preproduction Space is selected.
Check the Go to model in the space after promoting it option.
Click Promote.

Check your progress

The following image shows the model in the deployment space. You are now ready to create a model deployment.

Create an online deployment for the model

Follow these steps to create an online deployment for your model:

When the deployment space opens, click New deployment.
1. For the Deployment type, select Online.
2. For the Name, copy and paste the deployment name exactly as shown with no leading or trailing spaces:
```
Mortgage Approval Model Deployment
```
3. For the Serving name, you can specify a descriptive name to use in place of the deployment ID that will help you to identify this deployment quickly. Copy and paste the serving name with no leading or trailing spaces. The name is validated to be unique per region. If this serving name already exists, then add a number (or any unique character) to the end of the serving name.
```
mortgage_approval_service
```
4. Click Create.
The model deployment may take several minutes to complete. When the model is deployed successfully, return to the model inventory; From the navigation menu , choose Catalogs > Model inventory.
For the Mortgage Approval Model Use Case, click View details.
Click the Asset tab. Under Model tracking, you can see that the model is now in the Test stage.

Check your progress

The following image shows the model use case with the model in the Deploy phase. Your model is now ready for you to evaluate in Watson OpenScale.

Task 7: Perform model risk assessment

The Model Owner can leverage captured metadata to consider carefully when approving this model as the implications can be large for the end customers and Golden Bank’s reputation. When the data scientist created a test deployment, all of the training time facts were captured. Now you perform a model risk assessment to determine if this model should move to a pre-production environment, and then eventually a production environment.

On the model tracking page, click the Mortgage Approval Model Use Case link to open the same use case in IBM OpenPages Model Risk Governance.
Under Associated Models, click the Mortgage Approval Prediction Model name to view the model.
Retrieve the training accuracy score:
1. Scroll down to the Associations section, and click the Metrics node in the tree.
2. In the side panel, click the training_accuracy_score metric.
3. Copy the value for the training accuracy score for the model. You need this value for the risk assessment.
  
  Tip: If you have trouble locating the training_accuracy_score metric, click the Metrics tab to locate the metric.
Return to the model tab to create the model risk assessment:
1. In the Model Risk Assessment section, click New.
2. For the Description, type Initial risk assessment for Mortgage Approval model.
3. Answer the following questions to assess the risk of the model.
  - Model uses other models outputs or feeds downstream models: Choose No since this model makes a decision on the loan application and does not depend on any other models.
  - Model Training Accuracy score: Paste the value for the training accuracy score metric that you copied earlier.
  - Is this model used in granting loans or mortgages?: Choose Yes since this is a mortgage approval model.
  - Is there information on protected groups in the training data?: Choose No since this model is trained with mortgage data that has been anonymized.
4. After you provide the information for all of the required fields, click Save.
When the risk assessment is complete, review the Computed tier. The risk assessment generates a computed risk tier for the model (Tier 3 for low risk, Tier 2 for medium risk, and Tier 1 for high risk). Note that you can override the computed tier if that’s appropriate.
Edit the Status field:
1. Click the edit icon next to the Status field.
2. Choose Confirmed to confirm the assessment.
3. Click Save.
Return to the model page to assign an owner and submit the model for pre implementation review.
1. Click the edit icon next to the Model Owner field.
2. Select your user name from the list.
3. Click Save.
4. Click Action > Submit for Pre Implementation Review, and then click Continue to send the model for validation.

Check your progress

The following image shows the Activity tab for the model in OpenPages.

As a data scientist at Golden Bank, you created a mortgage approval model by running a Jupyter Notebook that built the model and automatically captured metadata to track the model in an AI Factsheet. You then promoted the model to a deployment space, and deployed the model. You tracked all of the activity for the model by using IBM OpenPages with Watson.

Next steps

You are now ready to validate and monitor your deployed machine learning model to ensure it is working accurately and fairly. For this task, you will use Watson OpenScale. See the Test and validate the model tutorial.

Learn more

Parent topic: Data fabric tutorials