Creating custom evaluations and metrics
To create custom evaluations, select a set of custom metrics to quantitatively track your model deployment and business application. You can define these custom metrics and use them alongside metrics that are generated by other types of evaluations.
You can use one of the following methods to manage custom evaluations and metrics:
Managing custom metrics with the Python SDK
To manage custom metrics with the Python SDK, you must perform the following tasks:
The following advanced tutorial shows how to do this:
You can disable and enable again custom monitoring at any time. You can remove custom monitor if you do not need it anymore.
For more information, see the Python SDK documentation.
Step 1: Register custom monitor with metrics definition.
Before you can start by using custom metrics, you must register the custom monitor, which is the processor that tracks the metrics. You also must define the metrics themselves.
- Use the
get_definition(monitor_name)
method to import theMetric
andTag
objects. - Use the
metrics
method to define the metrics, which requirename
,thresholds
, andtype
values. - Use the
tags
method to define metadata.
The following code is from the working sample notebook that was previously mentioned:
def get_definition(monitor_name):
monitor_definitions = wos_client.monitor_definitions.list().result.monitor_definitions
for definition in monitor_definitions:
if monitor_name == definition.entity.name:
return definition
return None
monitor_name = 'my model performance'
metrics = [MonitorMetricRequest(name='sensitivity',
thresholds=[MetricThreshold(type=MetricThresholdTypes.LOWER_LIMIT, default=0.8)]),
MonitorMetricRequest(name='specificity',
thresholds=[MetricThreshold(type=MetricThresholdTypes.LOWER_LIMIT, default=0.75)])]
tags = [MonitorTagRequest(name='region', description='customer geographical region')]
existing_definition = get_definition(monitor_name)
if existing_definition is None:
custom_monitor_details = wos_client.monitor_definitions.add(name=monitor_name, metrics=metrics, tags=tags, background_mode=False).result
else:
custom_monitor_details = existing_definition
To check how you're doing, run the client.data_mart.monitors.list() command to see whether your newly created monitor and metrics are configured properly.
You can also get the monitor ID by running the following command:
custom_monitor_id = custom_monitor_details.metadata.id
print(custom_monitor_id)
For a more detailed look, run the following command:
custom_monitor_details = wos_client.monitor_definitions.get(monitor_definition_id=custom_monitor_id).result
print('Monitor definition details:', custom_monitor_details)
Step 2: Enable custom monitor.
Next, you must enable the custom monitor for subscription. This activates the monitor and sets the thresholds.
- Use the
target
method to import theThreshold
object. - Use the
thresholds
method to set the metriclower_limit
value. Supply themetric_id
value as one of the parameters. If you don't remember, you can always use thecustom_monitor_details
command to get the details as shown in the previous example.
The following code is from the working sample notebook that was previously mentioned:
target = Target(
target_type=TargetTypes.SUBSCRIPTION,
target_id=subscription_id
)
thresholds = [MetricThresholdOverride(metric_id='sensitivity', type = MetricThresholdTypes.LOWER_LIMIT, value=0.9)]
custom_monitor_instance_details = wos_client.monitor_instances.create(
data_mart_id=data_mart_id,
background_mode=False,
monitor_definition_id=custom_monitor_id,
target=target
).result
To check on your configuration details, use the subscription.monitoring.get_details(monitor_uid=monitor_uid)
command.
Step 3: Store metric values.
You must store, or save, your custom metrics to the region where your service instance exists.
- Use the
metrics
method to set which metrics you are storing. - Use the
subscription.monitoring.store_metrics
method to commit the metrics.
The following code is from the working sample notebook that was previously mentioned:
from datetime import datetime, timezone, timedelta
from ibm_watson_openscale.base_classes.watson_open_scale_v2 import MonitorMeasurementRequest
custom_monitoring_run_id = "11122223333111abc"
measurement_request = [MonitorMeasurementRequest(timestamp=datetime.now(timezone.utc),
metrics=[{"specificity": 0.78, "sensitivity": 0.67, "region": "us-south"}], run_id=custom_monitoring_run_id)]
print(measurement_request[0])
published_measurement_response = wos_client.monitor_instances.measurements.add(
monitor_instance_id=custom_monitor_instance_id,
monitor_measurement_request=measurement_request).result
published_measurement_id = published_measurement_response[0]["measurement_id"]
print(published_measurement_response)
To list all custom monitors, run the following command:
published_measurement = wos_client.monitor_instances.measurements.get(monitor_instance_id=custom_monitor_instance_id, measurement_id=published_measurement_id).result
print(published_measurement)
Managing custom metrics with watsonx.governance
Step 1: Add metric groups
- On the Configure tab, click Add metric group.
- If you want to configure a metric group manually, click Configure new group.
a. Specify a name and a description for the metric group.
The length of the name that you specify must be less than or equal to 48 characters.
b. Click the Edit icon on the Input parameters tile and the specify the details for your input parameters.
The parameter name that you specify must match the parameter name that is specified in the metric API.
c. If the parameter is required to configure your custom monitor, select the Required parameter checkbox.
d. Click Add.
After you add the input parameters, click Next.
e. Select the model types that your evaluation supports and click Next.
f. If you don't want to specify an evaluation schedule, click Save.
g. If you want to specify an evaluation schedule, click the toggle.
You must specify the interval for the evaluation schedule and click Save. h. Click Add metric and specify the metric details.
Click Save. - If you want to configure a metric group by using a JSON file, click Import from file.
Upload a JSON file and click Import.
Step 2: Add metric endpoints
- In the Metric endpoints section, click Add metric endpoint.
- Specify a name and a description for the metric endpoint.
- Click the Edit icon on the Connection tile and specify the connection details.
Click Next. - Select the metric groups that you want associate with the metric endpoint and click Save.
Step 3: Configure custom monitors
- On the Insights Dashboard page, select Configure monitors on a model deployment tile.
- In the Evaluations section, select the name of the metric group that you added.
- Select the Edit icon on the Metric endpoint tile.
- Select a metric endpoint and click Next.
If you don't want to use a metric endpoint, select None. - Use the toggles to specify the metrics that you want to use to evaluate the model and provide threshold values.
Click Next. - Specify values for the input parameters. If you selected JSON as the data type for the metric group, add the JSON data.
Click Next.
You can now evaluate models with a custom monitor.
Accessing and visualizing custom metrics
To access and visualize custom metrics, you can use programmatic interface. The following advanced tutorial shows how to do this:
-
Working with IBM watsonx.ai Runtime
For more information, see the Python SDK documentation.
Visualization of your custom metrics appears on the Insights Dashboard.
Learn more
Parent topic: Configuring model evaluations