Drift v2 evaluations
Cosine distance
Cosine distance measures the difference between embedding vectors. The following formula is used to measure cosine distance:
The cosine distance ranges between 0, which indicates identical vectors to 1, which indicates no correlation between the vectors, to 2, which indicates opposite vectors.
Euclidean distance
Euclidean distance is the shortest distance between embedding vectors in the euclidean space. The following formula is used to measure euclidean distance:
The euclidean distance ranges between 0, which indicates completely identical vectors, to infinity. However, for vectors that are normalized to have unit length, the maximum euclidean distance is the .
You can configure drift v2 evaluations to measure changes in your data over time to ensure consistent outcomes for your model. Use drift v2 evaluations to identify changes in your model output, the accuracy of your predictions, and the distribution of your input data.
The following sections describe how to configure drift v2 evaluations:
Configuring drift v2 evaluations
If you log payload data when you prepare for model evaluations, you can configure drift v2 evaluations to help you understand how changes in your data affect model outcomes.
Compute the drift archive
You must choose the method that is used to analyze your training data to determine the data distributions of your model features. If you connect training data and the size of your is less than 500 MB, you can choose to compute the drift v2 archive.
If you don't connect your training data, or if the size of your data is larger than 500 MB, you must choose to compute the drift v2 archive in a notebook. You must also compute the drift v2 archive in notebooks if you want to evaluate image or text models.
You can specify a limit for the size of your training data by setting maximum sample sizes for the amount of training data that is used for scoring and computing the drift v2 archive. For non-watsonx.ai Runtime deployments, computing the drift v2 archive has a cost associated with scoring the training data against your model's scoring endpoint.
Set drift thresholds
You must set threshold values for each metric to identify issues with your evaluation results. The values that you set create alerts on the Insights dashboard that appear when metric scores violate your thresholds. You must set the values between the range of 0 to 1. The metric scores must be lower than the threshold values to avoid violations.
Select important features
For tabular models only, Feature importance is calculated to determine the impact of feature drift on your model. To calculate feature importance, you can select the important and most important features from your model that have the biggest impact on your model outcomes.
When you configure SHAP explanations, the important features are automatically detected by using global explanations.
You can also upload a list of important features by uploading a JSON file. Sample snippets are provided that you can use to upload a JSON file. For more information, see Feature importance snippets.
Set sample size
Sample sizes are provided to process the number of transactions that are evaluated during evaluations. You must set a minimum sample size to indicate the lowest number of transactions that you want to evaluate. You can also set a maximum sample size to indicate the maximum number of transactions that you want to evaluate.
Supported drift v2 metrics
When you enable drift v2 evaluations, you can view a summary of evaluation results with metrics for the type of model that you're evaluating.
You can view the results of your drift v2 evaluations on the Insights dashboard. For more information, see Reviewing drift v2 results.
The following metrics are supported by drift v2 evaluations:
Output drift
Output drift measures the change in the model confidence distribution.
-
How it works:
The amount that your model output changes from the time that you train the model is measured. For regression models, output drift is calculated by measuring the change in distribution of predictions on the training and payload data. For classification models, output drift is calculated for each class probability by measuring the change in distribution for class probabilities on the training and payload data. For multi-classification models, output drift is aggregated for each class probability by measuring a weighted average. -
Do the math:
The following formulas are used to calculate output drift:
Model quality drift
Model quality drift compares the estimated runtime accuracy to the training accuracy to measure the drop in accuracy.
- How it works:
A drift detection model is built that processes your payload data when you configure drift v2 evaluations to predict whether your model generates accurate predictions without the ground truth. The drift detection model uses the input features and class probabilities from your model to create its own input features.
- Do the math:
The following formula is used to calculate model quality drift:
The accuracy of your model is calculated as the base_accuracy
by measuring the fraction of correctly predicted transactions in your training data. During evaluations, your transactions are scored against the drift detection model
to measure the amount of transactions that are likely predicted correctly by your model. These transactions are compared to the total number of transactions that are processed to calculate the predicted_accuracy
. If the predicted_accuracy
is less than the base_accuracy
, a model quality drift score is generated.
Feature drift
Feature drift measures the change in value distribution for important features.
- How it works:
Drift is calculated for categorical and numeric features by measuring the probability distribution of continuous and discrete values. To identify discrete values for numeric features, a binary logarithm is used to compare the number of distinct values of each feature to the total number of values of each feature. The following binary logarithm formula is used to identify discrete numeric features:
If the distinct_values_count
is less than the binary logarithm of the total_count
, the feature is identified as discrete.
- Do the math:
The following formulas are used to calculate feature drift:
The following formulas are used to calculate drift v2 evaluation metrics:
Total variation distance
Total variation distance measures the maximum difference between the probabilities that two probability distributions, baseline (B) and production (P), assign to the same transaction as shown in the following formula:
If the two distributions are equal, the total variation distance between them becomes 0.
The following formula is used to calculate total variation distance:
-
𝑥 is a series of equidistant samples that span the domain of that range from the combined miniumum of the baseline and production data to the combined maximum of the baseline and production data.
-
is the difference between two consecutive 𝑥 samples.
-
is the value of the density function for production data at a 𝑥 sample.
-
is the value of the density function for baseline data for at a 𝑥 sample.
The denominator represents the total area under the density function plots for production and baseline data. These summations are an approximation of the integrations over the domain space and both these terms should be 1 and total should be 2.
Overlap coefficient
The overlap coefficient is calculated by measuring the total area of the intersection between two probability distributions. To measure dissimilarity between distributions, the intersection or the overlap area is subtracted from 1 to calculate the amount of drift. The following formula is used to calculate the overlap coefficient:
-
𝑥 is a series of equidistant samples that span the domain of that range from the combined miniumum of the baseline and production data to the combined maximum of the baseline and production data.
-
is the difference between two consecutive 𝑥 samples.
-
is the value of the density function for production data at a 𝑥 sample.
-
is the value of the density function for baseline data for at a 𝑥 sample.
Jensen Shannon distance
Jensen Shannon Distance is the normalized form of Kullback-Leibler (KL) Divergence that measures how much one probability distribution differs from the second probabillity distribution. Jensen Shannon Distance is a symmetrical score and always has a finite value.
The following formula is used to calculate the Jensen Shannon distance for two probability distributions, baseline (B) and production (P):
is the KL Divergence.
Parent topic: Configuring model evaluations