Evaluate metrics
Last updated: Jul 07, 2025
The evaluate metrics module can help you calculate LLM metrics.
Evaluate metrics is a module in the
Python SDK that contains methods to compute scores for the context relevance, faithfulness,
and answer similarity metrics. You can use model insights to visualize the evaluation results.ibm-watsonx-gov
To use the metrics evaluation module you must install the
Python SDK with specific settings:ibm-watsonx-gov
pip install "ibm-watsonx-gov[metrics]"
Examples
You can use the evaluate metrics module to calculate metrics as shown in the following examples:
Simplified metrics evaluation
from ibm_watsonx_gov.evaluators.metrics_evaluator import MetricsEvaluator
from ibm_watsonx_gov.metrics import AnswerRelevanceMetric
os.environ["WATSONX_APIKEY"] = "..."
evaluator = MetricsEvaluator()
metrics = [AnswerSimilarityMetric()]
result = evaluator.evaluate(data=input_df, metrics=metrics)
Advanced metrics evaluation
from ibm_watsonx_gov.evaluators.metrics_evaluator import MetricsEvaluator
from ibm_watsonx_gov.metrics import AnswerRelevanceMetric
from ibm_watsonx_gov.config import GenAIConfiguration
from ibm_watsonx_gov.clients.api_client import APIClient
from ibm_watsonx_gov.credentials import Credentials
config = GenAIConfiguration(input_fields=["question"],
context_fields=["context"],
output_fields=["generated_text"],
reference_fields=["reference_answer"])
wxgov_client = APIClient(credentials=Credentials(api_key=""))
evaluator = MetricsEvaluator(configuration=config, api_client=wxgov_client)
metrics = [AnswerSimilarityMetric()]
result = evaluator.evaluate(data=input_df, metrics=metrics)
For more information, see the Evaluate metrics notebook.
Parent topic: Metrics computation using Python SDK
Was the topic helpful?
0/1000