Prompting a custom foundation model
After a custom foundation model is installed, registered, and deployed, use the capabilities of watsonx.ai to prompt the model.
When your deployed custom foundation model is available, you can prompt it by using one of these methods:
Prompt a custom model by using Prompt Lab
Open the custom model from the list of available foundation models. You can then work with the model as you do with foundation models that are provided with watsonx.ai.
- Use the Prompt Lab to create prompts and review prompts for your custom foundation model
- Build and save reusable prompts as prompt templates with variables
- Deploy and test prompt templates
Prompt a custom model by using the API
Refer to these examples to code a prompt for the custom foundation model:
Generating text response
The following code sample shows how to generate a text response by using the
API endpoint:/ml/v1/deployments/<deployment ID>/text/generation
curl -X POST "https://<cluster_url>/ml/v1/deployments/<your deployment ID>/text/generation?version=2024-01-29" \
-H "Authorization: Bearer $TOKEN" \
-H "content-type: application/json" \
--data '{
"input": "Hello, what is your name",
"parameters": {
"max_new_tokens": 200,
"min_new_tokens": 20
}
}'
You can also use the chat API to prompt your deployed custom foundation model. To enable chat API support, the model content (
file in model content) must include a chat template. Chat API support can only
be used to prompt models that use the vLLM runtime for deployment, therefore they must be deployed with the tokenizer_config.json
software specification.watsonx-cfm-caikit-1.1
You can only use the chat API to prompt new deployment. The chat API cannot be used to patch support for existing deployments.
The following code sample shows how to generate a text response by using the
API endpoint:/ml/v1/deployments/<deployment ID>/text/chat
curl --request POST\
--url'https://{region}.ml.cloud.ibm.com/ml/v1/deployments/<deployment-id>/text/chat?version=2020-10-10'\
--header'Authorization: Bearer$token'\
--header'Content-Type: application/json'\
--data'{
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What is the capital of India"
}
]
},
{
"role": "assistant",
"content": "New Delhi."
},
{
"role": "user",
"content": [
{
"type": "text",
"text": "Which continent"
}
]
}
]
}'
Generating stream response
The following code sample shows how to generate a stream response by using the
API endpoint:/ml/v1/deployments/<deployment ID>/text/generation_stream
curl -X POST "https://<cluster_url>/ml/v1/deployments/<your deployment ID>/text/generation_stream?version=2024-01-29" \
-H "Authorization: Bearer $TOKEN" \
-H "content-type: application/json" \
--data '{
"input": "Hello, what is your name",
"parameters": {
"max_new_tokens": 200,
"min_new_tokens": 20
}
}'
You can use the chat API to prompt your deployed custom foundation model, which implements methods for interacting with foundation models in a conversational way. To enable chat API support, the model content (
file in model content) must include a chat template. Chat API support can only be used to prompt models that use the vLLM runtime for deployment, therefore they must be deployed with the tokenizer_config.json
software specification.watsonx-cfm-caikit-1.1
You can only use the chat API to prompt new deployment. The chat API cannot be used to patch support for existing deployments.
The following code sample shows how to generate a text response by using the
API endpoint:/ml/v1/deployments/<deployment ID>/text/chat_stream
curl --request POST \
--url 'https://{region}.ml.cloud.ibm.com/ml/v1/deployments/<deployment-id>/text/chat_stream?version=2020-10-10' \
--header 'Authorization: Bearer$token' \
--header 'Content-Type: application/json' \
--data '{
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What is the capital of USA"
}
]
}
]
}'
Parent topic: Deploying a custom foundation model