Using the model gateway on IBM Cloud (beta)
The model gateway provides a single OpenAI-compatible LLM-as-a-service API, which proxies the routing of requests to LLM providers. The gateway currently offers features such as load balancing and usage statistics tracking.
Before you begin
Environment variables and keys
To use the model gateway on IBM Cloud, you must provide API keys for the supported LLM providers that you plan to use. It is recommended to configure environment variables for all your keys.
Besides the LLM provider API keys, it is recommended to use environment variables for tracking other API keys (such as for IBM Cloud IAM) and for other configuration values such as URLs. For the examples in this topic, export the host URL
of the IBM Cloud model gateway to the environment variable GATEWAY_URL
and the IAM API key to IBM_CLOUD_APIKEY
.
export GATEWAY_URL="{region}.ml.cloud.ibm.com/ml/gateway"
export IBM_CLOUD_APIKEY="xxxx" # See below for more info on how to create an IBM Cloud IAM API key.
Setting up authentication
To authenticate requests, use IBM Cloud Identity and Access Management (IAM).
To work with the API, authenticate your application or service by including your IBM Cloud IAM access token in API requests.
-
Create a new IAM API Key
- To create a IAM API key in the IBM Cloud UI, see create a user API key in the UI.
- To create a IAM API key in the IBM Cloud CLI using the following command:
ibmcloud iam api-key-create MyKey -d "this is my API key" --file key_file --action-if-leaked "DELETE"
For more information, see Managing user API keys
-
Create an IBM Cloud Secrets Manager
To use the model gateway, provision and link an IBM Secrets Manager instance.
- To configure using the UI, see Creating a Secrets Manager instance in the UI.
- To configure by using the CLI, use the following command.
Replace<region>
withus-south
. Replace<plan>
with one of the following pricing plan IDs:- Trial:
869c191a-3c2a-4faf-98be-18d48f95ba1f
- Standard plan:
7713c3a8-3be8-4a9a-81bb-ee822fcaac3d
- Trial:
ibmcloud login
ibmcloud target -r <region> -g <resource_group_name>
ibmcloud resource service-instance-create <instance_name> secrets-manager <plan> -p '{"allowed_network": "public-and-private"}'
ibmcloud resource service-instances # Optional, verifies that the service instance was created successfully.
For more information, see Creating a Secrets Manager instance from the CLI.
-
Authorize Secrets Manager
Before authorizing, make sure that you have the SecretsReader service role or higher on your Secrets Manager instance. For more information, see Authorizing an IBM Cloud service to access Secrets Manager.
- To authorize by using the UI, see Creating an authorization in the console.
- To authorize by using the IBM Cloud CLI, the following commands can be used:
ibmcloud login -a https://cloud.ibm.com --apikey ${IBM_CLOUD_APIKEY}
ibmcloud iam authorization-policy-create SecretsReader \
--source-service-name pm-20 \
--target-service-name secrets-manager
Note: pm-20
is the service name for watsonx.ai runtime.
For more information, see Using authorizations to grant access between services.
You can now use the IBM_CLOUD_APIKEY
as a valid bearer token for the model gateway on IBM Cloud.
curl ${GATEWAY_URL}/v1/... \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ${IBM_CLOUD_APIKEY}" \
...
Setting up the model gateway
- To begin using the model gateway, first create tenancy by using the authorized IBM Secrets Manager. With other configuration values, it is recommended to use an environment variable to track the Secrets Manager host address. In this topic,
the examples use the environment variable
SECRETS_MANAGER
.
export SECRETS_MANAGER="https://xxxx.xxxx.secrets-manager.appdomain.cloud"
To configure the model gateway, send a request to POST /v1/tenant
including a new name and your authorized Secrets Manager address.
Example
curl ${GATEWAY_URL}/v1/tenant \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ${IBM_CLOUD_APIKEY}" \
-d @- <<EOF
{
"name": "test",
"secrets_manager": "${SECRETS_MANAGER}"
}
EOF
-
You can configure LLM providers for your account. To configure each provider, send a request to
POST /v1/providers/<PROVIDER TYPE>
. The body of the request will include:name
- A custom user-defined identifier for this LLM provider credential. This identifier can be anything but has to be unique. The examples in this topic use"my-xxxx-provider"
for each provider type.data
- Contains any configuration arguments that are supplied for setting up the provider connection. Typically, some form ofapiKey
is required. Some providers may require or optionally supply additional configuration parameters.
The response to this request includes a UUID for the provider that should be noted for use in later steps to enable models on the provider.
{
"uuid": "de972dcf-7042-4cag-e7a3-d90a16229e5b",
"name": "my-openai-provider",
"type": "openai"
}
For more details and examples for each of the supported providers, see Choosing an LLM provider.
-
To retrieve information on configured model providers, call
GET /v1/providers
by supplying the credential name.The following example demonstrates retrieving the UUID of your
my-openai-provider
OpenAI provider by exporting the UUID from the response into an environment variable. Provider UUIDs will be used in the next step to enable models.
export OPENAI_PROVIDER_UUID=$(curl -sS ${GATEWAY_URL}/v1/providers \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ${IBM_CLOUD_APIKEY}" \
| jq -r --arg name "my-openai-provider" '.[] | select(.name == $name) | .uuid')
- After a model provider is added, you can enable models that you want to use through said provider by calling
POST /v1/providers/{provider_id}/models
with the provider's UUID, and supplying the model's ID in the request. The model ID must be an existing unique identifier for a model that is known by the LLM provider. Often times models are provided by multiple providers; model aliases are custom user-defined names that can be used to identify the model instead of the ID. For examples, see Enabling models for LLM providers.
Choosing an LLM provider
The model gateway currently supports the following LLM provider types:
OpenAI
Endpoint: /v1/providers/openai
Required Arguments: apiKey
Optional Arguments: baseURL
Example:
export OPENAI_APIKEY="xxxx"
curl -sS ${GATEWAY_URL}/v1/providers/openai \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $IBM_CLOUD_APIKEY" \
-d "$(jq -n \
--arg name "my-openai-provider" \
--arg apikey "$OPENAI_APIKEY" \
'{name: $name, data: {apikey: $apikey}}')"
watsonx.ai
Endpoint: /v1/providers/watsonxai
Required Arguments: apiKey
, one of projectID
or spaceID
Optional Arguments: baseURL
, authURL
, apiVersion
Example:
export WATSONX_AI_APIKEY="xxxx"
export WATSONX_AI_PROJECT_ID="xxxx"
curl -sS ${GATEWAY_URL}/v1/providers/watsonxai \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $IBM_CLOUD_APIKEY" \
-d "$(jq -n \
--arg name "my-watsonxai-provider" \
--arg apikey "$WATSONX_AI_APIKEY" \
--arg projectid "$WATSONX_AI_PROJECT_ID" \
'{name: $name, data: {apikey: $apikey, project_id: $projectid}}')"
Azure OpenAI
Endpoint: /v1/providers/azure-openai
Required Arguments: apiKey
, resourceName
Optional Arguments: subscriptionID
, resourceGroupName
, accountName
, apiVersion
Example:
export AZURE_OPENAI_APIKEY="xxxx"
export AZURE_OPENAI_RESOURCE_NAME="xxxx"
curl -sS ${GATEWAY_URL}/v1/providers/azure-openai \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $IBM_CLOUD_APIKEY" \
-d "$(jq -n \
--arg name "my-azure-openai-provider" \
--arg apikey "$AZURE_OPENAI_APIKEY" \
--arg resourcename "$AZURE_OPENAI_RESOURCE_NAME" \
'{name: $name, data: {apikey: $apikey, resource_name: $resourcename}}')"
Anthropic
Endpoint: /v1/providers/anthropic
Required Arguments: apiKey
Example:
export ANTHROPIC_APIKEY="xxxx"
curl -sS ${GATEWAY_URL}/v1/providers/anthropic \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $IBM_CLOUD_APIKEY" \
-d "$(jq -n \
--arg name "my-anthropic-provider" \
--arg apikey "$ANTHROPIC_APIKEY" \
'{name: $name, data: {apikey: $apikey}}')"
AWS Bedrock
Endpoint: /v1/providers/bedrock
Required Arguments: accessKeyId
, secretAccessKey
, region
\
Example:
export AWS_BEDROCK_KEY_ID="xxxx"
export AWS_BEDROCK_ACCESS_KEY="xxxx"
export AWS_BEDROCK_REGION="us-east-1"
curl -sS ${GATEWAY_URL}/v1/providers/bedrock \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $IBM_CLOUD_APIKEY" \
-d "$(jq -n \
--arg name "my-bedrock-provider" \
--arg keyid "$AWS_BEDROCK_KEY_ID" \
--arg accesskey "$AWS_BEDROCK_ACCESS_KEY" \
--arg region "$AWS_BEDROCK_REGION" \
'{name: $name, data: {access_key_id: $keyid, secret_access_key: $accesskey, region: $region}}')"
Cerebras
Endpoint: /v1/providers/cerebras
Required Arguments: apiKey
Example:
export CEREBRAS_APIKEY="xxxx"
curl -sS ${GATEWAY_URL}/v1/providers/cerebras \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $IBM_CLOUD_APIKEY" \
-d "$(jq -n \
--arg name "my-cerebras-provider" \
--arg apikey "$CEREBRAS_APIKEY" \
'{name: $name, data: {apikey: $apikey}}')"
NVIDIA NIM
Endpoint: /v1/providers/nim
Required Arguments: apiKey
Example:
export NIM_APIKEY="xxxx"
curl -sS ${GATEWAY_URL}/v1/providers/nim \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $IBM_CLOUD_APIKEY" \
-d "$(jq -n \
--arg name "my-nim-provider" \
--arg apikey "$NIM_APIKEY" \
'{name: $name, data: {apikey: $apikey}}')"
Enabling models for LLM providers
The following examples demonstrate how to enable models for providers:
Enable OpenAI's GPT-4o model for the user's Azure OpenAI provider
curl -X POST "${GATEWAY_URL}/v1/providers/${AZURE_OPENAI_PROVIDER_UUID}/models" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ${IBM_CLOUD_APIKEY}" \
-d '{ "alias": "azure/gpt-4o", "id": "scribeflowgpt4o"}'
Enable Anthropic's Claude 3.7 Sonnet model for the user's AWS Bedrock provider:
curl -X POST "${GATEWAY_URL}/v1/providers/${AWS_BEDROCK_PROVIDER_UUID}/models" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ${IBM_CLOUD_APIKEY}" \
-d '{ "alias": "aws/claude-3-sonnet", "id": "anthropic.claude-3-7-sonnet-20250219-v1:0"}'
Listing Providers and Models
You can list both the providers and models that you configured. The following are examples to list all of the configured providers, all the models for a given provider, and all the models across the configured providers.
To list all configured model providers, use the following command:
curl -sS ${GATEWAY_URL}/v1/providers \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ${IBM_CLOUD_APIKEY}"
To list all models enabled for the configured Azure OpenAI provider, use the following command:
curl -sS "${GATEWAY_URL}/v1/providers/${AZURE_OPENAI_DALLE_PROVIDER_UUID}/models" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ${IBM_CLOUD_APIKEY}"
To list all models enabled (across all the configured providers), use the following command:
curl -sS "${GATEWAY_URL}/v1/models" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ${IBM_CLOUD_APIKEY}"
Using the model gateway
The model gateway on IBM Cloud currently supports the following endpoints:
- Chat Completions (supports streaming) -
/v1/chat/completions
- Text Completions/Generations (supports streaming) -
/v1/completions
- Embeddings Generation -
/v1/embeddings
These endpoints expose an OpenAI-compatible but provider-agnostic API for the model gateway, which are used to route LLM requests. The gateway supports all of the preceding endpoints, however, some model providers may not support a specific endpoint's service in their backend. Trying to use a configured model provider with an unsupported endpoint service results in an appropriate error response.
Examples
Chat Completions - /v1/chat/completions
curl ${GATEWAY_URL}/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $IBM_CLOUD_APIKEY" \
-d '{
"model": "azure/gpt-4o",
"messages": [
{
"role": "system",
"content": "Please explain everything in a way a 5th grader could understand—simple language, clear steps, and easy examples."
},
{
"role": "user",
"content": "Can you explain what TLS is and how I can use it?"
}
]
}'
Text Completions/Generation - /v1/completions
curl ${GATEWAY_URL}/v1/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $IBM_CLOUD_APIKEY" \
-d '{
"model": "ibm/llama-3.3-70b",
"prompt": "Say this is a test",
"max_tokens": 7,
"temperature": 0
}'
Embeddings Generation - /v1/embeddings
curl ${GATEWAY_URL}/v1/embeddings \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $IBM_CLOUD_APIKEY" \
-d '{
"input": "The food was delicious and the waiter...",
"model": "text-embedding-3-large",
"encoding_format": "float"
}'
Using the OpenAI SDK
The model gateway maintains compatibility with the OpenAI API and as a result, the OpenAI SDKs can be used to interact with the gateway service by passing the $GATEWAY_URL
rather than the OpenAI URL and IBM Cloud API key instead
of the OpenAI API key.
To use the OpenAI Python SDK to make a chat completions request to the model gateway, see the following example:
import os
from openai import OpenAI
# Note that since we exported GATEWAY_URL=https://us-south.ml.cloud.ibm.com/ml/gateway/, we must specify the "/v1".
# This is because the client will invoke OpenAI child paths like "/chat/completions" not "/v1/chat/completions".
gateway_url = os.getenv("GATEWAY_URL") + "v1"
ibm_cloud_api_key = os.getenv("IBM_CLOUD_APIKEY")
print("Using GATEWAY_URL:", gateway_url)
print("Using IBM_CLOUD_APIKEY:", ibm_cloud_api_key)
# Customize client to connect to the model gateway using the IBM Cloud API key.
client = OpenAI(
base_url=gateway_url,
api_key=ibm_cloud_api_key,
)
# Create a Chat Completions request to the model gateway.
completion = client.chat.completions.create(
model="openai/gpt-4o",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"}
]
)
print(completion.choices[0].message)
Parent topic: Supported foundation models