0 / 0

Using the model gateway on IBM Cloud (beta)

Last updated: Jun 19, 2025
Using the model gateway on IBM Cloud (beta)

The model gateway provides a single OpenAI-compatible LLM-as-a-service API, which proxies the routing of requests to LLM providers. The gateway currently offers features such as load balancing and usage statistics tracking.

Note: The model gateway feature is in beta and available only in the Dallas region.

Before you begin

Environment variables and keys

To use the model gateway on IBM Cloud, you must provide API keys for the supported LLM providers that you plan to use. It is recommended to configure environment variables for all your keys.

Besides the LLM provider API keys, it is recommended to use environment variables for tracking other API keys (such as for IBM Cloud IAM) and for other configuration values such as URLs. For the examples in this topic, export the host URL of the IBM Cloud model gateway to the environment variable GATEWAY_URL and the IAM API key to IBM_CLOUD_APIKEY.

export GATEWAY_URL="{region}.ml.cloud.ibm.com/ml/gateway"
export IBM_CLOUD_APIKEY="xxxx" # See below for more info on how to create an IBM Cloud IAM API key.

Setting up authentication

To authenticate requests, use IBM Cloud Identity and Access Management (IAM).

To work with the API, authenticate your application or service by including your IBM Cloud IAM access token in API requests.

  1. Create a new IAM API Key

ibmcloud iam api-key-create MyKey -d "this is my API key" --file key_file --action-if-leaked "DELETE"

For more information, see Managing user API keys

  1. Create an IBM Cloud Secrets Manager

    To use the model gateway, provision and link an IBM Secrets Manager instance.

Note: Select Public and private in the Endpoints option dropdown when configuring the new resource.
  • To configure by using the CLI, use the following command.
    Replace <region> with us-south. Replace <plan> with one of the following pricing plan IDs:
    • Trial: 869c191a-3c2a-4faf-98be-18d48f95ba1f
    • Standard plan: 7713c3a8-3be8-4a9a-81bb-ee822fcaac3d
 ibmcloud login
 ibmcloud target -r <region> -g <resource_group_name>
 ibmcloud resource service-instance-create <instance_name> secrets-manager <plan> -p '{"allowed_network": "public-and-private"}'
 ibmcloud resource service-instances # Optional, verifies that the service instance was created successfully.

For more information, see Creating a Secrets Manager instance from the CLI.

  1. Authorize Secrets Manager

    Before authorizing, make sure that you have the SecretsReader service role or higher on your Secrets Manager instance. For more information, see Authorizing an IBM Cloud service to access Secrets Manager.

  ibmcloud login -a https://cloud.ibm.com --apikey ${IBM_CLOUD_APIKEY}
  ibmcloud iam authorization-policy-create SecretsReader \
      --source-service-name pm-20 \
      --target-service-name secrets-manager

Note: pm-20 is the service name for watsonx.ai runtime.

For more information, see Using authorizations to grant access between services.

You can now use the IBM_CLOUD_APIKEY as a valid bearer token for the model gateway on IBM Cloud.

curl ${GATEWAY_URL}/v1/...  \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${IBM_CLOUD_APIKEY}" \
  ...

Setting up the model gateway

  1. To begin using the model gateway, first create tenancy by using the authorized IBM Secrets Manager. With other configuration values, it is recommended to use an environment variable to track the Secrets Manager host address. In this topic, the examples use the environment variable SECRETS_MANAGER.
  export SECRETS_MANAGER="https://xxxx.xxxx.secrets-manager.appdomain.cloud"

To configure the model gateway, send a request to POST /v1/tenant including a new name and your authorized Secrets Manager address.

Example

 curl ${GATEWAY_URL}/v1/tenant  \
   -H "Content-Type: application/json" \
   -H "Authorization: Bearer ${IBM_CLOUD_APIKEY}" \
   -d @- <<EOF
  {
    "name": "test",
    "secrets_manager": "${SECRETS_MANAGER}"
  }
  EOF
  1. You can configure LLM providers for your account. To configure each provider, send a request to
    POST /v1/providers/<PROVIDER TYPE>. The body of the request will include:

    • name - A custom user-defined identifier for this LLM provider credential. This identifier can be anything but has to be unique. The examples in this topic use "my-xxxx-provider" for each provider type.
    • data - Contains any configuration arguments that are supplied for setting up the provider connection. Typically, some form of apiKey is required. Some providers may require or optionally supply additional configuration parameters.

    The response to this request includes a UUID for the provider that should be noted for use in later steps to enable models on the provider.

  {
    "uuid": "de972dcf-7042-4cag-e7a3-d90a16229e5b",
    "name": "my-openai-provider",
    "type": "openai"
  }

For more details and examples for each of the supported providers, see Choosing an LLM provider.

  1. To retrieve information on configured model providers, call GET /v1/providers by supplying the credential name.

    The following example demonstrates retrieving the UUID of your my-openai-provider OpenAI provider by exporting the UUID from the response into an environment variable. Provider UUIDs will be used in the next step to enable models.

  export OPENAI_PROVIDER_UUID=$(curl -sS ${GATEWAY_URL}/v1/providers  \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${IBM_CLOUD_APIKEY}" \
  | jq -r --arg name "my-openai-provider" '.[] | select(.name == $name) | .uuid')
  1. After a model provider is added, you can enable models that you want to use through said provider by calling POST /v1/providers/{provider_id}/models with the provider's UUID, and supplying the model's ID in the request. The model ID must be an existing unique identifier for a model that is known by the LLM provider. Often times models are provided by multiple providers; model aliases are custom user-defined names that can be used to identify the model instead of the ID. For examples, see Enabling models for LLM providers.

Choosing an LLM provider

The model gateway currently supports the following LLM provider types:

OpenAI

Endpoint: /v1/providers/openai
Required Arguments: apiKey
Optional Arguments: baseURL

Example:

export OPENAI_APIKEY="xxxx"

curl -sS ${GATEWAY_URL}/v1/providers/openai \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $IBM_CLOUD_APIKEY" \
  -d "$(jq -n \
        --arg name "my-openai-provider" \
        --arg apikey "$OPENAI_APIKEY" \
        '{name: $name, data: {apikey: $apikey}}')"

watsonx.ai

Endpoint: /v1/providers/watsonxai
Required Arguments: apiKey, one of projectID or spaceID
Optional Arguments: baseURL, authURL, apiVersion

Example:

export WATSONX_AI_APIKEY="xxxx"
export WATSONX_AI_PROJECT_ID="xxxx"

curl -sS ${GATEWAY_URL}/v1/providers/watsonxai \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $IBM_CLOUD_APIKEY" \
  -d "$(jq -n \
        --arg name "my-watsonxai-provider" \
        --arg apikey "$WATSONX_AI_APIKEY" \
        --arg projectid "$WATSONX_AI_PROJECT_ID" \
        '{name: $name, data: {apikey: $apikey, project_id: $projectid}}')"

Azure OpenAI

Endpoint: /v1/providers/azure-openai
Required Arguments: apiKey, resourceName
Optional Arguments: subscriptionID, resourceGroupName, accountName, apiVersion

Example:

export AZURE_OPENAI_APIKEY="xxxx"
export AZURE_OPENAI_RESOURCE_NAME="xxxx"

curl -sS ${GATEWAY_URL}/v1/providers/azure-openai \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $IBM_CLOUD_APIKEY" \
  -d "$(jq -n \
        --arg name "my-azure-openai-provider" \
        --arg apikey "$AZURE_OPENAI_APIKEY" \
        --arg resourcename "$AZURE_OPENAI_RESOURCE_NAME" \
        '{name: $name, data: {apikey: $apikey, resource_name: $resourcename}}')"

Anthropic

Endpoint: /v1/providers/anthropic
Required Arguments: apiKey

Example:

export ANTHROPIC_APIKEY="xxxx"

curl -sS ${GATEWAY_URL}/v1/providers/anthropic \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $IBM_CLOUD_APIKEY" \
  -d "$(jq -n \
        --arg name "my-anthropic-provider" \
        --arg apikey "$ANTHROPIC_APIKEY" \
        '{name: $name, data: {apikey: $apikey}}')"

AWS Bedrock

Endpoint: /v1/providers/bedrock
Required Arguments: accessKeyId, secretAccessKey, region\

Example:

export AWS_BEDROCK_KEY_ID="xxxx"
export AWS_BEDROCK_ACCESS_KEY="xxxx"
export AWS_BEDROCK_REGION="us-east-1"

curl -sS ${GATEWAY_URL}/v1/providers/bedrock \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $IBM_CLOUD_APIKEY" \
  -d "$(jq -n \
        --arg name "my-bedrock-provider" \
        --arg keyid "$AWS_BEDROCK_KEY_ID" \
        --arg accesskey "$AWS_BEDROCK_ACCESS_KEY" \
        --arg region "$AWS_BEDROCK_REGION" \
        '{name: $name, data: {access_key_id: $keyid, secret_access_key: $accesskey, region: $region}}')"

Cerebras

Endpoint: /v1/providers/cerebras
Required Arguments: apiKey

Example:

export CEREBRAS_APIKEY="xxxx"

curl -sS ${GATEWAY_URL}/v1/providers/cerebras \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $IBM_CLOUD_APIKEY" \
  -d "$(jq -n \
        --arg name "my-cerebras-provider" \
        --arg apikey "$CEREBRAS_APIKEY" \
        '{name: $name, data: {apikey: $apikey}}')"

NVIDIA NIM

Endpoint: /v1/providers/nim
Required Arguments: apiKey

Example:

export NIM_APIKEY="xxxx"

curl -sS ${GATEWAY_URL}/v1/providers/nim \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $IBM_CLOUD_APIKEY" \
  -d "$(jq -n \
        --arg name "my-nim-provider" \
        --arg apikey "$NIM_APIKEY" \
        '{name: $name, data: {apikey: $apikey}}')"

Enabling models for LLM providers

The following examples demonstrate how to enable models for providers:

Enable OpenAI's GPT-4o model for the user's Azure OpenAI provider

  curl -X POST "${GATEWAY_URL}/v1/providers/${AZURE_OPENAI_PROVIDER_UUID}/models" \
	  -H "Content-Type: application/json" \
	  -H "Authorization: Bearer ${IBM_CLOUD_APIKEY}" \
	  -d '{ "alias": "azure/gpt-4o", "id": "scribeflowgpt4o"}'

Enable Anthropic's Claude 3.7 Sonnet model for the user's AWS Bedrock provider:

  curl -X POST "${GATEWAY_URL}/v1/providers/${AWS_BEDROCK_PROVIDER_UUID}/models" \
	  -H "Content-Type: application/json" \
	  -H "Authorization: Bearer ${IBM_CLOUD_APIKEY}" \
	  -d '{ "alias": "aws/claude-3-sonnet", "id": "anthropic.claude-3-7-sonnet-20250219-v1:0"}'

Listing Providers and Models

You can list both the providers and models that you configured. The following are examples to list all of the configured providers, all the models for a given provider, and all the models across the configured providers.

To list all configured model providers, use the following command:

curl -sS ${GATEWAY_URL}/v1/providers \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${IBM_CLOUD_APIKEY}"

To list all models enabled for the configured Azure OpenAI provider, use the following command:

curl -sS "${GATEWAY_URL}/v1/providers/${AZURE_OPENAI_DALLE_PROVIDER_UUID}/models" \
	-H "Content-Type: application/json" \
	-H "Authorization: Bearer ${IBM_CLOUD_APIKEY}"

To list all models enabled (across all the configured providers), use the following command:

curl -sS "${GATEWAY_URL}/v1/models" \
	-H "Content-Type: application/json" \
	-H "Authorization: Bearer ${IBM_CLOUD_APIKEY}"

Using the model gateway

The model gateway on IBM Cloud currently supports the following endpoints:

  • Chat Completions (supports streaming) - /v1/chat/completions
  • Text Completions/Generations (supports streaming) - /v1/completions
  • Embeddings Generation - /v1/embeddings

These endpoints expose an OpenAI-compatible but provider-agnostic API for the model gateway, which are used to route LLM requests. The gateway supports all of the preceding endpoints, however, some model providers may not support a specific endpoint's service in their backend. Trying to use a configured model provider with an unsupported endpoint service results in an appropriate error response.

Examples

Chat Completions - /v1/chat/completions

curl ${GATEWAY_URL}/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $IBM_CLOUD_APIKEY" \
  -d '{
    "model": "azure/gpt-4o",
    "messages": [
      {
        "role": "system",
        "content": "Please explain everything in a way a 5th grader could understand—simple language, clear steps, and easy examples."
      },
      {
        "role": "user",
        "content": "Can you explain what TLS is and how I can use it?"
      }
    ]
  }'

Text Completions/Generation - /v1/completions

curl ${GATEWAY_URL}/v1/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $IBM_CLOUD_APIKEY" \
  -d '{
    "model": "ibm/llama-3.3-70b",
    "prompt": "Say this is a test",
    "max_tokens": 7,
    "temperature": 0
  }'

Embeddings Generation - /v1/embeddings

curl ${GATEWAY_URL}/v1/embeddings \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $IBM_CLOUD_APIKEY" \
  -d '{
    "input": "The food was delicious and the waiter...",
    "model": "text-embedding-3-large",
    "encoding_format": "float"
  }'

Using the OpenAI SDK

The model gateway maintains compatibility with the OpenAI API and as a result, the OpenAI SDKs can be used to interact with the gateway service by passing the $GATEWAY_URL rather than the OpenAI URL and IBM Cloud API key instead of the OpenAI API key.

To use the OpenAI Python SDK to make a chat completions request to the model gateway, see the following example:

import os
from openai import OpenAI

# Note that since we exported GATEWAY_URL=https://us-south.ml.cloud.ibm.com/ml/gateway/, we must specify the "/v1".
# This is because the client will invoke OpenAI child paths like "/chat/completions" not "/v1/chat/completions".
gateway_url = os.getenv("GATEWAY_URL") + "v1"
ibm_cloud_api_key = os.getenv("IBM_CLOUD_APIKEY")

print("Using GATEWAY_URL:", gateway_url)
print("Using IBM_CLOUD_APIKEY:", ibm_cloud_api_key)

# Customize client to connect to the model gateway using the IBM Cloud API key.
client = OpenAI(
    base_url=gateway_url,
    api_key=ibm_cloud_api_key,
)

# Create a Chat Completions request to the model gateway.
completion = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"}
    ]
)

print(completion.choices[0].message)

Parent topic: Supported foundation models