IBM watsonx.data Presto connection

To access your data in IBM watsonx.data, create a connection asset for it. The connection asset includes information for connecting to a watsonx.data instance and to the Presto query engine that is running on that instance.

IBM watsonx.data is an open, hybrid, and governed data lakehouse that is optimized by a query engine for all data and AI workloads.

Before you begin

Differences between the watsonx.data Presto and the Presto connections

IBM watsonx.data incorporates the Presto SQL Query Engine. Both the watsonx.data Presto and Presto connections can create connection assets to interact with the Presto SQL Query Engine in IBM watsonx.data.

watsonx.data Presto connection

The watsonx.data Presto connection supports reading from IBM watsonx.data using the Presto SQL Query Engine and supports writing tables in the Iceberg table format to Amazon S3, Apache Ozone, IBM Ceph and IBM Cloud Object Storage buckets in IBM watsonx.data. The connection is also required if you want the integration with IBM Knowledge Catalog to take advantage of the service with watsonx.data.

IBM recommends using the watsonx.data Presto connection when connecting from Cloud Pak for Data to IBM watsonx.data.

For more information about the watsonx.data Presto connection, see the rest of the topic.

Presto

The Presto connection can create a read-only connection to any Presto engines including the implementation in IBM watsonx.data.

For more information about the Presto connection, see Presto connection.

Prerequisite

Set up an instance of watsonx.data.

You can connect to software or as a service instances:

watsonx.data software on Cloud Pak for Data: See Installing watsonx.data on Cloud Pak for Data.
watsonx.data as a Service on IBM Cloud: See Getting started with watsonx.data on IBM Cloud
watsonx.data stand-alone software: See Installing stand-alone watsonx.data

Create a connection to watsonx.data

Your connection details vary between the deployment type chosen. To create the connection asset, in the Connection details section of the Connect to a data source page, select the deployment type:

IBM watsonx.data Developer edition
IBM watsonx.data on IBM Cloud
IBM watsonx.data on Red Hat OpenShift

You can also leave the deployment type in the default value where you will see the legacy connection details.

The details you need to provide will change based on the deplyment type you pick:

IBM watsonx.data Developer edition

You can import a JSON file to fill in these fields using the Import connection values. To get the JSON file needed for this connection, you will need to go to your watsonx.data instance's console page and navigate to the Connect information field and you can copy the JSON file.

Hostname or IP address: Find this information in the console under Configurations > Connection information > Instance details.
Port: The default port number is 443. You can find this information in the console under Configurations > Connection information > Instance details.
Instance ID: Find this value in the watsonx.data console. Click Instance details from the navigation menu. You can also find this information in the console under Configurations > Connection information > Instance details.

IBM watsonx.data on IBM Cloud

Hostname or IP address: Find this information in the console under Configurations > Connection information > Instance details.
Port: The default port number is 443. You can find this information in the console under Configurations > Connection information > Instance details.
CRN: Cloud resource name: Find this value in the watsonx.data console. Click Instance details from the navigation menu. You can also find this information in the console under Configurations > Connection information > Instance details.

IBM watsonx.data on Red Hat OpenShift

Hostname or IP address: Find this information in the console under Configurations > Connection information > Instance details.
Port: The default port number is 443. You can find this information in the console under Configurations > Connection information > Instance details.
Instance ID: Find this value in the watsonx.data console. Click Instance details from the navigation menu. You can also find this information in the console under Configurations > Connection information > Instance details.

Legacy connection details

watsonx.data software

To create the connection asset, in the Connection details section of the Connect to a data source page, select Connect to watsonx.data on Cloud Pak for Data and provide these details:

Hostname or IP address: Find this information in the console under Configurations > Connection information > Instance details.
Port: The default port number is 443. You can find this information in the console under Configurations > Connection information > Instance details.
Instance ID: Find this value in the watsonx.data console. Click Instance details from the navigation menu. You can also find this information in the console under Configurations > Connection information > Instance details.
Instance name: Find the instance name in the Cloud Pak for Data web client home page. Click Services > Instances from the navigation menu.

watsonx.data as a Service

Hostname or IP address: Find this information in the console under Configurations > Connection information > Instance details.
Port: The default port number is 443. You can find this information in the console under Configurations > Connection information > Instance details.
Instance ID: Find this value in the watsonx.data console. Click Instance details from the navigation menu. You can also find this information in the console under Configurations > Connection information > Instance details.
Instance name: Find this value on the watsonx.ai Service instances page. Click Administration > Services > Service instances. For example, watsonx.data-aaa. Do not use the suggested instance name that is shown in the field.
CRN: Cloud resource name: Find this value in the watsonx.data console. Click Instance details from the navigation menu. You can also find this information in the console under Configurations > Connection information > Instance details.

Credentials

Your credentials vary between the deployment type chosen:

IBM watsonx.data Developer edition
IBM watsonx.data on IBM Cloud
IBM watsonx.data on Red Hat OpenShift

You can also leave the deployment type in the default value where you will see the legacy connection details.

IBM watsonx.data Developer edition

Username and password: The username and password that is used to log in to the watsonx.datastandalone console.

IBM watsonx.data on IBM Cloud

API key: The API key of the account that has access to the watsonx.data instance on IBM Cloud.

The API key can be generated in the IBM Cloud console.

IBM watsonx.data on Red Hat OpenShift

You must select an authentication method:

Username and password: The username and password that is used to access Cloud Pak for Data where the watsonx.data instance is located.
Username and API key: The username and API key that is used to access Cloud Pak for Data where the watsonx.data instance is located.

This authentication method is recommended if Cloud Pak for Data uses an Identity Management Service (IAM), for example, LDAP or SSO. The API key is located in the Profile and settings of the target Cloud Pak for Data cluster. For information on API keys, see Generating API keys for authentication.

Legacy connection details

watsonx.data software

The username and password or usernames and API key for the watsonx.data instance. The same credentials are also used for the engine.

You must select the authentication method:

Username and password: The username and password that is used to access Cloud Pak for Data where the watsonx.data instance is located, or the username and password for watsonx.data standalone.
Username and API key: The username and API key that is used to access Cloud Pak for Data where the watsonx.data instance is located, or the username and password for watsonx.data standalone. This authentication method is recommended if Cloud Pak for Data uses an Identity Management Service (IAM), for example, LDAP or SSO. The API key is located in the Profile and settings of the target Cloud Pak for Data cluster. For information on API keys, see Generating API keys for authentication.

watsonx.data as a Service

The username and password for the watsonx.data instance. The same credentials are also used for the engine.

Username: The default username is ibmlhapikey_<cloud-account-email-address>. For example, [email protected].
Password: The password is the user's API key. To create an API key, see IBM Cloud docs: Creating an API key in the console.

Certificates

By default, SSL is enabled is selected. This setting is recommended for increased security. If you do not use SSL, the data might be subject to vulnerabilities such as data leakage. Although the database that is hosted in watsonx.data can also have an SSL certificate, the connection goes through the engine.

The SSL certificate must be in PEM format.

The SSL certificates information vary between the deployment type chosen:

IBM watsonx.data Developer edition
IBM watsonx.data on IBM Cloud
IBM watsonx.data on Red Hat OpenShift

IBM watsonx.data Developer edition

The SSL certificate is optional.

If SSL is enabled on a watsonx.data instance on Cloud Pak for Data and the certificate is a self-signed certificate, you must enter the certificate in the SSL certificate field.

Ask your watsonx.data administrator if SSL is set up. You can find the SSL certificate in the watsonx.data console under Configurations > Connection information > Instance details.

IBM watsonx.data on IBM Cloud

The SSL certificate is optional.

IBM watsonx.data on Red Hat OpenShift

The SSL certificate is optional.

If SSL is enabled on a watsonx.data instance on Cloud Pak for Data and the certificate is a self-signed certificate, you must enter the certificate in the SSL certificate field.

Ask your watsonx.data administrator if SSL is set up. You can find the SSL certificate in the watsonx.data console under Configurations > Connection information > Instance details.

Engine connection details

Enter the engine connection details

Supported engine versions

For watsonx.data on Cloud Pak for Data version 5.0.3 and later:

Presto (Java)
Presto (C++)

For watsonx.data on Cloud Pak for Data version 5.0.2 and before:

Presto (Java)

For watsonx.data as a Service:

Presto (Java)
Presto (C++)

Provide these engine connection details. Find this information in the watsonx.data web console under Configurations > Connection information > Engine and service connection details.

Engine's hostname or IP address: The hostname or IP address is the value of the Internal host field.
Engine ID: This value is in the Engine ID field.
Engine's port: The port number is the value in the Internal host field after the colon (:). The default port number is 8443.

Choose the method for creating a connection based on where you are in the platform

In a project: Click Assets > New asset > Connect to a data source. See Adding a connection to a project.
In a catalog: Click Add to catalog > Connection. See Adding a connection asset to a catalog.
In the Platform assets catalog: Click New connection. See Adding platform connections.

Next step: Add data assets from the connection

Where you can use this connection

You can use the watsonx.data Presto connection in the following workspaces and tools:

Projects

Data Refinery (watsonx.ai Studio or IBM Knowledge Catalog)
DataStage (DataStage service). See Connecting to a data source in DataStage.
Decision Optimization (watsonx.ai Studio and watsonx.ai Runtime)
Metadata import (IBM Knowledge Catalog)

Catalogs

Platform assets catalog
Other catalogs (IBM Knowledge Catalog)

Writing data into watsonx.data

You can ingest data into watsonx.data with DataStage. You must enter a catalog_name, schema_name, and table_name properties. The table_name property is required. You can pass the fully qualified name, catalog_name.schema_name.table_name, into the table_name property.

The watsonx.data Presto connector creates Iceberg tables directly on storage defined in IBM watsonx.data. Currently the connector supports writing to the following storage:

Amazon S3
Apache Ozone
IBM Ceph
IBM Cloud Object Storage

watsonx.data web console

Learn more

Related connections

Parent topic: Supported connections