IBM watsonx.data Presto connection
To access your data in IBM watsonx.data, create a connection asset for it. The connection asset includes information for connecting to a watsonx.data instance and to the Presto query engine that is running on that instance.
IBM watsonx.data is an open, hybrid, and governed data lakehouse that is optimized by a query engine for all data and AI workloads.
Before you begin
Differences between the watsonx.data Presto and the Presto connections
IBM watsonx.data incorporates the Presto SQL Query Engine. Both the watsonx.data Presto and Presto connections can create connection assets to interact with the Presto SQL Query Engine in IBM watsonx.data.
watsonx.data Presto connection
The watsonx.data Presto connection supports reading from IBM watsonx.data using the Presto SQL Query Engine and supports writing tables in the Iceberg table format to Amazon S3, Apache Ozone, IBM Ceph and IBM Cloud Object Storage buckets
in IBM watsonx.data. The connection is also required if you want the integration with IBM Knowledge Catalog to take advantage of the service with watsonx.data.
IBM recommends using the watsonx.data Presto connection when connecting from Cloud Pak for Data to IBM watsonx.data.
For more information about the watsonx.data Presto connection, see the rest of the topic.
Presto
The Presto connection can create a read-only connection to any Presto engines including the implementation in IBM watsonx.data.
For more information about the Presto connection, see Presto connection.
Prerequisite
Set up an instance of watsonx.data.
You can connect to software or as a service instances:
-
watsonx.data software on Cloud Pak for Data: See Installing watsonx.data on Cloud Pak for Data.
-
watsonx.data as a Service on IBM Cloud: See Getting started with watsonx.data on IBM Cloud
-
watsonx.data stand-alone software: See Installing stand-alone watsonx.data
Create a connection to watsonx.data
Your connection details vary between the deployment type chosen. To create the connection asset, in the Connection details section of the Connect to a data source page, select the deployment type:
- IBM watsonx.data Developer edition
- IBM watsonx.data on IBM Cloud
- IBM watsonx.data on Red Hat OpenShift
You can also leave the deployment type in the default value where you will see the legacy connection details.
The details you need to provide will change based on the deplyment type you pick:
IBM watsonx.data Developer edition
You can import a JSON file to fill in these fields using the Import connection values. To get the JSON file needed for this connection, you will need to go to your watsonx.data instance's console page and navigate to the Connect information field and you can copy the JSON file.
- Hostname or IP address: Find this information in the console under Configurations > Connection information > Instance details.
- Port: The default port number is
443
. You can find this information in the console under Configurations > Connection information > Instance details. - Instance ID: Find this value in the watsonx.data console. Click Instance details from the navigation menu. You can also find this information in the console under Configurations > Connection information > Instance details.
IBM watsonx.data on IBM Cloud
You can import a JSON file to fill in these fields using the Import connection values. To get the JSON file needed for this connection, you will need to go to your watsonx.data instance's console page and navigate to the Connect information field and you can copy the JSON file.
- Hostname or IP address: Find this information in the console under Configurations > Connection information > Instance details.
- Port: The default port number is
443
. You can find this information in the console under Configurations > Connection information > Instance details. - CRN: Cloud resource name: Find this value in the watsonx.data console. Click Instance details from the navigation menu. You can also find this information in the console under Configurations > Connection information > Instance details.
IBM watsonx.data on Red Hat OpenShift
You can import a JSON file to fill in these fields using the Import connection values. To get the JSON file needed for this connection, you will need to go to your watsonx.data instance's console page and navigate to the Connect information field and you can copy the JSON file.
- Hostname or IP address: Find this information in the console under Configurations > Connection information > Instance details.
- Port: The default port number is
443
. You can find this information in the console under Configurations > Connection information > Instance details. - Instance ID: Find this value in the watsonx.data console. Click Instance details from the navigation menu. You can also find this information in the console under Configurations > Connection information > Instance details.
Legacy connection details
watsonx.data software
To create the connection asset, in the Connection details section of the Connect to a data source page, select Connect to watsonx.data on Cloud Pak for Data and provide these details:
- Hostname or IP address: Find this information in the console under Configurations > Connection information > Instance details.
- Port: The default port number is
443
. You can find this information in the console under Configurations > Connection information > Instance details. - Instance ID: Find this value in the watsonx.data console. Click Instance details from the navigation menu. You can also find this information in the console under Configurations > Connection information > Instance details.
- Instance name: Find the instance name in the Cloud Pak for Data web client home page. Click Services > Instances from the navigation menu.
watsonx.data as a Service
-
Hostname or IP address: Find this information in the console under Configurations > Connection information > Instance details.
-
Port: The default port number is
443
. You can find this information in the console under Configurations > Connection information > Instance details. -
Instance ID: Find this value in the watsonx.data console. Click Instance details from the navigation menu. You can also find this information in the console under Configurations > Connection information > Instance details.
-
Instance name: Find this value on the watsonx.ai Service instances page. Click Administration > Services > Service instances. For example,
watsonx.data-aaa
. Do not use the suggested instance name that is shown in the field. -
CRN: Cloud resource name: Find this value in the watsonx.data console. Click Instance details from the navigation menu. You can also find this information in the console under Configurations > Connection information > Instance details.
Credentials
Your credentials vary between the deployment type chosen:
- IBM watsonx.data Developer edition
- IBM watsonx.data on IBM Cloud
- IBM watsonx.data on Red Hat OpenShift
You can also leave the deployment type in the default value where you will see the legacy connection details.
IBM watsonx.data Developer edition
- Username and password: The username and password that is used to log in to the watsonx.datastandalone console.
IBM watsonx.data on IBM Cloud
- API key: The API key of the account that has access to the watsonx.data instance on IBM Cloud.
The API key can be generated in the IBM Cloud console.
IBM watsonx.data on Red Hat OpenShift
You must select an authentication method:
- Username and password: The username and password that is used to access Cloud Pak for Data where the watsonx.data instance is located.
- Username and API key: The username and API key that is used to access Cloud Pak for Data where the watsonx.data instance is located.
This authentication method is recommended if Cloud Pak for Data uses an Identity Management Service (IAM), for example, LDAP or SSO. The API key is located in the Profile and settings of the target Cloud Pak for Data cluster. For information on API keys, see Generating API keys for authentication.
Legacy connection details
watsonx.data software
The username and password or usernames and API key for the watsonx.data instance. The same credentials are also used for the engine.
You must select the authentication method:
- Username and password: The username and password that is used to access Cloud Pak for Data where the watsonx.data instance is located, or the username and password for watsonx.data standalone.
- Username and API key: The username and API key that is used to access Cloud Pak for Data where the watsonx.data instance is located, or the username and password for watsonx.data standalone. This authentication method is recommended if Cloud Pak for Data uses an Identity Management Service (IAM), for example, LDAP or SSO. The API key is located in the Profile and settings of the target Cloud Pak for Data cluster. For information on API keys, see Generating API keys for authentication.
watsonx.data as a Service
The username and password for the watsonx.data instance. The same credentials are also used for the engine.
- Username: The default username is
ibmlhapikey_<cloud-account-email-address>
. For example,[email protected]
. - Password: The password is the user's API key. To create an API key, see IBM Cloud docs: Creating an API key in the console.
Certificates
By default, SSL is enabled is selected. This setting is recommended for increased security. If you do not use SSL, the data might be subject to vulnerabilities such as data leakage. Although the database that is hosted in watsonx.data can also have an SSL certificate, the connection goes through the engine.
The SSL certificate must be in PEM format.
The SSL certificates information vary between the deployment type chosen:
- IBM watsonx.data Developer edition
- IBM watsonx.data on IBM Cloud
- IBM watsonx.data on Red Hat OpenShift
IBM watsonx.data Developer edition
The SSL certificate is optional.
If SSL is enabled on a watsonx.data instance on Cloud Pak for Data and the certificate is a self-signed certificate, you must enter the certificate in the SSL certificate field.
Ask your watsonx.data administrator if SSL is set up. You can find the SSL certificate in the watsonx.data console under Configurations > Connection information > Instance details.
IBM watsonx.data on IBM Cloud
The SSL certificate is optional.
IBM watsonx.data on Red Hat OpenShift
The SSL certificate is optional.
If SSL is enabled on a watsonx.data instance on Cloud Pak for Data and the certificate is a self-signed certificate, you must enter the certificate in the SSL certificate field.
Ask your watsonx.data administrator if SSL is set up. You can find the SSL certificate in the watsonx.data console under Configurations > Connection information > Instance details.
Engine connection details
Enter the engine connection details
Supported engine versions
For watsonx.data on Cloud Pak for Data version 5.0.3 and later:
- Presto (Java)
- Presto (C++)
For watsonx.data on Cloud Pak for Data version 5.0.2 and before:
- Presto (Java)
For watsonx.data as a Service:
- Presto (Java)
- Presto (C++)
Provide these engine connection details. Find this information in the watsonx.data web console under Configurations > Connection information > Engine and service connection details.
-
Engine's hostname or IP address: The hostname or IP address is the value of the Internal host field.
-
Engine ID: This value is in the Engine ID field.
-
Engine's port: The port number is the value in the Internal host field after the colon (
:
). The default port number is8443
.
Choose the method for creating a connection based on where you are in the platform
- In a project
- Click Assets > New asset > Connect to a data source. See Adding a connection to a project.
- In a catalog
- Click Add to catalog > Connection. See Adding a connection asset to a catalog.
- In the Platform assets catalog
- Click New connection. See Adding platform connections.
Next step: Add data assets from the connection
Where you can use this connection
You can use the watsonx.data Presto connection in the following workspaces and tools:
Projects
- Data Refinery (watsonx.ai Studio or IBM Knowledge Catalog)
- DataStage (DataStage service). See Connecting to a data source in DataStage.
- Decision Optimization (watsonx.ai Studio and watsonx.ai Runtime)
- Metadata import (IBM Knowledge Catalog)
Catalogs
-
Platform assets catalog
-
Other catalogs (IBM Knowledge Catalog)
Writing data into watsonx.data
You can ingest data into watsonx.data with DataStage. You must enter a catalog_name
, schema_name
, and table_name
properties. The table_name
property is required. You can pass the fully qualified
name, catalog_name.schema_name.table_name
, into the table_name
property.
The watsonx.data Presto connector creates Iceberg tables directly on storage defined in IBM watsonx.data. Currently the connector supports writing to the following storage:
- Amazon S3
- Apache Ozone
- IBM Ceph
- IBM Cloud Object Storage
watsonx.data web console
Learn more
Related connections
Parent topic: Supported connections