What's new
Check back each week to learn about new features and updates for Cloud Pak for Data as a Service and services such as watsonx.ai Studio (formerly Watson Studio), watsonx.ai Runtime (formerly Watson Machine Learning), DataStage, and IBM Knowledge Catalog.
Week ending 22 November 2024
Name change for Watson Studio and Watson Machine Learning services
21 November 2024
The following services were renamed:
- Watson Machine Learning is now named watsonx.ai Runtime.
- Watson Studio is now named watsonx.ai Studio.
Some videos, notebooks, and code samples might continue to refer to these services by their former names.
Cloud Pak for Data as a Service is available in the Sydney region
21 November 2024
Cloud Pak for Data as a Service is now generally available in the Sydney data center with watsonx.ai Runtime and watsonx.ai Studio services. When you sign-up, you can select Sydney as the preferred region.
Not all of the services are available in the Sydney region yet. For more information about product features that are available in the Sydney region, see Regional availability for services and features.
Enhanced monitoring of metadata enrichment jobs (IBM Knowledge Catalog)
21 November 2024
On the new run metrics dashboard, you can monitor the progress of the individual enrichment tasks for an active metadata enrichment job run. In addition, you can explore run information for completed job runs to identify if and where issues occurred. For more information, see Monitoring runs of enrichment jobs.
Week ending 15 November 2024
Task credentials are now required to deploy assets and run jobs from a deployment space
11 November 2024
To improve the security for running deployment jobs, you must enter your task credentials to deploy the following assets from a deployment space:
- Prompt templates
- AI services
- Models
- Python functions
- Scripts
Additionally, you must enter your task credentials to create the following deployments from your deployment space:
- Online
- Batch
You must also use your task credentials to create and manage deployment jobs from your deployment space.
To learn how to set up your task credentials and generate an API key, see Adding task credentials.
Editor mode for custom properties (IBM Knowledge Catalog)
14 November 2024
When viewing governance artifacts, you can now switch on the editor mode for custom properties. When the Edit values toggle is switched off in the Details section, you can only see those custom properties for which values were definded for the artifact. Switch the editor mode on, and you can see all available custom properties and edit their values. For more information, see Custom properties, relationships, and asset types.
Week ending 8 November 2024
Connect to new data sources with SPSS Modeler
7 November 2024
You can now connect SPSS Modeler to Databricks and Microsoft Azure Synapse Analytics, and SPSS Modeler has read and write access to both data sources. For more information, see Microsoft Azure Databricks connection and Microsoft Azure Synapse Analytics connection.
Week ending 1 November 2024
Deprecation of IBM Runtime 23.1
28 October 2024
IBM Runtime 23.1 is deprecated. Beginning November 21, 2024, you cannot create new notebooks or custom environments by using 23.1 runtimes. Also, you cannot create new deployments with software specifications that are based on the 23.1 runtime. To ensure a seamless experience and to leverage the latest features and improvements, switch to IBM Runtime 24.1.
- For information about changing environments, see Changing notebook environments.
- For details on deployment frameworks, see Managing frameworks and software specifications.
Week ending 25 October 2024
Compare tables in Decision Optimization experiments to see differences between scenarios
23 October 2024
You can now compare tables in a Decision Optimization experiment in either the Prepare data or Explore solution view. This comparison can be useful to see data value differences between scenarios displayed
next to each other.
For more information, see Compare scenario tables.
Week ending 18 October 2024
Account resource scoping is enabled by default
17 October 2024
The Resource scope
setting for your account is now set to ON
by default. However, if you previously set the value for the Resource scope setting to either ON
or OFF
, the current setting is
not changed.
When resource scoping is enabled, you can’t access projects that are not in your currently selected IBM Cloud account. If you belong to more than one IBM Cloud account, you might not see all your projects listed together. For example, you might not see all your projects on the All projects page. You must switch accounts to see the projects in the other accounts.
Week ending 11 October 2024
Analyze Japanese text data in SPSS Modeler with Text Analytics
9 October 2024
You can now use the Text Analytics nodes in in SPSS Modeler, such as the Text Link Analysis node and Text Mining node, to analyze text data written in Japanese.
Week ending 4 October 2024
Introducing IBM Manta Data Lineage: a new service that provides data lineage for your data
04 October 2024
IBM Manta Data Lineage is a data lineage service that increases data pipeline transparency so you can determine data accuracy throughout business models and systems. For information about data lineage, see Data lineage.
This service requires IBM Knowledge Catalog service and enabling data lineage on your IBM Cloud account. See, Enable data lineage. It is available only in the Dallas region.
You can access your imported lineages in the new workspace Data lineage or view lineage for a specific asset through Catalogs or Projects page.
You can import lineage metadata from the following sources:
- Microsoft Azure SQL Database connection
- Microsoft SQL Server connection
- Microsoft Power BI (Azure) connection
- Snowflake connection
- InfoSphere DataStage
- IBM DataStage for Cloud Pak for Data
For more information about metadata import, see Importing metadata.
Improved Draft tab for governance artifacts (IBM Knowledge Catalog)
3 October 2024
For each artifact type, you can now view all available drafts in the Draft tab. To view it, select the artifact type from the main menu and click Draft. The tab is visible only if you have the required permissions and if any drafts are available. When viewing all your drafts in the tab, you can select multiple drafts and use the bulk actions menu to edit or process them at once. Note that the All drafts page is no longer available from the main menu. For more information, see Managing governance artifacts.
Bulk actions on catalog assets (IBM Knowledge Catalog)
3 October 2024
You can now edit and remove classifications and custom properties for multiple assets in a catalog at the same time.
Automatically updated common properties of data assets (IBM Knowledge Catalog)
3 October 2024
With global asset identification, you can ensure that the common properties of data assets that have the same resource key and reference the same physical resource stay the same even if they're in different projects or catalogs. This way, you can manage such data assets properly and consistently. For more information, see Globla asset identification.
Assign user groups as asset members (IBM Knowledge Catalog)
3 October 2024
You can now assign user groups as asset members. Previously, you could add only individual catalog users as asset members.
Upload and update assets in bulk (IBM Knowledge Catalog)
3 October 2024
To upload and update multiple assets in bulk, you can now import and export CSV files with either asset metadata details or asset relationship details, or both. For more information, see Adding and updating assets and asset metadata from CSV files to catalogs).
Availability of watsonx.governance plan in Frankfurt region and deprecation of OpenScale legacy plan
3 October 2024
The watsonx.governance legacy plan to provision Watson OpenScale in the Frankfurt region is deprecated. IBM Watson OpenScale will no longer be available for new subscription or to provision new instances. For OpenScale capabilities, subscribe to the watsonx.governance Essentials plan, which is now available in Frankfurt as well as Dallas.
- To view plan details, see watsonx.governance plans.
- To get started, see Provisioning and launching watsonx.governance.
Notes:
- Existing legacy plan instances will continue to operate and will be supported until the End of Support date which remains to be determined.
- Existing customers on IBM Watson OpenScale can continue to open support tickets using IBM Watson OpenScale.
Updated environments and software specifications
3 October 2024
The Tensorflow and Keras libraries that are included in IBM Runtime 23.1 are now updated to their newer versions. This might have an impact on how code is executed in your notebooks. For details, see Library packages included in watsonx.ai Studio (formerly Watson Studo) runtimes.
Runtime 23.1 will be discontinued in favor of IBM Runtime 24.1 later this year. To avoid repeated disruption we recommend that you switch to IBM Runtime 24.1 now and use related software specifications for deployments.
- For information about changing environments, see Changing notebook environments.
- For details on deployment frameworks, see Managing frameworks and software specifications.
Use data source definitions to manage and protect data that is accessed from connections
04 October 2024
Data source definitions are a new type of asset that you define based on a connection or connected data asset's endpoints. When you create a data source definition, you can monitor where your data is stored across multiple projects, catalogs, or multi-node data sources. You can also apply the correct protection solution (enforcement engine) based on the data source definition. For details, see Data protection with data source definitions.
These new data source definition features are available only in the Dallas region.
Defining a data source definition with a protection solution (IBM Knowledge Catalog)
04 October 2024
A protection solution is a method of enforcing the data protection rules either in governed catalogs or by a deep enforcement solution.
To configure the platform with a deep enforcement solution, you can create a data source definition to set the data source type. The data source type determines which types of connections the data source definition can be associated with and your available protection solution options. For details, see Protection solutions for data source definition.
These new data source definition features are available only in the Dallas region.
Review and manage data class and term assignments in a spreadsheet (IBM Knowledge Catalog)
04 October 2024
If you prefer to work in a familiar spreadsheet program when you review and update metadata enrichment results, you can now install the Review metadata add-in for Microsoft Excel. Use the spreadsheet template provided with the product in combination with the add-in:
- To download enriched data assets from a specific project and metadata enrichment.
- To review and update suggested and assigned data classes and terms for these data assets.
- To upload the updated data assets to the project.
For more information, see Reviewing and updating enrichment results in an external program.
Week ending 27 September 2024
Removal of Spark 3.3 runtime
23 September 2024
Support for Spark 3.3 runtime in IBM Analytics Engine will be removed by October 29, 2024 and the default version will be changed to Spark 3.4 runtime. To ensure a seamless experience and to leverage the latest features and improvements, switch to Spark 3.4.
Beginning October 29, 2024, you cannot create or run notebooks or custom environments by using Spark 3.3 runtimes. Also, you cannot create or run deployments with software specifications that are based on the Spark 3.3 runtime.
- To upgrade your instance to Spark 3.4, see Replace Instance Default Runtime.
- For details on available notebook environments, see Changing the environment of a notebook.
- For details on deployment frameworks, see Managing frameworks and software specifications.
Week ending 20 September 2024
Group data quality rules (IBM Knowledge Catalog)
20 September 2024
You can now group certain types of data quality rules into a single DataStage flow and run them together. For more information, see Grouping rules.
Week ending 13 September 2024
Create batch jobs for SPSS Modeler flows in deployment spaces
10 September 2024
You can now create batch jobs for SPSS Modeler flows in deployment spaces. Flows give you the flexibility to decide which terminal nodes to run each time that you create a batch job from a flow. When you schedule batch jobs for flows, the batch job uses the data sources and output targets that you specified in your flow. The mapping for these data sources and outputs is automatic if the data sources and targets are also in your deployment space. For more information about creating batch jobs from flows, see Creating deployment jobs for SPSS Modeler flows.
For more information about flows and models in deployment spaces, see Deploying SPSS Modeler flows and models.
Week ending 30 August 2024
Change pipeline node shape
30 August 2024
You can now change pipeline nodes' appearance to turn them from uniform card style into more compact sized shapes which reflect the type of node. For more information, see Pipelines settings.
Create global parameter sets
30 August 2024
You can now add PROJDEF parameters to your pipeline parameter sets. The parameters can be referenced from both DataStage and Orchestration Pipelines flows at the same project level. For more information, see Configuring global objects for Orchestration Pipelines.
Week ending 23 August 2024
Add user groups as collaborators in projects and spaces
22 August 2024
You can now add user groups as collaborators in projects and spaces if your IBM Cloud account contains IAM access groups. Your IBM Cloud account administrator can create access groups, which are then available as user groups in projects. While creating a project, you must leave the Restrict who can be a collaborator option enabled to add user groups as collaborators. For more information, see Working with IAM access groups.
Support ending for anomaly prediction feature for AutoAI time-series experiments
19 August 2024
The feature to predict anomalies (outliers) in AutoAI time-series model predictions, currently in beta, is deprecated and will be removed on Sep 23, 2024. Standard AutoAI time-series experiments are still fully supported. For details, see Building a time series experiment.
Assign classifications in metadata enrichment (IBM Knowledge Catalog)
22 August 2024
You can now assign classifications to data assets and columns in metadata enrichment, either automatically based on term or data-class assignment or manually in the enrichment results. See Designing metadata enrichment: Assign terms and classifications.
Week ending 16 August 2024
Archive and unarchive projects and spaces
16 August 2024
Projects and spaces are now archived after 90 days of inactivity to preserve resources. To work with such projects or spaces again, unarchive them by opening them directly on the project or space page. Depending on the size of the project or space, unarchiving might take a varied amount of time.
Configure asset removal
16 August 2024
Now, when you create a new catalog, you can also decide how you want to configure the removal of assets. You can either select to purge the assets automatically either immediately after the removal or 30 days after the removal. For previously created catalogs, you can change asset removal settings on the catalog Settings page.
For more information, see:
Task credentials are now required to run jobs in a deployment space
15 August 2024
To improve the security for running deployment jobs, you must enter your task credentials to run job in a deployment space. For more information, see Creating jobs in deployment spaces.
To learn how to set up your task credentials and generate an API key, see Adding task credentials.
Week ending 26 July 2024
Pausing metadata enrichment (IBM Knowledge Catalog)
25 July 2024
You can now pause and resume metadata enrichment job runs. For details, see Pausing and resuming enrichment job runs.
Announcing support for Python 3.11 and R4.3 frameworks and software specifications on runtime 24.1
25 July 2024
You can now use IBM Runtime 24.1, which includes the latest data science frameworks based on Python 3.11 and R 4.3, to run Jupyter notebooks and R scripts, and train models. Starting on July 29, you can also run deployments. Update your assets and deployments to use IBM Runtime 24.1 frameworks and software specifications.
- For information on the IBM Runtime 24.1 release and the included environments for Python 3.10 and R 4.2, see Notebook environments.
- For details on deployment frameworks, see Managing frameworks and software specifications.
Enhanced version of Jupyter Notebook editor is now available
25 July 2024
If you're running your notebook in environments that are based on Runtime 24.1, you can use these enhancements to work with your code:
- Automatically debug your code
- Automatically generate a table of contents for your notebook
- Toggle line numbers next to your code
- Collapse cell contents and use side-by-side view for code and output, for enhanced productivity
For more information, see Jupyter notebook editor.
Natural Language Processor transformer embedding models supported with Runtime 24.1
25 July 2024
In the new Runtime 24.1 environment, you can now use natural language processing (NLP) transformer embedding models to create text embeddings that capture the meaning of a sentence or passage to help with retrieval-augmented generation tasks. For more information, see Embeddings.
New specialized NLP models are available in Runtime 24.1
25 July 2024
The following new, specialized NLP models are now included in the Runtime 24.1 environment:
- A model that is able to detect and identify hateful, abusive, or profane content (HAP) in textual content. For more information, see HAP detection.
- Three pre-trained models that are able to address topics related to finance, cybersecurity, and biomedicine. For more information, see Classifying text with a custom classification model.
Extract detailed insights from large collections of texts by using Key Point Summarization
25 July 2024
You can now use Key Point Summarization in notebooks to extract detailed and actionable insights from large collections of texts that represent people’s opinions (such as product reviews, survey answers, or comments on social media). The result is delivered in an organized, hierarchical way that is easy to process. For more information, see Key Point Summarization
RStudio version update
25 July 2024
To provide a consistent user experience across private and public clouds, the RStudio IDE for the Cloud Pak for Data as a Service will be updated to RStudio Server 2024.04.1 and R 4.3.1 on July 29, 2024. The new version of RStudio provides a number of enhancements and security fixes. See the RStudio Server 2024.04.1 release notes for more information. While no major compatibility issues are anticipate, users should be aware of the version changes for some packages described in the following table below.
When launching the RStudio IDE from a project after the upgrade, reset the RStudio workspace to ensure that the library path for R 4.3.1 packages is picked up by the RStudio Server.
Week ending 12 July 2024
Tracking data protection rule enforcement decisions
9 July 2024
You can now track enforcement decisions as audit events when the Send policy evaluations to audit logs checkbox is selected from the Managing rule settings page.
Week ending 5 July 2024
Connectors grouped by data source type
05 July 2024
When you create a connection, the connectors are now grouped by data source type so that the connectors are easier to find and select. For example, the MongoDB data source type includes the IBM Cloud Databases for MongoDB and the MongoDB connectors.
In addition, a new Recents category shows the six latest connectors that you used to create a connection.
For instructions, see Adding connections to data sources in a project or Adding connections to data sources in a catalog.
Bulk edits for governance artifact properties
05 July 2024
You can now change the primary or secondary category for multiple governance artifacts at once. Bulk edits are also available when updating relationships. For more information, see Managing governance artifacts.
Setting an assignment threshold for results of relationship analyses (IBM Knowledge Catalog)
05 July 2024
You now also set a threshold for when results of a relationship analysis should be assigned automatically. You can set a project default but overwrite the setting for each analysis run. For details, see Identifying relationships.
Changes to Cloud Object Storage Lite plans
01 July 2024
Starting on 1 July 2024, the Cloud Object Storage Lite plan that is automatically provisioned when you sign up for a 30 day trial of Cloud Pak for Data as a Service expires after the trial ends. You can upgrade your Cloud Object Storage Lite instance to the Standard plan with the Free Tier option at any time during the 30 day trial.
Existing Cloud Object Storage service instances with Lite plans that you provisioned prior to 1 July 2024 will be retained until 15 December 2024. You must upgrade your Cloud Object Storage service to a Standard plan before 15 December 2024.
Week ending 21 June 2024
Adding catalog assets to projects
20 June 2024
Added a Add catalog assets to projects user permission. Now, to add assets to projects, you must have the Add catalog assets to projects, the Admin, Editor or Viewer role in the catalog, and be the asset owner or editor. Users that don't have an existing role with the Manage catalogs or Access catalogs permission must be explicitly granted the Add catalog assets to projects permission.
Cognos Dashboard removal postponed
20 June 2024
Any existing dashboards that you created with the Cognos Dashboards Embedded service will now continue working until 30 September 2024. You can no longer provision an instance of the Cognos Dashboards Embedded service. You can use Cognos Analytics on Cloud On-Demand as a replacement for Cognos Dashboards Embedded. For more information, see IBM Cognos Analytics Pricing Plans.
Task credentials will be required for deployment job requests
19 Jun 2024
To improve security for running deployment jobs, the user requesting the job will be required to provide task credentials in the form of an API key. The requirement will be enforced starting August 15, 2024. See Adding task credentials for details on generating the API key.
Enhanced data enrichment in IBM Knowledge Catalog
20 Jun 2024
In addition to the existing capabilities, metadata enrichment now provides options for semantic and AI-augmented data enrichment:
- Recommend descriptive names for tables and columns based on the collected metadata and a predefined glossary.
- Suggest and assign semantic descriptions for the contents of tables and columns based on the surrounding columns and the context of the tables.
- Complete semantic term assignment for tables and columns.
For details, see Designing metadata enrichments.
These new gen AI based metadata enrichment features are available only in the Dallas region.
IBM Federated Learning Python client change
20 Jun 2024
Federated Learning's Python client library has been merged with the watsonx.ai library. Your code samples must be updated with the newest Python client. See Connecting to the aggregator.
Connect to a new data source in DataStage: IBM Planning Analytics
14 Jun 2024
You can now include data from an IBM Planning Analytics data source in your DataStage flows.
For the full list of DataStage connectors, see Supported data sources in DataStage.
Week ending 7 June 2024
Bulk edits for governance artifacts
7 Jun 2024
You can now make changes to multiple governance artifacts at once when you want to edit tags or stewards. For more information, see Managing governance artifacts.
Changing parent category for individual artifacts
7 Jun 2024
When viewing artifact details, you can now change the parent category by selecting Move to from the three-dot action menu.
Data protection rules no longer enforced in projects
7 June 2024
Data protection rules are now only enforced either in governed catalogs or by a deep enforcement solution. A deep enforcement solution is a protection solution to enforce rules on data that is outside of Cloud Pak for Data when the data source is integrated with one of these services:
- IBM Watson Query
- IBM watsonx.data
Assets that are added into projects from a governed catalog no longer have preview, download or profiling restricted by data protection rules unless you have configured a deep enforcement solution.
You will be reminded of the revised data protection rule enforcement protocols when you:
- Creating a data protection rule.
- Copying an asset from a governed catalog into a project
For details, see Accept revised protocol for enforcing data protection rules.
Managing reports settings
6 June 2024
IBM Cloud account owners or administrators can now manage the reports settings on the Account page. For more information, see Managing your account settings.
Week ending 31 May 2024
IBM Watson Pipelines is now IBM Orchestration Pipelines
30 May 2024
The new service name reflects the capabilities for orchestrating parts of the AI lifecycle into repeatable flows.
Tag projects for easy retrieval
31 May 2024
You can now assign tags to projects to make them easier to group or retrieve. Assign tags when you create a new project or from the list of all projects. Filter the list of projects by tag to retrieve a related set of projects. For more information, see Creating a project.
Connect to a new data source: Milvus
31 May 2024
Use the Milvus connection to store and confirm the accuracy of your credentials and connection details to access a Milvus vector store. For information, see Milvus connection.
Week ending 24 May 2024
Asset user and role
24 May 2024
Updated the asset membership roles for catalogs. Now, users can hold the asset owner, asset editor, or asset viewer role. The asset editor role replaced the asset member role. Now, to complete any asset-related actions, you must be an asset owner or asset editor.
Also, assets might have more than one owner now.
You can change asset user roles on the Access control page of an asset by selecting a role from the Role dropdown menu.
Bulk actions on catalog assets
24 May 2024
You can now edit and remove the business terms, owners or tags on up to 20 catalog assets at a time.
Week ending 10 May 2024
New filters for enrichment results (IBM Knowledge Catalog)
10 May 2024
You can now apply additional filters to your enrichment results:
- Assigned, suggested, or no business terms
- Assigned, suggested, or no data class
Name changes for DataStage connections and connectors
10 May 2024
The following DataStage connections and connectors have new names:
- "Apache Cassandra (optimized)" is now "Apache Cassandra for DataStage".
- "IBM Db2 (optimized") is now "IBM Db2 for DataStage".
- "IBM Netezza Performance Server (optimized)" is now "IBM Netezza Performance Server for DataStage".
- "Oracle (optimized)" is now "Oracle Database for DataStage".
- "Salesforce.com (optimized)" is now "Salesforce API for DataStage".
- "Teradata (optimized)" is now "Teradata database for DataStage".
Your previous settings for the connections, connectors, and their associated jobs remain the same. Only the connection and connector names have changed.
Week ending 26 April 2024
Name change for the IBM Watson Query connection
26 Apr 2024
The "IBM Watson Query" connection has been renamed to "IBM Data Virtualization". Your previous settings for the connection remain the same. Only the connection name has changed.
Name change for the DataStage IBM Watson Query connector
26 Apr 2024
The DataStage "IBM Watson Query" connector name has changed to "IBM Data Virtualization". This change coincides with the connection name change. Your previous settings for the connection, connector, and the associated jobs remain the same. Only the connection and connector name have changed.
Masking watsonx.data in IBM Knowledge Catalog
26 Apr 2024
You can protect sensitive data in watsonx.data by using masking capabilities of IBM Knowledge Catalog. For more information, see Masking watsonx.data assets in IBM Knowledge Catalog.
Week ending 19 April 2024
Enhanced project list view in catalogs
18 Apr 2024
Now, when you are adding assets from a catalog to a project, you can view more than 100 projects in your project list page and add up to 50 assets at a time to your project. For more information, see Add assets from within the catalog.
Evaluate machine learning deployments in spaces
18 Apr 2024
Configure watsonx.governance evaluations in your deployment spaces to gain insights about your machine learning model performance. For example, evaluate a deployment for bias or monitor a deployment for drift. When you configure evaluations, you can analyze evaluation results and model transaction records directly in your spaces.
For more information, see Evaluating deployments in spaces.
19 Apr 2024
Week ending 12 April 2024
Revised data protection rule enforcement protocol across Cloud Pak for Data
12 Apr 2024
A revised version of the data protection rule enforcement protocol is now in place across Cloud Pak for Data. When you're inside of a governed catalog and click Add to project
, information about the new data protection rule enforcement
protocol appears. You must acknowledge it to continue.
Cognos Dashboards Embedded service is deprecated
11 Apr 2024
You can no longer provision an instance of the Cognos Dashboards Embedded service. However, any existing dashboards that you created with the Cognos Dashboards Embedded service will continue working until 20 June 2024. You can use Cognos Analytics on Cloud On-Demand as a replacement for Cognos Dashboards Embedded. For more information, see IBM Cognos Analytics Pricing Plans.
Week ending 5 April 2024
Use pivot tables to display data aggregated in Decision Optimization experiments
5 Apr 2024
You can now use pivot tables to display both input and output data aggregated in the Visualization view in Decision Optimization experiments. For more information, see Visualization widgets in Decision Optimization experiments.
Access the list of connection API properties from the user interface
05 Apr 2024
Previously the only way to view the connection properties was to open a new web page at https://dataplatform.cloud.ibm.com/connections/docs. Now you can access the same information from Data > Connectivity. Expand Connection resources, and select Connection properties.
You can use these properties to create connections with the connections in the Watson Data API. For example, if you create a connection in a notebook programmatically, you can use this information to identify the properties that you need.
Week ending 22 March 2024
Create dynamic views of connected data (IBM Knowledge Catalog)
21 March 2024
A new type of connected data asset provides filtered access to data from data sources that support SQL queries so you can access only relevant data. In a project, provide an SQL query to create a view of specific columns or rows from one or more tables. You can use these data assets in metadata enrichment and data quality analysis just like any other connected data asset.
For more information, see Adding a dynamic view of connected data to a project.
Use Delta Lake or Apache Iceberg table formats in the Amazon S3 and the Apache HDFS connectors
22 March 2024
The Amazon S3 and the Apache HDFS connectors now include properties for the Delta Lake and Apache Iceberg table formats. These table formats are integral to data lakes, which provide a centralized repository for managing large data volumes. Data lakes serve as a foundation for collecting and analyzing structured, semi-structured, and unstructured data in its original format for long-term storage and to drive insights and predictions.
The table format property is included in the interaction properties for the supported tools. For example, in the connector Stage properties in DataStage.
Week ending 23 February 2024
Access data from DataStax Enterprise
23 Feb 2024
You can now work with data from DataStax Enterprise.
Week ending 16 February 2024
Case-sensitive codes in reference data sets in IBM Knowledge Catalog
16 Feb 2024
Reference data values consist of at least two columns: code and value. For all new reference data sets the code column is now case-sensitive. When you add values to a new reference data set, the code is saved exactly as you type it. Note that any reference data sets that were created before this change was introduced remain case-insensitive, and any new values added there will be saved in upper case. These reference data sets are marked with a Case-insensitive tag in the UI. For details, see Case-sensitive code.
Improved search, filter and sort options for reference data sets in IBM Knowledge Catalog
16 Feb 2024
When you view a list of reference data values, you can use the following methods to find the required values faster:
- Use a search bar to type a query for a code, value or a custom column value.
- Use one of the 6 advanced filter options.
- Use the sorting feature.
The search, filter, and sort options can be combined. For details, see Viewing reference data sets.
Week ending 09 February 2024
New Spark 3.4 environment for running Data Refinery flow jobs
09 Feb 2024
When you select an environment for a Data Refinery flow job, you can now select Default Spark 3.4 & R 4.2, which includes enhancements from Spark.
The Default Spark 3.3 & R 4.2 environment is deprecated and will be removed in a future update.
Update your Data Refinery flow jobs to use the new Default Spark 3.4 & R 4.2 environment. For details, see Compute resource options for Data Refinery in projects.
More task-oriented Decision Optimization documentation
09 Feb 2024
You can now more easily find the right information for creating and configuring Decision Optimization experiments. See Decision Optimization experiments and its subsections.
Pagination view feature to publish assets to a catalog
08 Feb 2024
When you are publishing project assets to a catalog, you can now view 20 catalogs and assets on each page with the pagination view. Previously, you can view your assets on a list. See Publishing assets to a catalog.
Advanced analysis types in metadata enrichment are available in the Frankfurt region (IBM Knowledge Catalog)
09 Feb 2024
Advanced primary key and relationship analysis and advanced profiling are now also available in the Frankfurt region, in addition to the Dallas region.
IBM Cloud Data Engine connection is deprecated
08 Feb 2024
The IBM Cloud Data Engine connection is deprecated and will be discontinued in a future release. See Deprecation of Data Engine for important dates and details.
Week ending 02 February 2024
Save your searches for catalog assets
02 Feb 2024
Each user can now save up to 25 searches within each of their catalogs. The user who saves a search in a catalog is the only user who can view, run, edit, and remove the search. For more information, see Saving searches for catalog assets.
Gallery renamed to Resource hub
02 Feb 2024
The Gallery is renamed to Resource hub. The Resource hub contains sample projects, data sets, and notebooks. See Resource hub.
IBM Cloud Databases for DataStax connection is discontinued
02 Feb 2024
The IBM Cloud Databases for DataStax connection has been removed from Cloud Pak for Data as a Service.
Dremio connection requires updates
02 Feb 2024
Previously the Dremio connection used a JDBC driver. Now the connection uses a driver based on Arrow Flight.
Important: Update the connection properties. Different changes apply to a connection for a Dremio Software (on-prem) instance or a Dremio Cloud instance.
Dremio Software: Update the port number.
The new default port number that is used by Flight is 32010
. You can confirm the port number in the dremio.conf file. See Configuring via dremio.conf for information.
Additionally, Dremio no longer supports connections with IBM Cloud Satellite.
Dremio Cloud: Update the authentication method and hostname.
- Log into Dremio and generate a personal access token. For instructions see Personal Access Tokens.
- In Cloud Pak for Data as a Service in the Create connection: Dremio form, change the authentication type to Personal Access Token and add the token information. (The Username and password authentication can no longer be used to connect to a Dremio Cloud instance.)
- Select Port is SSL-enabled.
If you use the default hostname for a Dremio Cloud instance, you need to change it:
- Change
sql.dremio.cloud
todata.dremio.cloud
- Change
sql.eu.dremio.cloud
todata.eu.dremio.cloud
Additional analysis types in metadata enrichment (IBM Knowledge Catalog)
31 Jan 2024
Metadata enrichment now provides these additional analysis options:
-
Primary key analysis to detect primary keys in your data that uniquely identify each record in a data asset.
Shallow analysis is automatically included when you select the Profile data enrichment option. Advanced analysis can be run on selected assets from the enrichment results.
-
Relationship analysis to identify relationships between data asset or to find overlapping and redundant data in columns.
Shallow key relationship analysis is run when you select the new Set relationships enrichment option. Advanced analysis can be run on selected assets from the enrichment results.
-
Advanced profiling to get more exact results for certain metrics, such as frequency distribution and uniqueness of values within a column.
Advanced profiling can be run on selected assets from the enrichment results.
Advanced primary key and relationship analysis and advanced profiling require the DataStage service in addition to the IBM Knowledge Catalog service and are available only in the Dallas region.
For more information, see Creating a metadata enrichment asset, Identifying primary keys, Identifying relationships, and Advanced data profiles.
Week ending 26 January 2024
AutoAI supports ordered data for all experiments
25 Jan 2024
You can now specify ordered data for all AutoAI experiments rather than just time series experiments. Specify if your training data is ordered sequentially, according to a row index. When input data is sequential, model performance is evaluated on newest records instead of a random sampling, and holdout data uses the last n records of the set rather than n random records. Sequential data is required for time series experiments but optional for classification and regression experiments.
Set to dark theme
25 Jan 2024
You can now set your Cloud Pak for Data as a Service user interface to dark theme. Click your avatar and select Profile and settings to open your account profile. Then, set the Dark theme switch to on. Dark theme is not supported in RStudio and Jupyter notebooks. For information on managing your profile, see Managing your settings.
Week ending 19 January 2024
View native type information in the details panel for asset columns
19 Jan 2024
Now, you can view both standardized and native data types directly in the column details panel. To view the native type information, click an asset column name from the Overview page of an asset.
New option for rule action precedence (IBM Knowledge Catalog)
18 Jan 2024
Rule action precedence enables you to specify how rules are applied when there are multiple rules with different actions on a data set. You can use the new Hierarchical enforcement option to configure a two-layer evaluation of data protection rules.
- The first layer evaluates the rules for an
Allow
orDeny
action without considering any masking actions. The decision from this first layer must be to allow access to move to the second layer. - The second layer evaluates the rules for a
Transform
action.
You can set this option from the user interface or from the access_decision_precedence
API.
For more information, see Managing rule settings.
Store the results of data quality analysis (IBM Knowledge Catalog)
18 Jan 2024
You now have the option to write the output of the predefined data quality checks that are run as part of metadata enrichment to a database. For example, you might want to store this data so that you can use the tables for tracking quality issues and as input to remediation processes. For more information, see Creating a metadata enrichment.
Connect to a new data source in DataStage: Tableau
18 Jan 2024
You can now include data from a Tableau data source in your DataStage flows.
For the full list of DataStage connectors, see Supported data sources in DataStage.
Week ending 12 January 2024
Support for IBM Runtime 22.2 deprecated in watsonx.ai Runtime (formerly Watson Machine Learning)
11 Jan 2024
IBM Runtime 22.2 is deprecated and will be removed on 11 April 2024. Beginning 7 March 2024, you cannot create notebooks or custom environments by using the 22.2 runtimes. Also, you cannot train new models with software specifications that are based on the 22.2 runtime. Update your assets and deployments to use IBM Runtime 23.1 before 7 March 2024.
- To learn more about migrating an asset to a supported framework and software specification, see Managing outdated software specifications or frameworks.
- To learn more about the notebook environment, see Compute resource options for the notebook editor in projects.
- To learn more about changing your environment, see Changing the environment of a notebook.
Week ending 15 December 2023
View data source information in the details panel for catalogs
15 Dec 2023
If you click on an asset from the related items grid, you can view data source information directly in the asset details panel.
Create user API keys for jobs and other operations
15 Dec 2023
Certain runtime operations in Cloud Pak for Data as a Service, such as jobs and model training, require an API key as a credential for secure authorization. With user API keys, you can now generate and rotate an API key directly in Cloud Pak for Data as a Service as needed to help ensure your operations run smoothly. The API keys are managed in IBM Cloud, but you can conveniently create and rotate them in Cloud Pak for Data as a Service.
The user API key is account-specific and is created from Profile and settings under your account profile.
For more information, see Managing the user API key.
New login session expiration and sign out due to inactivity
15 Dec 2023
You are now signed out of IBM Cloud due to session expiration. Your session can expire due to login session expiration (24 hours by default) or inactivity (2 hours by default). You can change the default durations in the Access (IAM) settings in IBM Cloud. For more information, see Set the login session expiration.
Access the list of connection API properties
15 Dec 2023
You can now view the full list of the connectors with their individual properties at: https://dataplatform.cloud.ibm.com/connections/docs.
You can use these properties to create connections with the connections in the Watson Data API. For example, if you create a connection in a notebook programmatically, you can use this information to identify the properties that you need.
Organize project assets into folders
14 Dec 2023
You can now create folders in your projects to organize assets. An administrator of the project must enable folders, and administrators and editors can create and manage them. Folders are in beta and are not yet supported for use in production environments. For more information, see Organizing assets with folders (beta).
IBM Cloud Databases for DataStax connector is deprecated
15 Dec 2023
The IBM Cloud Databases for DataStax connector is deprecated and will be discontinued in a future release.
Week ending 08 December 2023
New client properties in Db2 connections for workload management
08 Dec 2023
You can now specify properties in the following fields for monitoring purposes: Application name, Client accounting information, Client hostname, and Client user. These fields are optional and are available for the following connections:
Connect to a new data source in DataStage: Google Looker
08 Dec 2023
You can now include data from a Google Looker data source in your DataStage flows. (You can use this connection for source data only.)
For the full list of DataStage connectors, see Supported data sources in DataStage.
New and enhanced features in Watson Query
08 Dec 2023
The following new and enhanced features are available in Watson Query:
Use IBM Knowledge Catalog data protection rules to filter rows in virtualized tables
You might have a data source that has tables with government, enterprise, and retail client data combined. For example, a billing table might have data for all the customers, where some of the rows are for government clients and some are for nongovernment clients. The type of the client is not indicated in the billing table. Now, you can filter the list of client records by using one of the following techniques.
You can use a separate table to identify customers that are government clients. The IDs from this table can be used to filter out rows from the billing table. When you filter out rows, the masked table does not contain the rows with data of government clients.
You can use a table of blocked customer identifiers as a reference table. Any rows in the billing table that have rows with the customer identifier that is included in the blocked customer set are filtered out of the resulting set.
Watson Query supports masking columns in virtualized data based on data protection rules that are defined in IBM Knowledge Catalog. Now, you can create data protection rules to include or exclude rows in your virtualized data to avoid exposing sensitive data.
For more information, see Governing virtual data with data protection rules in Watson Query.
Use advanced data masking on virtualized data
You can now use the advanced data masking options in Watson Query to avoid exposing sensitive data.
For more information about the updated masking behavior, see Masking virtual data in Watson Query.
Improved query performance and enforcement of data protection rules
Watson Query now stores and caches data protection rules from IBM Knowledge Catalog in a policy enforcement point (PEP) cache to avoid evaluating rules every time an object is queried. This cache improves the performance of previously executed queries by reducing the number of calls to IBM Knowledge Catalog to fetch the rules. However, you might notice a delay of up to 10 seconds before newly added or updated data protection rules are applied to queries. You can use the web client to configure PEP cache settings, such as cache size and cache live time.
For more information, see Enabling enforcement of data protection rules in Watson Query.
Format and save formatted query access plans for performance tuning
You can now format and save formatted access plans for performance tuning in Watson Query. When you run SQL queries in Watson Query, you can use the web client to format how EXPLAIN information appears when you generate query access plans. You can then run the db2exfmt command from the web client to easily generate and download the EXPLAIN output in text files.
Use wildcard characters to filter your data sources
Now when you create a virtualized table, you can use the following wildcard characters to customize filters to find the data sources that you need:
- % (percent): To represent zero or more characters
- _ (underscore): To represent a single character
For more information, see Filtering data in Watson Query.
Watson Query users can publish their own virtual objects
Users with the User role in Watson Query can now publish virtual objects that they created to governed catalogs.
For more information, see Publishing virtual data to a catalog with Watson Query.
Manage who can access and perform operations on individual data sources
With data source access restrictions, you can explicitly manage access to individual data source connections that use shared credentials. You can assign users and roles as collaborators for a data source connection. Only those collaborators can access the data source connection. You assign specific privileges to the collaborators to manage the actions that they can perform on the data sources. This enables you to separate privileges from roles, so that some users who are assigned a role such as Manager can access and take action on different data source connections than other Manager users.
For more information, see Data source connection access restrictions in Watson Query.
Query data in Generic S3 and Microsoft Azure Data Lake Storage Gen2 data lakes
You can now connect to Generic S3 and Microsoft Azure Data Lake Storage Gen2 data sources. For more information, see Supported data sources in Watson Query.
Choose your query mode to prioritize either performance or consistency
You can now choose between running queries in Max Pushdown mode or in Max Consistency mode.
- Max Pushdown mode ignores semantic difference between Watson Query and data source for single source queries. Therefore, more single source queries might be fully pushed down to data source, improving query performance. Query results are consistent with data source semantics for fully pushed down queries in this mode. Max Pushdown mode does not impact mulitple-source queries.
- Max Consistency mode follows Watson Query semantics to evaluate whether operations can be pushed down to the data source. If the operation that is executed on the data source generates the same result as Watson Query, the operation can be pushed down. Queries in this mode might be fully pushed down if the remote data source has the same semantics as Watson Query.
Quickly find and virtualize tables with the Explore tab
You can now quickly find the tables that you want to virtualize. On the Virtualize page, you can use the Explore tab to browse through databases, schemas, and available tables in a connected data source. The List tab displays all of the available tables in all of your connected data sources. On the Data sources page, you can filter your data sources to quickly load the reduced list of available tables in the List tab.
For more information, see Creating virtual objects in Watson Query.
Improve statistics collection for virtualized tables by using data sampling
Data sampling improves statistics collection by reducing the resources that you need to collect statistics. When you collect statistics by selecting the Remote query collection method in the web client, a default sampling rate of 20% is used. To optimize statistics collection, select Enable table sampling and choose a sampling rate between 1% and 99%.
If you collect statistics by using the DVSYS.COLLECT_STATISTICS procedure, you can use the TABLESAMPLE option with the remote-query statistics collection type to sample data when you collect statistics. For tips, see Usage notes.
You can also use the DVSYS.COLLECT_STATISTICS procedure to collect statistics for virtualized tables over flat files.
For more information, see the COLLECT_STATISTICS stored procedure in Watson Query.
Use your platform credentials to access Watson Query connections
When you use a platform connection to access Watson Query, you are prompted for your credentials. You can optionally select Use my platform login credentials, rather than entering your personal credentials for the connection. The connection uses your current session JSON Web Token (JWT).
Improvements for data sources in object storage
- You can now create connections and virtualize files for Generic S3 data sources in object storage:
- You can now create virtualized tables from externally compressed CSV or TSV files that are stored in object storage. For more information, see Creating a virtual table from files in object storage.
- You can now virtualize flat files in cloud object storage that contain column headers.
For more information, see Creating a virtualized table from files in cloud object storage in Watson Query.
Predicate pushdown improvements and support for predicate pushdown on more data sources
Predicate pushdown is an optimization that reduces query times and memory usage. This release includes the following improvements to predicate pushdown:
- Queries that include COUNT (DISTINCT) or GROUP BY clauses can now be pushed down with trailing blanks comparison rules for Teradata, Netezza®, Microsoft SQL Server, Db2® for z/OS®, and Db2 Database data sources.
- Queries that include a string comparison operation such as a GROUP BY or WHERE predicate against CHAR or VARCHAR data for the Teradata data source to handle case sensitivity.
- SQL statements with LIKE predicates are now pushed down for: Db2®, SAP HANA, Oracle, PostgreSQL, Apache Hive, MySQL, Microsoft SQL Server, Snowflake, Netezza® Performance Server, and Teradata.
- SQL statements with Fetch clauses are now pushed down for: Db2, Db2 for z/OS, Apache Derby, Oracle, Amazon Redshift, Google BigQuery, and Salesforce.com data sources.
- SQL statements with a string comparison filter are now pushed down for: Db2, Microsoft SQL Server, Teradata, Netezza Performance Server, and Apache Derby data sources.
- SQL statements with OLAP functions are now pushed down for Db2 and Netezza Performance Server data sources.
- The Greenplum data sources now supports push down of predicates.
- The MySQL (My SQL Community Edition and My SQL Enterprise Edition) data source now supports push down of predicates.
- The Cloudera Impala data source now supports push down of predicates.
- The Watson Query Manager for z/OS® data source now supports push down of predicates.
For more information, see Supported data sources in Watson Query.
A Watson Query connection is now available in the Platform connections by default
You can add a Watson Query connection from Platform connections to catalogs and projects without manually populating the connection details.
Manage access for multiple users and roles if you are a Manager
As a Watson Query Manager, you can now grant and revoke access for multiple users, and roles at the same time.
For more information, see Managing access to virtual objects in Watson Query.
Watson Query managers can now make virtual objects visible to all users
Managers can now choose to give users a more comprehensive view of the content by making existing virtual objects visible from the Virtualized data page. Data access within those objects continues to adhere to Watson Query authorizations and data protection rules. To enable this feature, managers need to disable the Restrict visibility setting from Service settings.
For more information, see Managing visibility of virtual objects in Watson Query.
New caching APIs
Cache entries can be managed through REST APIs that the caching service exposes. These APIs can be invoked by any application. You can use new caching APIs to do the following tasks:
- Create a cache
- List a specific cache
- Delete a cache
- Enable a cache
- Disable a cache
- Refresh a cache
- Edit a cache
The following caching APIs are deprecated:
- List caches
- List a cache
- Fetch the cache storage
For more information, see Caches in the Watson Query 2.0.0 API docs.
New publishing API
You can publish virutalized data to catalogs by using the following API:
The following API is deprecated:
Week ending 1 December 2023
New plans for Watson OpenScale as part of watsonx.governance
1 Dec 2023
Watson OpenScale is now part of watsonx.governance. Provisioning watsonx.governance from the IBM Cloud Catalog installs the Watson OpenScale. On Cloud Pak for Data as a Service, Watson OpenScale continues to provide services for evaluating predictive machine learning models. On the watsonx, provisioning watsonx.governance extends the governance capabilities of Watson OpenScale to evaluate foundation model assets as well as machine learning assets. You can define AI use cases to address business problems, then track asset data in factsheets to support compliance and governance goals. Watsonx.governance plans and features are available only in the Dallas region. Watson OpenScale legacy plans are available in the Frankfurt region.
- To view plan details, see watsonx.governance plans.
- To get started, see Provisioning and launching watsonx.governance.
IBM Watson Knowledge Catalog is now IBM Knowledge Catalog
1 Dec 2023
IBM Watson Knowledge Catalog is renamed to IBM Knowledge Catalog. Only the name changed, the service offering plans and product capabilities remain the same.
New data sources for metadata import in IBM Knowledge Catalog
1 Dec 2023
You can import metadata to IBM Knowledge Catalog from the following data sources:
- IBM Match 360
- SingleStoreDB
For more information, see Supported data sources for metadata import, metadata enrichment, and data quality rules.
Week ending 17 November 2023
New custom property of type user and user group
17 Nov 2023
You can now create a custom property of type user and user group and assign specific users or user groups to it. For more information, see Creating custom properties.
Multiple sources on either end of a custom relationship type
17 Nov 2023
You can extend your set of custom relationship types by using multiple types on the source and target end. Use many artifact, asset and column types for more detailed relationship definition. For more information, see Creating custom relationships.
New permissions for data quality in IBM Knowledge Catalog
17 Nov 2023
You can now assign the following permissions to your users to have more control over how data quality is established in IBM Knowledge Catalog:
- Manage data quality assets
- Execute data quality rules
- Drill down to issue details
By default, the new permissions are included in the following roles:
- Administrator
- CloudPak Data Quality Analyst, which is a new role
Update role assignments and any custom roles you might have for users who need to manage data quality definitions and rules and to run data quality rules.
For more information, see User roles and permissions for IBM Knowledge Catalog and watsonx.ai Studio.
Export and import data protection rules
17 Nov 2023
You can now use APIs to export and import data protection rules across multiple instances of Cloud Pak for Data as a Service. The links to glossary artifacts, catalogs, assets, and users are maintained when you export the data protection rules.
For more information, see Migrating data protection rules.
Run DataStage flows in Extract, Load, and Transform (ELT) run mode (Beta)
13 Nov 2023
The ELT process is different from the traditional Extract, Transform, and Load (ETL) process in that it runs the transform part of the process in the target database, which can be more efficient and cost effective. This capability is currently offered in beta and is not supported for production.
Removal of some predefined relationship types (13 December 2023)
13 Nov 2023
On 13 December 2023, predefined relationship types for asset-asset and asset-artifact relationships that are infrequently used will be removed.
The following relationship types will be affected:
Defines - Is defined by
will be replaced byContains - Is contained in
Is owner of - Is owned by
will be replaced byContains - Is contained in
Has for parent entity - Is relationship child of
will be replaced byIs parent of - Is child of
Is supertype of - Is subtype of
will be replaced byIs parent of - Is child of
Here's what you need to do now:
- If you are not using these relationship types, no action is required.
- If you are using these relationship types and agree with the replacement relationship types, no action is required.
- If you are using these relationship types and would like to assign different relationship types, remove the current relationship and create new relationships using other predefined or custom relationship types.
If you have any questions or concerns related to the replacement of these relationship types, you can open a support ticket.
Week ending 10 November 2023
Removal of resource key from the details panel for columns
10 Nov 2023
Resource key was displayed in the details panel at a column level although the information was not applicable for columns. Resource key is now removed from the details panel at a column level. The information is still required at an asset level. For example, the asset resource key might be used in the import lineage mapping CSV file.
Deploy DataStage remote runtime engines locally with DataStage-aaS Anywhere
9 Nov 2023
You can now deploy DataStage remote runtime engines to run data integration jobs on-premises or on any data center or cloud.
The DataStage runtime engine is a containerized offering that is deployed in local environments for enhanced performance and security. Design ETL and ELT pipelines in DataStage and run data integration tasks locally on your engine. Administrators can spin up one or more remote runtime engines. For security, the execution style cannot be reverted back to the IBM Cloud serverless runtime once DSaaS Anywhere is enabled for a project, but the IBM Cloud serverless runtime remains available for other projects.
For more information, see DataStage environments.
Announcing support for Python 3.10 and R4.2 frameworks and software specifications on runtime 23.1
9 Nov 2023
You can now use IBM Runtime 23.1, which includes the latest data science frameworks based on Python 3.10 and R 4.2, to run Jupyter notebooks and R scripts, train models, and run deployments. Update your assets and deployments to use IBM Runtime 23.1 frameworks and software specifications.
- For information on the IBM Runtime 23.1 release and the included environments for Python 3.10 and R 4.2, see Changing notebook environments.
- For details on deployment frameworks, see Managing frameworks and software specifications.
Use Apache Spark 3.4 to run notebooks and scripts
Spark 3.4 with Python 3.10 and R 4.2 is now supported as a runtime for notebooks and RStudio scripts in projects. For details on available notebook environments, see Compute resource options for the notebook editor in projects and Compute resource options for RStudio in projects.
Week ending 27 October 2023
Access data from complex flat files in DataStage
27 Oct 2023
You can now use the Complex Flat File connector in your DataStage flows.
For the full list of DataStage connectors, see Supported data sources in DataStage.
Saving your search query when using the global search bar
27 Oct 2023
You can now save your search criteria for later use. Your saved searches are listed in the drop-down list when you type in the search bar. You can also edit or delete the saved search. See Saving your search.
Connect to more data sources in DataStage
27 Oct 2023
You can now include data from these data sources in your DataStage flows:
- Apache Derby
- IBM Cloud Data Engine
- IBM Cloud Databases for DataStax
- IBM watsonx.data Presto
For the full list of DataStage connectors, see Supported data sources in DataStage.
Use a Satellite Connector to connect to an on-prem database
26 Oct 2023
Use the new Satellite Connector to connect to a database that is not accessible via the internet (for example, behind a firewall). Satellite Connector uses a lightweight Docker-based communication that creates secure and auditable communications from your on-prem environment back to IBM Cloud. For instructions, see Connecting to data behind a firewall.
Secure Gateway is deprecated
26 Oct 2023
IBM Cloud announced the deprecation of Secure Gateway. For information, see the Overview and timeline.
If you currently have connections that are set up with Secure Gateway, plan to use an alternative communication method. In Cloud Pak for Data as a Service, you can use the Satellite Connector as a replacement for Secure Gateway. See Connecting to data behind a firewall.
Use NLS collate in DataStage
27 Oct 2023
You can now collate data with National Language Support in your DataStage flows.
Week ending 20 October 2023
Access lakehouse data with the new IBM watsonx.data Presto connection
20 Oct 2023
You can use the IBM watsonx.data Presto connection to connect to a database in a watsonx.data instance that is deployed on Cloud Pak for Data or IBM Cloud. IBM watsonx.data is an open, hybrid and governed data lakehouse that is optimized by a query engine for all data and AI workloads.
For information, see IBM watsonx.data Presto connection.
Week ending 13 October 2023
Custom enumeration property names translated into your preferred language (IBM Knowledge Catalog)
13 Oct 2023
Custom property owners can now allow custom enumeration type property names to be translated into your preferred language.
The owner of the custom enumeration type property for an asset or column must define the definition of the property before you can choose to view custom enumeration property names in your browser's language. For more information, see Creating custom properties.
Intermediate solutions in Decision Optimization
12 Oct 2023
You can now choose to see a sample of intermediate solutions while a Decision Optimization experiment is running. This can be useful for debugging or to see how the solver is progressing. For large models that take longer to solve, with intermediate solutions you can now quickly and easily identify any potential problems with the solve, without having to wait for the solve to complete. You can configure the Intermediate solution delivery parameter in the Run configuration and select a frequency for these solutions. For more information, see Intermediate solutions and Run configuration parameters.
New Decision Optimization saved model dialog
When you save a model for deployment from the Decision Optimization user interface, you can now review the input and output schema, and more easily select the tables that you want to include. You can also add, modify or delete run configuration parameters, review the environment, and the model files used. All these items are displayed in the same Save as model for deployment dialog. For more information, see Deploying a Decision Optimization model by using the user interface.
Deprecation of profiling of unstructured data (IBM Knowledge Catalog)
10 Oct 2023
As of today, data assets that contain unstructured data can no longer be profiled.
View runtime metrics for your DataStage jobs
9 Oct 2023
You can now view runtime metrics for your DataStage jobs on the canvas and on the job run details page. For more information, see Creating and managing DataStage jobs.
Bulk add keys and attributes to new stages
9 Oct 2023
You can now bulk add keys and attributes to the following stages in your DataStage flows: Sort, Merge, Join, Remove duplicate, Difference, Change capture, Change apply, Combine records, Funnel, Compare, Lookup file set, Write range map, and Bloom filter.
Week ending 6 October 2023
Control the placement of a new column in the Concatenate operation (Data Refinery)
6 Oct 2023
You now have two options to specify the position of the new column that results from the Concatenate operation: As the right-most column in the data set or next to the original column.
Previously, the new column was placed at the beginning of the data set.
Edit the Concatenate operation in any of your existing Data Refinery flows to specify the new column position. Otherwise, the flow might fail.
For information about Data Refinery operations, see GUI operations in Data Refinery.
Week ending 29 September 2023
Use new functions in the expression builder for the Modify stage in DataStage
25 Sept 2023
You can use conversion functions in the expression builder in the Modify stage in your DataStage flows.
Week ending 22 September 2023
Decision Optimization Java models
20 Sept 2023
Decision Optimization Java models can now be deployed in watsonx.ai Runtime (formerly Watson Machine Learning). By using the Java worker API, you can create optimization models with OPL, CPLEX, and CP Optimizer Java APIs. You can now easily create your models locally, package them and deploy them on watsonx.ai Runtime by using the boilerplate that is provided in the public Java worker GitHub. For more information, see Deploying Java models for Decision Optimization.
Week ending 8 September 2023
Reminder: Watson Knowledge Catalog profiling of unstructured data will be discontinued
8 Sept 2023
Profiling of unstructured data assets will no longer be supported starting on October 10, 2023.
Week ending 1 September 2023
Deprecation of comments in notebooks
31 Aug 2023
As of today it is not possible to add comments to a notebook from the notebook action bar. Any existing comments were removed.
Use new environment variable in DataStage
28 Aug 2023
You can now add the environment variable APT_SHOW_METRICS to the flow parameters of your DataStage flows.
Week ending 25 August 2023
Quickly find catalogs with name and date sorting
24 Aug 2023
You can now find catalogs by sorting the list of catalogs on the View all Catalogs page by name or date created. Click on the Name header to sort the catalogs alphabetically by name. Click on the Date created header to sort the catalogs by ascending or descending dates.
Data quality at a glance in IBM Knowledge Catalog
22 Aug 2023
Data quality information has a new home. For each data asset in a catalog or a project, a Data quality page is populated with quality information that comes from predefined data quality checks and data quality rules. You can see the applicable data quality dimensions and the results of individual quality checks. You can drill down into the results for each check or even into the results for each column.
For more information, see Data quality.
Similar information is available from metadata enrichment results.
All data quality analysis is now run in the context of metadata enrichment or data quality rules. When you run profiling from the Profile page in a project or a catalog, data quality is not analyzed anymore and no data quality scores are generated.
Additional cache enhancements available for Watson Pipelines
21 August 2023
More options are available for customizing your pipeline flow settings. You can now exercise greater control over when the cache is used for pipeline runs. For details, see Managing default settings.
Week ending 18 August 2023
Plan name updates for watsonx.ai Runtime (formerly Watson Machine Learning) service
18 August 2023
Starting immediately, plan names are updated for the IBM watsonx.ai Runtime service, as follows:
-
The v2 Standard plan is now the Essentials plan. The plan is designed to give your organization the resources required to get started working with foundation models and machine learning assets.
-
The v2 Professional plan is now the Standard plan. This plan provides resources designed to support most organizations through asset creation to productive use.
Changes to the plan names do not change your terms of service. That is, if you are registered to use the v2 Standard plan, it will now be named Essentials, but all of the plan details will remain the same. Similarly, if you are registered to use the v2 Professional plan, there are no changes other than the plan name change to Standard.
For details on what is included with each plan, see watsonx.ai Runtime plans. For pricing information, find your plan on the watsonx.ai Runtime plan page in the IBM Cloud catalog.
Connect to more data sources in DataStage
18 Aug 2023
You can now include data from these data sources in your DataStage flows:
- Cloudera Impala
- Presto
For the full list of DataStage connectors, see Supported data sources in DataStage.
Connect to Google BigQuery data with ODBC (DataStage)
18 Aug 2023
The ODBC connection now includes the Google BigQuery data source.
For the full list of data sources that are available for the ODBC connection in DataStage, see ODBC connection.
Week ending 11 August 2023
Use new functions in the DataStage Transformer stage
8 August 2023
- You can now use data masking, encryption, and regex functions in the Transformer stage as part of your DataStage flows.
- You can now drag and drop columns on the Output tab of the Transformer stage.
- You can now bulk edit columns in the Transformer stage from the Input tab.
Deprecation of comments in notebooks
7 August 2023
On 31 August 2023, you will no longer be able to add comments to a notebook from the notebook action bar. Any existing comments that were added that way will be removed.
Week ending 4 August 2023
Custom text analytics template (SPSS Modeler)
4 August 2023
For SPSS Modeler, you can now upload a custom text analytics template to a project. This provides you with more flexibility to capture and extract key concepts in a way that is unique to your context.
Week ending 28 July 2023
Enhanced capabilities for evaluating models with Watson OpenScale
25 July 2023
Use these new features to monitor and evaluate model deployments and interpret results.
Configure deployments with a new guided setup
A new setup wizard is available to help you add deployments to the Watson OpenScale Insights dashboard and provide model details. For more information, see Adding deployments for evaluations.
Configure new drift evaluation to provide more insights
You can configure a new version of the drift evaluation in Watson OpenScale to generate the following new metrics:
- Output drift
- Feature drift
- Model quality drift
For more information, see Configuring drift v2 evaluations.
Understand model performance with model health evaluations
Watson OpenScale now provides new model health evaluations by default to help you understand how efficiently your model processes your transactions. For more information, see Model health monitor evaluation metrics.
Add multi-target prediction models in Watson OpenScale
When you add your deployments in Watson OpenScale, you can now specify multiple prediction columns to provide details about your models output to configure quality evaluations. For more information, see Providing model details.
Run fairness evaluations with unstructured data
You can now enable fairness evaluations on unstructured data types to identify bias. For more information, see Configuring fairness evaluations.
Week ending 14 July 2023
Manage asset column relationships in a catalog
14 July 2023
Admins can now create and manage asset column relationships in a catalog. Column relationships can be created between columns and assets, columns and artifacts, or between columns.
To add a column relationship, click a column row on the Overview page of an asset. In the side pane, click the Related items overflow menu. Select one of the relationship types from the dropdown to add a relationship.
To learn more about creating relationships, see Asset relationships in a catalog.
Deprecation of the profiling support for unstructured data in IBM Knowledge Catalog
12 July 2023
Profiling of data assets that contain unstructured data, such as Microsoft Word, PDF, HTML, and plain text documents, is deprecated. Support will be discontinued on 10 October 2023. Until then, unstructured data assets of the supported types will continue to be profiled automatically when added to a project or a catalog. Starting on 11 October 2023, newly added unstructured data assets will no longer be profiled. Existing profiles will be available while the respective data assets live in the project or catalog.
Microsoft Azure SQL Database connection supports Azure Active Directory authentication (Azure AD)
14 July 2023
You can now select Active Directory for the Microsoft Azure SQL Database connection. Active Directory authentication is an alternative to SQL Server authentication. With this enhancement, administrators can centrally manage user permissions to Azure. For more information, see Microsoft Azure SQL Database connection.
Week ending 7 July 2023
Switch to IBM watsonx.ai
7 July 2023
If you have the watsonx.ai Studio (formerly Watson Studio) and watsonx.ai Runtime (formerly Watson Machine Learning) services, you now have access to IBM watsonx.ai. You can switch from Cloud Pak for Data as a Service to watsonx and work with foundation models in the Prompt Lab tool or in notebooks.
Updates to watsonx.ai Runtime (formerly Watson Machine Learning) plans
7 July 2023
All watsonx.ai Runtime plans now include foundation model inferencing. Foundation model inferencing is available only on watsonx.ai. You can switch to watsonx.ai and use the new Prompt Lab tool or access foundation models with a notebook. You use the same watsonx.ai Runtime service instance on watsonx.ai as you use on Cloud Pak for Data as a Service.
If you have the watsonx.ai Runtime Lite plan, you can use up to 25,000 tokens for foundation model inferencing per month.
If you have the watsonx.ai Runtime v2 Standard or v2 Professional plan, your account will incur charges when your account users perform foundation model inferencing in the Prompt Lab or in notebooks.
For details on how foundation model inferencing is tracked and billed, see watsonx.ai Runtime plan. For the pricing of foundation model inferencing, find your plan on the watsonx.ai Runtime plan page in the IBM Cloud catalog.
Enhanced Natural Language Processing capabilities in Runtime 23.1
7 July 2023
Runtime 23.1 contains the Watson Natural Language Processing library 4.1 and a new set of pre-trained models. The NLP library contains the following enhancements and updates:
- Many included models are now transformer-based. These models were trained on the Slate large language model (LLM), which was created by IBM. The models are available in two versions:
- Optimized for CPU-only environments
- For environments with GPUs or CPUs
- Many included models for different NLP tasks are now workflow-based instead of block-based, so you can apply the models directly on input text without worrying about preprocessing steps.
NLP includes a Slate foundation model that you can use for fine-tuning your NLP tasks. You can use the Slate model or any transformer-based model from Hugging Face as a base to build your own models with Watson NLP.
All models provided by IBM are now exclusively trained on unbiased data with state-of-the-art filtering for hate, bias, and profanity.
These capabilities are currently available in the following environments:
- NLP Runtime 23.1 on Python 3.10
- GPU V100 Runtime 23.1 on Python 3.10
- GPU 2xV100 Runtime 23.1 on Python 3.10
You can use these environments for NLP processing, but not for general model development. The data science libraries used in these environments are not yet supported by watsonx.ai Runtime (formerly Watson Machine Learning).
For more information, see Watson Natural Language Processing.
Week ending 30 June 2023
Enhanced Data Privacy content in Knowledge Accelerators (IBM Knowledge Catalog)
28 June 2023
The Knowledge Accelerator for Cross Industry now has Data Privacy content that includes a set of classified business terms and data classes to accelerate the discovery and governance of personal information. In addition, sample data privacy policies and rules are available to describe the activities that are related to processing personal information.
The business terms and data classes have classifications to guide the identification of personal information (PI) and sensitive personal information (SPI). You can use metadata enrichment in IBM Knowledge Catalog to assign the business terms to imported data assets to identify assets that contain personal data.
Reporting now available for custom assets (IBM Knowledge Catalog)
28 June 2023
You can now create queries, reports, and dashboards based on custom-defined properties for any asset in a project or in a catalog. You can define new custom properties for assets to extend any provided or custom asset types and then create reports based on these relationships. For example, you can create a report on your data quality rules and artifact relationships to extrapolate the accuracy of your data. For more information, see Setting up reporting.
Reporting improvements for data quality rules (IBM Knowledge Catalog)
28 June 2023
You can now monitor data quality rules in the following ways:
- Receive and manage reports on data quality issues for each data asset in a catalog or a project.
- Monitor ongoing data quality for data assets in projects and catalogs by using reporting for data quality scores and data quality dimensions scores. The data quality score is based on a weighted average from data quality dimension scores. The data quality dimensions scores are based on results from relevant data quality checks.
- For data quality rules that include multiple rule definitions, see the data quality check statistics (results) by rule definition in the BI reporting schema.
For more information, see Data model.
Week ending 23 June 2023
Govern models more effectively with enhancements for AI Factsheets
23 June 2023
AI Factsheets now offers more ways for you to track solutions for business problems, govern a wider range of assets, capture more information with factsheet attachments, and generate improved reports.
Track different model use case solutions with approaches
When you track models in a use case, you can now create one or more approaches to track different methods and model versions for addressing a business problem. For example, you might create two different approaches in a use case to compare how different algorithms affect model performance so you can find the best solution. For details, see Managing model versions in a use case.
Enhanced options for governing external models
You can now use AI Factsheets to govern a wider range of external models, including models developed, deployed, and monitored on a platform other than Cloud Pak for Data as a Service. In addition to more comprehensive metadata tracked for external models, the Python client and API commands provide more features for moving models and deployments to different environments to more accurately track the life cycle for these assets. For details, see Adding an external model to the model inventory.
Exercise more control over attachments
Model inventory administrators can create attachment groups and create attachment definitions so that users can view attachments in a more organized fashion and upload attachments in an approved format. For details, see Adding and managing attachments for factsheets.
Add branding to your AI Factsheets reports
Customize the report templates that you use to create reports from factsheets by adding branding information and a logo. For more information, see Generating reports for factsheets and model use cases. For details, see Generating reports for factsheets and model use cases.
Announcing support for Python 3.10 Spark 3.3 runtime for notebooks (watsonx.ai Studio formerly Watson Studo)
23 June 2023
Python 3.10 Spark 3.3 is now supported as a runtime for notebooks. Python 3.9 Spark 3.3 is deprecated and will be discontinued on July 20, 2023. Starting on July 6, 2023, you will be restricted from creating notebooks with a Python 3.9 Spark 3.3 environment, but existing notebooks will continue to run until July 30, 2023. Change your notebook environment to use Python 3.10 Spark 3.3 before the deprecated environment is removed. For details on notebook environments, see Compute resource options for the notebook editor in projects.
Week ending 16 June 2023
Coming soon: General availability of time series anomaly prediction in AutoAI experiments
15 June 2023
Create a time series anomaly prediction experiment to train a model that can detect anomalies, or unexpected results, when the model predicts results based on new data. This capability of AutoAI is currently offered in beta, and is not supported for production. Once the feature is generally available and fully supported, training for time series anomaly prediction experiments will consume capacity unit hours (CUH) as part of your watsonx.ai Runtime plan. Create a time series anomaly prediction experiment to train a model that can detect anomalies, or unexpected results, when the model predicts results based on new data. This capability of AutoAI is currently offered in beta, and is not supported for production. Once the feature is generally available and fully supported, training for time series anomaly prediction experiments will consume capacity unit hours (CUH) as part of your watsonx.ai Runtime (formerly Watson Machine Learning) plan.
Customize engine parameters for Decision Optimization experiments (watsonx.ai Studio (formerly Watson Studio))
15 June 2023
You can now add an engine settings file in your Decision Optimization experiment. With this file, you can view and customize the engine parameters that are used to solve your model in a new visual editor. You can also import an engine settings file and search for existing settings.
Week ending 2 June 2023
Manage AI lifecycle events with the cpdctl tool
2 June 2023
You can now manage and automate your assets hosted on Cloud Pak for Data as a Service using the Cloud Pak for Data Command Line Interface tool (cpdctl). Use automatic configuration from IBM Cloud to easily connect with the cpdctl API commands. For details and an example, see these resources:
- IBM Cloud Pak for Data Command Line Interface documentation.
- Exporting space assets for an example of using cpdctl for managing assets.
- IBM cpdctl CLI on IBM Cloud blog post for details about connecting to cpdctl from Cloud Pak for Data as a Service.
Find your catalogs easily with search
1 June 2023
With the updated Catalogs page, you can now search for a catalog by name and see more catalogs on the page for easier scanning.
Week ending 19 May 2023
Reminder: End of support approaching for Runtime 22.1 on Python 3.9 and R 3.6
15 May 2023
IBM Runtime 22.1 on Python 3.9 and R 3.6 environments will be removed on June 15, 2023. You can no longer create new notebooks or create custom environments using the 22.1 runtimes or R 3.6, or train new models with Python 3.9 software specifications. Update your assets and deployments to use IBM Runtime 22.2 on Python 3.10 or R 4.2 before June 15, 2023.
- For details on migrating an asset to a supported framework and software specification, see Managing frameworks and software specifications.
- For details on notebook environments, see Compute resource options for the notebook editor in projects.
- For information on changing your environments, see Changing the environment of a notebook.
- For details on the libraries and packages for R versions, see the CRAN release notes.
Introducing key-value search for advanced users
18 May 2023
Using key:value
pairs in the search bar, you can now search within asset and artifact properties, such as the description, tags, custom properties, column names, and many more. See Searching for properties.
Name change for the IBM Cloud Compose for MySQL connection
18 May 2023
The IBM Cloud Compose for MySQL connection was renamed to IBM Cloud Databases for MySQL. Your previous settings for the connection remain the same. Only the connection name has changed.
Discontinued connections
18 May 2023
The following connections are discontinued and have been removed from Cloud Pak for Data as a Service:
- IBM Db2 Event Store
- IBM Db2 Hosted
Renaming data assets also renames file attachments in projects
19 May 2023
When you change the name of data assets with file attachments that you uploaded into the project, the file attachments are also renamed. However, changing the name of data assets imported from catalogs does not rename any attachments. You must update any references to the data asset in code-based assets, like notebooks, to the new data asset name, otherwise, the code-based asset won't run. See more information about Managing assets in projects.
Week ending 12 May 2023
New UI capabilities for creating custom assets and managing custom properties for columns
11 May 2023
Catalog collaborators with the Admin or Editor role can now complete the following tasks from the web client:
- Create custom assets from the catalog. To add a custom asset, select Custom asset from the Add to catalog drop-down menu.
- Manage custom properties for data asset columns. To manage custom properties, select a column in the Overview of an asset and edit the properties in the side pane.
To learn more about custom properties for data assets, see Custom asset types, properties, and relationships.
Week ending 5 May 2023
Add generated code from the Code snippets pane
4 May 2023
A new Code snippets icon was added to the notebook toolbar. Clicking the icon, opens the Code snippets pane from where you can read data from a file or connection that was added to the project. The existing "Insert to code" function logic for generating code that loads data to a notebook cell has been moved under Read data. The former Find and load data pane can now only be used to upload data to a project. See Loading and accessing data in a notebook.
Week ending 28 April 2023
Orchestration Pipelines now generally available for automating AI lifecycle activities
27 Apr 2023
Orchestration Pipelines provides a graphical interface for orchestrating an end-to-end flow of assets from creation through deployment. Assemble and configure a pipeline that automates the tasks around curating data, then training, deploying, and updating machine learning models. Run a pipeline job in real time or on a schedule. For details on creating pipelines, see Orchestration Pipelines.
New in this update is the ability to create a custom pipeline component to execute a script you write using a Python function. You can use custom components to share reusable scripts between pipelines. You create custom components as project assets and then use them in pipelines you create in that project. For details, see Creating a custom component.
Orchestration Pipelines is offered as a feature of watsonx.ai Studio (formerly Watson Studio). However, you must have service plans for the assets and processes used in a pipeline. For example, to run a DataStage flow in a pipeline, you must have a Data Stage service instance. Orchestration Pipelines consumes resources based on the assets and processes used in the pipeline. If your pipeline trains an AutoAI model, your account is charged for the watsonx.ai Runtime (formerly Watson Machine Learning) capacity units per hour (CUH) used for training the model. Likewise if a pipeline contains a DataStage flow, the execution of that flow within Orchestration Pipelines is charged to your DataStage plan. Running pipeline components and bash scripts consume watsonx.ai Studio CUH resources. For details on provisioning service instances and plans, see Services and integrations.
Access more data with the new Presto connection
27 Apr 2023
You can now work with data from Presto data sources. For information, see Presto connection.
Week ending 21 April 2023
Drill down into the details of profiling results (IBM Knowledge Catalog)
20 Apr 2023
You can now access detailed profiling information from within a metadata enrichment or from an asset’s Profile tab in a project or a catalog. For each column, view statistical information about the column data, information about data classes, data types and formats, and the frequency distribution of values in the column. For the statistical information, you can also choose between several types of visualizations. To populate these views for an existing profile, update the profile.
For details, see Column-level profile details.
Week ending 14 April 2023
Default Python and CPLEX versions updated (Decision Optimization)
13 Apr 2023
The default Python for Decision Optimization users is now 3.10 and the default CPLEX version is 22.1. These versions are used by default when you create a new experiment. Python 3.9 is deprecated and will soon be removed. To update your environment, see Configuring Environments. To update existing deployed models, see Model deployment.
Enhancements to data quality rules (IBM Knowledge Catalog)
13 Apr 2023
You can now also run data quality rules on data assets from these data sources:
- Amazon S3 (CSV files only)
- Apache Cassandra
- SAP ASE
When you configure a data quality rule with externally managed bindings, you can now select additional content for output links in the associated DataStage flow. For more information, see Creating rules from data quality definitions.
Week ending 7 April 2023
New: Time Series anomaly detection experiment (Beta)
7 Apr 2023
Use AutoAI to train a time series anomaly prediction model that can detect anomalies, or unexpected results, when the model predicts results based on new data. Model candidate pipelines generated by the experiment are ranked according to how well they perform measured by the optimizing metric. Save a model as a notebook to review the code, or save and deploy a model to detect potential anomalies in new data.
Filter your asset activity in a project
6 Apr 2023
In the Assets pane on the Overview tab of a project, you can filter assets by selecting By you or By all using the dropdown. By you lists assets edited by you, ordered by most recent at the top. By all lists assets edited by others and also by you, ordered by most recent at the top.
Upgrade to Spark with R 4.2 in watsonx.ai Studio (formerly Watson Studo)
3 Apr 2023
Spark R 3.6 environments for watsonx.ai Studio are upgraded to R 4.2. All Spark R 3.6 environments are now deprecated and will be removed on 15 June 2023. Starting on 11 May 2023, you can no longer create new notebooks or new Data Refinery flows with Spark R 3.6. Additionally, you will not be able to create new Spark R 3.6 custom environments. At that time, you might need to update some package versions and scripts for your notebooks. You must update your assets and deployments to use Spark with R 4.2 before 15 June 2023.
See Changing the environment for a notebook. For details on the libraries and packages for R versions, see the CRAN release notes.
New Spark with R 4.2 environment for running Data Refinery flow jobs
3 Apr 2023
You can now select Default Spark 3.3 & R 4.2 when you select an environment for a Data Refinery flow job. The new environment uses the same capacity unit hours (CUHs) as the other Default environments.
The Default Spark 3.2 & R 3.6 environment is deprecated and will be discontinued in a future update. Change your Data Refinery flow jobs to use the new Default Spark 3.3 & R 3.6 environment.
For information about environments for Data Refinery, see Compute resource options for Data Refinery in projects.
The environment change affects two GUI operations. If you have existing Data Refinery flows that include these GUI operations, you must update the Data Refinery flow.
- Split
- Tokenize
To update a flow, open it, save it. For details, see Managing Data Refinery flows.
Week ending 31 March 2023
Create custom assets from a catalog
31 Mar 2023
Admins and editors can now create custom assets inside the Catalog UI. To add a new custom asset, select Custom asset from the Add to catalog dropdown menu. To learn more about custom assets, see Custom asset types, properties, and relationships in Adding assets to a catalog (Watson Knowledge Catalog).
Improvements and enhancements in Watson Query
29 Mar 2023
Watson Query has been updated to provide the following capabilities:
- With asynchronous virtualization, you can view the status details of a virtualization job any time on the Virtualized data page. If the virtualized tables are large and the job takes longer, you can work on other tasks, such as virtualizing more tables, while the job finishes.
- With asynchronous publishing and assignments on the Virtualized data page, you can work on other tasks while the publishing and assignment jobs finish.
- You can use jobs in the web client to collect statistics on virtualized tables. For more information, see Collecting statistics in the web client in Watson Query.
- You can view the publishing or assignment history of an object on the Virutualized data page. Click an object row from the list to view its publishing and assignment history in the right side panel of the Virutualized data page.
Week ending 24 March 2023
Federated Learning runs on Mac computers with M-series chips
23 Mar 2023
Run your Federated Learning experiments on M1 Mac and M2 Mac computers in the latest runtime. For requirements, see Set up your system.
Week ending 17 March 2023
Define composite keys in reference data sets (IBM Knowledge Catalog)
17 Mar 2023
You can now specify multiple columns to create a composite key for your reference data sets. Without a composite key, reference data values in a set are identified by a unique string in the code column. A composite key is a combination of the code column and up to 5 custom columns in a reference data set. A composite key is used to uniquely identify each reference data value. With a composite key, the values in the code column no longer need to be unique. Uniqueness is guaranteed only when the values of all the specified columns are combined. For details, see Designing reference data sets.
Week ending 10 March 2023
Create queries, reports, or dashboards based on custom relationships (IBM Knowledge Catalog)
9 Mar 2023
When you create custom relationships between assets and governance artifacts, you can sync them to IBM Knowledge Catalog Reporting Data Mart, so that you can create reports. For example, you can use the custom relationships reporting to:
- Obtain quality analytics at various levels of granularity (by domain, by metadata, by user, by team)
- Certify the data quality of your data
- Count the number of assets that have a specific privacy property
To learn how to create custom relationships, see Custom properties and relationships for governance artifacts and catalog assets (IBM Knowledge Catalog).
To learn how to create reports, see Setting up reporting for IBM Knowledge Catalog.
Runtime 22.1 on Python 3.9 deprecation for watsonx.ai Studio (formerly Watson Studio) and watsonx.ai Runtime (formerly Watson Machine Learning)
9 Mar 2023
IBM Runtime 22.1 on Python 3.9 is now deprecated and will be removed on Jun 15, 2023. Starting on May 11, 2023, you can no longer create new notebooks or create custom environments using the 22.1 runtimes. You will also be unable to train new models with Python 3.9 software specifications. Update your assets and deployments to use IBM Runtime 22.2 on Python 3.10 before June 15, 2023:
- For details on migrating an asset to a supported framework and software specification, see Managing frameworks and software specifications.
- For details on notebook environments, see Compute resource options for the notebook editor in projects.
- For information on changing your environments, see Changing the environment of a notebook.
Run data quality rules on additional data sources (IBM Knowledge Catalog)
9 Mar 2023
You can now run data quality rules on data assets from these data sources:
- IBM Data Virtualization
- Microsoft Azure Data Lake Storage
- Snowflake
New option for binding variables in data quality rules (IBM Knowledge Catalog)
9 Mar 2023
You can now also use job parameters to bind rule variables to data columns and manage those parameters centrally in a project. Thus, you don’t need to update the rules when, for example, you want to change the binding to a different column. See Creating rules from data quality definitions.
Week ending 3 March 2023
Enhancements for AI Factsheets (watsonx.ai Runtime, formerly Watson Machine Learning)
3 March 2023
You can now attach files and images to a factsheet. For details, see Customizing details for a factsheet. Factsheets also display additional Watson OpenScale metrics from explainability and custom monitors. For details, see Viewing factsheets.
Create, store, and share machine learning features (Beta) (watsonx.ai Studio formerly Watson Studo)
2 March 2023
You can now speed the development of machine learning models by creating and sharing features. You add a feature group to a data asset in a project to identify the features of that data set. You can share the features with your organization by publishing the data asset to a catalog, which acts as a feature store. See Managing feature groups.
Week ending 24 February 2023
Manage custom relationships (IBM Knowledge Catalog)
24 February 2023
Now, you can manage custom relationships between catalog assets and governance artifacts in the Overview page of an asset.
To learn how to create custom relationships, see Custom properties and relationships for governance artifacts and catalog assets (IBM Knowledge Catalog).
Week ending 17 February 2023
Data Refinery Calculate operation works on Date columns
17 Feb 2023
You can now use the Calculate operation on Date data type columns to add or subtract day or month values.
For information about GUI operations, see GUI operations in Data Refinery.
New library to access project assets in watsonx.ai Studio (formerly Watson Studo)
17 Feb 2023
The ibm-watson-studio-lib
library contains a set of functions that help you to interact with watsonx.ai Studio projects and project assets. The library can be used in notebooks that are created in the notebook editor and is available
for Python and R. It is the successor of the project_lib
library. For details, see Using ibm-watson-studio-lib.
"Default Spark 3.2 & R 3.6 " environment discontinued (Data Refinery)
17 Feb 2023
The Default Spark 3.2 & R 3.6 environment will no longer be available effective February 17, 2023.
If you have any Data Refinery flow jobs set up with the Default Spark 3.2 & R 3.6 environment or a custom environment that uses Spark 3.0, the jobs will fail. Change the environment to Default Spark 3.3 & R 3.6 or Default Data Refinery XS or a custom environment that does not use Spark 3.0.
For information about environments for Data Refinery, see Compute resource options for Data Refinery in projects.
New features for data quality rules (IBM Knowledge Catalog)
16 Feb 2023
These new capabilities are available:
- Use more than one data quality definition in a single data quality rule. In addition, you can include an individual definition more than once to apply the same definition to different columns. For details, see Creating rules from data quality definitions.
- Download rule output as CSV file. If an output table is defined for the rule, you can now also download the rule output as a CSV file from the rule's run history, for example, for use in a spreadsheet program.
- Run rules on data from Amazon Redshift and Greenplum data sources. See Supported data sources for metadata import, metadata enrichment, and data quality rules.
- Export and import data quality assets. When you export project assets to desktop, you can now include data quality assets. See Exporting a project.
Week ending 10 February 2023
Import assets from a project or space into an existing space (watsonx.ai Runtime formerly Watson Machine Learning)
9 Feb 2023
You can now import a deployment space or a project (in .zip format) into an existing deployment space. Add assets or update existing assets to a space. For example, you can replace a model with a newer version. For details, see Importing spaces and projects into existing spaces.
Use more macros in DataStage
10 Feb 2023
You can add the DSJobController macro to stage properties or in the transformer functions.
The macro acts as DataStage function and outputs data without the need for arguments, simplifying the setup of DataStage jobs and flows.
For more information, see Macros.
Week ending 3 February 2023
Use more macros in DataStage
6 Feb 2023
You can add the following macros to stage properties or in the transformer functions:
- DSProjectId
- DSJobRunId
- DSJobId
The macros act as DataStage functions and output data without the need for arguments, simplifying the setup of DataStage jobs and flows.
For more information, see Macros.
Week ending 20 January 2023
Edit input columns in DataStage stages
20 Jan 2023
You can now edit columns through the input tab of a stage in DataStage. Your changes are propagated to the previous stage in the flow.
New options for metadata import (IBM Knowledge Catalog)
19 Jan 2023
To ensure that the target project or catalog of your metadata import doesn't contain stale data, you can now configure the import to clean up data assets that can't be reimported. Select to delete assets that are no longer available in the data source, that were removed from the import scope, or both from the import target when the metadata import is rerun. See Importing metadata.
Export data from Decision Optimization experiments to your project
18 Jan 2023
You can now export tables to your project from either the Prepare data or Explore solution view in your Decision Optimization experiment. This enables you to reuse your data in other models or services. You can also export data using the
Decision Optimization Python client.
See Exporting data from Decision Optimization experiments.
Week ending 13 January 2023
Updated Data fabric use cases
12 Jan 2023
The Data fabric uses cases are updated to better reflect how you use our products:
- Data integration: This use case now includes Pipelines.
- Data governance: This use case now includes Match 360.
- AI governance: This use case now focuses on monitoring, maintaining, automating, and governing AI models in production.
- Data Science and MLOps: This new use case explains how to operationalize data analysis and model creation.
Customize the web browser to support your brand
12 Jan 2023
As an administrator, you can add custom product names, logos, and other graphics to customize the branding of the web browser for Cloud Pak for Data as a Service.
Week ending 6 January 2023
Connect to more data sources in DataStage
6 Jan 2023
You can now include data from these data sources in your DataStage flows:
- Dremio
- SingleStoreDB
For the full list of DataStage connectors, see DataStage connectors.