Choosing compute resources for running tools in projects
You use compute resources in projects when you run jobs and most tools. Depending on the tool, you might have a choice of compute resources for the runtime for the tool.
Compute resources are known as either environment templates or hardware and software specifications. In general, compute resources with larger hardware configurations incur larger usage costs. Many tools in projects use the Watson Studio service for compute resources, but some tools use other services. Each service tracks and bills compute usage separately.
These tools have multiple choices for configuring runtimes that you can choose from:
- Notebook editor
- Data Refinery
- SPSS Modeler
- DataStage flow editor
- AutoAI
- Decision Optimization experiment
- RStudio IDE
These tools have one runtime configuration that is assigned automatically:
The following tools do not consume compute resources:
- Metadata import
- Master Data configuration
Profiling data assets
Profiling a data asset in a project or a catalog consumes 6 CUH per hour from the IBM Knowledge Catalog, with a minimum amount of 0.96 CUH per profiling session. Profiling requires the IBM Knowledge Catalog service.
The runtime for profiling does not appear on the Resource usage page of the Manage tab of the project. You can't track compute usage for profiling.
Metadata enrichment
Metadata enrichment requires the IBM Knowledge Catalog service. The amount of CUH per hour from IBM Knowledge Catalog that metadata enrichment jobs consume depends on the enrichment objectives that you select.
Metadata enrichment objectives | Capacity units per hour (CUH) |
---|---|
Profile data | 6 |
Profile data and assign terms | 8 |
When you run metadata enrichment, one or more jobs are started. Each job handles a maximum of 200 tables. When you enrich more than 200 tables at a time, you start multiple jobs. For example, if you run metadata enrichment on 500 tables, you start three jobs. The minimum amount of CUH that is billed for each metadata enrichment job is 0.96 CUH.
Jobs for metadata enrichment with the Expand metadata option or semantic term assignment are limited to 10 tables per job.
The amount of CUH consumed by metadata enrichment depends on the number of tables, as well as columns in the tables. Other factors, such as the structure of the data, can also affect the amount of consumed CUH. For example:
- The three jobs for profiling data for 500 tables with 500 columns might consume a total of approximately 24 CUH.
- The three jobs for profiling data and assigning terms for 500 tables with 500 columns might consume a total of approximately 30 CUH.
The runtimes for metadata enrichment does not appear on the Resource usage page of the Manage tab of the project. You can't track compute usage for metadata enrichment.
Data quality rules
A data quality rule job runs as a DataStage flow with the Default DataStage PX S environment, which consumes 1 CUH per hour, with a minimum of 1 minute of CUH. Data quality rules require the IBM Knowledge Catalog and DataStage services.
The runtime for data quality rules appears as a DataStage flow on the Resource usage page of the Manage tab of the project.
Learn more
Parent topic: Projects