0 / 0
Glossary

Glossary

A | B | C | D | E | F | G | H | I | J | M | N | O | P |Q | R | S | T | U | V

A

active metadata
Metadata that is automatically updated based on analysis by machine learning processes. For example, profiling and data quality analysis automatically update metadata for data assets.

active runtime
An instance of an environment that is running to provide compute resources to analytical assets.

algorithm
Formula applied to data to determine optimal ways to solve analytical problems.

asset
An item that contains information about data, other valuable information, or code that works with data.
See also: data asset

AutoAI experiment
An automated training process that considers a series of training definitions and parameters to create a set of ranked pipelines as model candidates.

B

business term
A word or phrase that defines a business concept in a standard way for an enterprise. Terms can be used to enrich the metadata of data assets and to define the criteria of data protection rules.

C

catalog
A repository of assets for an organization to share.
In Watson Knowledge Catalog, assets in catalogs can be governed by data protection rules and enriched by other governance artifacts, such as classifications, data classes, and business terms. Catalogs can store structured and unstructured data, references to data in external data sources, and other types of assets, like machine learning models.

category
In Watson Knowledge Catalog, a collaborative workspace for organizing and managing governance artifacts.

classification
In Watson Knowledge Catalog, a governance artifact that describes the sensitivity level of the data in a data asset.

cleanse
To ensure that all values in a data set are consistent and correctly recorded.

compute resources
The hardware and software resources that are defined by an environment template to run assets in tools.

confusion matrix
A table that provides a detailed numeric breakdown of annotated document sets. The table is used to compare the annotations that were added by a machine learning model to the annotations in the vetted data. The table reports the number of false positives, false negatives, true positives, and true negatives.

connected data asset
A pointer to data that is accessed through a connection to an external data source.

connection
The information required to connect to a database. The actual information that is required varies according to the DBMS and connection method.

connection asset
An asset that contains information that enables connecting to a data source.

connected folder asset
A pointer to a folder in IBM Cloud Object Storage.

constraint
In Decision Optimization, a condition that must be satisfied by the solution of a problem.
In databases, a relationship between tables.

continuous learning
The automated tasks of monitoring model performance, retraining with new data, and redeploying to ensure prediction quality.

Core ML deployment
The process of downloading a deployment in Core ML format for use in iOS apps.

curate
To create a data asset and prepare it to be published in a catalog. Curation can include enriching the data asset by assigning governance artifacts such as business terms, classification, and data classes, and analyzing the quality of the data in the data asset.

D

data asset
An asset that points to data, for example, to an uploaded file. Connections and connected data assets are also considered data assets.
See also: asset

data class
A governance artifact that categorizes columns in relational data sets according to the type of the data and how the data is used.

data integration
The process of combining technical and business processes that are then used to combine data from disparate sources into meaningful and valuable information.

data mining
The process of collecting critical business information from a data source, correlating the information, and uncovering associations, patterns, and trends.
See also: predictive analytics

data protection rule
A governance artifact that specifies what data to control and how to control it. A data protection rule contains criteria and an action.

data quality analysis
The analysis of data against the quality dimensions accuracy, completeness, consistency, timeliness, uniqueness, and validity.

data quality definition
A data quality definition describes a rule evaluation or condition for data quality rules.

data quality rule
During data quality analysis, a data quality rule assesses data for whether specific conditions are met and identifies records that do not meet the conditions as rule violations.

Data Refinery flow
A set of steps that cleanse and shape data to produce a new data asset.

data science
The analysis and visualization of structured and unstructured data to discover insights and knowledge.

data set
A collection of data, usually in the form of rows (records) and columns (fields) and contained in a file or database table.

data source
A repository, queue, or feed for reading data, such as a Db2 database.

DataStage flow
An ordered set of steps to extract, transform, and load data (ETL).

data table
A collection of data, usually in the form of rows (records) and columns (fields) and contained in a table.

deployment
A model or application package that is available for use.

deployment space
A workspace where models are deployed and deployments are managed.

DOcplex
A Python API for modeling and solving Decision Optimization problems.

E

endpoint URL
A network destination address that identifies resources, such as services and objects. For example, an endpoint URL is used to identify the location of a model or function deployment when a user sends payload data to the deployment.

environment
The compute resources for running jobs.

environment template
A definition that specifies hardware and software resources to instantiate environment runtimes.

environment runtime
An instantiation of the environment template to run analytical assets.

experiment
A model training process that considers a series of training definitions and parameters to determine the most accurate model configuration.
See also: AutoAI experiment

F

feature group
A set of columns of a particular data asset along with the metadata that is used for machine learning.

feature selection
Identifying the columns of data that best support an accurate prediction or score.

feature transformation
In AutoAI, a phase of pipeline creation that applies algorithms to transform and optimize the training data to achieve the best outcome for the model type.

federated learning
The training of a common machine learning model that uses multiple data sources that are not moved, joined, or shared. The result is a better-trained model without compromising data security.

flow
A collection of nodes that define a set of steps for processing data or training a model.

G

Gantt chart
A graphical representation of a project timeline and duration in which schedule data is displayed as horizontal bars along a time scale.

governance artifact
Governance items that enrich or control data assets. Governance artifacts include business terms, classifications, data classes, policies, rules, and reference data sets.

governance rule
A governance artifact that provides a natural-language description of the criteria that are used to determine whether data assets are compliant with business objectives.

governance workflow
A task-based process to control the creating, modifying, and deleting of governance artifacts.

governed catalog
A catalog in which data protection rules are enforced.

graphical builder
A tool for creating analytical assets by visually coding. A canvas is an area on which to place objects or nodes that can be connected to create a flow.

H

hyperparameter
In machine learning, a parameter whose value is set before training as a way to increase model accuracy.

hyperparameter optimization (HPO)
The process for setting hyperparameter values to the settings that provide the most accurate model.

I

image
A software package that contains a set of libraries.

ingest
To continuously add a high-volume of real-time data to a database.

J

job
A separately executable unit of work.

Jupyter notebook
See: notebook.

M

machine learning framework
The libraries and runtime for training and deploying a model.

mask
To obfuscate, substitute, or redact data in a column, as defined by data protection rules.

masking flow
A flow that produces permanently masked copies of data.

master data
1. In Match 360, a consolidated view of data from the disparate sources.
2. For model training, reference data that remains the same for several jobs on the same model but that can be changed, if necessary.

metadata import
A method of importing metadata that is associated with data assets, including process metadata that describes the lineage of data assets and technical metadata that describes the structure of data assets.

model
1. In a machine learning context, a set of functions and algorithms that are trained and tested on a data set to provide predictions or decisions.
2. In Decision Optimization, a mathematical formulation of a problem to be solved.

model use case
Tracks the lifecycle of a model from request to production in the Model inventory.

ModelOps
A methodology for managing the full lifecycle of an AI model, including training, deployment, scoring, evaluation, retraining, and updating.

MLOps
A methodology that takes a machine learning model from development to production.
See also: ModelOps

N

natural language processing library
A library that provides basic natural language processing functions for syntax analysis and out-of-the-box pre-trained models for a wide variety of text processing tasks.

neural network
A mathematical model for predicting or classifying cases by using a complex mathematical scheme that simulates an abstract version of brain cells. A neural network is trained by presenting it with a large number of observed cases, one at a time, and allowing it to update itself repeatedly until it learns the task.

node
In an SPSS Modeler flow, the graphical representation of a data operation.

notebook
An interactive document that contains executable code, descriptive text for that code, and the results of any code that is run.

notebook kernel
The part of the notebook editor that executes code and returns the computational results.

O

obfuscate
To replace data in a column with similarly formatted values that match the original format. A form of masking.

object storage
A method of storing data, typically used in the cloud, in which data is stored as discrete units, or objects, in a storage pool or repository that does not use a file hierarchy but that stores all objects at the same level.

online deployment
Method of accessing a model or Python code deployment through an API endpoint as a web service to generate predictions online, in real time.

OPL
See: Optimization Programming Language

operational asset
An asset that runs code in a tool or a job.

optimization
The process of finding the most appropriate solution to a precisely defined problem while respecting the imposed constraints and limitations. For example, determining how to allocate resources or how to find the best elements or combinations from a large set of alternatives.

Optimization Programming Language (OPL)
A modeling language for expressing model formulations of optimization problems in a format that can be solved by CPLEX optimization engines such as IBM CPLEX.
See also: model formulation

orchestration
The process of creating an end-to-end flow that can train, run, deploy, test, and evaluate a machine learning model.

P

party
In Federated Learning, an entity that contributes data for training a common model. The data is not moved or combined but each party gets the benefit of the federated training.

payload
The data that is passed to a deployment to get back a score, prediction, or solution.

pipeline
1. In Watson Pipelines, an end-to-end flow of assets from creation through deployment.
2. In AutoAI, a candidate model.

pipeline leaderboard
In AutoAI, a table that shows the list of automatically generated candidate models, as pipelines, ranked according to the specified criteria.

policy
A governance artifact that consists of one or more data protection rules.

predictive analytics
A business process and a set of related technologies that are concerned with the prediction of future possibilities and trends. Predictive analytics applies such diverse disciplines as probability, statistics, machine learning, and artificial intelligence to business problems to find the best action for a specific situation.
See also: data mining

primary category
In Watson Knowledge Catalog, the category that contains the governance artifact. A category is similar to a folder or directory that organizes a user's governance artifacts.
See also: secondary category

profile
The generated metadata and statistics about the content and format of data.

project
A collaborative workspace for working with data and other assets.

publish
To copy an asset into a catalog.

Python
A programming language that is used in data science and AI.

Python function
A function that contains Python code to support a model in production.

Q

quality analysis
See: data quality analysis

R

R
An extensible scripting language that is used in data science and AI that offers a wide variety of analytic, statistical, and graphical functions and techniques.

redact
To replace data values in a column with a string of one repeated character to hide sensitive values, data format, and referential integrity.

reference data set
A governance artifact that defines values for specific types of columns.

refine
To cleanse and shape data.

rule
In Watson Knowledge Catalog, a governance artifact that contains information, criteria, or logic to analyze or protect data. Some rules are enforced and some are informational.
See also: data protection rule, data quality rule, governance rule

runtime environment
The predefined or custom hardware and software configuration that is used to run tools or jobs, such as notebooks.

S

scoring
1. The process of computing how closely the attributes for an incoming identity match the attributes of an existing entity.
2. In machine learning, the process of measuring the confidence of a predicted outcome.

script
A file that contains Python or R scripts to support a model in production.

secondary category
An optional category that references the governance artifact.
See also: primary category

sensitive data
Data that contains information that should not be visible to all users. For example, personally identifiable information or other information that is restricted by privacy regulations.

shape
To customize data by filtering, sorting, removing columns; joining tables; performing operations that include calculations, data groupings, hierarchies and more.

SQL pushback
In SPSS Modeler, the process of performing many data preparation and mining operations directly in the database through SQL code.

structured data
Data that resides in fixed fields within a record or file. Relational databases and spreadsheets are examples of structured data.

substitute
In Watson Knowledge Catalog, to replace data in a column with values that don't match the original format but retain referential integrity.

T

text classification
A type of model that automatically identifies and classifies text into specified categories.

time series
A set of values of a variable at periodic points in time.

trained model
A model that is ready to be deployed.

training
The initial stage of model building, involving a subset of the source data. The model can then be tested against a different subset for which the outcome is already known.

U

unstructured data
Any data that is stored in an unstructured format rather than in fixed fields. Data in a word processing document is an example of unstructured data.

unsupervised learning
A model for deep learning that allows raw, unlabeled data to be used to train a system with little to no human effort.

V

virtualize data
To create a virtual table to segment or combine data from one or more tables.

visualization
A graph, chart, plot, table, map, or any other visual representation of data.

Generative AI search and answer
These answers are generated by a large language model in watsonx.ai based on content from the product documentation. Learn more