Metadata that is automatically updated based on analysis by machine learning processes. For example, profiling and data quality analysis automatically update metadata for data assets.
An instance of an environment that is running to provide compute resources to analytical assets.
Formula applied to data to determine optimal ways to solve analytical problems.
An item that contains information about data, other valuable information, or code that works with data.
See also: data asset
An automated training process that considers a series of training definitions and parameters to create a set of ranked pipelines as model candidates.
A word or phrase that defines a business concept in a standard way for an enterprise. Terms can be used to enrich the metadata of data assets and to define the criteria of data protection rules.
A repository of assets for an organization to share.
In Watson Knowledge Catalog, assets in catalogs can be governed by data protection rules and enriched by other governance artifacts, such as classifications, data classes, and business terms. Catalogs can store structured and unstructured data, references to data in external data sources, and other types of assets, like machine learning models.
In Watson Knowledge Catalog, a collaborative workspace for organizing and managing governance artifacts.
In Watson Knowledge Catalog, a governance artifact that describes the sensitivity level of the data in a data asset.
To ensure that all values in a data set are consistent and correctly recorded.
The hardware and software resources that are defined by an environment template to run assets in tools.
A table that provides a detailed numeric breakdown of annotated document sets. The table is used to compare the annotations that were added by a machine learning model to the annotations in the vetted data. The table reports the number of false positives, false negatives, true positives, and true negatives.
connected data asset
A pointer to data that is accessed through a connection to an external data source.
The information required to connect to a database. The actual information that is required varies according to the DBMS and connection method.
An asset that contains information that enables connecting to a data source.
connected folder asset
A pointer to a folder in IBM Cloud Object Storage.
In Decision Optimization, a condition that must be satisfied by the solution of a problem.
In databases, a relationship between tables.
The automated tasks of monitoring model performance, retraining with new data, and redeploying to ensure prediction quality.
Core ML deployment
The process of downloading a deployment in Core ML format for use in iOS apps.
To create a data asset and prepare it to be published in a catalog. Curation can include enriching the data asset by assigning governance artifacts such as business terms, classification, and data classes, and analyzing the quality of the data in the data asset.
An asset that points to data, for example, to an uploaded file. Connections and connected data assets are also considered data assets.
See also: asset
A governance artifact that categorizes columns in relational data sets according to the type of the data and how the data is used.
The process of combining technical and business processes that are then used to combine data from disparate sources into meaningful and valuable information.
The process of collecting critical business information from a data source, correlating the information, and uncovering associations, patterns, and trends.
See also: predictive analytics
data protection rule
A governance artifact that specifies what data to control and how to control it. A data protection rule contains criteria and an action.
data quality analysis
The analysis of data against the quality dimensions accuracy, completeness, consistency, timeliness, uniqueness, and validity.
data quality definition
A data quality definition describes a rule evaluation or condition for data quality rules.
data quality rule
During data quality analysis, a data quality rule assesses data for whether specific conditions are met and identifies records that do not meet the conditions as rule violations.
Data Refinery flow
A set of steps that cleanse and shape data to produce a new data asset.
The analysis and visualization of structured and unstructured data to discover insights and knowledge.
A collection of data, usually in the form of rows (records) and columns (fields) and contained in a file or database table.
A repository, queue, or feed for reading data, such as a Db2 database.
An ordered set of steps to extract, transform, and load data (ETL).
A collection of data, usually in the form of rows (records) and columns (fields) and contained in a table.
A model or application package that is available for use.
A workspace where models are deployed and deployments are managed.
A Python API for modeling and solving Decision Optimization problems.
A network destination address that identifies resources, such as services and objects. For example, an endpoint URL is used to identify the location of a model or function deployment when a user sends payload data to the deployment.
The compute resources for running jobs.
A definition that specifies hardware and software resources to instantiate environment runtimes.
An instantiation of the environment template to run analytical assets.
A model training process that considers a series of training definitions and parameters to determine the most accurate model configuration.
See also: AutoAI experiment
A set of columns of a particular data asset along with the metadata that is used for machine learning.
Identifying the columns of data that best support an accurate prediction or score.
In AutoAI, a phase of pipeline creation that applies algorithms to transform and optimize the training data to achieve the best outcome for the model type.
The training of a common machine learning model that uses multiple data sources that are not moved, joined, or shared. The result is a better-trained model without compromising data security.
A collection of nodes that define a set of steps for processing data or training a model.
A graphical representation of a project timeline and duration in which schedule data is displayed as horizontal bars along a time scale.
Governance items that enrich or control data assets. Governance artifacts include business terms, classifications, data classes, policies, rules, and reference data sets.
A governance artifact that provides a natural-language description of the criteria that are used to determine whether data assets are compliant with business objectives.
A task-based process to control the creating, modifying, and deleting of governance artifacts.
A catalog in which data protection rules are enforced.
A tool for creating analytical assets by visually coding. A canvas is an area on which to place objects or nodes that can be connected to create a flow.
In machine learning, a parameter whose value is set before training as a way to increase model accuracy.
hyperparameter optimization (HPO)
The process for setting hyperparameter values to the settings that provide the most accurate model.
A software package that contains a set of libraries.
To continuously add a high-volume of real-time data to a database.
A separately executable unit of work.
machine learning framework
The libraries and runtime for training and deploying a model.
To obfuscate, substitute, or redact data in a column, as defined by data protection rules.
A flow that produces permanently masked copies of data.
1. In Match 360, a consolidated view of data from the disparate sources.
2. For model training, reference data that remains the same for several jobs on the same model but that can be changed, if necessary.
A method of importing metadata that is associated with data assets, including process metadata that describes the lineage of data assets and technical metadata that describes the structure of data assets.
1. In a machine learning context, a set of functions and algorithms that are trained and tested on a data set to provide predictions or decisions.
2. In Decision Optimization, a mathematical formulation of a problem to be solved.
model use case
Tracks the lifecycle of a model from request to production in the Model inventory.
A methodology for managing the full lifecycle of an AI model, including training, deployment, scoring, evaluation, retraining, and updating.
A methodology that takes a machine learning model from development to production.
See also: ModelOps
natural language processing library
A library that provides basic natural language processing functions for syntax analysis and out-of-the-box pre-trained models for a wide variety of text processing tasks.
A mathematical model for predicting or classifying cases by using a complex mathematical scheme that simulates an abstract version of brain cells. A neural network is trained by presenting it with a large number of observed cases, one at a time, and allowing it to update itself repeatedly until it learns the task.
In an SPSS Modeler flow, the graphical representation of a data operation.
An interactive document that contains executable code, descriptive text for that code, and the results of any code that is run.
The part of the notebook editor that executes code and returns the computational results.
To replace data in a column with similarly formatted values that match the original format. A form of masking.
A method of storing data, typically used in the cloud, in which data is stored as discrete units, or objects, in a storage pool or repository that does not use a file hierarchy but that stores all objects at the same level.
Method of accessing a model or Python code deployment through an API endpoint as a web service to generate predictions online, in real time.
See: Optimization Programming Language
An asset that runs code in a tool or a job.
The process of finding the most appropriate solution to a precisely defined problem while respecting the imposed constraints and limitations. For example, determining how to allocate resources or how to find the best elements or combinations from a large set of alternatives.
Optimization Programming Language (OPL)
A modeling language for expressing model formulations of optimization problems in a format that can be solved by CPLEX optimization engines such as IBM CPLEX.
See also: model formulation
The process of creating an end-to-end flow that can train, run, deploy, test, and evaluate a machine learning model.
In Federated Learning, an entity that contributes data for training a common model. The data is not moved or combined but each party gets the benefit of the federated training.
The data that is passed to a deployment to get back a score, prediction, or solution.
1. In Watson Pipelines, an end-to-end flow of assets from creation through deployment.
2. In AutoAI, a candidate model.
In AutoAI, a table that shows the list of automatically generated candidate models, as pipelines, ranked according to the specified criteria.
A governance artifact that consists of one or more data protection rules.
A business process and a set of related technologies that are concerned with the prediction of future possibilities and trends. Predictive analytics applies such diverse disciplines as probability, statistics, machine learning, and artificial intelligence to business problems to find the best action for a specific situation.
See also: data mining
In Watson Knowledge Catalog, the category that contains the governance artifact. A category is similar to a folder or directory that organizes a user's governance artifacts.
See also: secondary category
The generated metadata and statistics about the content and format of data.
A collaborative workspace for working with data and other assets.
To copy an asset into a catalog.
A programming language that is used in data science and AI.
A function that contains Python code to support a model in production.
See: data quality analysis
An extensible scripting language that is used in data science and AI that offers a wide variety of analytic, statistical, and graphical functions and techniques.
To replace data values in a column with a string of one repeated character to hide sensitive values, data format, and referential integrity.
reference data set
A governance artifact that defines values for specific types of columns.
To cleanse and shape data.
In Watson Knowledge Catalog, a governance artifact that contains information, criteria, or logic to analyze or protect data. Some rules are enforced and some are informational.
See also: data protection rule, data quality rule, governance rule
The predefined or custom hardware and software configuration that is used to run tools or jobs, such as notebooks.
1. The process of computing how closely the attributes for an incoming identity match the attributes of an existing entity.
2. In machine learning, the process of measuring the confidence of a predicted outcome.
A file that contains Python or R scripts to support a model in production.
An optional category that references the governance artifact.
See also: primary category
Data that contains information that should not be visible to all users. For example, personally identifiable information or other information that is restricted by privacy regulations.
To customize data by filtering, sorting, removing columns; joining tables; performing operations that include calculations, data groupings, hierarchies and more.
In SPSS Modeler, the process of performing many data preparation and mining operations directly in the database through SQL code.
Data that resides in fixed fields within a record or file. Relational databases and spreadsheets are examples of structured data.
In Watson Knowledge Catalog, to replace data in a column with values that don't match the original format but retain referential integrity.
A type of model that automatically identifies and classifies text into specified categories.
A set of values of a variable at periodic points in time.
A model that is ready to be deployed.
The initial stage of model building, involving a subset of the source data. The model can then be tested against a different subset for which the outcome is already known.
Any data that is stored in an unstructured format rather than in fixed fields. Data in a word processing document is an example of unstructured data.
A model for deep learning that allows raw, unlabeled data to be used to train a system with little to no human effort.
To create a virtual table to segment or combine data from one or more tables.
A graph, chart, plot, table, map, or any other visual representation of data.