Configuring agents for lineage metadata import

Last updated: Jun 27, 2025

Configure Manta agents in the same location or network segment as the external system to extract lineage metadata from these systems and visualize this data on a lineage graph.

Overview

In most cases, you can access many data sources directly from Cloud Pak for Data as a Service. However, it is not always possible or optimal. You can then use Manta agents, which you install in the same location or network segment as the external system from which you want to extract metadata for lineage analysis. The most common use cases are:

It is not possible to connect to an on-premises data source.
You connect to a data source that requires specific third-party tools or libraries and you can't or don't want to install these tools or libraries on Cloud Pak for Data as a Service.
Your data centers are distributed in many geographical locations and you want to avoid delays of data transfer (network latency).

The following list summarizes the steps that are required to import lineage metadata by using the Manta agents:

Download the Manta agent executable files and save them in the target location. These files are compressed in a .zip file. Extract the file.
Register a new agent instance in Manta Data Lineage, and save the configuration file.
Copy the agent instance configuration file to the target location and start the agent.
When you create a metadata import, select the agent from the list.

Note:

Each instance of a data source might require an individual agent instance, depending on the access settings. For example, if you have three instances of IBM Cognos Analytics, you might need to register three agent instances, and configure them independently on each Cognos Analytics instance. Provide meaningful names for the agent instances to know to which data source instance the agent is connected.

Supported data sources

You can use the agents with the following data source:

IBM Cognos Analytics. When you create a metadata import, using agents is the only way to connect to Cognos Analytics. You select agents in the Connection mode option when you create a metadata import.
Microsoft Azure Databricks

Agent status

The agent can have the following statuses:

Online: The agent is configured and connected. It is ready to be used.
Offline: The agent is configured but is not connected at the moment.
Registered: The agent is registered but needs to be configured on the external system. For more information, see Configuring the agent on the external system.

Prerequisites

On the external system, create a dedicated operating system user account to run the agent. The agent executable files and agent configuration file are stored on this user account. Use Java Runtime Environment (JRE) version 21 or higher.

It is important to create a dedicated user account on the external system to ensure the security of the data. The agent configuration file contains confidential information that includes a username and API key. This file must be always protected. On the external system only authorized users can access it. With a dedicated user account for running the agent on the external system, the confidential data is secure. Also, even when data is compromised, the impact is limited to one agent instance only.

Downloading Manta Agent executable files

Download the Manta Agent executable files from the Fix Central website.

Extract the .zip file in a location where the executable files are allowed. For example, it can be /usr/local/bin/manta-agent on the Linux operating system, and C:/manta-agent on the Windows operating system.

Make sure that you install the latest agent version. For information about how to upgrade the current agent installation, see Updating agent version.

Registering an agent in Manta Data Lineage

To register a new agent, complete these steps in Manta Data Lineage:

Go to Administration > Configurations and settings > Data lineage setup.
On the Manage agents tab, click New agent.
If you already have the Manta agent file on your external system, go to the next step. If not, download it and extract it on the external system.
Define the name for the agent instance. It cannot contain spaces.
Click Register.
Download the configuration file. You will use it to finish configuring the agent on the external system.

Important: Store this file somewhere safe. You cannot recover it. If you lose it, you need to regenerate the API key to create a new configuration file, and update all scripts and applications to use the new API key.

At this point, the agent status is Registered.

Configuring the agent on the external system

To finish the configuration of the agent on the external system, complete these steps:

Copy the agent configuration file to the same location where you extracted the agent executable files.
Run the starting script, which is run.sh or run.bat, depending on your operating system. The script is in the bin folder.

Note: Use agent configuration file for only one agent installation. If you reuse the same configuration file for more than one agent installation, the older agent still runs, it is not stopped, but it is not used anymore. Any metadata import jobs are then run by the agent that was installed later.

At this point, the agent status is Online. It is ready to be used in the metadata import. For more information, see Creating metadata imports.

When the agent is run for the first time, the data folder is created in the location where you extracted the .zip file. The data folder contains log files for the agent, where you can find the agent's status updates and information about ongoing extraction jobs.
In the bin folder, you can find the README.md file with useful information about the agent.

Updating agent version

From time to time, you must update the agent version to the latest version. When the current agent version is outdated, the agent is not started and the log files contain an error message that you must install the latest version.

To update the agent, complete these steps:

Download the latest agent version from the Fix Central website.
Save the agent files in another destination than the previous agent version and extract the new agent files.
Stop the previous agent version by running the shutdown.sh or shutdown.bat scripts, depending on your operating system.
Create a backup copy of the previous agent config.json configuration file, and save it in the new agent folder. Do not move the data folder to the new location.
Delete the entire folder with the previous agent files.
Start the new agent by running the run.sh or run.bat scripts, depending on your operating system.
Go to Data > Data lineage > Data lineage setup > Manage agents and verify that the status of the new agent is Online.

The agent is updated. You do not need to modify the API key.

Regenerating API key

In some cases, you might need to regenerate the API key for an agent. For example, when the agent configuration file is lost. In this case, the API key of the associated Service ID must be regenerated and a new configuration file created.

To regenerate API key, complete these steps:

Go to Data > Data lineage > Data lineage setup.
On the Manage agents tab, find the agent that you want to update and click it to display the details panel.
Click Regenerate API key.
Download the new configuration file.
On the external system, replace the old configuration file with the new one.
Restart the agent by using the shutdown.sh or shutdown.bat, and run.sh or run.bat scripts, depending on your operating system.

The old API key is automatically removed.

Removing an agent

To remove an agent, complete these steps, in any order:

On the Manage agents tab in Cloud Pak for Data as a Service, find the agent, open the details panel, and click Delete agent.
On the external system, stop the agent by using the shutdown.sh or shutdown.bat script, and delete the files that you extracted from the .zip file and the configuration file for the agent.

Configuring agent settings in the `setenv` scripts

You can configure the following settings for each agent installation:

Memory settings
The AGENT_JVM_OPTS property controls the Java virtual machine settings for the agent, primarily memory allocation.

Example values:

Linux or macOS operating systems: export AGENT_JVM_OPTS="-Xms1g -Xmx4g -XX:+UseG1GC"
Windows operating system: set "AGENT_JVM_OPTS=-Xms1g -Xmx4g -XX:+UseG1GC"

You can adjust the following parameters for the AGENT_JVM_OPTS property:

-Xms: This parameter sets the initial Java heap size. For example, you can set it to 1g, which means 1 gigabyte.
-Xmx: This parameter sets the maximum Java heap size. If the agent processes large data sources or out of memory errors occur when the agent is run, you might increase the value for this parameter, for example to -Xmx8g or -Xmx16g. Monitor the agent's memory consumption to find an optimal value.
-XX:+UseG1GC: This parameter selects the G1 (Garbage-First) garbage collector, which can provide better performance for applications with larger heap sizes.

Agent extractor memory
The LINEAGE_AGENT_EXTRACTOR_MEMORY property specifies the maximum memory (in megabytes) that the extractor part of the agent can use.

Example values:

Linux or macOS operating systems: export LINEAGE_AGENT_EXTRACTOR_MEMORY=4096
Windows operating system: set "LINEAGE_AGENT_EXTRACTOR_MEMORY=4096"

If the default value is not set, it might be derived from the system memory or a pre-configured internal default. If agent extracts large or complex data sources, and out of memory errors occur, you might increase the value to 8192 for 8 GB, or 16384 for 16 GB. When you adjust the value, check how much memory is allocated to the main agent by using the AGENT_JVM_OPTS, and do not set a value that is higher than the total system memory.

Agent dictionary batch size
The LINEAGE_AGENT_DICTIONARY_BATCH_SIZE property specifies how many dictionary entries are sent to the central service in a single batch.

Example values:

Linux or macOS operating systems: export LINEAGE_AGENT_DICTIONARY_BATCH_SIZE=1000
Windows operating system: set "LINEAGE_AGENT_DICTIONARY_BATCH_SIZE=1000"

The default value is around 500 or 1000. You might increase the value to 2000 or 5000 when you populate large dictionaries and when there is a network latency between the agent and the server. If the memory consumption is too high, you might set a lower value than the default value.

Logging level
The LOGGING_LEVEL_COM_IBM_WDP_DATALINEAGE property adjusts the verbosity of the agent's logs, specifically for lineage-related components.

Example values:

Linux or macOS operating systems: export LOGGING_LEVEL_COM_IBM_WDP_DATALINEAGE=DEBUG
Windows operating system: set "LOGGING_LEVEL_COM_IBM_WDP_DATALINEAGE=DEBUG"

You can set this property to one of these values: INFO (default), DEBUG, WARN, ERROR. When you investigate issues or work on issues with IBM support, set this property to DEBUG. In most cases, the default value INFO is sufficient.

Procedure

To modify these settings, complete these steps:

In the agent installation folder, go to the bin folder, and open the setenv script for editing. Depending on your operating system, the script is setenv.sh or setenv.bat.
Uncomment the property that you want to modify, and provide your custom values.
Save your changes.
Start the new agent by running the run.sh or run.bat scripts, depending on your operating system.