Advanced data profiling
Advanced profiling provides more accurate results than regular profiling but takes longer to complete because large amounts of data must be processed.
The DataStage service must be deployed for running advanced profiling.
All operations that are run as part of a metadata enrichment require credentials for secure authorization. Typically, your user API key is used to execute such long-running operations without disruption. If credentials are not available when you try to run advanced profiling, you are prompted to create an API key. That API key is then saved as your task credentials. See Managing the user API key.
If any of the connections to the data sources are locked, you are asked to enter your personal credentials. This is a one-time step that permanently unlocks the connections for you.
To run advanced data profiling on one or more assets:
-
Open the metadata enrichment asset.
-
On the Assets tab, select assets as required.
-
Select Enrich > Run advanced data profiling from the toolbar.
-
Optional: Customize settings.
-
Select whether you want to write frequency distribution information to a database table and determine how many distinct values you want to capture.
Without an output table, the first 100 distinct values are captured and stored internally. You can view and download that information from the Statistics page of a column profile.
If you choose to write frequency distribution information to a table, enable the External output option. The section is prepopulated with the default enrichment settings. See Advanced profiling settings. You can change the settings as required for this individual advanced profiling run. If you change the output table, you can also set this table as the new default location, thus overwriting the previous default setting.
You can access this table by using standard database queries or through the detailed column profile. For more information, see Frequency distributions.
-
Select a sampling type. See Creating a metadata enrichment asset.
-
-
Click Run. You are notified when the analysis is complete.
Learn more
Parent topic: Enriching your data assets