Microsoft Azure Databricks lineage configuration
To import lineage metadata from Microsoft Azure Databricks, create a connection, data source definition and metadata import job.
To import lineage metadata for Microsoft Azure Databricks, complete these steps:
- Create a data source definition.
- Create a connection to the data source in a project.
- Create a metadata import.
Creating a data source definition
Create a data source definition. Select Microsoft Azure Databricks as the data source type.
Creating a connection to Microsoft Azure Databricks
Create a connection to the data source in a project. For connection details, see Microsoft Azure Databricks connection.
Creating a metadata import
Create a metadata import. Learn more about options that are specific to Microsoft Azure Databricks data source:
Connection mode
You can connect to Microsoft Azure Databricks by using one of the following connection modes:
- Direct connection
- Remote connection with a Manta agent. When an agent is configured, select it from the list. For more information, see Configuring agents for lineage metadata import.
Include and exclude lists
You can include or exclude assets up to the schema level. Provide catalogs and schemas in the format catalog/schema. Each part is evaluated as a regular expression. Assets which are added later in the data source will also be included or excluded if they match the conditions specified in the lists. Example values:
myCatalog/
: all schemas inmyCatalog
,myCatalog/.*
: all schemas inmyCatalog
,myCatalog3/mySchema1
:mySchema1
frommyCatalog3
,myCatalog4/mySchema[1-5]
: any schema in mymyCatalog4
with a name that starts withmySchema
and ends with a digit between 1 and 5
External inputs
If you use external Microsoft Azure Databricks dll archives, you can add them in a .zip file as an external input. You can organize the structure of the .zip file as the dll folder with subfolders or archives that represent the workspace structure. The .zip file can have the following structure:
<dll>
<catalog_name_folder>
<schema_name_folder>
<tables>
<table_name.sql>
<views>
<view_name.sql>
Advanced import options
- Display table lineage
- Generate edges between tables for which the column-level lineage information was not found.
Learn more
- Microsoft Azure Databricks connection
- Microsoft Azure Databricks
- Microsoft Azure Databricks documentation
Parent topic: Supported connectors for lineage import