Adding data to a project
After you create a project, the next step is to add data assets to it so that you can work with data. All the collaborators in the project are automatically authorized to access the data in the project.
Different asset types can have duplicate names. However, you can't add an asset type with the same name multiple times.
You can use the following methods to add data assets to projects:
Method | When to use |
---|---|
Add local files | You have data in CSV or similar files on your local system. |
Add Resource hub data sets | You want to use sample data sets. |
Add database connections | You need to connect to a remote data source. |
Add data from a connection | You need one or more tables or files from a remote data source. |
Add a dynamic view | You need a view that contains a subset of the data in one or more tables in a remote data source. |
Import metadata from a connection | You need many tables or files from a remote data source. You want to schedule and rerun the import process. |
Add connected folder assets from IBM Cloud Object Storage | You need a folder in IBM Cloud Object Storage that contains a dynamic set of files, such as a news feed. |
Add catalog assets | You need one or more assets from a catalog. |
Convert files in project storage to assets | You want to convert files that you created in the project into data assets. |
Add local files
You can add a file from your local system as a data asset in a project.
- Required permissions
-
You must have the Editor or Admin role in the project.
- Restrictions
-
- The file cannot be empty.
-
- The file name can't exceed 255 characters.
-
- The maximum size for files that you can load with the UI is 5 GB. You can load larger files to a project with APIs.
To add data files to a project:
-
From your project's Assets page, click the Upload asset to project icon . You can also click the same icon from within a notebook or canvas.
-
In the pane that opens, browse for the files or drag them onto the pane. You must stay on the page until the load is complete.
The files are saved in the object storage that is associated with your project and are listed as data assets on the Assets page of your project.
When you click the data asset name, you can see this information about data assets from files:
- The asset name and description
- The tags for the asset
- The name of the person who created the asset
- The size of the data
- The date when the asset was added to the project
- The date when the asset was last modified
- A preview of the data, for CSV, Avro, Parquet, TSV, Microsoft Excel, PDF, text, JSON, and image files
- A profile of the data, for CSV, Avro, Parquet, TSV, and Microsoft Excel files
You can update the contents of a data asset from a file by adding a file with the same name and format to the project and then choosing to replace the existing data asset.
You can remove the data asset by choosing the Delete option from the action menu next to the asset name. Choose the Prepare data option to refine the data with Data Refinery.
Add Resource hub data sets
You can add data sets from Resource hub to your project:
- In Resource hub, find the card for the data set that you want to add.
- Click the Add to Project icon from the action bar, select the project, and click Add.
Watch this short video to see how to load and analyze public data sets.
This video provides a visual method to learn the concepts and tasks in this documentation.
-
Video transcript Time Transcript 00:00 This video shows you how to access public data sets in the Cloud Pak for Data as a Service Gallery. 00:06 Start in the Resource Hub and use the filters to see just the data sets. 00:13 Here, you'll find some rich data sets for you to use in your analysis. 00:17 For example, you can search for "economy" or "population" or "weather" or "jobs". 00:28 This looks like an interesting data set. 00:30 Open it and preview the data. 00:34 From here, you can share the data set on social media, get a direct link to the data set, or download the data set. 00:45 You can also copy the data set into a specific project. 00:52 Now, navigate to that project. 00:55 And on the "Assets" tab, you'll see the data set was added to the data assets section. 01:01 Next, add a new notebook. 01:05 The title for this notebook will be "Unemployment rates". 01:09 Select a runtime environment and a language. 01:14 When you're ready, create the notebook. 01:20 When the notebook loads, access the data sources and locate the unemployment file. 01:27 Click "Insert to code" and choose how you want to insert the data. 01:33 The choices in this drop-down box are dependent upon the language used in this notebook. 01:38 Notice that the inserted code includes the credentials you'll need to read the data file from the Object Storage instance. 01:45 When you run the code, the first five rows display. 01:50 Now, you're ready to start analyzing any of the rich data sets in the Resource Hub. 01:56 Find more videos in the Cloud Pak for Data as a Service documentation.
Convert files in project storage to assets
The storage for the project contains the data assets that you uploaded to the project, but it can also contain other files. For example, you can save a DataFrame in a notebook in the project environment storage. You can convert files in project storage to assets.
To convert files in project storage to assets:
- From the Assets tab of your project, click Import asset.
- Select Project files.
- Select the data_asset folder.
- Select the asset and click Import.
Next steps
Learn more
Parent topic: Preparing data