Creating a Federated Learning experiment
Learn how to create a Federated Learning experiment to train a machine learning model with data from different users with their own data sources.
Cloud open beta
This is a Cloud open preview and is not supported for use in production environments.
Note: Because Federated Learning is a collaborative approach to training a common model from a set of remote data sources, some of the steps are done by the admin and others by the contributing parties. Unless otherwise noted, the steps in this topic are performed by the admin, who creates the experiment and coordinates the training.
Before you begin
Before you begin, make sure that you have:
- An IBM Watson Studio project.
- An untrained model file, in pickled or
SavedModelformat. The untrained model file is a “blank” slate of your model framework that is trained in the Federated Learning experiment. To see example code to generate an untrained model, see Scikit-learn model configuration and Tensorflow 2 model configuration. The model and generated components must be compressed into a zip file, for example, see this untrained model file.
Tip: When you build your experiment, you choose your model’s framework, fusion method, hyperparameters, and you must customize a data handler. You can review the options in Choosing your framework, fusion method, and hyperparameters and in Data handler examples.
Overview: Create a Federated Learning experiment
In the following sections, you will:
- Start the Federated Learning aggregator (for admin)
- Connect the remote training parties (for parties)
- Monitor training progress and deploy the trained model (for admin or parties)
Note: The following sections describe creating the Federated Learning experiment from the experiment builder. If you create the experiment programmatically, refer to Considerations for building an experiment using the API for details on setting up the environments and loading the required libraries.
Start the Federated Learning aggregator (for admin)
In this step, we start to create the Federated Learning experiment. The process is in these steps:
- Step 1: Set up the Federated Learning experiment
- Step 2: Create the remote training system
- Step 3: Start the experiment
Step 1: Set up the Federated Learning experiment
In this step, you set up a Federated Learning experiment from an IBM Watson Studio project using the Federated Learning experiment builder.
- From the project, click Add to Project > Federated Learning.
- Name the experiment and add an optional description and tags.
- In the Configure tab, choose the training framework and model type. See Choosing the framework and Choosing the fusion method for a table of frameworks, fusion methods, and their attributes.
- Click Select under Model specification and upload the zip file containing your untrained model.
- In the Define hyperparameters tab, you can choose hyperparameter options available for your framework and fusion method to specify how to tune your model. For details, see Choosing your framework, fusion method, and hyperparameters.
Step 2: Create the remote training system
In this step, you create the remote training system that specifies the participating parties of the experiment.
- From Select remote training system, click Add new systems.

- Complete the fields:
- Name: A name to identify this specific instance of the Remote Training System.
- Description: An optional description for more details of the training system.
- System administrator: The user who has access to run the current Federated Learning experiment.
- Allowed users: Users who are allowed to interface with the current Federated Learning experiment training run. Note: For an admin or party to show up they must be collaborators in your project.
- Tags: Associate keywords with the Remote Training System to make it easier to find.
- Click Add. The specific Remote Training System instance is then saved. If you are creating multiple remote training instances, you can repeat these steps.
- Click Add systems to save the remote training system as an asset in the Federated Learning system. Note: You can use a remote training definition for future experiments. For example, in the Select remote training system tab, you can select any Remote Training System that you previously created.
Step 3: Start the experiment
In this step, the Federated Learning experiment will generate a party connector script that you must send to the participating parties.
- Click Review and create to view the settings of your current Federated Learning experiment. Then, click Create.

- The Federated Learning experiment will be in
Pendingstatus while the aggregator is starting. When the aggregator starts, the status will change toSetup – Waiting for remote systems.
Connecting the remote training parties (for parties)
In this section, the parties must complete all the steps to configure and run the party connector script to train the model by using their data.
Pre-requisites
- Ensure that you are running in a Conda environment. If you are not, run the following code to set up Conda:
# Create a new conda environment conda create -n fl_env python=3.7.10 # Install jupyter in ev conda install -c conda-forge -n fl_env notebook # Activate env conda activate fl_env # Install Watson Machine Learning client pip install --upgrade ibm-watson-machine-learningNote: If you are running in a Watson Studio or Cloud Pak for Data project, you do not need to set up Conda.
-
Ensure that you have the proper libraries set up with this code:
pip install \ tensorflow-cpu==2.4.2 \ torch==1.7.1 \ scikit-learn==0.23.2 \ numpy \ scipy \ environs \ parse \ websockets==8.1 \ jsonpickle==1.4.1 \ pandas \ pytest \ pyYAML \ requests \ pathlib2 \ psutil \ setproctitle \ tabulate \ lz4 \ opencv-python \ gym \ cloudpickle==1.3.0 \ image \ diffprivlibNote: To understand better how to set up the environment to run Federated Learning, please see the Federated Learning API Samples.
-
Ensure that your data is loaded and pre-processed. They need to implement a data handler class in their environment in the same directory as the data set and the party connector script. The following is a general data handler template. To see examples of data handlers, please see our example of the MNIST data handler.
# your import statements from ibmfl.data.data_handler import DataHandler class MyDataHandler(DataHandler): """ Data handler for your dataset. """ def __init__(self, data_config=None): super().__init__() self.file_name = None if data_config is not None: if '<your_data_file_name>' in data_config: self.file_name = data_config['<your_data_file_name>'] # extract other additional parameters from `info` if any. # load and preprocess the training and testing data self.load_and_preprocess_data() """ # Example: # (self.x_train, self.y_train), (self.x_test, self.y_test) = self.load_dataset() """ def load_and_preprocess_data(self): """ Loads and pre-processeses local datasets, and updates self.x_train, self.y_train, self.x_test, self.y_test. # Example: # return (self.x_train, self.y_train), (self.x_test, self.y_test) """ pass def get_data(self): """ Gets the prepared training and testing data. :return: ((x_train, y_train), (x_test, y_test)) # most build-in training modules expect data is returned in this format :rtype: `tuple` This function should be as brief as possible. Any pre-processing operations should be performed in a separate function and not inside get_data(), especially computationally expensive ones. # Example: # X, y = load_somedata() # x_train, x_test, y_train, y_test = \ # train_test_split(X, y, test_size=TEST_SIZE, random_state=RANDOM_STATE) # return (x_train, y_train), (x_test, y_test) """ pass def preprocess(self, X, y): pass - Log in to the Watson Studio project and click the Federated Learning experiment.

- Click View setup information and download the party connector script for each party by clicking on the “down arrow” icon. Optional: Alternatively, you can choose to download the
ymlconfiguration file and send it to the party to run instead of the party connector script. To see details of this method, see Party yml configuration. Download the party connector script. -
Each party must configure the party connector script and provide valid credentials. For example:
from ibm_watson_machine_learning import APIClient from ibmfl.party.party import Party wml_credentials = { "url": "https://us-south.ml.cloud.ibm.com", "apikey": "<API KEY>" } wml_client = APIClient(wml_credentials) party_config = { "aggregator": { "ip": "" }, "connection": { "info": { "id": "abcd-efgh-ij8k" } }, # Supply the name of the data handler class and path to it. # The info section may be used to pass information to the # data handler. # For example, # "data": { # "info": { # "npz_file":"./example_data/example_data.npz" # }, # "name": "MnistSklearnDataHandler", # "path": "example.mnist_sklearn_data_handler" # }, "data": { "name": "<data handler>", "path": "<path to data handler>", "info": { <information to pass to data handler> # For example: # "train_file": "./mnist-keras-train.pkl", # "test_file": "./mnist-keras-test.pkl" }, }, "local_training": { "name": "FedAvgLocalTrainingHandler", "path": "ibmfl.party.training.fedavg_local_training_handler" }, "protocol_handler": { "name": "PartyProtocolHandler", "path": "ibmfl.party.party_protocol_handler" } } p = Party( config_dict = party_config, token = "Bearer " + wml_client.wml_token ) p.start() while not p.connection.stopped: pass
-
There are a few flags that you can set:
- You can set
log_levelif you have not installed thelog_config.yamlfile. To see how you can install thelog_config.yamlfile, see Log level configuration.log_levelcan take these options as parameters:- “DEBUG”: Provides information for diagnosing problems or troubleshooting.
- “INFO”: Standard log level that provides informative messages to ensure that the application is behaving as expected.
- “WARNING”: Logs behaviours that are unexpected or indicative of potential problems that do not affect the application’s current run.
- “ERROR”: Logs when the application hits an issue that prevents the application’s functions from working.
- “CRITICAL”: Reports errors that might prevent the execution of the application.
For example,p = Party( config_dict = party_config, token = auth_token, log_level: "INFO").
- You can set
-
After configuring and saving the script, each party should run the party connector script with this command:
p = Party(config_dict = party_config, token = auth_token)
Monitor training progress and deploy the trained model (for admin or parties)
After you have completed the configuration for the Federated Learning experiment and each participating party has run the party connector script from the previous step, the experiment starts training automatically. As the experiment trains, you can check the progress of the experiment with the visual graph and table. After the training is complete, you can deploy the model and test it with the new data.
Check the progress of the experiment
You can see a visualization of the training progress. For each round of training, you can view the four stages of training:
- Sending model: Federated Learning sends the model data to each party.
- Training: The process of training the collected data locally.
- Receiving models: When the local data is trained, the results are sent back to the Federated Learning experiment.
- Aggregating: The aggregator combines the results of all the trained data to produce a final model.
View your results
When the training is complete, a chart that displays the model accuracy over each round of training is drawn. Hover over the points on the chart for more information on a single point’s exact metrics.
A Training rounds table shows details for each training round.

When you are done with the viewing, click Save model to project to save the Federated Learning model to your project.
Deploy your model
After you save your Federated Learning model, you can deploy and score the model like other machine learning models.
See Deploying models.
Considerations for building an experiment using the API
If you build your experiment entirely programmatically, note these requirements and considerations:
- In your code, you must set up and load the conda environment
- You must install the required libraries
You can see how this is done in this Federated Learning sample of building an experiment using the API.