Additional details for Federated Learning implementation

This document provides more examples of Federated Learning configuration that require extensive code implementation.

Cloud open beta

This is a Cloud open preview and is not supported for use in production environments.

Party yml configuration

Besides the party connector script, the admin can also send the party a yml configuration file as an alternative for them to run to connect to the Federated Learning experiment. These are the steps that

  1. Modify the yml file as follows:
     aggregator:
         ip: [CPD_HOSTNAME]/ml/v4/trainings/[TRAINING_ID]
     connection:
         info:
         id: [REMOTE_TRAINING_SYSTEM_ID]
         # Supply the name of the data handler class and path to it.
         # The info section may be used to pass information to the 
         # data handler.
         # For example,
         # 	"data": {    
         # 		"name": "MnistSklearnDataHandler",
         # 		"path": "example.mnist_sklearn_data_handler"
         # 		"info": {
         # 			"npz_file":"./example_data/example_data.npz"
         # 		},
         # 	},
         "data": {        
             "name": "<data handler>",
             "path": "<path to data handler>",
             "info": {
                 <information to pass to data handler>
                 # For example:
                 # "train_file": "./mnist-keras-train.pkl",
                 # "test_file": "./mnist-keras-test.pkl"
             },
         },
     local_training:
         name: LocalTrainingHandler
         path: ibmfl.party.training.local_training_handler
     protocol_handler:
         name: PartyProtocolHandler
         path: ibmfl.party.party_protocol_handler
    
  2. Have each party run the yml file with this command:
    python -m ibmfl.party.party <config file> <Bearer token> <log_level>. Note: When the party runs the yml file from the CLI, the -s flag needs to be passed, like this command: python -m ibmfl.party.party -s <config file> <Bearer token> <log_level> where <config file> refers to the party yml file path. More information on getting the bearer tokenand log level.

Returning a data generator defined by Keras or Tensorflow 2

The following is a code example that need to be included as part of get_data to return data in the form of a data generator defined by Keras or Tensorflow 2:

train_gen = ImageDataGenerator(rotation_range=8,
                                width_sht_range=0.08,
                                shear_range=0.3,
                                height_shift_range=0.08,
                                zoom_range=0.08)

    train_datagenerator = train_gen.flow(
        x_train, y_train, batch_size=64)

    return train_datagenerator

Returning data as numpy arrays

The following is a code example of the MNIST data handler, which returns the data in the format of numpy arrays.

import numpy as np

# imports from ibmfl
from ibmfl.data.data_handler import DataHandler
from ibmfl.exceptions import FLException



class MnistKerasDataHandler(DataHandler):
    """
    Data handler for MNIST dataset.
    """

    def __init__(self, data_config=None, channels_first=False):
        super().__init__()
        self.file_name = None
        # `data_config` loads anything inside the `info` part of the `data` section. 
        if data_config is not None:
            # this example assumes the local dataset is in .npz format, so it searches for it.
            if 'npz_file' in data_config: 
                self.file_name = data_config['npz_file']
        self.channels_first = channels_first
        
        if self.file_name is None:
            raise FLException('No data file name is provided to load the dataset.')
        else:
            try:
                data_train = np.load(self.file_name)
                self.x_train = data_train['x_train']
                self.y_train = data_train['y_train']
                self.x_test = data_train['x_test']
                self.y_test = data_train['y_test']
            except Exception:
                raise IOError('Unable to load training data from path '
                              'provided in config file: ' +
                              self.file_name)
            self.preprocess_data()

    def get_data(self):
        """
        Gets pre-processed mnist training and testing data. 

        :return: training and testing data
        :rtype: `tuple`
        """
        return (self.x_train, self.y_train), (self.x_test, self.y_test)

    def preprocess_data(self):
        """
        Preprocesses the training and testing dataset.

        :return: None
        """
        num_classes = 10
        img_rows, img_cols = 28, 28
        if self.channels_first:
            self.x_train = self.x_train.reshape(self.x_train.shape[0], 1, img_rows, img_cols)
            self.x_test = self.x_test.reshape(self.x_test.shape[0], 1, img_rows, img_cols)
        else:
            self.x_train = self.x_train.reshape(self.x_train.shape[0], img_rows, img_cols, 1)
            self.x_test = self.x_test.reshape(self.x_test.shape[0], img_rows, img_cols, 1)

        print('x_train shape:', self.x_train.shape)
        print(self.x_train.shape[0], 'train samples')
        print(self.x_test.shape[0], 'test samples')

        # convert class vectors to binary class matrices
        self.y_train = np.eye(num_classes)[self.y_train]
        self.y_test = np.eye(num_classes)[self.y_test]

Scikit-learn model configuration

If you chose Scikit-learn (SKLearn) as the model framework, you need to configure your settings to save the model trained in Federated Learning as a pickle file. Specify your model by the following code example of methods to implement as part of your model file, which depends on the model type that you select for SKLearn.

XGBoost classification

# XGBoost classification model
# you can choose your own loss function by changing the content for 'loss'. 
# In the following example, we choose `binary_crossentropy` 
# for a binary classification example.
# If you want to train a multiclass classification problem, you need to choose
# 'categorical_crossentropy'. 
# You can also choose 'auto' to allow IBM FL to choose the correct loss for you.

spec = {
	'global': {
            'learning_rate': 0.1,
            'loss': 'binary_crossentropy',
            'max_bins': 255,
            'max_depth': None,
            'max_iter': 100,
            'verbose': True,
            'num_classes': 2
        }
}

XGBoost regression

# XGBoost regression model
# you can choose your own loss function by changing the content for 'loss'. 
# In the following example, we choose `binary_crossentropy` 
# for a binary classification example.
# If you want to train a multiclass classification problem, you need to choose
# 'categorical_crossentropy'. 
# You can also choose 'auto' to allow IBM FL to choose the correct loss for you.

spec = {
	'global': {
            'learning_rate': 0.1,
            'loss': 'least_squares', 
            'max_bins': 255,
            'max_depth': None,
            'max_iter': 100,
            'verbose': True
        }
}

SKLearn classification

# SKLearn classification
# Specify your model. Users need to provide the classes used in classification problems.
# In the example, there are 10 classes.

model = SGDClassifier(loss='log', penalty='l2')
model.classes_ = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])  

# Define the path and save the model as a pickle file

if not os.path.exists(folder_configs):
    os.makedirs(folder_configs)
fname = os.path.join(folder_configs, 'model_architecture.pickle')
with open(fname, 'wb') as f:
    joblib.dump(model, f)
    # Generate model spec:
spec = {'model_definition': fname}

SKLearn regression

# Sklearn regression 
# create a sklearn regression model

model = SGDRegressor(loss='huber', penalty='l2')

# specify/create a directory where you want to save the model file

if not os.path.exists(folder_configs):
    os.makedirs(folder_configs)
fname = os.path.join(folder_configs, 'model_architecture.pickle')

# save the model as a pickle file

with open(fname, 'wb') as f:
    pickle.dump(model, f)

SKLearn Kmeans

# SKLearn Kmeans

def get_model_config(folder_configs, dataset, is_agg=False, party_id=0):

    model = KMeans()

    # Save model
    fname = os.path.join(folder_configs, 'kmeans-central-model.pickle')
    with open(fname, 'wb') as f:
        pickle.dump(model, f)
    # Generate model spec:
    spec = {
        'model_name': 'sklearn-kmeans',
        'model_definition': fname
    }

    model = {
        'name': 'SklearnKMeansFLModel',
        'path': 'ibmfl.model.sklearn_kmeans_fl_model',
        'spec': spec
    }

    return model

Tensorflow 2 model configuration

Here is an example of a Tensorflow 2 model configuration.

img_rows, img_cols = 28, 28
    batch_size = 28
    input_shape = (batch_size, img_rows, img_cols, 1)
    sample_input = np.zeros(shape=input_shape)

    class MyModel(Model):
        def __init__(self):
            super(MyModel, self).__init__()
            self.conv1 = Conv2D(32, 3, activation='relu')
            self.flatten = Flatten()
            self.d1 = Dense(128, activation='relu')
            self.d2 = Dense(10)

        def call(self, x):
            x = self.conv1(x)
            x = self.flatten(x)
            x = self.d1(x)
            return self.d2(x)

    # Create an instance of the model
    model = MyModel()
    loss_object = tf.keras.losses.SparseCategoricalCrossentropy(
        from_logits=True)
    optimizer = tf.keras.optimizers.Adam()
    acc = tf.keras.metrics.SparseCategoricalAccuracy(name='accuracy')
    model.compile(optimizer=optimizer, loss=loss_object, metrics=[acc])
    model._set_inputs(sample_input)

    if not os.path.exists(folder_configs):
        os.makedirs(folder_configs)

    model.save(folder_configs)

To save the model as an SavedModel file, you can add the following configuration:

model = keras.Sequential([
   keras.layers.Dense(16, activation = 'relu', input_shape = (11,)),
   keras.layers.Dropout(0.5),
   keras.layers.Dense(1, activation = 'sigmoid')])
   model.compile(optimizer=keras.optimizers.Adam(lr=2e-2), loss=keras.losses.binary_crossentropy, metrics=metrics)
])