Additional details for Federated Learning implementation
This document provides more examples of Federated Learning configuration that require extensive code implementation.
Cloud open beta
This is a Cloud open preview and is not supported for use in production environments.
- Log level configuration
- Party yml file
- Scikit-learn model configuration
- Tensorflow 2 model configuration
Party yml configuration
Besides the party connector script, the admin can also send the party a yml configuration file as an alternative for them to run to connect to the Federated Learning experiment. These are the steps that
- Modify the
ymlfile as follows:aggregator: ip: [CPD_HOSTNAME]/ml/v4/trainings/[TRAINING_ID] connection: info: id: [REMOTE_TRAINING_SYSTEM_ID] # Supply the name of the data handler class and path to it. # The info section may be used to pass information to the # data handler. # For example, # "data": { # "name": "MnistSklearnDataHandler", # "path": "example.mnist_sklearn_data_handler" # "info": { # "npz_file":"./example_data/example_data.npz" # }, # }, "data": { "name": "<data handler>", "path": "<path to data handler>", "info": { <information to pass to data handler> # For example: # "train_file": "./mnist-keras-train.pkl", # "test_file": "./mnist-keras-test.pkl" }, }, local_training: name: LocalTrainingHandler path: ibmfl.party.training.local_training_handler protocol_handler: name: PartyProtocolHandler path: ibmfl.party.party_protocol_handler - Have each party run the
ymlfile with this command:python -m ibmfl.party.party <config file> <Bearer token> <log_level>. Note: When the party runs theymlfile from the CLI, the-sflag needs to be passed, like this command:python -m ibmfl.party.party -s <config file> <Bearer token> <log_level>where<config file>refers to the party yml file path. More information on getting the bearer tokenand log level.
Returning a data generator defined by Keras or Tensorflow 2
The following is a code example that need to be included as part of get_data to return data in the form of a data generator defined by Keras or Tensorflow 2:
train_gen = ImageDataGenerator(rotation_range=8,
width_sht_range=0.08,
shear_range=0.3,
height_shift_range=0.08,
zoom_range=0.08)
train_datagenerator = train_gen.flow(
x_train, y_train, batch_size=64)
return train_datagenerator
Returning data as numpy arrays
The following is a code example of the MNIST data handler, which returns the data in the format of numpy arrays.
import numpy as np
# imports from ibmfl
from ibmfl.data.data_handler import DataHandler
from ibmfl.exceptions import FLException
class MnistKerasDataHandler(DataHandler):
"""
Data handler for MNIST dataset.
"""
def __init__(self, data_config=None, channels_first=False):
super().__init__()
self.file_name = None
# `data_config` loads anything inside the `info` part of the `data` section.
if data_config is not None:
# this example assumes the local dataset is in .npz format, so it searches for it.
if 'npz_file' in data_config:
self.file_name = data_config['npz_file']
self.channels_first = channels_first
if self.file_name is None:
raise FLException('No data file name is provided to load the dataset.')
else:
try:
data_train = np.load(self.file_name)
self.x_train = data_train['x_train']
self.y_train = data_train['y_train']
self.x_test = data_train['x_test']
self.y_test = data_train['y_test']
except Exception:
raise IOError('Unable to load training data from path '
'provided in config file: ' +
self.file_name)
self.preprocess_data()
def get_data(self):
"""
Gets pre-processed mnist training and testing data.
:return: training and testing data
:rtype: `tuple`
"""
return (self.x_train, self.y_train), (self.x_test, self.y_test)
def preprocess_data(self):
"""
Preprocesses the training and testing dataset.
:return: None
"""
num_classes = 10
img_rows, img_cols = 28, 28
if self.channels_first:
self.x_train = self.x_train.reshape(self.x_train.shape[0], 1, img_rows, img_cols)
self.x_test = self.x_test.reshape(self.x_test.shape[0], 1, img_rows, img_cols)
else:
self.x_train = self.x_train.reshape(self.x_train.shape[0], img_rows, img_cols, 1)
self.x_test = self.x_test.reshape(self.x_test.shape[0], img_rows, img_cols, 1)
print('x_train shape:', self.x_train.shape)
print(self.x_train.shape[0], 'train samples')
print(self.x_test.shape[0], 'test samples')
# convert class vectors to binary class matrices
self.y_train = np.eye(num_classes)[self.y_train]
self.y_test = np.eye(num_classes)[self.y_test]
Scikit-learn model configuration
If you chose Scikit-learn (SKLearn) as the model framework, you need to configure your settings to save the model trained in Federated Learning as a pickle file. Specify your model by the following code example of methods to implement as part of your model file, which depends on the model type that you select for SKLearn.
XGBoost classification
# XGBoost classification model
# you can choose your own loss function by changing the content for 'loss'.
# In the following example, we choose `binary_crossentropy`
# for a binary classification example.
# If you want to train a multiclass classification problem, you need to choose
# 'categorical_crossentropy'.
# You can also choose 'auto' to allow IBM FL to choose the correct loss for you.
spec = {
'global': {
'learning_rate': 0.1,
'loss': 'binary_crossentropy',
'max_bins': 255,
'max_depth': None,
'max_iter': 100,
'verbose': True,
'num_classes': 2
}
}
XGBoost regression
# XGBoost regression model
# you can choose your own loss function by changing the content for 'loss'.
# In the following example, we choose `binary_crossentropy`
# for a binary classification example.
# If you want to train a multiclass classification problem, you need to choose
# 'categorical_crossentropy'.
# You can also choose 'auto' to allow IBM FL to choose the correct loss for you.
spec = {
'global': {
'learning_rate': 0.1,
'loss': 'least_squares',
'max_bins': 255,
'max_depth': None,
'max_iter': 100,
'verbose': True
}
}
SKLearn classification
# SKLearn classification
# Specify your model. Users need to provide the classes used in classification problems.
# In the example, there are 10 classes.
model = SGDClassifier(loss='log', penalty='l2')
model.classes_ = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
# Define the path and save the model as a pickle file
if not os.path.exists(folder_configs):
os.makedirs(folder_configs)
fname = os.path.join(folder_configs, 'model_architecture.pickle')
with open(fname, 'wb') as f:
joblib.dump(model, f)
# Generate model spec:
spec = {'model_definition': fname}
SKLearn regression
# Sklearn regression
# create a sklearn regression model
model = SGDRegressor(loss='huber', penalty='l2')
# specify/create a directory where you want to save the model file
if not os.path.exists(folder_configs):
os.makedirs(folder_configs)
fname = os.path.join(folder_configs, 'model_architecture.pickle')
# save the model as a pickle file
with open(fname, 'wb') as f:
pickle.dump(model, f)
SKLearn Kmeans
# SKLearn Kmeans
def get_model_config(folder_configs, dataset, is_agg=False, party_id=0):
model = KMeans()
# Save model
fname = os.path.join(folder_configs, 'kmeans-central-model.pickle')
with open(fname, 'wb') as f:
pickle.dump(model, f)
# Generate model spec:
spec = {
'model_name': 'sklearn-kmeans',
'model_definition': fname
}
model = {
'name': 'SklearnKMeansFLModel',
'path': 'ibmfl.model.sklearn_kmeans_fl_model',
'spec': spec
}
return model
Tensorflow 2 model configuration
Here is an example of a Tensorflow 2 model configuration.
img_rows, img_cols = 28, 28
batch_size = 28
input_shape = (batch_size, img_rows, img_cols, 1)
sample_input = np.zeros(shape=input_shape)
class MyModel(Model):
def __init__(self):
super(MyModel, self).__init__()
self.conv1 = Conv2D(32, 3, activation='relu')
self.flatten = Flatten()
self.d1 = Dense(128, activation='relu')
self.d2 = Dense(10)
def call(self, x):
x = self.conv1(x)
x = self.flatten(x)
x = self.d1(x)
return self.d2(x)
# Create an instance of the model
model = MyModel()
loss_object = tf.keras.losses.SparseCategoricalCrossentropy(
from_logits=True)
optimizer = tf.keras.optimizers.Adam()
acc = tf.keras.metrics.SparseCategoricalAccuracy(name='accuracy')
model.compile(optimizer=optimizer, loss=loss_object, metrics=[acc])
model._set_inputs(sample_input)
if not os.path.exists(folder_configs):
os.makedirs(folder_configs)
model.save(folder_configs)
To save the model as an SavedModel file, you can add the following configuration:
model = keras.Sequential([
keras.layers.Dense(16, activation = 'relu', input_shape = (11,)),
keras.layers.Dropout(0.5),
keras.layers.Dense(1, activation = 'sigmoid')])
model.compile(optimizer=keras.optimizers.Adam(lr=2e-2), loss=keras.losses.binary_crossentropy, metrics=metrics)
])