Abstract
Standard centralized machine learning applications require the participants to upload
their personal data to a central cloud for model training, which significantly harms
the users’ privacy. And federated learning is an emerging privacy preservation machine
learning paradigm proposed to alleviate this issue. It is inherently a distributed machine
learning framework and enables multiple users to collaboratively train a global model
without sharing their local training data, thus preventing individuals’ data from being
revealed.
However, federated learning often consumes much more communication resources than
centralized learning, since the model parameters are required to be uploaded and downloaded between clients and the server during the training process. To mitigate this issue,
we propose a multi-objective evolutionary federated learning framework to reduce the
communicated model size without obvious performance drop, thus, reducing the communication costs in federated learning. In addition, the modified SET algorithm is
adopted to improve the encoding scalability and further reduce the model size. This
approach is intrinsically an offline evolutionary optimization framework, and the performance of the reinitialized models at each generation will dramatically degrade, making
the offline optimization approach infeasible for real-time federated learning systems.
Therefore, we extend our previous offline method to propose a novel real-time federated evolutionary neural architecture search (NAS) framework named RT-FedEvoNAS.
By real-time or online federated evolutionary NAS, we mean that the neural network
models are already in use during the search process and each client only trains one
sub-model within one generation with the help of the proposed double sampling strategy. In addition, all the searched neural network models at the last generation are
already well trained and have no need to be re-trained from scratch again, making the
communication costs kept minimum.
Another challenge is that, federated learning cannot fully guarantee the local privacy
and the training data still have potential risk to be disclosed through the uploaded model parameters. To address this concern, homomorphic encryption (HE) is commonly
applied in federated learning by encrypting the model parameters on each client before
sending them to the server. However, most HE-based FL systems are not efficient enough and require a thrusted third party for key pair generation. Therefore, we propose
a distributed additive encryption and quantization federated deep learning framework,
in which the key pairs are generated between the server and clients without the help of
an extra thrusted third party. In addition, ternary gradient quantization and aggregation approximation strategy are adopted to simultaneously reduce the communication
and local computational costs.
Different from aforementioned methods focused on training parametric models in horizontal federated learning, our last work is located at learning non-parametric models in
vertical federated learning. Non-parametric models like gradient boosting decision trees
(GBDTs) are commonly used in the previous work of federated learning for vertically
partitioned data. But these approaches assume that all the training data labels are
stored on only one (guest) client, which is not applicable in the real world applications.
Therefore, we propose a secure vertical federated learning framework to train GBDT
with data labels distributed on multiple devices. A novel secure protocol is proposed
by setting source client and split client for node split, thus, preventing both data features and labels from being disclosed. Moreover, a partial differential privacy scheme
is introduced to add Gaussian noise upon the leaf weights before sending them to the
source clients for prediction protection.
All the proposed methods are empirically proven using both benchmark datasets and
real-world datasets. And the experimental results show that our approaches introduced
in this thesis work properly for building a secure and efficient federated learning system.