Abstract
Recent advances in Big Data Analytics are primarily driven by innovations in Artificial Intelligence and Machine Learning Methods. Due to the richness of data sources at the edge and with the increasing privacy concerns, Distributed privacy-preserving machine learning (ML) methods are increasingly becoming the norm for training ML models on federated big data. In a popular approach known as Federated learning (FL), service providers leverage end-user data to train ML models to improve services such as text auto-completion, virtual keyboards, and item recommendations. FL is expected to grow in importance with the increasing focus on big data, privacy and 5G/6G technologies. However, FL faces significant challenges such as heterogeneity, communication overheads, and privacy preservation. In practice, training models via FL is time-intensive and worse its dependent on client participation who may not always be available to join the training. Our empirical analysis shows that client availability can significantly impact the model quality which motivates the design of an availability-aware selection scheme. We propose A2FL to mitigate the quality degradation caused by the under-representation of the global client population by prioritizing the least available clients. Our results show that, compared to state-of-the-art methods, A2FL can improve the client diversity during the training and hence boost the trained model quality.