An Efficient Statistical-based Gradient Compression Technique for Distributed Training Systems

Ahmed M Abdelmoniem; Ahmed Elzanaty; Mohamed-Slim Alouini; Marco Canini

doi:10.48550/arxiv.2101.10761

Back

Preprint

An Efficient Statistical-based Gradient Compression Technique for Distributed Training Systems

Ahmed M Abdelmoniem, Ahmed Elzanaty, Mohamed-Slim Alouini and Marco Canini

arXiv (Cornell University)

26/01/2021

DOI: https://doi.org/10.48550/arxiv.2101.10761

Abstract

Computer Science - Distributed, Parallel, and Cluster Computing

Computer Science - Learning

The recent many-fold increase in the size of deep neural networks makes efficient distributed training challenging. Many proposals exploit the compressibility of the gradients and propose lossy compression techniques to speed up the communication stage of distributed training. Nevertheless, compression comes at the cost of reduced model quality and extra computation overhead. In this work, we design an efficient compressor with minimal overhead. Noting the sparsity of the gradients, we propose to model the gradients as random variables distributed according to some sparsity-inducing distributions (SIDs). We empirically validate our assumption by studying the statistical characteristics of the evolution of gradient vectors over the training process. We then propose Sparsity-Inducing Distribution-based Compression (SIDCo), a threshold-based sparsification scheme that enjoys similar threshold estimation quality to deep gradient compression (DGC) while being faster by imposing lower compression overhead. Our extensive evaluation of popular machine learning benchmarks involving both recurrent neural network (RNN) and convolution neural network (CNN) models shows that SIDCo speeds up training by up to 41:7%, 7:6%, and 1:9% compared to the no-compression baseline, Topk, and DGC compressors, respectively.

Metrics

1 Record Views

Details

Title: An Efficient Statistical-based Gradient Compression Technique for Distributed Training Systems
Creators: Ahmed M Abdelmoniem - King Abdullah University of Science and Technology
Ahmed Elzanaty - King Abdullah University of Science and Technology
Mohamed-Slim Alouini
Marco Canini - King Abdullah University of Science and Technology
Publication Details: arXiv (Cornell University)
Identifiers: 99928738602346
Academic Unit: School of Computer Science and Electronic Engineering
Language: English
Resource Type: Preprint

An Efficient Statistical-based Gradient Compression Technique for Distributed Training Systems

Abstract

Metrics

Details

Usage Policy