Optimizing Machine Learning on Apache Spark in HPC Environments

Zhenyu Li; James Davis; Stephen A. Jarvis

doi:10.1109/MLHPC.2018.00006

Back

Conference proceeding

Optimizing Machine Learning on Apache Spark in HPC Environments

Zhenyu Li, James Davis and Stephen A. Jarvis

PROCEEDINGS OF 2018 IEEE/ACM MACHINE LEARNING IN HPC ENVIRONMENTS (MLHPC 2018), pp.95-105

01/01/2018

DOI: https://doi.org/10.1109/MLHPC.2018.00006

Abstract

Computer Science

Computer Science, Artificial Intelligence

Computer Science, Theory & Methods

Engineering

Engineering, Electrical & Electronic

Science & Technology

Technology

Machine learning has established itself as a powerful tool for the construction of decision making models and algorithms through the use of statistical techniques on training data. However, a significant impediment to its progress is the time spent training and improving the accuracy of these models this is a data and compute intensive process, which can often take days, weeks or even months to complete. A common approach to accelerate this process is to employ the use of multiple machines simultaneously, a trait shared with the field of High Performance Computing (HPC) and its clusters. However, existing distributed frameworks for data analytics and machine learning are designed for commodity servers, which do not realize the full potential of a HPC cluster, and thus denies the effective use of a readily available and potentially useful resource. In this work we adapt the application of Apache Spark, a distributed data-flow framework, to support the use of machine learning in HPC environments for the purposes of machine learning. There are inherent challenges to using Spark in this context; memory management, communication costs and synchronization overheads all pose challenges to its efficiency. To this end we introduce: (i) the application of MapRDD, a fine grained distributed data representation; (ii) a task-based all-reduce implementation; and (iii) a new asynchronous Stochastic Gradient Descent (SGD) algorithm using non-blocking all-reduce. We demonstrate up to a 2.6x overall speedup (or a 11.2x theoretical speedup with a Nvidia K80 graphics card), a 82-91% compute ratio, and a 80% reduction in the memory usage, when training the GoogLeNet model to classify 10% of the ImageNet dataset on a 32-node cluster. We also demonstrate a comparable convergence rate using the new asynchronous SGD with respect to the synchronous method. With increasing use of accelerator cards, larger cluster computers and deeper neural network models, we predict a 2x further speedup (i.e. 22.4x accumulated speedup) is obtainable with the new asynchronous SGD algorithm on heterogeneous clusters.

Metrics

1 Record Views

2 Times Cited - Web of Science

Details

Title: Optimizing Machine Learning on Apache Spark in HPC Environments
Creators: Zhenyu Li - University of Warwick
James Davis - University of Warwick
Stephen A. Jarvis - University of Warwick
Publication Details: PROCEEDINGS OF 2018 IEEE/ACM MACHINE LEARNING IN HPC ENVIRONMENTS (MLHPC 2018), pp.95-105
Publisher: IEEE
Number of pages: 11
Publication Date: 01/01/2018
Grant note: Atos IT Services UK Ltd EP/L016400/1 / EPSRC Centre for Doctoral Training in Urban Science and Progress; UK Research & Innovation (UKRI); Engineering & Physical Sciences Research Council (EPSRC)
Identifiers: 991103791002346; WOS:000462382400010
Academic Unit: President & VC's Office (VC01)
Language: English
Resource Type: Conference proceeding

Optimizing Machine Learning on Apache Spark in HPC Environments

Abstract

Metrics

Details

Usage Policy