BalanceMoE: An Efficient Dynamic Load Balance Framework to Accelerate Mixture-of-Expert Training

Yunqi Gao; Bing Hu; Mahdi Boloursaz Mashhadi; Wei Wang; Pei Xiao; Rahim Tafazolli; Merouane Debbah

doi:10.1109/TCC.2026.3689276

Back

BalanceMoE: An Efficient Dynamic Load Balance Framework to Accelerate Mixture-of-Expert Training

Journal article

Peer reviewed

BalanceMoE: An Efficient Dynamic Load Balance Framework to Accelerate Mixture-of-Expert Training

Yunqi Gao, Bing Hu, Mahdi Boloursaz Mashhadi, Wei Wang, Pei Xiao, Rahim Tafazolli and Merouane Debbah

IEEE Transactions on Cloud Computing, Vol.Early Access(Early Access)

30/04/2026

DOI: https://doi.org/10.1109/TCC.2026.3689276

Abstract

Distributed Deep Learning

Mixture-of-Expert

Expert Parallelism

Load Balance

MoE-GPT

The sparsely-activated Mixture-of-Expert (MoE) techniques support scaling the parameter sizes of pre-trained models to the trillion-level without increasing the computational costs. However, in large-scale cloud computing environments, dynamic load imbalance caused by the random expert selection of samples draws a huge challenge for distributed training efficiency. To address these challenges, we propose a lighter-weight and lower communication overhead dynamic load balancing framework, called BalanceMoE, to accelerate MoE model training. BalanceMoE is based on two key novel ideas. Firstly, we model a worker-pair-based expert transfer mechanism that considers the tradeoff between the expert parameter communication and the time reduction obtained. We perform a theoretical analysis and design a highly lightweight algorithm to obtain a near-optimal load balancing solution for per-iteration time reduction. Then, we present our proposed scheme for parallelization of expert computing and transfer, which overlaps the parameter communication of transferred experts and the computing of non-transferred experts to reduce per-iteration training time.

We implement BalanceMoE architecture on the PyTorch framework. Extensive experiments on two clusters demonstrate that at training speed, BalanceMoE can achieve up to 1.26x, 1.79x and 2.62x speedup compared to the state-of-the-art SmartMoE, FasterMoE and FastMoE, respectively. At memory usage, BalanceMoE saves up to 71% and 36% of memory compared to FasterMoE and FastMoE, respectively. At energy consumption, BalanceMoE saves up to 13% of the energy consumed within each training iteration compared to SmartMoE. BalanceMoE’s code is available at https://github.com/ZJU-CNLAB/BalanceMoE.

Files and links (1)

pdf

The final manuscript9.83 MB

Author's Accepted Manuscript Restricted. Access maybe granted on request., This file will be open access upon publication. CC BY V4.0

Metrics

1 Record Views

Details

Title: BalanceMoE: An Efficient Dynamic Load Balance Framework to Accelerate Mixture-of-Expert Training
Creators: Yunqi Gao (Author) - Zhejiang University
Bing Hu (Author) - Zhejiang University
Mahdi Boloursaz Mashhadi (Author) - University of Surrey, School of Computer Science & Electronic Engineering
Wei Wang (Author) - Zhejiang University
Pei Xiao (Author) - University of Surrey, School of Computer Science & Electronic Engineering
Rahim Tafazolli (Author) - University of Surrey, School of Computer Science & Electronic Engineering
Merouane Debbah (Author) - Khalifa University of Science and Technology
Publication Details: IEEE Transactions on Cloud Computing, Vol.Early Access(Early Access)
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Number of pages: 18
First online publication date: 30/04/2026
Date accepted for publication: 25/04/2026
Identifiers: 991118193702346
Copyright: © 2026 IEEE. All rights reserved, including rights for text and data mining and training of artificial intelligence and similar technologies. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information. For the purpose of open access, the authors have applied a Creative Commons Attribution (CC BY) licence to any Author Accepted Manuscript version arising.
Academic Unit: School of Computer Science & Electronic Engineering
Language: English
Resource Type: Journal article

BalanceMoE: An Efficient Dynamic Load Balance Framework to Accelerate Mixture-of-Expert Training

Abstract

Files and links (1)

Metrics

Details

Usage Policy