Abstract
Model-merging has emerged as a powerful approach in deep learning, capable of
enhancing model performance without any training. However, the underlying
mechanisms that explain its effectiveness remain largely unexplored. In this
paper, we investigate this technique from three novel perspectives to
empirically provide deeper insights into why and how weight-averaged
model-merging~\cite{wortsman2022soups} works: (1) we examine the intrinsic
patterns captured by the learning of the model weights, and we are the first to
connect that these weights encode structured with why weight-averaged model
merging can work; (2) we investigate averaging on weights versus averaging on
features, providing analyses from the view of diverse architecture comparisons
on multiple datasets; and (3) we explore the impact on model-merging prediction
stability in terms of changing the parameter magnitude, revealing insights into
the way of weight averaging works as regularization by showing the robustness
across different parameter scales. The code is available at
https://github.com/billhhh/Rethink-Merge.