Abstract
With the fast development of information acquisition, there is a rapid growth of multimodality data, e.g., text, audio, image and even video, in fields of health care, multimedia retrieval and scientific research. Confronted with the challenges of clustering, classification or regression with multi-modality information, it is essential to effectively measure the distance or similarity between objects described with heterogeneous features. Metric learning, aimed at finding a task-oriented distance function, is a hot topic in machine learning. However, most existing algorithms lack efficiency for highdimensional multi-modality tasks. In this work, we develop an effective and efficient metric learning algorithm for multi-modality data, i.e., Efficient Multi-modal Geometric Mean Metric Learning (EMGMML). The proposed algorithm learns a distinctive distance metric for each view by minimizing the distance between similar pairs while maximizing the distance between dissimilar pairs. To avoid overfitting, the optimization objective is regularized by symmetrized LogDet divergence. EMGMML is very efficient in that there is a closed-formsolution for each distance metric. Experiment results show that the proposed algorithm outperforms the state-of-the-art metric learning methods in terms of both accuracy and efficiency.