Abstract
Convolutional neural network (CNN) is a popular choice for visual object detection where two sub-nets are often used to achieve object classification and localization separately. However, the intrinsic relation between the localization and classification sub-nets was not exploited explicitly for object detection. In this letter, we propose a novel association loss, namely, the proxy squared error (PSE) loss, to entangle the two sub-nets, thus use the dependency between the classification and localization scores obtained from these two sub-nets to improve the detection performance. We evaluate our proposed loss on the MS-COCO dataset and compare it with the loss in a recent baseline, i.e. the fully convolutional one-stage (FCOS) detector. The results show that our method can improve the AP from 33:8 to 35:4 and AP75 from 35:4 to 37:8, as compared with the FCOS baseline.