Abstract
As human action is a spatial–temporal process, modern action recognition research has focused on exploring more effective motion representations, rather than only taking human poses as input. To better model a motion pattern, in this paper, we exploit the rotation information to depict the spatial–temporal variation, thus enhancing the dynamic appearance, as well as forming a complementary component with the static coordinates of the joints. Specifically, we design to represent the movement of human body with joint units, consisting of performing regrouping human joints together with the adjacent two bones. Therefore, the rotation descriptors reduce the impact from the static values while focus on the dynamic movement. The proposed general features can be simply applied to existing CNN-based action recognition methods. The experimental results performed on NTU-RGB+D and ICL First Person Handpose datasets demonstrate the advantages of the proposed method.