Abstract
heute gibt es aber auch mehr angebote in der kultur oder woanders (trans: Today, however, there are also more offers in culture or elsewhere) HEUTE1 MEHR1 VERSCHIEDENES2 KULTUR1A VERSCHIEDENES1 wir konnen uns selber andern oder freude geben (trans: we can change ourselves or give joy) KORPER1 SELBST1A ODER6B FROH1 GEBEN1 a) b) d) c) Figure 1. Photo-Realistic Sign Language Production: Given a spoken language sentence from an unconstrained domain of discourse (a), an initial translation is conducted to a gloss sequence (b). FS-NET next produces a co-articulated continuous skeleton pose sequence from dictionary signs (c), which SIGNGAN generates into a photo-realistic sign language video in a given style (d). Abstract Sign languages are visual languages, with vocabularies as rich as their spoken language counterparts. However , current deep-learning based Sign Language Production (SLP) models produce under-articulated skeleton pose sequences from constrained vocabularies and this limits applicability. To be understandable and accepted by the deaf, an automatic SLP system must be able to generate co-articulated photo-realistic signing sequences for large domains of discourse. In this work, we tackle large-scale SLP by learning to co-articulate between dictionary signs, a method capable of producing smooth signing while scaling to unconstrained domains of discourse. To learn sign co-articulation, we propose a novel Frame Selection Network (FS-NET) that improves the temporal alignment of interpolated dictionary signs to continuous signing sequences. Additionally, we propose SIGNGAN, a pose-conditioned human synthesis model that produces photo-realistic sign language videos direct from skeleton pose. We propose a novel keypoint-based loss function which improves the quality of synthesized hand images. We evaluate our SLP model on the large-scale meineDGS (mDGS) corpus, conducting extensive user evaluation showing our FS-NET approach improves co-articulation of interpolated dictionary signs. Additionally, we show that SIGNGAN significantly outperforms all base-line methods for quantitative metrics, human perceptual studies and native deaf signer comprehension.