Signing at Scale: Learning to Co-Articulate Signs for Large-Scale Photo-Realistic Sign Language Production

Ben Saunders; NECATI CIHAN CAMGOZ; Richard Bowden Prof

Back

Signing at Scale: Learning to Co-Articulate Signs for Large-Scale Photo-Realistic Sign Language Production

Conference proceeding

Open access

Signing at Scale: Learning to Co-Articulate Signs for Large-Scale Photo-Realistic Sign Language Production

Ben Saunders, NECATI CIHAN CAMGOZ and Richard Bowden Prof

CVPR 2022 (19/06/2022 - 24/06/2022)

06/2022

Abstract

heute gibt es aber auch mehr angebote in der kultur oder woanders (trans: Today, however, there are also more offers in culture or elsewhere) HEUTE1 MEHR1 VERSCHIEDENES2 KULTUR1A VERSCHIEDENES1 wir konnen uns selber andern oder freude geben (trans: we can change ourselves or give joy) KORPER1 SELBST1A ODER6B FROH1 GEBEN1 a) b) d) c) Figure 1. Photo-Realistic Sign Language Production: Given a spoken language sentence from an unconstrained domain of discourse (a), an initial translation is conducted to a gloss sequence (b). FS-NET next produces a co-articulated continuous skeleton pose sequence from dictionary signs (c), which SIGNGAN generates into a photo-realistic sign language video in a given style (d). Abstract Sign languages are visual languages, with vocabularies as rich as their spoken language counterparts. However , current deep-learning based Sign Language Production (SLP) models produce under-articulated skeleton pose sequences from constrained vocabularies and this limits applicability. To be understandable and accepted by the deaf, an automatic SLP system must be able to generate co-articulated photo-realistic signing sequences for large domains of discourse. In this work, we tackle large-scale SLP by learning to co-articulate between dictionary signs, a method capable of producing smooth signing while scaling to unconstrained domains of discourse. To learn sign co-articulation, we propose a novel Frame Selection Network (FS-NET) that improves the temporal alignment of interpolated dictionary signs to continuous signing sequences. Additionally, we propose SIGNGAN, a pose-conditioned human synthesis model that produces photo-realistic sign language videos direct from skeleton pose. We propose a novel keypoint-based loss function which improves the quality of synthesized hand images. We evaluate our SLP model on the large-scale meineDGS (mDGS) corpus, conducting extensive user evaluation showing our FS-NET approach improves co-articulation of interpolated dictionary signs. Additionally, we show that SIGNGAN significantly outperforms all base-line methods for quantitative metrics, human perceptual studies and native deaf signer comprehension.

Files and links (2)

pdf

CVPR22_Signing_At_Scale_Saunders_CRC4.22 MBDownload View

Open Access

url

https://cvpr2022.thecvf.com/View

Event Website

Metrics

84 File views/ downloads

269 Record Views

Details

Title: Signing at Scale: Learning to Co-Articulate Signs for Large-Scale Photo-Realistic Sign Language Production
Creators: Ben Saunders - University of Surrey, School of Computer Science and Electronic Engineering
NECATI CIHAN CAMGOZ - University of Surrey, School of Computer Science and Electronic Engineering
Richard Bowden Prof - University of Surrey, School of Computer Science and Electronic Engineering
Conference: CVPR 2022 (19/06/2022 - 24/06/2022)
Identifiers: 99640063802346
Academic Unit: School of Computer Science and Electronic Engineering
Language: English
Resource Type: Conference proceeding

Signing at Scale: Learning to Co-Articulate Signs for Large-Scale Photo-Realistic Sign Language Production

Abstract

Files and links (2)

Metrics

Details

Usage Policy