Abstract
Machine learning models fundamentally rely on large quantities of
high-quality data. Collecting the necessary data for these models can be
challenging due to cost, scarcity, and privacy restrictions. Signed languages
are visual languages used by the deaf community and are considered low-resource
languages. Sign language datasets are often orders of magnitude smaller than
their spoken language counterparts. Sign Language Production is the task of
generating sign language videos from spoken language sentences, while Sign
Language Translation is the reverse translation task. Here, we propose
leveraging recent advancements in Sign Language Production to augment existing
sign language datasets and enhance the performance of Sign Language Translation
models. For this, we utilize three techniques: a skeleton-based approach to
production, sign stitching, and two photo-realistic generative models, SignGAN
and SignSplat. We evaluate the effectiveness of these techniques in enhancing
the performance of Sign Language Translation models by generating variation in
the signer's appearance and the motion of the skeletal data. Our results
demonstrate that the proposed methods can effectively augment existing datasets
and enhance the performance of Sign Language Translation models by up to 19%,
paving the way for more robust and accurate Sign Language Translation systems,
even in resource-constrained environments.