Surrey researchers Sign in
Taking a Cue From the Human: Linguistic and Visual Prompts for the Automatic Sequencing of Multimodal Narrative
Journal article   Open access  Peer reviewed

Taking a Cue From the Human: Linguistic and Visual Prompts for the Automatic Sequencing of Multimodal Narrative

KIM STARR, SABINE BRAUN and JALEH DELFANI
Journal of Audiovisual Translation
18/12/2020

Abstract

audiovisual translation, computer vision, video description, audio description, machine learning, audiovisual content, accessibility, content description, content retrieval, MeMAD, automatic captioning
Human beings find the process of narrative sequencing in written texts and moving imagery a relatively simple task. Key to the success of this activity is establishing coherence by using critical cues to identify key characters, objects, actions and locations as they contribute to plot development.
pdf
document(12)884.76 kBDownloadView
CC BY V4.0 Open Access
url
https://www.jatjournal.org/index.php/jat/article/view/138View
Human beings find the process of narrative sequencing in written texts and moving imagery a relatively simple task. Key to the success of this activity is establishing coherence by using critical cues to identify key characters, objects, actions and locations as they contribute to plot development. In the drive to make audiovisual media more widely accessible (through audio description), and media archives more searchable (through content description), computer vision experts strive to automate video captioning in order to supplement human description activities. Existing models for automating video descriptions employ deep convolutional neural networks for encoding visual material and feature extraction (Krizhevsky, Sutskever, & Hinton, 2012; Szegedy et al., 2015; He, Zhang, Ren, & Sun, 2016). Recurrent neural networks decode the visual encodings and supply a sentence that describes the moving images in a manner mimicking human performance. However, these descriptions are currently “blind” to narrative coherence. Our study examines the human approach to narrative sequencing and coherence creation using the MeMAD [Methods for Managing Audiovisual Data: Combining Automatic Efficiency with Human Accuracy] film corpus involving five-hundred extracts chosen as stand-alone narrative arcs. We examine character recognition, object detection and temporal continuity as indicators of coherence, using linguistic analysis and qualitative assessments to inform the development of more narratively sophisticated computer models in the future.

Metrics

61 File views/ downloads
127 Record Views

Details

Usage Policy