Abstract
We propose VADER, a spatio-temporal matching, alignment, and change
summarization method to help fight misinformation spread via manipulated
videos. VADER matches and coarsely aligns partial video fragments to candidate
videos using a robust visual descriptor and scalable search over adaptively
chunked video content. A transformer-based alignment module then refines the
temporal localization of the query fragment within the matched video. A
space-time comparator module identifies regions of manipulation between aligned
content, invariant to any changes due to any residual temporal misalignments or
artifacts arising from non-editorial changes of the content. Robustly matching
video to a trusted source enables conclusions to be drawn on video provenance,
enabling informed trust decisions on content encountered.