Abstract
Relighting is a long-standing computer vision problem. The task has applications ranging
from simple image beautification to domain adaptation, diversification of autonomous
vehicle training data, or enhancements in the TV and film production world. In this
thesis, we focus on the final area. Motivated by broadcast applications, we explore the
idea of capturing visual data under sub-optimal lighting conditions and then adjusting
it automatically in post-production. Doing so could meaningfully simplify the process
of shooting footage, allow for cheaper coverage of smaller or more resource-constrained
events and, thus, wider broadcasting. Driven by the recent advances in deep learning
and computer vision, we decide to tackle this problem using powerful generative models.
Capturing high quality training data at scale and with meaningful complexity is a challenge.
Consequently, we strive to create a system with relaxed supervision requirements.
With this in mind, we propose a self-supervised and domain-independent relighting
model. Instead of relying on ground truth, our GAN-based solution exploits the rich
information contained in the input data and learns desired illumination style from a
collection of unsorted examples, not a directly aligned reference. This flexible approach
allows us to use a loose definition of the lighting style and, potentially, to adapt to styles
existing in already captured materials, even those coming from other environments.
Our preliminary solution performs accurate colour and brightness adjustments yet
exhibits subpar shadow manipulation performance. Therefore, we decide to address
this aspect more explicitly and in our next two technical chapters we investigate the
problems of shadow removal and detection. To perform de-shadowing, we adjust our
self-supervised system to the new task and modify its losses to account for inconsistencies
existing in the benchmark datasets. The resulting system is capable of removing the
shadowed areas without significant boundary residue, providing superior visual results.
We explore the idea of shadows further in the next technical chapter and consider
the common misclassification of dark areas which are often confused for shadows. To
alleviate this, we create 2 datasets featuring diverse objects and backgrounds as well as
cast and self-cast shadows. We then use this data to create a 3D-aided shadow-caster
verification system, identifying sources of real shadows and discouraging the detection
of ‘fake’ shadows, i.e. dark or patterned image regions.
In our final technical chapter, we take a step back from the fragmented approach and
instead consider the most recent foundation models and the wealth of information and
world understanding contained within them. We pair the successful design choices from
the earlier stages of the PhD with lighting-conditioned diffusion models, and explore
their applicability and adaptability to the task of relighting.