Abstract
Given an abstract, deformed, ordinary sketch from untrained amateurs like you
and me, this paper turns it into a photorealistic image - just like those shown
in Fig. 1(a), all non-cherry-picked. We differ significantly from prior art in
that we do not dictate an edgemap-like sketch to start with, but aim to work
with abstract free-hand human sketches. In doing so, we essentially democratise
the sketch-to-photo pipeline, "picturing" a sketch regardless of how good you
sketch. Our contribution at the outset is a decoupled encoder-decoder training
paradigm, where the decoder is a StyleGAN trained on photos only. This
importantly ensures that generated results are always photorealistic. The rest
is then all centred around how best to deal with the abstraction gap between
sketch and photo. For that, we propose an autoregressive sketch mapper trained
on sketch-photo pairs that maps a sketch to the StyleGAN latent space. We
further introduce specific designs to tackle the abstract nature of human
sketches, including a fine-grained discriminative loss on the back of a trained
sketch-photo retrieval model, and a partial-aware sketch augmentation strategy.
Finally, we showcase a few downstream tasks our generation model enables,
amongst them is showing how fine-grained sketch-based image retrieval, a
well-studied problem in the sketch community, can be reduced to an image
(generated) to image retrieval task, surpassing state-of-the-arts. We put
forward generated results in the supplementary for everyone to scrutinise.