Artificial Intelligence (AI) and Deep Learning (DL) techniques are recently becoming the foundation of applications such as text-to-image generation, super-resolution, and in-image painting.
Indeed, it is possible to give them as input the high-detailed description of an image and receive a realistic picture corresponding to the given text as output. Furthermore, they can transpose an image from a low resolution to a higher resolution, generating a new series of realistic high-frequency details. They can even help remove artifacts or undesired objects from an input image. The variety of tasks and capabilities of neural networks seem to be unlimited. What if these methods could also travel in time?
For example, have you ever wondered what a photograph of yourself would look like if it were taken fifty or a hundred years ago? What would your favorite actor or actress look like if they were born in a completely different epoch than they are? If you are interested to know the answers to these questions, keep reading, and you will find out.
Given the recent success of StyleGAN in high-quality face synthesis and editing, many works focus on portrait editing using pre-trained StyleGAN models. However, the available techniques typically manipulate well-defined semantic attributes (e.g., adding or removing a smile or modifying the subject’s age). The idea behind this work is instead to maintain unchanged these attributes, which constitute a person’s identity, while sending them back to the past or back to the future with this AI-based DeLorean.
The main problem, in this case, is the absence of an appropriate dataset, and it is widely known that even with the perfect neural network model, datasets remain the nightmare of every AI researcher. Unbalanced, insufficient, or unavailable data are well-known related issues in the deep learning field, respectively leading to biased or inaccurate results.
To overcome this problem, the researchers created FTT (Faces Through Time), a brand-new dataset with images sourced from Wikimedia Commons (WC), a crowdsourced and open-licensed collection of 50M images. TFF features 26,247 portraits from the 19th to 21st centuries, with roughly 1,900 images per decade on average.
Even with this relatively small dataset, the results are impressive (Figure 1).
But how are these results achieved? The main idea is a StyleGAN (Generative Adversarial Network) parent-child architecture. Rather than training a single model covering all decades, a family of child models is used, one for each decade, to better synthesize the data distribution of each period. However, to preserve the identity and pose of the portraited person, a parent model is adopted to map this information into a latent-space vector.
The architecture pipeline is presented as follows.
Firstly, a family of StyleGAN models is trained, one for each decade, using adversarial losses and an identity loss on a blended face. This face represents the output from the child model modified to resemble the parent model in its colors. It is necessary to avoid inconsistency in the identity loss computed through the features in ArcFace, a popular facial recognition model. Since the ArcFace model is only trained on modern-day images, the authors found out that it performs poorly on historical images.
Afterward, each real image is projected onto a vector w on the decade manifold (1960 in the figure above), where a generator G′t is trained to transfer refinement details to all child models. Lastly, a mask is applied to the input image to encourage the model to preserve facial details.
To summarize, the key contributions are (i) Faces Through Time (i.e., the dataset exploited for the neural network training) and (ii) a novel architecture for transforming faces across time while preserving identity details. Although it suffers from small biases in the dataset (e,g., a few females with short hair at the beginning of the 20th century) leading to inconsistency in the output images, this model resembles large improvements in comparison with previous works.
This was a summary of TFF, a novel method for transforming faces through time. You can find more information in the links below if you want to learn more about it.
This Article is written as a research summary article by Marktechpost Staff based on the research paper 'What's in a Decade? Transforming Faces Through Time'. All Credit For This Research Goes To Researchers on This Project. Check out the paper and project. Please Don't Forget To Join Our ML Subreddit
Daniele Lorenzi received his M.Sc. in ICT for Internet and Multimedia Engineering in 2021 from the University of Padua, Italy. He is a Ph.D. candidate at the Institute of Information Technology (ITEC) at the Alpen-Adria-Universität (AAU) Klagenfurt. He is currently working in the Christian Doppler Laboratory ATHENA and his research interests include adaptive video streaming, immersive media, machine learning, and QoS/QoE evaluation.