The age of the internet offers a wide set of tools to perform tasks which were thought to be impossible a generation ago. These tasks, often trivial, yet malicious like manipulation of photos or re-creation of videos with fake faces have become more popular now than ever before. Tools like Adobe’s Photoshop are the go-to software to edit images.
To keep the malicious actors at bay, Adobe Research team in collaboration with UC Berkeley develop a system to detect photoshopped faces.
“While such editing operations have helped enable creative expression if done without the viewer’s knowledge, they can have serious negative implications, ranging from body image issues set by unrealistic standards to the consequences of “fake news” in politics,” wrote the authors regarding the motivation behind their work.
To detect the fake photos the authors present two models in this paper:
- a global classification model, tasked with predicting whether a face has been warped, and
- a local warp predictor, which can be used to identify where manipulations occur and reverse them
Predicting What Moved Where
Face warping is an interesting problem as it is a domain that is surprisingly hard for people to detect, but it is commonly used and has wide-reaching implications.
In this paper, the researchers suggest an approach, which consists of a convolutional neural network(CNN) carefully trained to detect facial warping modifications in images. As with any deep learning method, collecting enough supervised training data is always a challenge. This is especially true for forensics applications since there are no large-scale datasets of manually created visual fakes.
First, a large dataset of real face images was created by scraping from different sources on the internet. Then the Face-Aware Liquify tool in Photoshop is scripted directly, which abstracts facial manipulations into a high level, such as “increase nose width” and “decrease eye distance”.
A Dilated Residual Network variant (DRN-C-26), pre-trained on the ImageNet dataset, was used as a base network for local prediction.
The results show that training for flow regression network directly performs poorly. So, the researchers first recast the problem into multinomial classification, commonly used in regression problems and then fine-tuned it for regression
To find which parts of an image were warped, and what did the image look like prior to manipulation, an optical flow field is predicted from the original image. A flow prediction model was trained to predict the per-pixel warping field.
Usually, an optical flow would give out the difference between warped images and those of ground-truth by considering pixel intensities and other such attributes. For instance, if an object is getting brighter with every frame then it can be inferred that the object is not only moving but also coming closer as well.
For evaluating the classification task, ranking-based scores that are not sensitive to the “base rate” of the fraction of fake images were used. For this, the Average Precision (AP) and Two Alternative Forced Choice (2AFC) score were used.
Along with demonstrating a technique to identify morphed images, the authors also have experimented on image puppeteering where a video (from a different subject) is used to animate an input image via image warping and the addition of extra details, such as skin wrinkles and texture in the eyes and mouth than the same manipulation detection model is applied to this data. The results show that despite not being trained on this data, the model makes reasonable predictions.
Future Of Forensics
Forensics algorithms are often sensitive to operations as discussed above. To increase the robustness of algorithms, the authors have considered more aggressive data augmentation, including resizing methods (bicubic and bilinear), JPEG compression, brightness, contrast, and saturation.
This work on facial warp detection as an important step toward making forensics methods for analyzing images of a human body, and extending these approaches to body manipulations and photometric edits such as skin smoothing are interesting avenues for future work. The authors also believe that their work will lead to developing forensics tools that learn without labeled data, and which incorporate interactive editing tools into the training process.
Know about this work here.