The world has become more connected now than ever before. An opinion can fly at light speed across continents and a revolution can be sparked by remote players in a matter of hours. And as the technology keeps improving, new methods are being discovered by fraudulent players to up their ante.
Most platform owners tussle with the after-effects of an aggravation. To identify the perpetrator and discourage them to carry out any illicit, immoral social engineering would require automated detection techniques.
In a new research funded by Google, Microsoft, and the Defense Advanced Research Projects Agency, a group of researchers developed a new method to keep the world leaders immune to the infamous deep fake.
“With relatively modest amounts of data and computing power, the average person can, for example, to create a video of a world leader confessing to illegal activity leading to a constitutional crisis, a military leader saying something racially insensitive leading to civil unrest in an area of military activity, or a corporate titan claiming that their profits are weak leading to global stock manipulation,” said the authors in their paper titled ‘Protecting World Leaders Against Deep Fakes’.
Dawn Of Multimedia Forensics
Deep fakes can be categorised as follows:
- Face-swap, in which the face in a video is automatically replaced with another person’s face
- Lip-sync, in which a source video is modified so that the mouth region is consistent with an arbitrary audio recording
- Puppet-master, in which a target person is animated (head movements, eye movements, facial expressions) by a performer sitting in front of a camera and acting out what they want their puppet to say and do.
The talking style and facial behaviour of a person can vary with the context in which the person is talking. Facial behaviour while delivering a prepared speech, for instance, can differ significantly as compared to answering a stressful question during a live interview
The researchers collected videos like weekly addresses, where leaders like Barack Obama was talking to a camera
In the second experiment, another round of videos of Obama were collected in even more significantly different contexts ranging from an interview in which he was looking at the interviewer and not the camera to a live interview in which he paused significantly more during his answer and tended to look downward contemplatively.
Shown above are five equally spaced frames from a 250-frame clip annotated with the results of OpenFace tracking along with the intensity of one action unit AU01 (eyebrow lift) measured over this video clip.
These regularities were exploited by building soft biometric models of high-profile individuals and then use these models to distinguish between real and fake videos
The open-source facial behaviour analysis toolkit OpenFace2 was used to extract facial and head movements in a video.
The facial and head movements were tracked and then the presence and strength of specific action units were extracted. A novelty detection model (one-class support vector machine (SVM)) was built that distinguishes an individual from other individuals as well as comedic impersonators and deep-fake impersonators
The SVM hyper-parameters that control the Gaussian kernel width and outlier percentage are optimised using 10% of the video clips of random people taken from the FaceForensics original video data set. The SVM is trained on the 190 features extracted from overlapping 10-second clips
The results show the efficacy of this approach on a large number of deep fakes of a range of U.S. politicians ranging from Hillary Clinton, Barack Obama, Bernie Sanders, Donald Trump, and Elizabeth Warren.
The following objectives were demonstrated by the authors in the paper:
- Show that when individuals speak, they exhibit relatively distinct patterns of facial and head movements
- Show that the creation of all three types of deep fakes tends to disrupt these patterns because the expressions are being controlled by an impersonator (face-swap and puppet-master) or the mouth is decoupled from the rest of the face (lip-sync)
This approach, unlike previous approaches, is resilient to laundering because it relies on relatively coarse measurements that are not easily destroyed, and is able to detect all three forms of deep fakes. The success of this method on a large scale relies on the diversity of the dataset that is being gathered of the targets prone to this deep fake attack
There is a lot of misinformation circulated and it gets worse with the popularity of the entities involved. This virtual wildfire decouples the users from the truth and they usually end up in their own echo chambers. As the attention of the world media shifts towards the elections in the US, there will probably be attempts at foul play and having readily available tools to curate the data is almost mandatory.
Read the full work here