Efficient face detection and replacement in the creation of simple fake videos
DOI:
https://doi.org/10.15276/aait.06.2023.20Keywords:
Deepfake, affine transformation, face detection, video processing, alpha channel, binary masksAbstract
Face detection and facial recognition technologies are among the most intensively studied topics within the field of computer vision, owing to their vast application potential across a multitude of industries. These technologies have demonstrated practical applicability in varied contexts such as identifying suspicious individuals in crowded urban spaces, real-time recognition of smartphone owners, creating compelling deepfakes for entertainment applications, and specialized applications that modify the movements of facial features such as the lips or eyes. With the current state-of-the-art advancements in hardware and software technology, today's technological infrastructure provides more resources than are necessary for video streaming. As a result, simple face recognition systems can be implemented without the need for high-cost server instances that require specified pre-trained models. This abundance of resources is changing the landscape of face recognition, and the discussion within this paper will revolve around these emerging paradigms. The primary focus of this article is an in-depth analysis of the key concepts of face detection in streaming video data using prominent pre-trained models. The models under discussion include HRNet, RetinaFace, Dlib, MediaPipe, and KeyPoint R-CNN. Each of these models has its strengths and weaknesses, and the article discusses these attributes in the context of real-world case studies. This discussion provides valuable insights into the practical applications of these models and the trade-offs involved in their utilization. Moreover, this paper presents a comprehensive overview of image transformation techniques. It introduces an abstract method for affine image transformation, an important technique in image processing that changes the geometric properties of an image without affecting its pixel intensity. Additionally, the article discusses image transformation operations executed through the OpenCV library, one of the leading libraries in the field of computer vision, providing a highly flexible and efficient toolset for image manipulation. The culmination of this research is presented as a practical standalone system for image replacement in video. This system leverages the RetinaFace model for inference and employs OpenCV for affine transformations, demonstrating the concepts and technologies discussed in the paper. The work outlined in this article thereby advances the field of face detection and recognition, presenting an innovative approach that makes full use of contemporary hardware and software advances.