Face recognition across Near Infrared (NIR) and VIS images have recently gained increasing attention in the computer vision community to overcome the problem that traditional VISible (VIS) image face recognition methods generally fail to achieve satisfactory performance under poor illumination. However, the NIR-VIS face recognition network training is prone to overfitting because of a lack of sufficient NIR-VIS data. Previous efforts attempted to build large-scale NIR-VIS face pictures by converting VIS data to NIR images to avoid overfitting. However, image-to-image translation algorithms are constrained by the amount of data in the source domain and the variety of created pictures.
Fig 1(a) DVG-Face produced numerous VIS (top row) and NIR (bottom row) face images of the same identity.
Recently, unconditional generative models have been used to synthesize heterogeneous face picture pairs from noise and achieve state-of-the-art performance by including the target NIR-VIS datasets’ numerous intra-changes, such as postures and lighting, during the creation. Despite taking into account intra-class variety, only one NIR-VIS pair is created for each identity, limiting the potential of synthetic face pictures in the NIR-VIS face identification test. When they generate several NIR-VIS picture pairings for a particular identity, they notice that the identity consistency could be better maintained, as seen in Fig. 1. (a).
Furthermore, the created pictures’ appearance variations depend on the target NIR-VIS face recognition datasets, which means that different facial images are synthesized to match various target datasets. The generalizability of NIR-VIS encounters recognition networks suffers from such dataset-specific face synthesis. To address the abovementioned issues, they present a unique physically-based facial image creation approach that generates high-quality NIR-VIS facial picture pairs from obtained renderable 3D facial assets. We may get paired labeled training data with adjustable identity, position, expression, and lighting by rendering photorealistic 3D facial datasets.
Fig 1(b) Multiple VIS (top row) and NIR (bottom row) face images of the same identity generated by SOTA method discussed int the paper
Recent research has proposed ways for generating high-quality renderable components from arbitrary facial photographs. In contrast to generative approaches, the displayed identity does not change while other parameters are changed, which considerably facilitates training. However, gathering human-rendering elements necessitates a significant amount of manual labor, either by scanning systems or by artists. The accessible datasets are either too tiny or lack relightable reflectances, such as diffuse albedo, specular albedo, and normals. Furthermore, Wood et al. demonstrated that high-quality synthetic face data might be successfully utilized successfully for computer vision tasks such as landmark localization and facial parsing.
However, to their knowledge, no dataset or approach exists to generate renderable 3D faces in both the VIS and NIR domains. They build several such facial assets, translate them from VIS to NIR, and then display them under the same circumstances to obtain high-quality training data using a cutting-edge facial reflectance collection approach. The person’s identification is fully retained in both NIR and VIS because their new transformation process is used per pixel on high-resolution reflectance maps. Figure 1 depicts faces created by the suggested approaches (b). As can be observed, their NIR-VIS face creation exceeds Fig. 1(a) in terms of identification consistency and facial appearance diversity.
To aid identity feature learning while reducing modality discrepancy, an IDentity-based Maximum Mean Discrepancy (ID-MMD) loss is presented. This loss draws closer together the feature centroids of the same identity in the NIR and VIS domains. The ID-MMD loss aids in bridging the gap between NIR and VIS pictures at the domain level. At the same time, the network is taught to focus on identification traits rather than facial aspects of instances, such as postures and accessories. The resulting high-quality NIR-VIS facial image dataset is then used to train the NIR-VIS to meet the requirements of the recognition network alongside a VIS face recognition dataset.
Overall, their primary contributions can be summarized as follows:
- A system capable of creating massive volumes of paired NIR and VIS faces pictures of diverse identities, positions, and lighting using 3D facial reconstruction and a remarkable VIS-to-NIR transformation for facial reflectance is suggested.
- They suggest an IDentity-based Maximum Mean Discrepancy (ID-MMD) loss to bridge the gap between NIR and VIS pictures, which minimizes the modality discrepancy at the domain level and motivates the network to pay attention to identity traits rather than face details.
- Extensive trials on four NIR-VIS face recognition benchmarks show that the proposed technique outperforms state-of-the-art algorithms while requiring no existing NIR-VIS face recognition dataset. Their strategy outperforms SOTA by slightly fine-tuning the models on the target NIR-VIS facial recognition dataset.
The whole project codebase is freely available on GitHub.
Check out the Paper, Tool, and GitHub link. All Credit For This Research Goes To Researchers on This Project. Also, don’t forget to join our Reddit page and discord channel, where we share the latest AI research news, cool AI projects, and more.
Aneesh Tickoo is a consulting intern at MarktechPost. He is currently pursuing his undergraduate degree in Data Science and Artificial Intelligence from the Indian Institute of Technology(IIT), Bhilai. He spends most of his time working on projects aimed at harnessing the power of machine learning. His research interest is image processing and is passionate about building solutions around it. He loves to connect with people and collaborate on interesting projects.