International Journal of Computer Science Applications & Information Technologies (IJCSAIT) Vol.2, No.1 (June 2014) 1 Automatic Face Tagging for Personal and Group Photos Sikha O.K1, Rohit.S2, Vineet Nair3, Pavithran.S4, Dharmik G Kothari5, Ganesh Kumar6 1 Computer Science, Amrita Vishwa Vidyapeetham, Coimbatore, India 1 2 [email protected] Computer Science, Amrita Vishwa Vidyapeetham, Coimbatore, India 2 3 [email protected] (Corresponding author) Computer Science, Amrita Vishwa Vidyapeetham, Coimbatore, India 3 [email protected] 4 Computer Science, Amrita Vishwa Vidyapeetham, Coimbatore, India 4 [email protected] 5 Computer Science, Amrita Vishwa Vidyapeetham, Coimbatore, India 5 [email protected] 6 Computer Science, Amrita Vishwa Vidyapeetham, Coimbatore, India 6 [email protected] Abstract In today’s world, almost every person with a computer has large collections of photographs. The most important issue in managing such huge photo collections is a way to effectively store them for easy retrieval. Traditionally, photograph collections have been stored according to the date of creation, size, events, etc. However, recent studies have shown that most people prefer to group their photograph collections based on the persons presence in them. This study proposes a system for automatic face tagging in personal and group photos using image processing and computer vision techniques. In recent years, many new algorithms have been proposed to make face tagging as an automatic process. This study divides the whole process into various sub processes including face detection and segmentation, subject clustering and selecting the best image among the cluster and finally faces recognition and tagging. Thus, combining all these stages gives rise to a robust face tagging system that can be put to use as per requirements. Keywords: Face recognition, tagging, group photographs, detection, clustering I. Introduction An automatic system that will tag persons in personal and group photos using face recognition techniques is an emerging area of research in computer vision. Recent surveys have shown that most people would prefer to group their personal photo collections based on who is present in them [1]. They like to do this with preferably minimal effort on their part. Thus, a fully automatic system that would tag photos with least user intervention is the need of the hour. Since face tagging is essentially a special application of face recognition, it has all stages involved in a typical face recognition process. These are: Face detection, segmentation and subsequently face recognition [2]. However, in this work, subject clustering and an algorithm to select the most suitable image from a cluster are also included. Face detection and segmentation are primarily used to detect the presence of faces and then segment out the faces from the rest of the photo which is then used for later processes. Face detection could have separate applications, as in digital cameras for auto face capture or it could be used as the first step in a face recognition process as in this study [3]. Subject clustering is added to reduce the time required for recognition, i.e. once if a person is recognised then he need not be recognised again in subsequent images; instead he could directly be tagged. Once subject clustering and the best image for www.arpublication.org 2 recognition are chosen, the images are ready to be supplied to a face recognition and tagging system. These processes will identify the faces given to them and tag the corresponding persons. Thus, the input to the system will be a set of personal photos with one or more persons in each photo and the output will identify the persons present in each photo. There will be a training database of people most likely to occur in these test photos and any person not present in this database will remain untagged. The rest of this paper is arranged as follows: Section 1 is about related work in the area of automatic face tagging, Section 2 about the overall architecture of the system, Section 3 discusses about the first step in the system, face detection. Subject clustering is discussed in Section 4, various image quality measures to determine the most suitable image in Section 5, face recognition in Section 6, a backtracking algorithm to tag persons in the original photograph in Section 7 and finally, Conclusion and Future enhancements in Section 8. II. Related work Using face tagging as an effective method of management of large photo collections is an area of intense research in computer vision and image processing. According to recent surveys, it was found that most people preferred to manage their photo collections based on who is present in them [4]. People found it easier to tag persons present in their photo collections for easy retrieval, thus effective photo management was the motivation for tagging [5]. Early face tagging systems were not fully automatic. They required the user to manually label all persons present in photographs and then manually confirm if that person was present in each group photo [6]. Automatic face recognition algorithms have subsequently been developed. However, these use the traditional face recognition algorithms for tagging photos [7]. 1) Traditional face tagging systems work by supplying inputs to the system in a serialized fashion. The proposed system works by giving up to four input photographs at the same time. 2) Existing systems do not consider clustering as a part of face tagging. This is because they do not consider cases with more than 1 input at a time. The proposed system includes subject clustering which will make the recognition fast and simple. III. Architecture of the system The overall architecture of the system is shown in Fig.1: first, a set of up to 4 input photographs is given to the system. This is fed to the face detection step. Face detection detects all the faces in the given set of photographs and gives the segmented faces to subject clustering. Subject clustering then takes these segmented faces and clusters them according to similarity. Certain image quality measures are then considered to select the best images from each cluster and then given to the face recognition step. Face recognition then recognizes the faces given to it as input by comparing to the training database while finally backtracking methods are used to tag the recognized persons in the original group photographs. www.arpublication.org International Journal of Computer Science Applications & Information Technologies (IJCSAIT) Vol.2, No.1 (June 2014) 3 Fig. 1 Architecture diagram of the system IV. Face Detection The first step to be performed is detecting faces from the photo. Several algorithms are there to detect face from a given input image. Among them Viola Jones (VJ) algorithm has been used since it has lesser false detection rate compared to other existing algorithms [8]. The proposed system follows two steps to detect the face from the photo. When the input photos are given VJ algorithm is applied to detect face followed by nose. If the detected region has a nose like part inside, then it is classified as a face. The VJ algorithm has two main steps i) ii) Feature Extraction using rectangular filters similar to Haar features. Classifying these features as a face or non-face using AdaBoost algorithm.[9] First feature extraction is done using rectangular filters. Here four rectangle filters and their combinations have been used. The filter has a black and grey region which calculates the luminance of that region. To speed up the process Integral Image technique is used. After the feature extraction, classification is done using AdaBoost algorithm. AdaBoost algorithm uses cascade classifier for better accuracy. Fig. 2 shows the example of a Cascade Classifier. Similar steps are applied for nose detection also. After that the combined output is used to detect faces and the faces are segmented and are used for subject clustering later.Fig. 3, Fig. 4 and Fig. 5 shows sample output of the face detection module. www.arpublication.org 4 Fig. 2 Cascade Classifier Fig. 3 Output of face detection Fig. 4 Output of nose detection www.arpublication.org International Journal of Computer Science Applications & Information Technologies (IJCSAIT) Vol.2, No.1 (June 2014) 5 Fig. 5 Combined output of Face and Nose detection V. Subject Clustering The next step in the automatic face tagging system is Subject Clustering. Clustering is a data processing method which groups similar data together for efficient processing. Since the face tagging system will be applied on sets of 4 or less informal group photographs, it is highly likely that the same people will appear in 2 or more of the photographs [11]. Therefore, it makes sense to implement an algorithm that will group faces of the same person together, or put them in one cluster. The input given to this step is a set of faces, detected and segmented from a set of up to 4 group photographs. The output from subject clustering is a set of clusters, each containing faces of one person. This output is then given to the face recognition step. Subject Clustering is traditionally not included in a face recognition system. But it is a highly important step in face tagging systems. Consider an example input dataset with 4 group photographs, in which 3 people appear 3 times and 2 people appear twice. In a normal face recognition application, the segmented faces will be directly given to the recognition step which will apply the recognition algorithm to all faces irrespective of whether a face is repeated. But applying the recognition algorithm unnecessarily to duplicate faces decreases the efficiency of the system. Instead, it would be enough if only one face image per person is given for recognition. That is, it is sufficient if recognition is done for one image in each subject cluster because it is known that all images in a cluster belong to the same person [12]. Hence, subject clustering improves the face tagging system greatly. The subject clustering algorithm works in the following way: Consider a situation with i segmented faces. Each face is represented by Fi, where Fi is the feature vector of Face i, extracted using a facial feature extraction algorithm. The feature extraction algorithm used in this subject clustering system is Local Binary Patterns. The Local Binary Pattern algorithm is applied on every face and the histogram of the resulting LBP image is takes as the feature vector for that pace. Since LBP is a local feature based algorithm, it gives better results than global feature based algorithms like PCA, LDA [13]. Now that Fi has been calculated www.arpublication.org 6 for each face, the dissimilarity between every pair of faces is calculated using the formula given in (1) below. Df (Fi , Fj) = ∑ D(fni , fnj) (1) Where Fi and Fj are the two faces between which dissimilarity is calculated and fni and fnj are the nth feature vector of face i and j respectively. Once D f(Fi,Fj) is calculated for every pair of faces, the pair with the least dissimilarity is grouped together in one cluster. Again, the dissimilarity is calculated between every pair of images and the pair with least dissimilarity is grouped together. This step is repeated until the least dissimilarity is greater than a certain pre-calculated threshold. Once, the algorithm completes all iterations, the result is a set of clusters that is given to the face recognition step. For example, consider the input data set given in Fig. 6. This input data set is first given to the face detection step which detects and segments the faces. These faces are then given as input to the subject clustering algorithm. The final result is a set of clusters as shown in Fig. 7. Fig. 6 2 group photographs given as input to the face tagging system www.arpublication.org International Journal of Computer Science Applications & Information Technologies (IJCSAIT) Vol.2, No.1 (June 2014) 7 Fig. 7 Resultant set of 5 clusters, each having two faces. VI. Image Quality Measures There are many methods to find a best image from a set of images [14]. Normalized Absolute Error (NAE) method has been used in this study. In this method NAE finds the errors in a set of images then it will select the minimum error image from which will select the best image [15]. Normalized Absolute Error (NAE) From a set of images as shown in fig 6, NAE will find the number of images and then the algorithm will look for the errors in each image. To find error it needs to compare two images or more. Instead of that the algorithm will apply a Gaussian noise to the desired image and it is compared with the original image. From Comparison NAE finds the error as shown in (2). Repeating this for a set of images, a set of error values is obtained and then NAE will select the minimum error valued image so it will be the best image as shown in fig 8. NAE=∑ ∑ | | ∑ ∑ | | (2) www.arpublication.org 8 Fig. 8 Set of input images given to find the best image. Fig. 9 Best output image selected from the input images. VII. Face Recognition Once the best image among the cluster is selected by the NAE algorithm, face recognition is performed. That is, the selected ‘best’ image is compared with the database to find the right match. In order to do so, ICA (Independent Component Analysis) algorithm technique is used. The main purpose of using ICA algorithm is to decompose the observed signal into linear combination of independent components. Hence, it minimizes both second order dependencies as well as higher order dependencies. [16] ICA is considered to be a generalisation technique of a most widely used method-PCA (Principal Component Analysis). Therefore, the PCA algorithm will act as a baseline algorithm. [17] In this section we will analyse the Architecture I of ICA for image representation. In Architecture I, the face images are considered as random variables and the pixel values are used to provide observations for these variables. The algorithm is as follows: The input face images in X (observed mixtures) are considered to be a linear mixture of statistically independent basis image S combined by an unknown mixing matrix A shown below in (3). i.e X=AS (3) Based on these observed mixtures, ICA algorithm tries to find the mixing matrix, A as shown in (4) U=WX=WAS (4) Here U is an estimation of independent source signal. First apply PCA such that it decorrelates the training data such that the covariance of the training data is zero. It will also help in projecting the data into a subspace of dimension m to control the number of Independent components that www.arpublication.org International Journal of Computer Science Applications & Information Technologies (IJCSAIT) Vol.2, No.1 (June 2014) 9 would be produced by ICA. Hence, it reduces the time complexity by minimising pair-wise dependencies [18]. Now, the whitening transform of the data is determined by D-1/2 RT where D is the diagonal matrix of the Eigen values and R is the Eigen vectors of the co-variance matrix. ICA transforms this whitened data into a set of statistically independent images.Images are said to be statistically independent when equation 5 is true Fu(u)= i fui (ui) (5) where Fu is the probability density function of u. However, there is no closed form expression to fully satisfy this independence condition and find the weight matrix W. Instead we use Infomax algorithm to approximate W so as to maximise independence. Thus, W-1, the inverse of the weight matrix can be interpreted as the source mixing Mathematically, let V be the p by m matrix containing the first m Eigen vectors from a set of n face images. The rows of the input matrix(X) to ICA are variables and the columns are observations. ICA is performed on RT. Then the n by m ICA coefficient matrix F, can be computed as follows: Let C be the n by m matrix of PCA coefficients. Then C= X*R Also, we know that We get Therefore, (6) and U= W * RT X= C * RT (7). (8) RT = W-1 *U (9) X = (C * W-1) * U = B * U (10) Here, X is the reconstruction of original data with minimum squared error. Fig. 10 here shows how to find statistically independent images. Fig. 11 and Fig. 12 show test images and recognized images after running this module Fig. 10 Finding statistically independent images Fig. 11 Test images www.arpublication.org 10 Fig. 12 Matched Faces VIII. Backtracking To tag all the faces in the group photo, the names of the recognized face should be retrieved from the training database. These retrieved names are stored in an array. For every detected face image the original face image should be retrieved from the database. Original face images are also stored in an array. The names should be tagged at the right face, to do that co-ordinates of each face should be obtained from the face detection module. Every face with its recognized name should be stored in a string. Now open the original image and the names should be inserted in the corresponding co-ordinates that is already obtained. Above process is continued till all the faces are tagged. IX. Conclusion and future enhancements Thus, the given group photographs are processed and gone through all the processes. A native training database was created and a set of test cases were made. The constraints imposed on these input photos were that the photos should be at least 2000*2000 in size, a maximum of 8-10 persons in each photograph and up to 4 group photographs at a time. Based on these parameters, the faces were detected and segmented, clustered, the best image was selected for recognition and finally recognition and tagging was done. The future scope of this paper is that the face tagging system can be used as a system to group photograph collections automatically and possibly share each photograph automatically to all those persons present in the photograph. Another possible future improvement could be a real time face tagging system that tags photos as soon as they have been clicked. References [1] M. Zhao, Y. W. Teo, S. Liu, T. S. Chua, and R. Jain, “Automatic person annotation of family photo album,” in Proc. LNCS Int. Conf. CIVR, 2006, pp. 163–172. [2] S. J. D. Prince, J. H. Elder, J. Warrell, and F. M. Felisberti, “Tied factor analysis for face recognition across large pose differences,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 30, no. 6, pp. 970–984, Jun. 2008. [3] M. H. Yang, D. J. Kriegman, and N. Ahuja, “Detecting faces in images: A survey,” IEEE Trans. Pattern. Anal. Mach. Intell., vol. 24, no. 1, pp. 34–58, Jan. 2002. [4] K. Rodden and K. R. Wood, “How do people manage their digital photographs?” in Proc. ACM Hum. Factors Comput. Syst., 2003, pp. 409–416. [5] M. Ames and M. Naaman, “Why we tag: Motivations for annotation in mobile and online media,” in Proc. ACM Int. Conf. CHI, 2007, pp. 971–980. [6] B. Suh and B. B. Bederson, “Semi-automatic photo annotation strategies using event based clustering and clothing based subject recognition,” Int. J. Interacting Comput., vol. 19, no. 2, pp. 524-544, 2007. [7] A. K. Jain, A. Ross, and S. Prabhaker, “An introduction to biometric recognition,” IEEE Trans. Circuits Syst. Video Technol., vol. 14, no. 1, pp. 4–20, Jan. 2004. [8] Viola, P. and Jones,” M. Rapid object detection using a boosted cascade of simple features.” In IEEE Conference on Computer Vision and Pattern Recognition, 2001 www.arpublication.org International Journal of Computer Science Applications & Information Technologies (IJCSAIT) Vol.2, No.1 (June 2014) 11 [9] P. Viola and M. Jones,” Fast and Robust Classification using Asymmetric AdaBoost and a Detector Cascade.” Mitsubishi Electric Research Lab, Cambridge, MA. 2001 [10] Pavlovic V. and Garg “A. Efficient Detection of Objects and Attributes using Boosting.’’ IEEE Conf. Computer Vision and Pattern Recognition.2001. [11] Jae Young Choi, Wesley De Neve, Yong Man Ro, Konstantinos N. Plataniotis. “Automatic Face Annotation in Personal Photo Collections Using Context-Based Unsupervised Clustering and Face Information Fusion”. Ieee transactions on circuits and systems for video technology, 2010. [12] Chunhui Zhu, Fang Wen, Jian Sun. “A Rank-Order Distance based Clustering Algorithm for Face Tagging”.IEEE, 2010. [13] Ahonen, T. Hadid, A., Pietikainen, M. “Face Description with Local Binary Patterns: Application to Face Recognition”, Pattern Analysis and Machine Intelligence, IEEE Transactions on (Volume:28 , Issue: 12 ), 2006 [14] N. Ponomarenko, M. Carli, V. Lukin, K. Egiazarian, J. Astola, and F. Battisti, "Color image database for evaluation of image quality metrics," in Proceedings of International Workshop on Multimedia Signal Processing, Oct 2008, pp. 403-408. [15]Kim-Han Thung, P.Raveendran, “A survey of image quality measures”, Technical Postgraduates (TECHPOS), 2009 International Conference, 14-15 Dec. 2009 [16] Bruce A. Draper, Kyungim Baek, Marian Stewart Bartlett and J. Ross Beveridge. Recognizing faces with PCA and ICA. Computer Vision and Image Understanding, 2003, pp 115-137. [17] Jian Yang, David Zang and Jing-yu Yang. Is ICA Significantly Better than PCA for Face Recognition? IEEE International Conference on Computer Vision, 2005, pp 5499-5505. [18] Chandrappa D N, M Ravishankar. Automatic Face Recognition in a crowded scene using Multi layered clutter filtering and Independent Component Analysis. IEEE, 2012. [19] Marian Stewart Bartlett, Javier R. Movellan, and Terrence J. Sejnowski. Face Recognition by Independent Component Analysis. IEEE Transactions on Neural Networks, Vol. 13, No. 6, November 2002. www.arpublication.org
© Copyright 2024 ExpyDoc