Full PDF - AR Publication

International Journal of Computer Science Applications & Information Technologies (IJCSAIT)
Vol.2, No.1 (June 2014)
1
Automatic Face Tagging for Personal and Group
Photos
Sikha O.K1, Rohit.S2, Vineet Nair3, Pavithran.S4, Dharmik G Kothari5, Ganesh Kumar6
1
Computer Science, Amrita Vishwa Vidyapeetham, Coimbatore, India
1
2
[email protected]
Computer Science, Amrita Vishwa Vidyapeetham, Coimbatore, India
2
3
[email protected] (Corresponding author)
Computer Science, Amrita Vishwa Vidyapeetham, Coimbatore, India
3
[email protected]
4
Computer Science, Amrita Vishwa Vidyapeetham, Coimbatore, India
4
[email protected]
5
Computer Science, Amrita Vishwa Vidyapeetham, Coimbatore, India
5
[email protected]
6
Computer Science, Amrita Vishwa Vidyapeetham, Coimbatore, India
6
[email protected]
Abstract
In today’s world, almost every person with a computer has large collections of photographs.
The most important issue in managing such huge photo collections is a way to effectively
store them for easy retrieval. Traditionally, photograph collections have been stored
according to the date of creation, size, events, etc. However, recent studies have shown that
most people prefer to group their photograph collections based on the persons presence in
them. This study proposes a system for automatic face tagging in personal and group photos
using image processing and computer vision techniques. In recent years, many new
algorithms have been proposed to make face tagging as an automatic process. This study
divides the whole process into various sub processes including face detection and
segmentation, subject clustering and selecting the best image among the cluster and finally
faces recognition and tagging. Thus, combining all these stages gives rise to a robust face
tagging system that can be put to use as per requirements.
Keywords: Face recognition, tagging, group photographs, detection, clustering
I.
Introduction
An automatic system that will tag persons in personal and group photos using face recognition
techniques is an emerging area of research in computer vision. Recent surveys have shown that
most people would prefer to group their personal photo collections based on who is present in
them [1]. They like to do this with preferably minimal effort on their part. Thus, a fully automatic
system that would tag photos with least user intervention is the need of the hour.
Since face tagging is essentially a special application of face recognition, it has all stages
involved in a typical face recognition process. These are: Face detection, segmentation and
subsequently face recognition [2]. However, in this work, subject clustering and an algorithm to
select the most suitable image from a cluster are also included. Face detection and segmentation
are primarily used to detect the presence of faces and then segment out the faces from the rest of
the photo which is then used for later processes. Face detection could have separate applications,
as in digital cameras for auto face capture or it could be used as the first step in a face recognition
process as in this study [3]. Subject clustering is added to reduce the time required for
recognition, i.e. once if a person is recognised then he need not be recognised again in subsequent
images; instead he could directly be tagged. Once subject clustering and the best image for
www.arpublication.org
2
recognition are chosen, the images are ready to be supplied to a face recognition and tagging
system. These processes will identify the faces given to them and tag the corresponding persons.
Thus, the input to the system will be a set of personal photos with one or more persons in each
photo and the output will identify the persons present in each photo. There will be a training
database of people most likely to occur in these test photos and any person not present in this
database will remain untagged.
The rest of this paper is arranged as follows: Section 1 is about related work in the area of
automatic face tagging, Section 2 about the overall architecture of the system, Section 3 discusses
about the first step in the system, face detection. Subject clustering is discussed in Section 4,
various image quality measures to determine the most suitable image in Section 5, face
recognition in Section 6, a backtracking algorithm to tag persons in the original photograph in
Section 7 and finally, Conclusion and Future enhancements in Section 8.
II.
Related work
Using face tagging as an effective method of management of large photo collections is an area of
intense research in computer vision and image processing. According to recent surveys, it was
found that most people preferred to manage their photo collections based on who is present in
them [4]. People found it easier to tag persons present in their photo collections for easy retrieval,
thus effective photo management was the motivation for tagging [5]. Early face tagging systems
were not fully automatic. They required the user to manually label all persons present in
photographs and then manually confirm if that person was present in each group photo [6].
Automatic face recognition algorithms have subsequently been developed. However, these use
the traditional face recognition algorithms for tagging photos [7].
1) Traditional face tagging systems work by supplying inputs to the system in a serialized
fashion. The proposed system works by giving up to four input photographs at the same
time.
2) Existing systems do not consider clustering as a part of face tagging. This is because they
do not consider cases with more than 1 input at a time. The proposed system includes
subject clustering which will make the recognition fast and simple.
III.
Architecture of the system
The overall architecture of the system is shown in Fig.1: first, a set of up to 4 input photographs is
given to the system. This is fed to the face detection step. Face detection detects all the faces in
the given set of photographs and gives the segmented faces to subject clustering. Subject
clustering then takes these segmented faces and clusters them according to similarity. Certain
image quality measures are then considered to select the best images from each cluster and then
given to the face recognition step. Face recognition then recognizes the faces given to it as input
by comparing to the training database while finally backtracking methods are used to tag the
recognized persons in the original group photographs.
www.arpublication.org
International Journal of Computer Science Applications & Information Technologies (IJCSAIT)
Vol.2, No.1 (June 2014)
3
Fig. 1 Architecture diagram of the system
IV.
Face Detection
The first step to be performed is detecting faces from the photo. Several algorithms are there to
detect face from a given input image. Among them Viola Jones (VJ) algorithm has been used
since it has lesser false detection rate compared to other existing algorithms [8]. The proposed
system follows two steps to detect the face from the photo. When the input photos are given VJ
algorithm is applied to detect face followed by nose. If the detected region has a nose like part
inside, then it is classified as a face. The VJ algorithm has two main steps
i)
ii)
Feature Extraction using rectangular filters similar to Haar features.
Classifying these features as a face or non-face using AdaBoost algorithm.[9]
First feature extraction is done using rectangular filters. Here four rectangle filters and their
combinations have been used. The filter has a black and grey region which calculates the
luminance of that region. To speed up the process Integral Image technique is used. After the
feature extraction, classification is done using AdaBoost algorithm. AdaBoost algorithm uses
cascade classifier for better accuracy. Fig. 2 shows the example of a Cascade Classifier. Similar
steps are applied for nose detection also. After that the combined output is used to detect faces
and the faces are segmented and are used for subject clustering later.Fig. 3, Fig. 4 and Fig. 5
shows sample output of the face detection module.
www.arpublication.org
4
Fig. 2 Cascade Classifier
Fig. 3 Output of face detection
Fig. 4 Output of nose detection
www.arpublication.org
International Journal of Computer Science Applications & Information Technologies (IJCSAIT)
Vol.2, No.1 (June 2014)
5
Fig. 5 Combined output of Face and Nose detection
V.
Subject Clustering
The next step in the automatic face tagging system is Subject Clustering. Clustering is a data
processing method which groups similar data together for efficient processing.
Since the face tagging system will be applied on sets of 4 or less informal group photographs, it is
highly likely that the same people will appear in 2 or more of the photographs [11]. Therefore, it
makes sense to implement an algorithm that will group faces of the same person together, or put
them in one cluster. The input given to this step is a set of faces, detected and segmented from a
set of up to 4 group photographs. The output from subject clustering is a set of clusters, each
containing faces of one person. This output is then given to the face recognition step.
Subject Clustering is traditionally not included in a face recognition system. But it is a highly
important step in face tagging systems. Consider an example input dataset with 4 group
photographs, in which 3 people appear 3 times and 2 people appear twice. In a normal face
recognition application, the segmented faces will be directly given to the recognition step which
will apply the recognition algorithm to all faces irrespective of whether a face is repeated. But
applying the recognition algorithm unnecessarily to duplicate faces decreases the efficiency of the
system. Instead, it would be enough if only one face image per person is given for recognition.
That is, it is sufficient if recognition is done for one image in each subject cluster because it is
known that all images in a cluster belong to the same person [12]. Hence, subject clustering
improves the face tagging system greatly. The subject clustering algorithm works in the following
way:
Consider a situation with i segmented faces. Each face is represented by Fi, where Fi is the feature
vector of Face i, extracted using a facial feature extraction algorithm. The feature extraction
algorithm used in this subject clustering system is Local Binary Patterns. The Local Binary
Pattern algorithm is applied on every face and the histogram of the resulting LBP image is takes
as the feature vector for that pace. Since LBP is a local feature based algorithm, it gives better
results than global feature based algorithms like PCA, LDA [13]. Now that Fi has been calculated
www.arpublication.org
6
for each face, the dissimilarity between every pair of faces is calculated using the formula given
in (1) below.
Df (Fi , Fj) = ∑ D(fni , fnj)
(1)
Where Fi and Fj are the two faces between which dissimilarity is calculated and fni and fnj are the
nth feature vector of face i and j respectively. Once D f(Fi,Fj) is calculated for every pair of faces,
the pair with the least dissimilarity is grouped together in one cluster. Again, the dissimilarity is
calculated between every pair of images and the pair with least dissimilarity is grouped together.
This step is repeated until the least dissimilarity is greater than a certain pre-calculated threshold.
Once, the algorithm completes all iterations, the result is a set of clusters that is given to the face
recognition step. For example, consider the input data set given in Fig. 6. This input data set is
first given to the face detection step which detects and segments the faces. These faces are then
given as input to the subject clustering algorithm. The final result is a set of clusters as shown in
Fig. 7.
Fig. 6 2 group photographs given as input to the face tagging system
www.arpublication.org
International Journal of Computer Science Applications & Information Technologies (IJCSAIT)
Vol.2, No.1 (June 2014)
7
Fig. 7 Resultant set of 5 clusters, each having two faces.
VI.
Image Quality Measures
There are many methods to find a best image from a set of images [14]. Normalized
Absolute Error (NAE) method has been used in this study. In this method NAE finds the errors in
a set of images then it will select the minimum error image from which will select the best image
[15].
Normalized Absolute Error (NAE)
From a set of images as shown in fig 6, NAE will find the number of images and then the
algorithm will look for the errors in each image. To find error it needs to compare two images or
more. Instead of that the algorithm will apply a Gaussian noise to the desired image and it is
compared with the original image. From Comparison NAE finds the error as shown in (2).
Repeating this for a set of images, a set of error values is obtained and then NAE will select the
minimum error valued image so it will be the best image as shown in fig 8.
NAE=∑
∑
|
|
∑
∑
|
|
(2)
www.arpublication.org
8
Fig. 8 Set of input images given to find the best image.
Fig. 9 Best output image selected from the input images.
VII.
Face Recognition
Once the best image among the cluster is selected by the NAE algorithm, face recognition is
performed. That is, the selected ‘best’ image is compared with the database to find the right
match. In order to do so, ICA (Independent Component Analysis) algorithm technique is used.
The main purpose of using ICA algorithm is to decompose the observed signal into linear
combination of independent components. Hence, it minimizes both second order dependencies as
well as higher order dependencies. [16]
ICA is considered to be a generalisation technique of a most widely used method-PCA (Principal
Component Analysis). Therefore, the PCA algorithm will act as a baseline algorithm. [17] In this
section we will analyse the Architecture I of ICA for image representation. In Architecture I, the
face images are considered as random variables and the pixel values are used to provide
observations for these variables. The algorithm is as follows: The input face images in X
(observed mixtures) are considered to be a linear mixture of statistically independent basis image
S combined by an unknown mixing matrix A shown below in (3).
i.e X=AS
(3)
Based on these observed mixtures, ICA algorithm tries to find the mixing matrix, A as shown in
(4)
U=WX=WAS
(4)
Here U is an estimation of independent source signal. First apply PCA such that it decorrelates
the training data such that the covariance of the training data is zero. It will also help in projecting
the data into a subspace of dimension m to control the number of Independent components that
www.arpublication.org
International Journal of Computer Science Applications & Information Technologies (IJCSAIT)
Vol.2, No.1 (June 2014)
9
would be produced by ICA. Hence, it reduces the time complexity by minimising pair-wise
dependencies [18].
Now, the whitening transform of the data is determined by D-1/2 RT where D is the diagonal
matrix of the Eigen values and R is the Eigen vectors of the co-variance matrix. ICA transforms
this whitened data into a set of statistically independent images.Images are said to be statistically
independent when equation 5 is true
Fu(u)=
i fui (ui)
(5)
where Fu is the probability density function of u. However, there is no closed form expression to
fully satisfy this independence condition and find the weight matrix W. Instead we use Infomax
algorithm to approximate W so as to maximise independence. Thus, W-1, the inverse of the
weight matrix can be interpreted as the source mixing
Mathematically, let V be the p by m matrix containing the first m Eigen vectors from a set of n
face images. The rows of the input matrix(X) to ICA are variables and the columns are
observations. ICA is performed on RT. Then the n by m ICA coefficient matrix F, can be
computed as follows:
Let C be the n by m matrix of PCA coefficients. Then
C= X*R
Also, we know that
We get
Therefore,
(6)
and
U= W * RT
X= C * RT
(7).
(8)
RT = W-1 *U
(9)
X = (C * W-1) * U = B * U
(10)
Here, X is the reconstruction of original data with minimum squared error. Fig. 10 here shows
how to find statistically independent images. Fig. 11 and Fig. 12 show test images and
recognized images after running this module
Fig. 10 Finding statistically independent images
Fig. 11 Test images
www.arpublication.org
10
Fig. 12 Matched Faces
VIII. Backtracking
To tag all the faces in the group photo, the names of the recognized face should be retrieved from
the training database. These retrieved names are stored in an array. For every detected face image
the original face image should be retrieved from the database. Original face images are also
stored in an array. The names should be tagged at the right face, to do that co-ordinates of each
face should be obtained from the face detection module. Every face with its recognized name
should be stored in a string. Now open the original image and the names should be inserted in the
corresponding co-ordinates that is already obtained. Above process is continued till all the faces
are tagged.
IX.
Conclusion and future enhancements
Thus, the given group photographs are processed and gone through all the processes. A native
training database was created and a set of test cases were made. The constraints imposed on these
input photos were that the photos should be at least 2000*2000 in size, a maximum of 8-10
persons in each photograph and up to 4 group photographs at a time. Based on these parameters,
the faces were detected and segmented, clustered, the best image was selected for recognition and
finally recognition and tagging was done. The future scope of this paper is that the face tagging
system can be used as a system to group photograph collections automatically and possibly share
each photograph automatically to all those persons present in the photograph. Another possible
future improvement could be a real time face tagging system that tags photos as soon as they have
been clicked.
References
[1] M. Zhao, Y. W. Teo, S. Liu, T. S. Chua, and R. Jain, “Automatic person annotation of family photo
album,” in Proc. LNCS Int. Conf. CIVR, 2006, pp. 163–172.
[2] S. J. D. Prince, J. H. Elder, J. Warrell, and F. M. Felisberti, “Tied factor analysis for face recognition
across large pose differences,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 30, no. 6, pp. 970–984,
Jun. 2008.
[3] M. H. Yang, D. J. Kriegman, and N. Ahuja, “Detecting faces in images: A survey,” IEEE Trans.
Pattern. Anal. Mach. Intell., vol. 24, no. 1, pp. 34–58, Jan. 2002.
[4] K. Rodden and K. R. Wood, “How do people manage their digital photographs?” in Proc. ACM Hum.
Factors Comput. Syst., 2003, pp. 409–416.
[5] M. Ames and M. Naaman, “Why we tag: Motivations for annotation in mobile and online media,” in
Proc. ACM Int. Conf. CHI, 2007, pp. 971–980.
[6] B. Suh and B. B. Bederson, “Semi-automatic photo annotation strategies using event based clustering
and clothing based subject recognition,” Int. J. Interacting Comput., vol. 19, no. 2, pp. 524-544, 2007.
[7] A. K. Jain, A. Ross, and S. Prabhaker, “An introduction to biometric recognition,” IEEE Trans. Circuits
Syst. Video Technol., vol. 14, no. 1, pp. 4–20, Jan. 2004.
[8] Viola, P. and Jones,” M. Rapid object detection using a boosted cascade of simple features.” In IEEE
Conference on Computer Vision and Pattern Recognition, 2001
www.arpublication.org
International Journal of Computer Science Applications & Information Technologies (IJCSAIT)
Vol.2, No.1 (June 2014)
11
[9] P. Viola and M. Jones,” Fast and Robust Classification using Asymmetric AdaBoost and a Detector
Cascade.” Mitsubishi Electric Research Lab, Cambridge, MA. 2001
[10] Pavlovic V. and Garg “A. Efficient Detection of Objects and Attributes using Boosting.’’ IEEE Conf.
Computer Vision and Pattern Recognition.2001.
[11] Jae Young Choi, Wesley De Neve, Yong Man Ro, Konstantinos N. Plataniotis. “Automatic Face
Annotation in Personal Photo Collections Using Context-Based Unsupervised Clustering and Face
Information Fusion”. Ieee transactions on circuits and systems for video technology, 2010.
[12] Chunhui Zhu, Fang Wen, Jian Sun. “A Rank-Order Distance based Clustering Algorithm for Face
Tagging”.IEEE, 2010.
[13] Ahonen, T. Hadid, A., Pietikainen, M. “Face Description with Local Binary Patterns: Application to
Face Recognition”, Pattern Analysis and Machine Intelligence, IEEE Transactions on (Volume:28 ,
Issue: 12 ), 2006
[14] N. Ponomarenko, M. Carli, V. Lukin, K. Egiazarian, J. Astola, and F. Battisti, "Color image
database for evaluation of image quality metrics," in Proceedings of International Workshop on Multimedia
Signal Processing, Oct 2008, pp. 403-408.
[15]Kim-Han Thung, P.Raveendran, “A survey of image quality measures”, Technical Postgraduates
(TECHPOS), 2009 International Conference, 14-15 Dec. 2009
[16] Bruce A. Draper, Kyungim Baek, Marian Stewart Bartlett and J. Ross Beveridge. Recognizing faces
with PCA and ICA. Computer Vision and Image Understanding, 2003, pp 115-137.
[17] Jian Yang, David Zang and Jing-yu Yang. Is ICA Significantly Better than PCA for Face Recognition?
IEEE International Conference on Computer Vision, 2005, pp 5499-5505.
[18] Chandrappa D N, M Ravishankar. Automatic Face Recognition in a crowded scene using Multi
layered clutter filtering and Independent Component Analysis. IEEE, 2012.
[19] Marian Stewart Bartlett, Javier R. Movellan, and Terrence J. Sejnowski. Face Recognition by
Independent Component Analysis. IEEE Transactions on Neural Networks, Vol. 13, No. 6, November
2002.
www.arpublication.org