Pattern recognition systems – Lab 10 Linear Discriminant Analysis 1. Objectives In this lab session we will study the linear discriminant analysis method for classifying patterns. The method will be implemented in the case when the feature space is two-dimensional and two classes of patterns exist. The patterns are considered to be a set of 2D points and the features are their horizontal and vertical coordinates. It is also known that these points are divided in two classes and we know for each point to what class it belongs. The objective is to classify them using the linear discriminant analysis method. 2. Theoretical Background The purpose of Discriminant Analysis is to classify objects (people, customers, things, etc.) into one of two or more groups based on a set of features that describe the objects (e.g. gender, age, income, weight, preference score, etc. ). In general, we assign an object to one of a number of predetermined groups based on observations made on the object. If we can assume that the groups are linearly separable, we can use linear discriminant model (LDA). Linearly separable suggests that the groups can be separated by a linear combination of features that describe the objects. If only two features, the separators between objects group will become lines. If the number of features is three, the separator is a plane and if the number of features (i.e. independent variables) is greater than 3, the separators become a hyper-plane. The objective of LDA is to perform dimensionality reduction while preserving as much of the class discriminatory information as possible. Assume we have a set of Ddimensional samples {x(1, x(2, …, x(N}, N1 of which belong to class ω1, and N2 to class ω2. We seek to obtain a scalar y by projecting the samples x onto a line: y = w Tx Of all the possible lines we would like to select the one that maximizes the separability of the scalars. This is illustrated for the two-dimensional case in the following figures: 1 In order to find a good projection vector, we need to define a measure of separation between the projections. The mean vector of each class in x and y feature space is: We could then choose the distance between the projected means as our objective function: However, the distance between the projected means is not a very good measure since it does not take into account the standard deviation within the classes. The next figure shows these issue: The solution proposed by Fisher is to maximize a function that represents the difference between the means, normalized by a measure of the within-class scatter. For each class we define the scatter, an equivalent of the variance, as where the quantity is called the within-class scatter of the projected examples. The Fisher linear discriminant is defined as the linear function wTx that maximizes the criterion function: Therefore, we will be looking for a projection where examples from the same class are projected very close to each other and, at the same time, the projected means are as farther apart as possible: In order to find the optimum projection w*, we need to express J(w) as an explicit function of w. We define a measure of the scatter in multivariate feature space x, which are scatter matrices: 2 Si = 1 Ni ∑ω ( x − µ )( x − µ ) x∈ i T i i S1 + S 2 = SW where SW is called the within-class scatter matrix. The scatter of the projection y can then be expressed as a function of the scatter matrix in feature space x: Similarly, the difference between the projected means can be expressed in terms of the means in the original feature space: The matrix SB is called the between-class scatter. Note that, since SB is the outer product of two vectors, its rank is at most one. We can finally express the Fisher criterion in terms of SW and SB as To find the maximum of J(w) we derive and equate to zero: Dividing by wTSWw: Solving the generalized eigenvalue problem (SW-1SBw=Jw) yields: Limitations of LDA: - LDA produces at most C-1 feature projections (C is the number of classes) - If the classification error estimates establish that more features are needed, some other method must be employed to provide those additional features; - LDA is a parametric method since it assumes unimodal Gaussian likelihoods 3 - If the distributions are significantly non-Gaussian, the LDA projections will not be able to preserve any complex structure of the data, which may be needed for classification - LDA will fail when the discriminatory information is not in the mean but rather in the variance of the data 3. Numerical example Compute the Linear Discriminant projection for the following two-dimensional dataset: X1=(x1,x2)={(4,1),(2,4),(2,3),(3,6),(4,4)} X2=(x1,x2)={(9,10),(6,8),(9,5),(8,7),(10,8)} The class statistics are: The between- and within-class scatter are: The LDA projection is then obtained directly by: 𝑤 ∗ = 𝑆𝑤−1 (𝜇1 − 𝜇2 ) = [−2.20 − 0.94]𝑇 4 4. Exercices You are given an image (8 bit/pixel) containing a set of 2D points belonging to two classes (red and blue). The palette of the image is modified, such that on position 1 we have the color red and on position 2 we have the color blue. Apply the LDA algorithm (Fisher solution) to find the linear projection vector that maximizes the separability of the scalars (projected features). Draw the line that is defined by the projection vector components. Remarks & details (implementation in DIBLook): - If the computed projection vector is w*=[wx1, wx2]T where x1 and x2 are the coordinates of the 2D points then the equation of the line that defines this projection in the feature space and passes through the center of the image is: x2=(wx2/wx1)*(x1-dwWidth/2)+ dwHeight/2. The value dwWidth is the width and dwHeight is the height of the image containing the set of 2D points. - Consider that feature x1 is on the horizontal axis and feature x2 on the vertical axis. 5. Grading • • • • • • • • • Open the image, find and store the points that belong to the red class and to the blue class. – 1p Compute the mean vectors for each class. – 1p Compute the scatter matrices S1 and S2. – 1p Compute SW the within-class scatter matrix. – 1p Find the determinant of the within-class scatter matrix. – 1p Find the inverse of the within-class scatter matrix. – 1p Compute the solution w*. – 2p Draw the line corresponding to w*. – 1p Granted points: - 1p 6. References [1] Ricardo Gutierrez-Osuna, Introduction to Pattern Analysis, Texas A&M University. [2] Kardi Teknomo, Linear Discriminant Analysis – tutorial, http://people.revoledu.com/kardi/tutorial/LDA/LDA.html 5
© Copyright 2024 ExpyDoc