Pattern recognition systems – Lab 4

Pattern recognition systems – Lab 10
Linear Discriminant Analysis
1. Objectives
In this lab session we will study the linear discriminant analysis method for
classifying patterns. The method will be implemented in the case when the feature
space is two-dimensional and two classes of patterns exist. The patterns are
considered to be a set of 2D points and the features are their horizontal and vertical
coordinates. It is also known that these points are divided in two classes and we know
for each point to what class it belongs. The objective is to classify them using the
linear discriminant analysis method.
2. Theoretical Background
The purpose of Discriminant Analysis is to classify objects (people, customers, things,
etc.) into one of two or more groups based on a set of features that describe the
objects (e.g. gender, age, income, weight, preference score, etc. ). In general, we
assign an object to one of a number of predetermined groups based on observations
made on the object.
If we can assume that the groups are linearly separable, we can use linear discriminant
model (LDA). Linearly separable suggests that the groups can be separated by a linear
combination of features that describe the objects. If only two features, the separators
between objects group will become lines. If the number of features is three, the
separator is a plane and if the number of features (i.e. independent variables) is greater
than 3, the separators become a hyper-plane.
The objective of LDA is to perform dimensionality reduction while preserving as
much of the class discriminatory information as possible. Assume we have a set of Ddimensional samples {x(1, x(2, …, x(N}, N1 of which belong to class ω1, and N2 to class
ω2. We seek to obtain a scalar y by projecting the samples x onto a line:
y = w Tx
Of all the possible lines we would like to select the one that maximizes the
separability of the scalars. This is illustrated for the two-dimensional case in the
following figures:
1
In order to find a good projection vector, we need to define a measure of separation
between the projections. The mean vector of each class in x and y feature space is:
We could then choose the distance between the projected means as our objective
function:
However, the distance between the projected means is not a very good measure since
it does not take into account the standard deviation within the classes. The next figure
shows these issue:
The solution proposed by Fisher is to maximize a function that represents the
difference between the means, normalized by a measure of the within-class scatter.
For each class we define the scatter, an equivalent of the variance, as
where the quantity
is called the within-class scatter of the projected
examples.
The Fisher linear discriminant is defined as the linear function wTx that maximizes
the criterion function:
Therefore, we will be looking for a projection where examples from the same class
are projected very close to each other and, at the same time, the projected means are
as farther apart as possible:
In order to find the optimum projection w*, we need to express J(w) as an explicit
function of w. We define a measure of the scatter in multivariate feature space x,
which are scatter matrices:
2
Si =
1
Ni
∑ω ( x − µ )( x − µ )
x∈
i
T
i
i
S1 + S 2 = SW
where SW is called the within-class scatter matrix.
The scatter of the projection y can then be expressed as a function of the scatter
matrix in feature space x:
Similarly, the difference between the projected means can be expressed in terms of
the means in the original feature space:
The matrix SB is called the between-class scatter. Note that, since SB is the outer
product of two vectors, its rank is at most one.
We can finally express the Fisher criterion in terms of SW and SB as
To find the maximum of J(w) we derive and equate to zero:
Dividing by wTSWw:
Solving the generalized eigenvalue problem (SW-1SBw=Jw) yields:
Limitations of LDA:
- LDA produces at most C-1 feature projections (C is the number of classes)
- If the classification error estimates establish that more features are needed,
some other method must be employed to provide those additional features;
- LDA is a parametric method since it assumes unimodal Gaussian likelihoods
3
- If the distributions are significantly non-Gaussian, the LDA projections will
not be able to preserve any complex structure of the data, which may be
needed for classification
- LDA will fail when the discriminatory information is not in the mean but
rather in the variance of the data
3. Numerical example
Compute the Linear Discriminant projection for the following two-dimensional
dataset:
X1=(x1,x2)={(4,1),(2,4),(2,3),(3,6),(4,4)}
X2=(x1,x2)={(9,10),(6,8),(9,5),(8,7),(10,8)}
The class statistics are:
The between- and within-class scatter are:
The LDA projection is then obtained directly by: 𝑤 ∗ = 𝑆𝑤−1 (𝜇1 − 𝜇2 ) =
[−2.20 − 0.94]𝑇
4
4. Exercices
You are given an image (8 bit/pixel) containing a set of 2D points belonging to two
classes (red and blue). The palette of the image is modified, such that on position 1 we
have the color red and on position 2 we have the color blue.
Apply the LDA algorithm (Fisher solution) to find the linear projection vector that
maximizes the separability of the scalars (projected features). Draw the line that is
defined by the projection vector components.
Remarks & details (implementation in DIBLook):
- If the computed projection vector is w*=[wx1, wx2]T where x1 and x2 are the
coordinates of the 2D points then the equation of the line that defines this projection
in the feature space and passes through the center of the image is:
x2=(wx2/wx1)*(x1-dwWidth/2)+ dwHeight/2.
The value dwWidth is the width and dwHeight is the height of the image containing
the set of 2D points.
- Consider that feature x1 is on the horizontal axis and feature x2 on the vertical axis.
5. Grading
•
•
•
•
•
•
•
•
•
Open the image, find and store the points that belong to the red class and to
the blue class. – 1p
Compute the mean vectors for each class. – 1p
Compute the scatter matrices S1 and S2. – 1p
Compute SW the within-class scatter matrix. – 1p
Find the determinant of the within-class scatter matrix. – 1p
Find the inverse of the within-class scatter matrix. – 1p
Compute the solution w*. – 2p
Draw the line corresponding to w*. – 1p
Granted points: - 1p
6. References
[1] Ricardo Gutierrez-Osuna, Introduction to Pattern Analysis, Texas A&M
University.
[2] Kardi Teknomo, Linear Discriminant Analysis – tutorial,
http://people.revoledu.com/kardi/tutorial/LDA/LDA.html
5