Pattern Recognition MM7 n Feature evaluation n n What to consider when choosing features Is a feature robust? n n n How many samples do we need to represent a feature (mean and covariance)? Is the feature normal distributed? Break n Dimensionality reduction of the feature space Number of samples n How many samples do we need to describe a feature for a class? n 1) Scientific Table n 2) Variance analysis Number of samples: Scientific Table n n n n n Distribution-free tolerance limits Which number of samples, N, is required in order to ensure that βp% of the population is within the min. and max. values, with a confidence of βt % ? K. Diem and C. Leitner. Scientific Tables. Ciba-Cjeigy Ltd. 1975 Typically: 95% for both => 93 samples Often samples are normally distributed => fewer samples are required Number of samples: Variance analysis n n n Plot the variance as a function of N Choose N so the variance is stable Do this for each feature in each class Is a feature normally distributed? n n If the features defining a class are normally distributed then Bayes’ classifier is reduced to Mahalanobis distance In praxis it is often assumed that all features are normally distributed, but how do we test this? 1) Histogram inspection 2) Goodness of fit Is a feature normally distributed? n 1) Histogram inspection n Matlab: normplot Is a feature normally distributed? 2) (”Goodness of fit” or χ2-test (chi)) n n n Idea: Compare data with a perfect normal distribution Algorithm: a) Divide in k intervals (k as small as possible) - Choose k: so fi > 1 for all i and fi > 5 for 80% of the k - Choose k: so each interval approx. has the same probability b) Compare the measured data with the expected data - Error measure: T - T is χ2 distributed c) If T < THα => normal distributed with significant level α (see stat. table) What to remember ? n Feature evaluation n n Robustness (invariant wrt. the application) Number of samples n n n Scientific Table Variance analysis Normally distributed (Bayes’ rule) n n Histogram inspection (qualitative analysis) Goodness of fit (statistical analysis) Break Method for reducing the dimensionality of the feature-space Reduce the number of features n Why? n ”The curse of dimensionality” n n n n How? n n n Visualization Remove ”noise” (10 dependent features + 1 independent) Faster processing If features are correlated => redundancy Remove redundancy Methods n n Hierarchical dimensionality reduction Principal Component Analysis (PCA) Methods n n n Unsupervised Ignore that samples come from different classes Reduce the dimensionality (compression) Hierarchical dimensionality reduction n n Correlation matrix Algorithm: 1) 2) 3) 4) 5) 6) Calc. the correlation matrix Find max Ckl k ≠ l Merge feature Fk and Fl Save merged feature as Fk Delete Fl Stop or go to 1) n Stop criterion: n n n n Max Ckl is too small Number of dim. Ok Others… Merge features n n n n Keep Fk and delete Fl (Fk + Fl) / 2 (w1Fk + w2Fl) / 2 Others… Principal Component Analysis (PCA) Principal Component Analysis (PCA) n n n n n n Combine features into new features! and then ignore some of the new features PCA is used a lot, especially when you have many dimensions Basic idea: Features with a large variance separates the classes better If both features have large variances – then what? Transform the feature-space, so we get large variances and no correlation! Variance = Information ! PCA – Transform x2 y1 y2 y2 x1 n n Ignore y2 without loosing info when classifying y1 and y2 are the principal components y1 PCA – How to 1. Collect data (x) 2. Calc. the covariance matrix: Cx n Matlab: Cx = cov(x); 3. Solve the Eigen-value problem => A and Cy n Matlab: [Evec,Eval] = eig(Cx); 4. Transform: x => y: y = A (x-µ) 5. Analyze (PCA) 1. 2. M-method J-measure What to remember Feature reduction n n n Unsupervised Hierarchical dimensionality reduction n n n Correlation matrix Merge features or delete features Principal Component Analysis (PCA) n n n n n Combine features into new features Ignore some of the new features VARIANCE = INFORMATION Transform the feature-space from the Eigen-vectors of the covariance matrix => uncorrelated features ! Analyze n n M-method J-measure X-tra slides Is a feature normally distributed? n 2) Skewness and Kurtosis n n One feature and one class A distribution’s i’th moment (mi) can be expressed as: 1 mi = N N ∑ (x j − µ) i j =1 N : Number of samples x j : Sample number j µ : Mean value for the feature Is a feature normally distributed? n 2) Skewness and Kurtosis The methods are not used so much any more, but can be seen in older reports/ papers n BUT they do describe general aspects for a distribution AND can be used as features ! n AAU laver milliardaftale med GE Healthcare (12. okt 2005) Aalborg Universitet (AAU) har indgået en licens- og produktionsaftale med GE Healthcare, som vil generere indtægter mellem 0,5 og 1 mia. kr. Licensen drejer sig om en ny opfindelse, der gør det nemmere at opdage hjertesygdommen Long QT-syndrom, der hvert år rammer millioner af mennesker på verdensplan. Det er en gruppe studerende fra Institut for Sundhedsteknologi, der har udviklet måleapparatet, og instituttet vil modtage en tredjedel af pengene fra aftalen. AAU modtager en anden tredjedel, mens de tre studerende og en række lærere deler den sidste tredjedel af beløbet. Millionaftale til Aalborg Universitet (21. okt 2005) Tre nyuddannede ingeniører fra Aalborg Universitets sundhedsteknologiske uddannelse har patenteret en metode til at diagnosticere en farlig hjertesygdom. En af verdens største leverandører af hospitalsudstyr, General Healthcare, har underskrevet en millionaftale om at benytte sig af teknologien. Videnskabsminister, Helge Sander kalder aftalen for den største nogensinde mellem et universitet og et privat firma, skriver Ingeniøren. Methods where we DO use the class information n n n Supervised Use info. of classes and reduce the dimensionality Methods: n n SEPCOR Linear Discriminant Methods SEPCOR n n n n n Inspired from: Hierarchical dimensionality reduction Method to choose the X best (most discriminative) features Idea: combine Hierarchical dimensionality reduction with class info. SEPCOR = separability + correlation Principle: n Calc. a measure for how good (discriminative) each feature, xi, is wrt. classification: n n Variability measure: V(xi) Keep the most discriminative features, which have a low correlation with the other features SEPCOR – Variability measure V(xi) = n The variance of the mean values on xi The mean value of the variances on xi V(xi) large => good feature wrt. classification n That is: large nominator and small denominator x2 V >> 1 V~1 x1 V(x1) < V(x2) => x2 best SEPCOR – The algorithm 1. 2. Make a list with features ordered after V-value Repeat until we have the desired number of features or the list is empty 1. 2. 3. Remove and store the feature with largest V-value Find the correlation between the removed feature and all the other features in the list Ignore all features with correlation bigger than MAXCOR Linear Discriminant Methods Linear Discriminant Methods n n n Transform data to the new feature space Linear transform (rotation): y=Ax The transform is defined so that classification becomes as easy as possible => n n Fisher Linear Discriminant method n n Info = discriminative power Map data to one dimension Multiple Discriminant analysis n Map data to a M-dimensional space Fisher Linear Discriminant n Idea: Map data to a line, y n n n The orientation of the line is defined so that the classes are as separated as possible Transform: y = wTx, w is the direction of the line, y PCA: w is defined as the 1st eigen-vector for the covariance matrix (vis prob.) Fisher Linear Discriminant • Example: 4 classes in 2D PCA: Fisher: Transformation y = wTx Find w Fisher Linear Discriminant n n Transform: y = wTx Find w so that the following criterion function is maximum The variance of the means J (w ) = . For two classes => The mean of the variances 2 ~ ~ m1 − m2 J (w ) = ~ 2 ~ 2 , Line = arg max J ( w ) s1 + s2 w ~ is the mean of the i´th class, D , mapped onto w m i i ~ si 2 is the variance of the i´th class, Di , mapped onto w Multiple Discriminant Analysis n Generalized Fisher Linear Discriminant method n N classes n Mapped into a M-dimensional space (M < N) n Fx 3 points will span a plan • Example with 3 classes in 3D mapped into two different sub-spaces What to remember n Feature reduction where we use the class info. n n Discriminative power = information SEPCOR (ignore some of the features) n n Hierarchical dimensionality reduction (correlation) + Variability measure: n n The variance of the means / the mean of the variances Linear Discriminant Methods (make new features and ignore some) n Fisher Linear Discriminant (map onto a line) n n n Transform: y = wTx, w is the direction of the line, y Variability measure Multiple Discriminant Analysis n Generalized Fisher Linear Discriminant method n N classes n Map data into a M-dimensional space (M < N)
© Copyright 2024 ExpyDoc