Stable Feature Selection via Dense Feature Groups Lei Yu, Chris Ding, Steven Loscalzo Knowledge Discovery and Data mining 2008 Coffeetalk, 20 Mar 2014 donderdag 20 maart 2014 Stable feature selection features 10 20 30 40 50 samples 60 70 80 90 100 200 300 400 500 • Often: MANY features, but correlated • Needs: interpretation. Which GROUPS of features are important? 2 donderdag 20 maart 2014 Stable feature selection features 10 20 30 40 50 samples 60 70 80 90 100 200 300 400 500 • Often: MANY features, but correlated • Needs: interpretation. Which GROUPS of features are important? 2 donderdag 20 maart 2014 Representation of features • Each feature is an object • Cluster the points (= objects = features) feature 17 sample 2 sample 1 3 donderdag 20 maart 2014 defined by the original samples. Algorithm 1, DG algorithms are Group Finder) was introduced in [26] as a mean as SVM-RFE. Mean Shift dense feature groups from data. DGF works by of the selected kernel density estimation [24] and the iterative up-based SVMy1 = x17[4] on each of the features in the sam • Start The resultsatofa pointprocedure When the mean shift process converges, nearby fe he same as group• Weigh your neighbors with gathered into feature groups and returned by the in the figures. a (Gaussian) kernel, with The main part of DGF is the iterative mean s cy on width the train� � h ydure for j −x i all features, which locates a density peak b the truly relexi K theh mean at a given feature xi and using other elected features the local neighborhood (determined by a kernel uces from 1000 h) to shift the mean to a denser location. Specifi rop •sharply. In Update your position, until uch less depenyj −xi n convergence x K( ) i i=1 h mple size is over yj+1 = j = 1, 2, ... y −x n i j • Storeoutcluster center onsistently ) i=1 K( h est •that intrinRepeat for all objects is used to determine the sequence of successive mple variations of the kernel K. The algorithm has a time com ng relevant fea2 O(λn m), where λ is the number of iterations for s more effective 4 shift procedure to converge. Details of the algorith choice of kernel function K and bandwidth are giv donderdag 20 maart 2014 defined by the original samples. Algorithm 1, DG algorithms are Group Finder) was introduced in [26] as a mean as SVM-RFE. Mean Shift dense feature groups from data. DGF works by of the selected kernel density estimation [24] and the iterative up-based SVMy1 = x17[4] on each of the features in the sam • Start The resultsatofa pointprocedure When the mean shift process converges, nearby fe he same as group• Weigh your neighbors with gathered into feature groups and returned by the in the figures. a (Gaussian) kernel, with The main part of DGF is the iterative mean s cy on width the train� � h ydure for j −x i all features, which locates a density peak b the truly relexi K theh mean at a given feature xi and using other elected features the local neighborhood (determined by a kernel uces from 1000 h) to shift the mean to a denser location. Specifi rop •sharply. In Update your position, until uch less depenyj −xi n convergence x K( ) i i=1 h mple size is over yj+1 = j = 1, 2, ... y −x n i j • Storeoutcluster center onsistently ) i=1 K( h est •that intrinRepeat for all objects is used to determine the sequence of successive mple variations of the kernel K. The algorithm has a time com ng relevant fea2 O(λn m), where λ is the number of iterations for s more effective 4 shift procedure to converge. Details of the algorith choice of kernel function K and bandwidth are giv donderdag 20 maart 2014 defined by the original samples. Algorithm 1, DG algorithms are Group Finder) was introduced in [26] as a mean as SVM-RFE. Mean Shift dense feature groups from data. DGF works by of the selected kernel density estimation [24] and the iterative up-based SVMy1 = x17[4] on each of the features in the sam • Start The resultsatofa pointprocedure When the mean shift process converges, nearby fe he same as group• Weigh your neighbors with gathered into feature groups G2 and returned in the figures. G1 by the a (Gaussian) kernel, with The main part of DGF is the iterative mean s cy on width the train� � h ydure for j −x i all features, which locates a density peak b the truly relexi K G 3 the mean at a given feature x and using other i h elected features the local neighborhood (determined by a kernel uces from 1000 h) to shift the mean to a denser location. Specifi rop •sharply. In Update your position, until uch less depenyj −xi n convergence x K( ) i i=1 h mple size is over yj+1 = j = 1, 2, ... y −x n i j • Storeoutcluster center onsistently ) i=1 K( h est •that intrinRepeat for all objects is used to determine the sequence of successive mple variations of the kernel K. The algorithm has a time com ng relevant fea2 O(λn m), where λ is the number of iterations for s more effective 4 shift procedure to converge. Details of the algorith choice of kernel function K and bandwidth are giv donderdag 20 maart 2014 Dense Group Finder samples 10 20 30 40 50 60 70 80 90 100 • Consider the features as objects xi • Choose a characteristic distance h • Find the modes of the density features (mean shift procedure) yj • Group the features belonging to one mode Gk 200 • Optionally: remove all features �xi − yj � > h 300 5 4 donderdag 20 maart 2014 Dense group finder • After mean shift: found feature groups Gi • Do group feature selection: Dense Relevant Attribute Group Selector (only find the groups, based on stability; no feature combiner) • Do classification: only use ONE feature from each group samples G1 features G3 G2 6 donderdag 20 maart 2014 Results Table 2: Average Accuracies (%, with Standard Deviation ±) Produced by DRAGS, Kmeans, and RFE k Data Colon SVM 1NN Leuk. SVM 1NN Lung SVM 1NN Pro. SVM 1NN Lym. SVM 1NN SRB. SVM donderdag 20 maart 2014 Method DRAGS RFE Kmeans DRAGS RFE Kmeans DRAGS RFE Kmeans DRAGS RFE Kmeans DRAGS RFE Kmeans DRAGS RFE Kmeans DRAGS RFE Kmeans DRAGS RFE Kmeans DRAGS RFE Kmeans DRAGS RFE Kmeans DRAGS RFE Kmeans 4 80.5±9.6 64.8±3.3 77.6±10.7 74.1±10.7 59.9±6.2 76.1±9.0 92.8±3.8 78.2±5 89.4±5.8 93.5±4 77.7±4.5 90.2±5.4 98.5±1.9 90.3±1.6 94.9±3.2 98.4±1.4 91.9±3.1 95.4±3.9 86.6±4.6 71.1±3.6 83.1±6.8 84.2±6.6 64.3±4.8 76.8±6.9 82.4±8.5 87.5±3.7 87.9±9.5 91.3±6 95.2±3.8 93.3±7.2 86.3±10.2 56.8±10.5 61.4±15.3 6 82.4±9.6 65.5±4.1 79.3±8.7 77±8.9 64±5.8 76.1±7.1 92.2±3.6 85.5±3.7 91.8±4.3 92.1±4.1 83.1±4.5 90.6±6.2 99±1.3 93.2±1.6 96.0±2.8 99±1.3 93.6±2.2 95.6±3.0 88.6±4.9 74.7±5 85.1±7.3 86.2±4.7 70.7±4.6 76.8±7.6 86±11.3 95.8±4 91.9±9.4 96.3±4.5 97.7±2.9 95.0±5.0 94.1±5.3 70.8±9.9 72.7±16.1 8 83.5±9.5 68.9±5.1 81.7±7.7 75.1±8.1 67.2±3.4 75.4±7.9 92.9±4 87.3±5.5 92.9±4.1 90.6±4.8 86.2±5 91.3±5.7 98.9±1.3 95.7±1.3 97.1±2.5 98.7±1.2 95±1.7 96.3±2.6 90.5±5.5 78.7±3.8 85.3±7.2 87.2±5.6 74.7±3.2 78.3±7.0 94.3±10.7 97.3±3.1 94.4±7.6 99±1.9 97.3±3.7 95.0±6.6 96.5±3.9 81.3±8.8 79.0±15.6 10 83.8±9.2 68.3±5.6 82.7±7.0 74.4±9.3 66.9±5.9 76.3±6.6 93.2±4.2 89.9±4.9 93.6±4.3 90.8±4.3 86.6±5.1 93.3±4.9 98.7±1.8 96.4±1.7 97.4±2.5 98.9±1.3 95.2±2 96.8±2.5 89.8±5.9 79.7±3.1 85.7±7.0 86.5±6.4 77.4±4.2 80±6.8 94.8±10.3 98±3.1 94.4±7.7 98.9±2.4 98±3.1 95.5±4.6 97±3.6 89.5±6.5 85.4±11.3 20 84.9±6.9 76.1±5.3 84.2±6.3 74.6±7.4 73±4.6 79.6±9.0 95±3.5 95.3±2.9 95.6±4.0 92.4±3.6 91.6±3 93.8±4.3 98.8±1.6 98.8±0.7 97.6±2.6 98.6±1.7 97.7±1.1 97.1±2.2 89.1±6.4 85.4±4.3 88.7±6.0 85.1±6.1 79.8±4.3 83.1±6.0 99.2±1.8 99±2.4 97.3±3.6 99.4±1.6 98.8±2.5 97.3±3.6 99.5±1.9 95.1±3.6 90.7±6.3 30 86.3±6.8 78.6±5.1 84.2±5.8 75.2±7.8 74.8±5.8 78.2±9.1 96.2±3 95.5±2.7 95.4±3.5 92.4±5 93.2±3.5 94.1±4.1 98.9±1.4 99.4±0.6 97.8±2.1 98.7±1.7 97.6±0.9 97.6±2.1 89.9±5.4 87.4±2.7 88.0±5.9 82.5±6.6 83.2±5.1 81.4±6.2 98.3±3.9 99.3±1.7 97.6±3.7 99±2.9 98.8±2.2 97.3±3.8 99.2±1.8 97±3.4 93.0±5.5 40 87.3±4.9 80.2±6.7 81.9±8.2 77.3±7.5 75.9±5.1 78.1±11.7 96.7±3.4 96.9±0.9 95.5±3.6 94.3±3.5 94.9±1.9 93.7±3.7 99±1.3 99.5±0.3 98.3±1.8 98.6±1.9 98.4±0.5 97.6±2.0 91.3±4.6 89.4±2.5 87.4±5.4 82.7±6.7 83.9±3.6 83.6±5.3 97.5±4.8 99.5±1.5 98.2±2.9 97.9±4.1 99.2±1.9 98.1±3.2 99.2±1.8 97.3±3.7 93.9±5.9 50 87.1±5.6 81.9±7.6 82.2±7.8 77.9±7.3 80.3±4.9 75.5±8.7 97.1±3.5 97.5±0.9 97.3±3.3 95.6±4.5 94.9±2.9 95.1±4.2 99.1±1.1 99.4±0.5 98.4±2.0 98.7±2 98.4±0.8 97.9±1.9 91.7±3.9 90.2±2.4 88.1±5.2 83.8±5.6 84.6±2.8 82.9±6.5 98.6±4.5 99.5±1.5 98.4±2.8 98.6±3.8 99±2 98.5±2.5 99.4±1.6 7 97.9±3.2 93.1±6.8 Conclusions • Very simple idea (but is it new??) • Focus on the stability of the clustering • Cool thing: the width parameter h is set to the average k-nearest neighbor distance! • Strange thing: why only take ONE feature from each group? (no averaging/PCA/...?) 8 donderdag 20 maart 2014
© Copyright 2024 ExpyDoc