Master mention Informatique Spécialité imagerie IMA UE RFIA Cours 3 Resp.: Matthieu Cord —LIP6 / UPMC Paris 6 Emploi du temps CHANGEMENTS DE SALLE : 08/10 barre 23/24 salle 205 TP le 29/10 bât. 31- salle 212 2 This week : Image representation 1. Introduction to Bag of Words 2. Dictionary computation 3. Coding of local descriptors 4. Image signature computation: pooling 5. Whole recognition pipeline 6. Beyond BoW representation 3 Bag of Words repsentation • Model to represent images for categorization: « Bag of Words » • BoW model computed from BoF (Bag of features) : • Rq: local signatures not scalable representation Bag of Words (BoW) model: basic explication with textual representation and color indexing Of all the sensory impressions proceeding to the brain, the visual experiences are the nerve, image dominant ones. Our Hubel, Wiesel perception of the world around us is based essentially on the messages that reach the brain from our eyes. For a long time it was thought that China is forecasting a trade surplus of $90bn (£51bn) to $100bn this year, a threefold increase trade,The onChina, 2004's $32bn. Commerce Ministry said surplus the surplus would be created by a predicted 30% jump to rise further in value. Comparing 2 docs using visual/color/word occurrences Slide credit L. Fei-Fei Bag of Visual Words (BoW) Model Image (features) Bag of Visual Words (BoW) (features) BoW : histogram on visual dictionary • Which dictionary ? • How to compute the histogram? Plan 3 steps for BoWs 1. Introduction to Bag of Words 2. Dictionary computation 3. Coding of local descriptors 4. Image signature computation: pooling 5. Whole recognition pipeline 6. Beyond BoW representation 8 Step 1 : dico computation 1. Extraction of local features (pattern/visual words) in images • Training dataset in classification • Image dataset in retrieval 2. Clustering of feature space Extraction Clustering Step 1 : dico computation • Many algorithms for clustering : • • • • K-Means Vectorial Quantization Gaussian Mixture Models … Clustering with k clusters Input: set of n points (xj)n in Rd Goal: find k points that give a approximation of the n input points (k<<n), ie. minimizing mean square error At k fixed, complexity is O(n(kd+1)logn) A lot of strategies to approximate the global optimization problem k-means Algorithm: Init centers cj by sampling k points wk in Rd 1. (Re)assign each point xi to the cluster j with the center wj so that dist(xi,wj) is less than dist from xi to any other clusters 2. Move wj inside each cluster as the new barycenter from all the points assigned to the cluster j 3. Go to step 2 if some points changed clusters during the last iteration Output: the set of the final k cluster centers {cj = wj} 11 K-means : why it is successful ? © L. Botou 12 Clustering • K-means : • Pros • Simplicity • Convergence (local min) • Cons • • • • • • Memory-intensive Depending on K Sensitive to initialization Sensitive to artifacts Limited to spherical clusters Concentration of clusters to areas with high densities of points (Alternatives : radial based methods) • K-Means deeply used in practise Clustering • Uniform / K-means / radius-based : • Radius-based clustering assigns all features within a fixed radius of similarity r to one cluster. Visual words Extraction Clustering Formation du dico Centers = dico. Visual words Dico examples Plan 1. Introduction to Bag of Words 2. Dictionary computation 3. Coding of local descriptors 4. Image signature computation: pooling 5. Whole recognition pipeline 6. Beyond BoW representation 16 Step 2 : BoW image signature • For each image: • For each local feature: find the closest visual word • Increase the corresponding bin in histogram • Image signature (global Index): • Vector (histogram) • V dimension = dico size • Each term represents a Likelihood to get this visual word Projection to dictionary § Original BoW strategy: hard assignement/coding § Find the closest cluster for each feature § Assign a fix weight (e.g. 1) Notations: n o {1; N } • Image data : • Centers : C = {Cm }, m {1; M } • Coding : f : IRd ⇥ IRM xj ⇥ f (xj ) = j = { m,j } , m ⇤ {1; M } X = xj IRd , j Hard coding: f = fQ assigns a constant weight to its closest center: 8 <1 if m = argmin kxj ck k2 k {1;M } fQ (xj )[m] = :0 otherwise 19 20 Plan 1. Introduction to Bag of Words 2. Dictionary computation 3. Coding of local descriptors 4. Image signature computation: pooling 5. Whole recognition pipeline 6. Beyond BoW representation 21 Aggregating projections => global image index § Global Index: image likelihood to get each visual word § Several strategies to aggregate the projections: pooling g : IRN m ={ m,j } , j ⇤ {1; N } ⇥ ⇥ IR g( m) = zm 23 Aggregating projections => global image index § Sum pooling : initial BoW strategy (just counting occurrences of words in the document) Classical BoW = hard coding + sum pooling 1. Find the closest cluster for each feature 2. Assign a fix weight (e.g. 1) to this cluster Aggregating projections => global image index § BoW Sum pooling: zm = g( m) = N X m,j = j=1 zm = N X fQ (xj )[m] j=1 8 N <1 if m = argmin kx X j k {1;M } :0 otherwise j=1 2 ck k Aggregating projections => global image index Alternative: § Max pooling : keep the max value for the projection for each visual word § Relevant for sparse / soft coding: limit noise effect § (Partially) Justify by bio-inspired models (cortex) zm = g( m) = max j=1..N m,j Plan 1. Introduction to Bag of Words 2. Dictionary computation 3. Coding of local descriptors 4. Image signature computation: pooling 5. Whole recognition pipeline 6. Beyond BoW representation 27 Representation 2. 1. feature detection & representation codewords dictionary image representation 3. … © Fei-Fei, Fergus, Torralba Learning and Recognition 4. category models (and/or) classifiers Generative method: - graphical models Discriminative method: - SVM Representation Learning and Recognition 2. 1. 3. 4. feature detection & representation codewords dictionary … category models (and/or) classifiers © Fei-Fei, Fergus, Torralba category decision Plan 1. Introduction to Bag of Words 2. Dictionary computation 3. Coding of local descriptors 4. Image signature computation: pooling 5. Whole recognition pipeline 6. Beyond BoW representation 31 TD on Visual Word Ambiguity article Généralités : • • • • Quel sont les thèmes étudiés dans l’article ? Originalité : Quelles Contributions ? Expliquer : visual word ambiguity et visual word plausibility Lister les méthodes proposées Partie 3 : modèles de représentations • • • • • • • • • • • Notations : indiquer ce que représentent CB(), w, V, D, ri. A quelle représentation d’image correspond l’équation (1) ? Quelle est la taille du vecteur représentant l’image ? Expliquer le rôle de K, quelle propriété est exigée pour K ? Est-ce une similarité ou une distance ? Quel type de noyau (K) est choisi dans l’article ? Quels sont les features locaux choisis et avec quelle distance ? Est ce que sigma est un paramètre important ? Pourquoi ? Comment est-il fixé ? Pourquoi parle-t-on de soft assignment ? Expliquer les différences entre KCB(), UNC() et PLA(), commenter les figures 2 et 3 Pourquoi UNC() ne permet-il pas de modéliser la plausibilité ? Pourrait-on considérer des sigma différents pour chaque cluster ? Quelle conséquence cela aurait ? Comment les fixer dans ce cas ? Est ce que la taille du vocabulaire est importante ? Quels liens les auteurs font-ils entre la dimension de l’espace des features et le niveau d’ambiguité ? Commenter. Partie 4 : expérimentations : • • Comment sont organisées les bases et les protocoles d’évaluation ? Y a t il des comparaisons (fig 6) : • • • Par rapport à des méthodes standards ? Entre les différentes méthodes ? Quelle(s) conclusion(s) ? 32 Projection =>dictionary § Second BoW strategy: soft assignement § Kernel codebook : absolute weight § Uncertainty: relative weight § Plausibility: absolute weight to 1-nn Visual Word Ambiguity J.C. van Gemert, C.J. Veenman, A.W.M. Smeulders, J.M. Geusebroek PAMI 2010 Soft Coding :kernel fKernel (xj )[m] = K(d(xj , cm )) 34 Soft Coding : uncertainty K(d(xj , cm )) fU nc (xj )[m] = PM k=1 K(d(xj , ck )) 35 Soft Coding : plausibility fP lau (xj )[m] = 8 <K(d(xj , cm )) :0 if m = argmin kxj k {1;M } ck k2 otherwise 36 Projection =>dictionary § Soft vs hard assignement/coding § Not a so big gain soft / hard § Uncertainty certainly the best strategy § Semi-soft : excellent tradeoff Projection =>dictionary § Other approach: sparse coding § Approximation of each local feature as a lin. combinaison of a subset of words from the dictionary: xi ~ D αi § αi weight vectors, D dictionary (matrix of vectors of C) § Each xi should be represented using only a small nb of visual words => sparsity Projection =>dictionary § Sparse coding vs VQ (hard assignement) § SC : Sparse Coding : most of αi=0 § LLC : Local Linear Coding : words representing the feature must be close (locality) § Are these criteria minimizing reconstruction error relevant for image classification purpose? Aggregating projections => global image index § Next ? § Avoiding clustering? § Kernel similarity on bag of local features: ⇒ Pyramid match kernel [GRAUMAN 05] • Exploiting spatial image information ⇒ Spatial Pyramid Matching [LAZEBNIK 06] • Better represent coding/pooling association: work on the whole matrix of clusters/descriptors dependency => combine spatial pooling and sparse coding • Work on new descriptors (bio inspired, learned) • Train the dictionaries (supervised training) Work on local descriptors Work on dico Work on coding/pooling 41
© Copyright 2024 ExpyDoc