Resp.: Matthieu Cord —LIP6 / UPMC Paris 6

Master mention Informatique
Spécialité imagerie IMA
UE RFIA
Cours 3
Resp.: Matthieu Cord —LIP6 / UPMC Paris 6
Emploi du temps
CHANGEMENTS DE SALLE :
08/10 barre 23/24 salle 205
TP le 29/10 bât. 31- salle 212
2
This week :
Image representation
1.  Introduction to Bag of Words
2.  Dictionary computation
3.  Coding of local descriptors
4.  Image signature computation: pooling
5.  Whole recognition pipeline
6.  Beyond BoW representation
3
Bag of Words repsentation
•  Model to represent images for categorization: « Bag
of Words »
•  BoW model computed from BoF (Bag of features) :
•  Rq: local signatures not scalable representation
Bag of Words (BoW) model: basic
explication with textual
representation and color indexing
Of all the sensory
impressions proceeding to
the brain, the visual
experiences
are the
nerve, image
dominant ones. Our
Hubel, Wiesel
perception
of the world
around us is based
essentially on the
messages that reach the
brain from our eyes. For a
long time it was thought that
China is forecasting a
trade surplus of $90bn
(£51bn) to $100bn this
year, a threefold increase
trade,The
onChina,
2004's $32bn.
Commerce
Ministry said
surplus
the surplus would be
created by a predicted
30% jump to rise further in
value.
Comparing 2 docs using visual/color/word occurrences
Slide credit L. Fei-Fei
Bag of Visual Words (BoW) Model
Image
(features)
Bag of Visual Words (BoW)
(features)
BoW : histogram on
visual dictionary
•  Which dictionary ?
•  How to compute the histogram?
Plan
3 steps
for
BoWs
1.  Introduction to Bag of Words
2.  Dictionary computation
3.  Coding of local descriptors
4.  Image signature computation: pooling
5.  Whole recognition pipeline
6.  Beyond BoW representation
8
Step 1 : dico computation
1.  Extraction of local features (pattern/visual words) in images
•  Training dataset in classification
•  Image dataset in retrieval
2.  Clustering of feature space
Extraction
Clustering
Step 1 : dico computation
•  Many algorithms for clustering :
• 
• 
• 
• 
K-Means
Vectorial Quantization
Gaussian Mixture Models
…
Clustering with k clusters
Input: set of n points (xj)n in Rd
Goal: find k points that give a approximation of the n input points
(k<<n), ie. minimizing mean square error
At k fixed, complexity is O(n(kd+1)logn)
A lot of strategies to approximate the global optimization problem
k-means Algorithm:
Init centers cj by sampling k points wk in Rd
1.  (Re)assign each point xi to the cluster j with the center wj
so that dist(xi,wj) is less than dist from xi to any other
clusters
2.  Move wj inside each cluster as the new barycenter from all
the points assigned to the cluster j
3.  Go to step 2 if some points changed clusters during the last
iteration
Output: the set of the final k cluster centers {cj = wj}
11
K-means : why it is successful ?
© L. Botou
12
Clustering
•  K-means :
•  Pros
•  Simplicity
•  Convergence (local min)
•  Cons
• 
• 
• 
• 
• 
• 
Memory-intensive
Depending on K
Sensitive to initialization
Sensitive to artifacts
Limited to spherical clusters
Concentration of clusters to areas with high densities of
points (Alternatives : radial based methods)
•  K-Means deeply used in practise
Clustering
•  Uniform / K-means / radius-based :
•  Radius-based clustering assigns all features within a fixed
radius of similarity r to one cluster.
Visual words
Extraction
Clustering
Formation du dico
Centers = dico. Visual words
Dico examples
Plan
1.  Introduction to Bag of Words
2.  Dictionary computation
3.  Coding of local descriptors
4.  Image signature computation: pooling
5.  Whole recognition pipeline
6.  Beyond BoW representation
16
Step 2 : BoW image signature
•  For each image:
•  For each local feature: find the closest visual word
•  Increase the corresponding bin in histogram
•  Image signature (global Index):
•  Vector (histogram)
•  V dimension = dico size
•  Each term represents a Likelihood to get this visual word
Projection to dictionary
§  Original BoW strategy: hard assignement/coding
§  Find the closest cluster for each feature
§  Assign a fix weight (e.g. 1)
Notations:
n
o
{1; N }
•  Image data :
•  Centers :
C = {Cm }, m {1; M }
•  Coding :
f : IRd
⇥ IRM
xj
⇥ f (xj ) = j = { m,j } ,
m ⇤ {1; M }
X = xj
IRd , j
Hard coding: f = fQ assigns a constant weight to its closest center:
8
<1 if m = argmin kxj ck k2
k {1;M }
fQ (xj )[m] =
:0 otherwise
19
20
Plan
1.  Introduction to Bag of Words
2.  Dictionary computation
3.  Coding of local descriptors
4.  Image signature computation: pooling
5.  Whole recognition pipeline
6.  Beyond BoW representation
21
Aggregating projections => global image index
§  Global Index: image likelihood to get each visual word
§  Several strategies to aggregate the projections: pooling
g : IRN
m
={
m,j } , j
⇤ {1; N }
⇥
⇥
IR
g(
m)
= zm
23
Aggregating projections => global image index
§  Sum pooling : initial BoW strategy (just counting
occurrences of words in the document)
Classical BoW = hard coding + sum pooling
1.  Find the closest cluster for each feature
2.  Assign a fix weight (e.g. 1) to this cluster
Aggregating projections => global image index
§  BoW Sum pooling:
zm = g(
m)
=
N
X
m,j
=
j=1
zm =
N
X
fQ (xj )[m]
j=1
8
N <1 if m = argmin kx
X
j
k {1;M }
:0 otherwise
j=1
2
ck k
Aggregating projections => global image index
Alternative:
§  Max pooling : keep the max value for the projection for each
visual word
§  Relevant for sparse / soft coding: limit noise effect
§  (Partially) Justify by bio-inspired models (cortex)
zm = g(
m)
= max
j=1..N
m,j
Plan
1.  Introduction to Bag of Words
2.  Dictionary computation
3.  Coding of local descriptors
4.  Image signature computation: pooling
5.  Whole recognition pipeline
6.  Beyond BoW representation
27
Representation
2.
1.
feature detection
& representation
codewords dictionary
image representation
3.
…
© Fei-Fei, Fergus, Torralba
Learning and Recognition
4.
category models
(and/or) classifiers
Generative method:
- graphical models
Discriminative method:
- SVM
Representation
Learning and Recognition
2.
1.
3.
4.
feature detection
& representation
codewords dictionary
…
category models
(and/or) classifiers
© Fei-Fei, Fergus, Torralba
category
decision
Plan
1.  Introduction to Bag of Words
2.  Dictionary computation
3.  Coding of local descriptors
4.  Image signature computation: pooling
5.  Whole recognition pipeline
6.  Beyond BoW representation
31
TD on Visual Word Ambiguity article
Généralités :
• 
• 
• 
• 
Quel sont les thèmes étudiés dans l’article ?
Originalité : Quelles Contributions ?
Expliquer : visual word ambiguity et visual word plausibility
Lister les méthodes proposées
Partie 3 : modèles de représentations
• 
• 
• 
• 
• 
• 
• 
• 
• 
• 
• 
Notations : indiquer ce que représentent CB(), w, V, D, ri.
A quelle représentation d’image correspond l’équation (1) ? Quelle est la taille du vecteur représentant
l’image ?
Expliquer le rôle de K, quelle propriété est exigée pour K ? Est-ce une similarité ou une distance ? Quel
type de noyau (K) est choisi dans l’article ?
Quels sont les features locaux choisis et avec quelle distance ?
Est ce que sigma est un paramètre important ? Pourquoi ? Comment est-il fixé ?
Pourquoi parle-t-on de soft assignment ?
Expliquer les différences entre KCB(), UNC() et PLA(), commenter les figures 2 et 3
Pourquoi UNC() ne permet-il pas de modéliser la plausibilité ?
Pourrait-on considérer des sigma différents pour chaque cluster ? Quelle conséquence cela aurait ?
Comment les fixer dans ce cas ?
Est ce que la taille du vocabulaire est importante ?
Quels liens les auteurs font-ils entre la dimension de l’espace des features et le niveau d’ambiguité ?
Commenter.
Partie 4 : expérimentations :
• 
• 
Comment sont organisées les bases et les protocoles d’évaluation ?
Y a t il des comparaisons (fig 6) :
• 
• 
• 
Par rapport à des méthodes standards ?
Entre les différentes méthodes ?
Quelle(s) conclusion(s) ?
32
Projection =>dictionary
§  Second BoW strategy: soft assignement
§  Kernel codebook : absolute weight
§  Uncertainty: relative weight
§  Plausibility: absolute weight to 1-nn
Visual Word Ambiguity
J.C. van Gemert, C.J. Veenman, A.W.M.
Smeulders, J.M. Geusebroek
PAMI 2010
Soft Coding :kernel
fKernel (xj )[m] = K(d(xj , cm ))
34
Soft Coding : uncertainty
K(d(xj , cm ))
fU nc (xj )[m] = PM
k=1
K(d(xj , ck ))
35
Soft Coding : plausibility
fP lau (xj )[m] =
8
<K(d(xj , cm ))
:0
if m = argmin kxj
k {1;M }
ck k2
otherwise
36
Projection =>dictionary
§  Soft vs hard assignement/coding
§  Not a so big gain soft / hard
§  Uncertainty certainly the best strategy
§  Semi-soft : excellent tradeoff
Projection =>dictionary
§  Other approach: sparse coding
§  Approximation of each local feature as a lin. combinaison
of a subset of words from the dictionary: xi ~ D αi
§  αi weight vectors, D dictionary (matrix of vectors of C)
§  Each xi should be represented using only a small nb
of visual words => sparsity
Projection =>dictionary
§  Sparse coding vs VQ (hard assignement)
§  SC : Sparse Coding : most of αi=0
§  LLC : Local Linear Coding : words representing the feature
must be close (locality)
§  Are these criteria minimizing reconstruction error relevant for
image classification purpose?
Aggregating projections => global image index
§  Next ?
§  Avoiding clustering?
§  Kernel similarity on bag of local features:
⇒ Pyramid match kernel [GRAUMAN 05]
•  Exploiting spatial image information
⇒ Spatial Pyramid Matching [LAZEBNIK 06]
•  Better represent coding/pooling association: work on the whole
matrix of clusters/descriptors dependency => combine spatial
pooling and sparse coding
•  Work on new descriptors (bio inspired, learned)
•  Train the dictionaries (supervised training)
Work on local
descriptors
Work on dico
Work on coding/pooling
41