Large Scale Classification and Search

Distance Metric Learning
for Large Margin Nearest Neighbor (LMNN) classification
Presenter: Yen-Cheng Chou
Image downloaded from Kilian Q. Weinberger (the first author) website,
http://www.cse.wustl.edu/~kilian/code/lmnn/lmnn.html
1
The cake is for his 30th birthday.
How do you classify
these images?
2
It depends on your purpose
3
You will get same results for two problems
if you use kNN in Euclidean space
(This is not what you want!)
4
Can we find a better
distance measurement
for kNN?
5
For foreground
classification
Intuition!
Similar objects
should be close
6
For background
classification
Metric Learning:
a way to make similar objects closer
•
Metric = distance function
•
Design a better distance function (for our
purpose)
•
For foreground object classification, we want
•
D(
,
) > D(
7
,
)
LMNN: a metric learning
method for KNN
Intuition for
robust kNN classification
•
A sample should share the same label as its (k)
neighbors
•
samples with different labels should be widely
separated
9
Goal of LMNN
•
Design a good distance function
that:
1. Similarly-labeled predefined
neighbors are close.
2. Differently-labeled samples are
out of neighborhood.
The sample
similarly-labeled predefined neighbor
similarly-labeled sample
(but not one of predefined neighbor)
10
differently-labeled samples
Example of LMNN
The sample
similarly-labeled predefined neighbor
similarly-labeled sample
(but not one of predefined neighbor)
11
differently-labeled samples
Result of LMNN
12
Result of LMNN
13
Model of LMNN
cost function, optimization, testcase classification
Distance Function
•
Define distance function by
!
•
Learn the best M with optimization problem
based on distance function
15
Cost Function of LMNN
•
Define a loss function that leverages our two
goals, then minimize it.
Similarly-labeled
predefined neighbors
should be close
16
Differently-labeled
samples should be
out of neighborhood
Loss Function
!
!
•
Similarly-labeled predefined
neighbors should be close
The sample
similarly-labeled predefined neighbor
similarly-labeled sample
(but not one of predefined neighbor)
17
differently-labeled samples
Loss Function
•
Differently-labeled samples
should be out of neighborhood.
The sample
similarly-labeled predefined neighbor
similarly-labeled sample
(but not one of predefined neighbor)
18
differently-labeled samples
Illustration of ideal Metric
Similarly-labeled
predefined neighbors
should be close
Imaged downloaded from http://en.wikipedia.org/wiki/File:Lmnn.png 19
Differently-labeled
samples should be
out of neighborhood
Formulate as
Convex Optimization Problem
Testcase Classification
•
Use derived distance function on
kNN
!
•
?
Energy based classification
•
Apply every label to the test case
•
Computer its lose
•
Use the one that has smallest
loss
The sample
similarly-labeled predefined neighbor
similarly-labeled sample
(but not one of predefined neighbor)
differently-labeled samples
Advantages of LMNN
Similarly-labeled
predefined neighbors
should be close
•
Differently-labeled
samples should be
out of neighborhood
Focus on (local) predefined neighbors
•
Less constraints
•
doesn’t require all constraints
•
It only requires the differently-labeled data far enough (outside neighborhood),
not as-far-as-possible
•
Formulated as a convex optimization problem. Global optimal solution exists.
Two views of LMNN
Distance Metric Function
Transformation onto
another space
Model Comparison
24
LMNN Versus PCA
Left image: downloaded from http://en.wikipedia.org/wiki/File:Lmnn.png
Right image: cropped from Principal Component Analysis(PCA) slides of Mario Savvides
25
LMNN versus MMC
The sample
Penalize all differently-labeled
samples.
sum(distance within same
label) should strictly < 1
similarly-labeled predefined neighbor
The sample
similarly-labeled sample
(but not one of predefined neighbor)
similarly-labeled sample
differently-labeled samples
differently-labeled samples
LMNN versus POLA
The sample
Constrain over similarly-labeled
pair and differently-labeled pair
similarly-labeled predefined neighbor
The sample
similarly-labeled sample
(but not one of predefined neighbor)
similarly-labeled sample
differently-labeled samples
differently-labeled samples
LMNN versus SVM
Extension
Multi-pass LMNN
…
Multi-metric LMNN
Learn various metrics for each labels
Discussion & Conclusion
How to define a good cost
function for metric learning?
•
Purpose: For kNN? SVM? De-noise?
•
Constraints:
•
•
too weak? (bad performance)
•
too strong? (hard to achieve)
Computation Issue
•
Close form solution?
•
Convex Optimization?
Well designed cost function
for LMNN
•
Purpose: For kNN
•
Constraints over
•
•
similarly-labeled predefined neighbor
•
differently-labeled sample only need to be
outside the neighborhood!
Well designed for convex optimization
VisualRank:
Applying PageRank to Large-Scale
Image Search
Yushi Jing, Member, IEEE, and Shumeet Baluja, Member, IEEE
Slides: Jonathan Chao
Summary
—  There are not many image-based search
engines around.
—  Image-based search methods are proposed, but
the effectiveness on large scale search is still
uncertain.
—  Thus, VisualRank, an efficient computation of
similarities to images in large quantity, is
proposed.
Summary
—  Why VisualRank?
—  How is it done?
—  What is the result?
—  Further work?
Text-Based Search Methods
—  Due to the ease of text search, the commercial
image search often uses techniques from text
search.
—  Instead of using the images as sources, the search
engines take the context in the websites that
contain certain key words and display the images
from those websites.
Text-Based Search Methods
—  However, such method will lead to display of
misleading information due to ambiguous
wording.
—  Sometimes this will cause inconvenience for
users.
Text-Based Search Methods
Key Word: Coca Cola
Text-Based Search Methods
Image-Based Search
—  Instead, the distribution of visual
similarities are analyzed.
—  “Visual Themes” and their strength in
images are used to create a visual ranking
system
Image-Based Search
VisualRank:
—  an end-to-end system that improves
Google image search results
—  emphasis on robust and efficient
computation of image similarities
—  applicable to a large number of queries
and images.
VisualRank Methodology
—  Each graph has vertices and edges.
—  A representation of an image
VisualRank Methodology
—  Eigenvector Centrality:
—  Principle eigenvector of a square stochastic
adjacency matrix, constructed from the weights of
the edges in the graph.
—  Or the ranking scores corresponding the likelihood of
travelling from one (random) vertex to another
(Random Walk)
VisualRank Methodology
—  Visual hyperlinks: visual similarity in initial results.
—  If images A & B have a visual hyperlink, there exists
a probability for a user to jump from A to B.
—  The importance of an image is measured by how
many visual hyperlinks pointing to it.
VisualRank Methodology
—  If an image is important, the images such image
points to is also important.
—  VisualRank is defined as
—  S* is the column normalized adjacency matrix, S.
VisualRank Methodology
—  By repeatedly multiplying VR by S*, the dominant
eigenvector of S* stands out.
—  Does VR converge?
—  S* needs to be aperiodic & irreducible.
—  Aperiodicity: generally true in all cases.
—  Irreducibility: guaranteed by introducing a damping
factor to the equation.
VisualRank Methodology
—  The new equation becomes:
—  d: scalar damping factor.
—  p: a matrix that can be uniformed of ununiformed.
VisualRank Methodology
—  Measurement of visual similarity
—  Local features to obtain more stable information and
avoid info change on
—  Rotation
—  Transformation
—  Scale
VisualRank Methodology
—  Local Feature extractors:
—  SIFT
—  Standard SIFT
—  Difference of Gaussian interest point detector
—  Builds a pyramid of scaled images by iteratively applying
Gaussian filter to the original image
—  Adjacent images are subtracted to create DoG
—  Interest points located at the local extrema of 2D image
space and scale space are selected.
—  Orientation histogram feature representation
—  Gradient map around the interest points are computed and
then divided into sub-regions from which the orientation
histogram can be computed.
VisualRank Methodology
—  At the end, a 128-D vector is created
—  4 x 4 orientation histogram
—  8 bins
—  Concatenate them into one vector
—  Visual similarity is defined as the number of local
features shared between two images divided by
their average number of interest points.
VisualRank Methodology
VisualRank Methodology
Web-Scale Visual Rank
—  Infeasible to generate the S matrix with billions of
images.
—  Precluster web images based on metadata
—  Or use existing commercial search engine to create
such cluster
—  Ex. Using top N results in Google Image Search to
form a cluster can calculate visual similarity.
Web-Scale Visual Rank
—  Hashing Local Features
—  Using hash table to store the features is a more
efficient way.
—  Locality Sensitive Hashing
—  Using approximate K-NNR (return all points that are
within range r)
Web-Scale Visual Rank
(Summary)
1.  Local features are generated.
2.  A collective of hash tables are created, each
storing bunch of hash functions. Each of the
descriptors is indexed into each of the hash table.
3.  Descriptors that share the same key in more than
a certain number of hash tables are considered a
match.
Web-Scale Visual Rank
(Summary)
4.  Regroup matched features by images they are
associated with. A 4-D histogram is used to store
“votes.” Select the histogram that has most votes
to be used to compute similarity score.
5.  A pair of images are considered a match if the
share more than C (C=3, empirical result)
descriptors.
6.  Given similarity matrix S, can generate results
using VisualRank (VR)
Experimental Results
VisualRank vs. Google on Product Query:
Experimental Results
VisualRank vs. Other Related Works on
Product Query:
Experimental Results
VisualRank vs. Google on Landmark Query:
Future Work
—  Working on both labeling and ranking unlabeled
images from personal image collections
—  studying the relationship between image similarity
and “likelihood for transition” more extensively
—  extending this work to other domains that are not
amenable to traditional text-based analysis
including video and audio analysis.