Online video seeds for temporal window objectness

Online Video SEEDS
Dr. Michael Van den Bergh
SEEDS
Superpixels
Extracted via
Energy-
Driven
Sampling
ECCV 2012
What are superpixels?
•
grouping pixels based on
similarity (color)
•
•
speeds up segmentation
objects are made up of a
small number of superpixels
Existing superpixel methods
gradual addition of cuts
•
•
•
high accuracy
very slow (contradictory)
e.g. Entropy Rate Superpixels (Liu et al.)
Existing superpixel methods
growing from centers
•
•
•
•
faster
reduced accuracy (local minima + stray labels)
still not fast enough
e.g. SLIC Superpixels (Achanta et al.)
new approach: SEEDS Superpixels
initalization
largest block update
•
•
•
sday, September 18, 12
medium block update
smallest block update
initialize with rectangular boundaries
gradually refine boundaries
SEEDS: Superpixels Extracted via Energy-driven
Sampling - ECCV 2012
pixel-level update
Advantages of SEEDS
initalization
•
sday, September 18, 12
largest block update
medium block update
smallest block update
pixel-level update
faster than growing centers
•
•
only needs to evaluate at the boundaries
highly efficient evaluation using color histograms (1 memory lookup)
Advantages of SEEDS
initalization
sday, September 18, 12
largest block update
medium block update
smallest block update
•
faster than growing centers
•
accuracy matches or exceeds state-of-the-art
•
•
•
•
only needs to evaluate at the boundaries
highly efficient evaluation using color histograms
avoids local minima
optimization only evaluates valid partitionings
pixel-level update
FH
0.7035
0.7746
0.8537
0.9034
ASA
SEEDS (15Hz)
SLIC (5Hz)
ERS (1Hz)
FH
50
0.9406
0.9064
0.932
0.9042
100
0.9579
0.935
0.951
0.9453
200
0.9676
0.9531
0.964
0.9598
400
0.9749
0.9676
0.972
0.9699
Advantages of SEEDS
Boundary Recall
Undersegmentation Error
Achievable Segmentation Accuracy
1
0.98
0.8
0.95
1.5
0.6
0.91
0.75
0.4
3
SEEDS (15Hz)
SLIC (5Hz)
Entropy Rate (1Hz)
Felzenszwalb and Huttenlocher
2.25
0
0.88
SEEDS (15Hz)
SLIC (5Hz)
Entropy Rate (1Hz)
Felzenszwalb and Huttenlocher
0.2
50
100
200
number of superpixels
400
SEEDS (15Hz)
SLIC (5Hz)
Entropy Rate (1Hz)
Felzenszwalb and Huttenlocher
0.84
50
100
200
number of superpixels
400
50
100
200
number of superpixels
400
Advantages of SEEDS
initalization
sday, September 18, 12
largest block update
medium block update
smallest block update
•
faster than state-of-the-art
•
accuracy matches or exceeds state-of-the-art
pixel-level update
Advantages of SEEDS
initalization
sday, September 18, 12
largest block update
medium block update
smallest block update
•
faster than state-of-the-art
•
accuracy matches or exceeds state-of-the-art
•
control over run-time
•
•
whenever the algorithm is stopped, a valid partitioning is available
state-of-the-art accuracy at 30 Hz (single core)
pixel-level update
Advantages of SEEDS
initalization
sday, September 18, 12
largest block update
medium block update
smallest block update
•
faster than state-of-the-art
•
accuracy matches or exceeds state-of-the-art
•
control over run-time
•
control over superpixel shape
•
•
•
whenever the algorithm is stopped, a valid partitioning is available
state-of-the-art accuracy at 30 Hz (single core)
one or more priors can be applied during boundary updating
pixel-level update
(b) SEEDS with 3 ⇥ 3 smoothing prior
(b) SEEDS with compactness prior
(b) SEEDS with edge prior (snap to edges)
(b) SEEDS with combined prior (3 ⇥ 3 smoothing + compactness + snap to edges)
Advantages of SEEDS
•
•
•
•
•
faster
more accurate
control over run-time
control over shape
temporal
Advantages of SEEDS
initalization
esday, September 18, 12
largest block update
medium block update
smallest block update
pixel-level update
Online Video SEEDS
Video SEEDS
methods.
elated to suks tackled in
ches for still
They either
om centers.
and its hierweighted agith Nystrom
are based on
from a stillSEEDS apo add a third
one that that
artition.
frame 0
frame 1
frame 2
DVSSFOU
frame
initialization
block-updates
propagation
pixel-updates
Figure 2. Overview of the Video SEEDS algorithm: The superpixel labels are propagated at an intermediary step of block-level
t=0
Video SEEDS
To
we
olor
olor
olor
initialization
layer 3 (blocks)
layer 2 (blocks)
layer 1 (pixels)
initialization
layer 2 (blocks)
layer 1 (pixels)
t=1
xels)
Figure 4. Efficient updating at different block sizes.
candidate
to
form
a
new
superpixel
in
the
hill-climbing
fulfill
the
constraint
of
number
of
superpixels
per frame
t
t:0
|Bn | ⌧ |An |, optimization,
we
do
it
with
the
following
propo
t:0
t:0aproposition.
t
(Sec.
4.1). The
candidates
to be
new superpixel,
aret:0
blocks t
ms
from
mization,
we
do
it
with
the
following
Proposition
3.
Let
|A
|
⇡
|A
|,
|B
|
⌧
|A
|,
|B
|
in a single bin.
t:0
t:0
t
t:0
t
m
n
n
n
m
ofm
pixels
of|A
thensame
size
as⌧
the
superpixels
candidates
to
osition
3.
Let
|A
|
⇡
|,
|B
|
|A
|,
|B
|
⌧
n
n
m
t:0
t
t
vement
t:0
t:0
t
t
|Am | and
that
both
B
and
B
have
concentrated
h
terminate,
which
are
part
of
an
existing
superpixel.
Let
t
t
Video
SEEDS
nLet
m| ⇡ |A
Proposition
3.
|A
|
⌧
|A
t:0 and
t:0
tm
t:0n |, |B
thisn
n
and
that
both
B
B
have
concentrated
position
3.
Let
|A
|
⇡
|A
|,
|B
|
⌧
|A
|,
|B
|
⌧
t
t:0
t
t:0
n
m
PropomAnbin.
n blocks ofnsuperpixels
m candiB
⇢
and Then,
Bn
t:0
t
none
m ⇢ Am be t
tograms
into
|A
|
0 into one bin.
tand thatt both Bn and Bm have conce
m
ms
Then,
dates
to
create
aB
new
superpixel.
To evaluate whichhisis the
| and
that
both
B
and
have
concentrated
essarily
n
m
(2) tograms into one bin. Then,
\B ).
best candidate to form a new superpixel in the hill-climbing
(m)
(n)
ams are
into one bin.
Then,
xels
)following
H(sproposition.
)
optimization,
we doH(s
it with the(n)
(m)
tion
stems
from
H(s ) H(s Creation
)
Termination
rpixel in a frame,
er
than
(m) t:0
(n)
lockextending
movement
t
t
t:0
t:0
t
t
xels
on
(m)
(n)
H(s
)
H(s
)
t , cA3.
t |,⌧
t:0
t ) 
t:0 \B
t ).
int(c
int(c
c
Proposition
Let
|A
|
⇡
|A
|,
|B
|A
|,
|B
|
⌧
m
B
B
\B
A
n
n
nint(c
tt:0 m
H(s
)
)
n
m
n m tH(s
nn
nm
t
t
t:0
t
int(c
,
c
)

,
c
).
(4)
assumption
is
that
ion.
Yet,
PropoBtn andABnt \B
\B
tcolor
of histograms.
the Bm Am
n|At:0
n
n concentrated his|
and
that
both
B
m
n t
m have
not necessarily
t , cAt:0 \B t )  int(cB t , cAt:0 \B
int(c
B
t
m, c n
tograms
oneint(c
bin.
m Then,
n (4) n
tm
t:0 \B tinto
t:0 \B t ).
int(c
)

by
[15] (inB t , cA
eactice
same
B
A
p
m
n under
m
n states that
n
n assumptions t
superpixels
are
The
proposition
the
rved
the same.
euch
proposition
states
that
under
the
assumptions
that
smaller
than
(m)
(n)
urththe
of temporal superpixels
H(s
) H(s size,
)
are
of
similar
the
energy
time
time
current
current
pixels.
AccordThe
proposition
states
that
under
the
assum
hold
most
of
the
mporal
superpixels
are
of
similar
size,
the
energy
is
framethat under the frame
he
proposition
states
assumptions
that
er
than
selected
to termiint(c
,
c
)

int(c
,
c
).
(4)
B
B
A
\B
A
\B
maximized
when
creating
a
new
superpixels
from
a blo
Figure
5.
Termination
and
creation
of
superpixels.
beis of
the same
the
temporal
superpixels
are
of
similar
size,
th
me
selected,
we
mized
when
creating
a
new
superpixels
from
a
block
emporal
superpixels
are of similar size, the energy is
hatoneof
ost
fourth
of
create
athe
new
one.
pixels
that
intersects
the
least
with
its
current
superpix
The
proposition
states
that
under
the
assumptions
that
maximized
when
creating
a
new
superpixels
fr
ties
as
in
Eq.
(2).
els
that
intersects
the
least
with
its
current
superpixel.
ch This
smallerwhen
than Iterations.
imized
creating
new
superpixels
Weacan
stop the
optimization
for a from
frame at a block
superpixels
per
frame
•
m.
the
temporal
superpixels
are We
ofshould
similar
size,
the energy
is
n and The
creationdecision
of
if
a
new
superpixel
appear
is
govern
any
time
and
obtain
a
valid
partition.
expect
a
higher
traint is that
the
of
pixels
that
intersects
the
least
with
its
current
superpixel
rate
(time)
•
ecision
if
a
new
superpixel
should
appear
is
governed
cient
intersection
xels
that
intersects
the
least
with
its
current
superpixel.
maximized
when
creating
a new
superpixels
from a block
(more
value
of
the
energy
function
if
we
let
the
hill-climbing
do
histogram.
Thissuperpixel rate parameter.
by the
The
decision
if
a
new
superpixel
should
appear
more
iterations,
untilintersects
convergence.
We
canwith
fix the
allowed
of
pixels
that
the
least
its
current
superpixel.
superpixel
rate
parameter.
decision
if a new
superpixel
should
appear
is
governed
e by [17]
(more
reation
and termitimeThe
to run
per frame,
ornew
set it superpixel
on-the-fly, depending
on the is governed
decision
if
a
should
appear
by
the
superpixel
rate
parameter.
d. When a superme.
application.
In
principle,
the
algorithm
can run for an inhe superpixel rate parameter.
t
n
-1:0
:0
:0
t
m
t:0
m
t
n
t
n
t:0
n
t
n
Video SEEDS
3D Undersegmentation Error
80
70
3D Boundary Recall
GBH (t= )
StreamGBH (t=10)
StreamGBH (t=1)
StreamGB (t=1)
Video SEEDS (t=1)
Meanshift
60
50
40
30
20
0.95
0.85
0.9
0.8
0.85
0.75
0.7
0.65
0.6
0.55
GBH (t= )
StreamGBH (t=10)
StreamGBH (t=1)
StreamGB (t=1)
Video SEEDS (t=1)
Meanshift
0.5
10
0
200
0.9
0.45
300
400
500
600
700
Number of Supervoxels
800
900
0.4
200
300
400
500
600
Number of Supervoxels
700
800
900
Explained Variation
90
0.8
0.75
0.7
0.65
GBH (t= )
StreamGBH (t=10)
StreamGBH (t=1)
StreamGB (t=1)
Video SEEDS (t=1)
Meanshift
0.6
0.55
0.5
0.45
200
300
400
500
600
700
800
900
Number of Supervoxels
Figure 7. Comparison of our online video superpixels method to the state-of-the-art (s-o-a). For the first plot, lower is better, and for the
second and third, higher is better.
•
•
•
•
Chen Xiph.org benchmark
6.1. Evaluation of Online Video SEEDS
report is
results
of the online video superpixels on
t=∞ means the entire We
video
analyzed
the Chen Xiph.org benchmark [3] using the metrics prot=1 means it is onlineposed
(not
streaming)
by [16].
The videos contain moving objects and are
with an uncontrolled camera. We use the stanwe are at 30Hz, theyrecorded
are
at
0.25 Hz
dard metrics for evaluating temporal superpixels.2 The 3D
Objectness Measure for Temporal Windows. We define a temporal window as a sequence of temporally connected bounding boxes, one per frame, which aim to surround an object in video. It can be thought as a rectangularshaped tube in the time axis (illustrated in Fig. 1 bottom).
The video is divided into overlapping shots of a predefined
length, and for each shot all temporal windows are consid-
SEEDS in OpenCV
Randomized SEEDS
(b) SEEDS with 3 ⇥ 3 smoothing prior
(b) SEEDS with compactness prior
(b) SEEDS with edge prior (snap to edges)
(b) SEEDS with combined prior (3 ⇥ 3 smoothing + compactness + snap to edges)
Randomness Injection
Randomized
SEEDS
labels
multiple SEEDS samples
objectness score
Figure 6. Different samples of randomized SEEDS segmentations of the same frame and with the same accuracy are combined. In the
randomized SEEDS, we show the average of the different samples. The objectness score is computed as the sum of the distances to the
common superpixel boundaries.
noise to the evaluation of the exchanges of pixels in the hillclimbing, i.e. in Eq. (2). This is,
Objectness Measure for Still Images. We use O to represent the intersection of several superpixel samples of ran-
Randomized SEEDS
Randomized
SEEDS
labels
multiple SEEDS samples
objectness score
Figure 6. Different samples of randomized SEEDS segmentations of the same frame and with the same accuracy are combined. In the
randomized SEEDS, we show the average of the different samples. The objectness score is computed as the sum of the distances to the
common superpixel boundaries.
noise to the evaluation of the exchanges of pixels in the hillclimbing, i.e. in Eq. (2). This is,
Objectness Measure for Still Images. We use O to represent the intersection of several superpixel samples of ran-
Temporal Video Objectness
Randomized
SEEDS
labels
multiple SEEDS samples
objectness score
Figure 6. Different samples of randomized SEEDS segmentations of the same frame and with the same accuracy are combined. In the
randomized SEEDS, we show the average of the different samples. The objectness score is computed as the sum of the distances to the
common superpixel boundaries.
noise to the evaluation of the exchanges of pixels in the hillclimbing, i.e. in Eq. (2). This is,
Objectness Measure for Still Images. We use O to represent the intersection of several superpixel samples of ran-
Temporal Video Objectness
emma Roig1 Xavier Boix1 Santiago Manen1
2
u¨ rich, Switzerland
KU Leuven, Belgium
Luc Van Gool1,2
Tubes of Bounding Boxes
e,boxavier,gemmar,vangool}@vision.ee.ethz.ch ⇤
hms are broadly used
support regions and
Recently, many algon order to exploit the
However, most methe for real-time applitime video superpixel
sed SEEDS superpixwhich delivers multiuperpixels in the same
le samples are shown
measure the objectuce the novel concept
xperiments show that
rable performance to
running at 30 fps on
e-art performance on
orders of magnitude
ows in video.
Figure 1. Top: Video SEEDS provide temporal superpixel tubes.
Bottom: Randomized SEEDS efficiently produce multiple label
hypotheses per frame. Based on these, a Video Objectness mea-
(a)
Tubes of Bounding Boxes
(b)
Figure 9. Comparison of the objectness measure with sampling superpix
tion of video objectness on the Chen dataset.
Figure 10. Example of the highest ranked temporal window rendered at different frames in the video.
video objectness score (3D edge) there is an improvement
in accuracy because the score is updated over time. Also,
Temporal Video Objectness (SEEDS)
Objectness on still images: s-o-a
Objectness on still images: baselines
1
gPb (auc: 0.473)
Canny (auc: 0.408)
Randomized SEEDS - 1 sample (auc: 0.428)
Randomized SEEDS - 5 samples (auc: 0.475)
0.9
0.8
0.9
Objectness [1] (auc: 0.490)
van de Sande [16]
Feng et al. [7] (auc: 0.475)
Rahtu et al. [12] (auc: 0.3680)
Randomized SEEDS (auc: 0.475)
0.9
0.8
0.7
0.81
0.72
0.7
0.6
0.5
0.4
0.63
Detection Rate
Detection Rate
Detection Rate
Video Objectness: temporal window performance
1
0.6
0.5
0.4
0.54
0.45
0.36
0.3
0.3
0.27
0.2
0.2
0.18
0.1
0.1
0.09
0
10
0
10
1
10
# windows
(a)
2
10
3
0
10
0
10
1
10
# windows
(b)
2
10
3
0
10
3D edge - 5 samples (auc: 0.652)
3D edge - 1 sample (auc: 0.523)
only propagation - 5 samples (auc: 0.628)
only propagation - 1 sample (auc: 0.309)
0
10
1
10
2
# temporal windows (tubes)
10
3
10
4
(c)
Figure 9. Comparison of the objectness measure with sampling superpixels on PASCAL VOC07 to (a) baselines, (b) s-o-a, and (c) evaluation of video objectness on the Chen dataset.
ral windows (tubes of bounding boxes) that contain object
candidates. Finally, our experiments have shown that both
the video superpixel and objectness algorithms match s-oa offline methods in terms of accuracy, but at much higher
speeds.
Thank You.