Online Video SEEDS Dr. Michael Van den Bergh SEEDS Superpixels Extracted via Energy- Driven Sampling ECCV 2012 What are superpixels? • grouping pixels based on similarity (color) • • speeds up segmentation objects are made up of a small number of superpixels Existing superpixel methods gradual addition of cuts • • • high accuracy very slow (contradictory) e.g. Entropy Rate Superpixels (Liu et al.) Existing superpixel methods growing from centers • • • • faster reduced accuracy (local minima + stray labels) still not fast enough e.g. SLIC Superpixels (Achanta et al.) new approach: SEEDS Superpixels initalization largest block update • • • sday, September 18, 12 medium block update smallest block update initialize with rectangular boundaries gradually refine boundaries SEEDS: Superpixels Extracted via Energy-driven Sampling - ECCV 2012 pixel-level update Advantages of SEEDS initalization • sday, September 18, 12 largest block update medium block update smallest block update pixel-level update faster than growing centers • • only needs to evaluate at the boundaries highly efficient evaluation using color histograms (1 memory lookup) Advantages of SEEDS initalization sday, September 18, 12 largest block update medium block update smallest block update • faster than growing centers • accuracy matches or exceeds state-of-the-art • • • • only needs to evaluate at the boundaries highly efficient evaluation using color histograms avoids local minima optimization only evaluates valid partitionings pixel-level update FH 0.7035 0.7746 0.8537 0.9034 ASA SEEDS (15Hz) SLIC (5Hz) ERS (1Hz) FH 50 0.9406 0.9064 0.932 0.9042 100 0.9579 0.935 0.951 0.9453 200 0.9676 0.9531 0.964 0.9598 400 0.9749 0.9676 0.972 0.9699 Advantages of SEEDS Boundary Recall Undersegmentation Error Achievable Segmentation Accuracy 1 0.98 0.8 0.95 1.5 0.6 0.91 0.75 0.4 3 SEEDS (15Hz) SLIC (5Hz) Entropy Rate (1Hz) Felzenszwalb and Huttenlocher 2.25 0 0.88 SEEDS (15Hz) SLIC (5Hz) Entropy Rate (1Hz) Felzenszwalb and Huttenlocher 0.2 50 100 200 number of superpixels 400 SEEDS (15Hz) SLIC (5Hz) Entropy Rate (1Hz) Felzenszwalb and Huttenlocher 0.84 50 100 200 number of superpixels 400 50 100 200 number of superpixels 400 Advantages of SEEDS initalization sday, September 18, 12 largest block update medium block update smallest block update • faster than state-of-the-art • accuracy matches or exceeds state-of-the-art pixel-level update Advantages of SEEDS initalization sday, September 18, 12 largest block update medium block update smallest block update • faster than state-of-the-art • accuracy matches or exceeds state-of-the-art • control over run-time • • whenever the algorithm is stopped, a valid partitioning is available state-of-the-art accuracy at 30 Hz (single core) pixel-level update Advantages of SEEDS initalization sday, September 18, 12 largest block update medium block update smallest block update • faster than state-of-the-art • accuracy matches or exceeds state-of-the-art • control over run-time • control over superpixel shape • • • whenever the algorithm is stopped, a valid partitioning is available state-of-the-art accuracy at 30 Hz (single core) one or more priors can be applied during boundary updating pixel-level update (b) SEEDS with 3 ⇥ 3 smoothing prior (b) SEEDS with compactness prior (b) SEEDS with edge prior (snap to edges) (b) SEEDS with combined prior (3 ⇥ 3 smoothing + compactness + snap to edges) Advantages of SEEDS • • • • • faster more accurate control over run-time control over shape temporal Advantages of SEEDS initalization esday, September 18, 12 largest block update medium block update smallest block update pixel-level update Online Video SEEDS Video SEEDS methods. elated to suks tackled in ches for still They either om centers. and its hierweighted agith Nystrom are based on from a stillSEEDS apo add a third one that that artition. frame 0 frame 1 frame 2 DVSSFOU frame initialization block-updates propagation pixel-updates Figure 2. Overview of the Video SEEDS algorithm: The superpixel labels are propagated at an intermediary step of block-level t=0 Video SEEDS To we olor olor olor initialization layer 3 (blocks) layer 2 (blocks) layer 1 (pixels) initialization layer 2 (blocks) layer 1 (pixels) t=1 xels) Figure 4. Efficient updating at different block sizes. candidate to form a new superpixel in the hill-climbing fulfill the constraint of number of superpixels per frame t t:0 |Bn | ⌧ |An |, optimization, we do it with the following propo t:0 t:0aproposition. t (Sec. 4.1). The candidates to be new superpixel, aret:0 blocks t ms from mization, we do it with the following Proposition 3. Let |A | ⇡ |A |, |B | ⌧ |A |, |B | in a single bin. t:0 t:0 t t:0 t m n n n m ofm pixels of|A thensame size as⌧ the superpixels candidates to osition 3. Let |A | ⇡ |, |B | |A |, |B | ⌧ n n m t:0 t t vement t:0 t:0 t t |Am | and that both B and B have concentrated h terminate, which are part of an existing superpixel. Let t t Video SEEDS nLet m| ⇡ |A Proposition 3. |A | ⌧ |A t:0 and t:0 tm t:0n |, |B thisn n and that both B B have concentrated position 3. Let |A | ⇡ |A |, |B | ⌧ |A |, |B | ⌧ t t:0 t t:0 n m PropomAnbin. n blocks ofnsuperpixels m candiB ⇢ and Then, Bn t:0 t none m ⇢ Am be t tograms into |A | 0 into one bin. tand thatt both Bn and Bm have conce m ms Then, dates to create aB new superpixel. To evaluate whichhisis the | and that both B and have concentrated essarily n m (2) tograms into one bin. Then, \B ). best candidate to form a new superpixel in the hill-climbing (m) (n) ams are into one bin. Then, xels )following H(sproposition. ) optimization, we doH(s it with the(n) (m) tion stems from H(s ) H(s Creation ) Termination rpixel in a frame, er than (m) t:0 (n) lockextending movement t t t:0 t:0 t t xels on (m) (n) H(s ) H(s ) t , cA3. t |,⌧ t:0 t ) t:0 \B t ). int(c int(c c Proposition Let |A | ⇡ |A |, |B |A |, |B | ⌧ m B B \B A n n nint(c tt:0 m H(s ) ) n m n m tH(s nn nm t t t:0 t int(c , c ) , c ). (4) assumption is that ion. Yet, PropoBtn andABnt \B \B tcolor of histograms. the Bm Am n|At:0 n n concentrated his| and that both B m n t m have not necessarily t , cAt:0 \B t ) int(cB t , cAt:0 \B int(c B t m, c n tograms oneint(c bin. m Then, n (4) n tm t:0 \B tinto t:0 \B t ). int(c ) by [15] (inB t , cA eactice same B A p m n under m n states that n n assumptions t superpixels are The proposition the rved the same. euch proposition states that under the assumptions that smaller than (m) (n) urththe of temporal superpixels H(s ) H(s size, ) are of similar the energy time time current current pixels. AccordThe proposition states that under the assum hold most of the mporal superpixels are of similar size, the energy is framethat under the frame he proposition states assumptions that er than selected to termiint(c , c ) int(c , c ). (4) B B A \B A \B maximized when creating a new superpixels from a blo Figure 5. Termination and creation of superpixels. beis of the same the temporal superpixels are of similar size, th me selected, we mized when creating a new superpixels from a block emporal superpixels are of similar size, the energy is hatoneof ost fourth of create athe new one. pixels that intersects the least with its current superpix The proposition states that under the assumptions that maximized when creating a new superpixels fr ties as in Eq. (2). els that intersects the least with its current superpixel. ch This smallerwhen than Iterations. imized creating new superpixels Weacan stop the optimization for a from frame at a block superpixels per frame • m. the temporal superpixels are We ofshould similar size, the energy is n and The creationdecision of if a new superpixel appear is govern any time and obtain a valid partition. expect a higher traint is that the of pixels that intersects the least with its current superpixel rate (time) • ecision if a new superpixel should appear is governed cient intersection xels that intersects the least with its current superpixel. maximized when creating a new superpixels from a block (more value of the energy function if we let the hill-climbing do histogram. Thissuperpixel rate parameter. by the The decision if a new superpixel should appear more iterations, untilintersects convergence. We canwith fix the allowed of pixels that the least its current superpixel. superpixel rate parameter. decision if a new superpixel should appear is governed e by [17] (more reation and termitimeThe to run per frame, ornew set it superpixel on-the-fly, depending on the is governed decision if a should appear by the superpixel rate parameter. d. When a superme. application. In principle, the algorithm can run for an inhe superpixel rate parameter. t n -1:0 :0 :0 t m t:0 m t n t n t:0 n t n Video SEEDS 3D Undersegmentation Error 80 70 3D Boundary Recall GBH (t= ) StreamGBH (t=10) StreamGBH (t=1) StreamGB (t=1) Video SEEDS (t=1) Meanshift 60 50 40 30 20 0.95 0.85 0.9 0.8 0.85 0.75 0.7 0.65 0.6 0.55 GBH (t= ) StreamGBH (t=10) StreamGBH (t=1) StreamGB (t=1) Video SEEDS (t=1) Meanshift 0.5 10 0 200 0.9 0.45 300 400 500 600 700 Number of Supervoxels 800 900 0.4 200 300 400 500 600 Number of Supervoxels 700 800 900 Explained Variation 90 0.8 0.75 0.7 0.65 GBH (t= ) StreamGBH (t=10) StreamGBH (t=1) StreamGB (t=1) Video SEEDS (t=1) Meanshift 0.6 0.55 0.5 0.45 200 300 400 500 600 700 800 900 Number of Supervoxels Figure 7. Comparison of our online video superpixels method to the state-of-the-art (s-o-a). For the first plot, lower is better, and for the second and third, higher is better. • • • • Chen Xiph.org benchmark 6.1. Evaluation of Online Video SEEDS report is results of the online video superpixels on t=∞ means the entire We video analyzed the Chen Xiph.org benchmark [3] using the metrics prot=1 means it is onlineposed (not streaming) by [16]. The videos contain moving objects and are with an uncontrolled camera. We use the stanwe are at 30Hz, theyrecorded are at 0.25 Hz dard metrics for evaluating temporal superpixels.2 The 3D Objectness Measure for Temporal Windows. We define a temporal window as a sequence of temporally connected bounding boxes, one per frame, which aim to surround an object in video. It can be thought as a rectangularshaped tube in the time axis (illustrated in Fig. 1 bottom). The video is divided into overlapping shots of a predefined length, and for each shot all temporal windows are consid- SEEDS in OpenCV Randomized SEEDS (b) SEEDS with 3 ⇥ 3 smoothing prior (b) SEEDS with compactness prior (b) SEEDS with edge prior (snap to edges) (b) SEEDS with combined prior (3 ⇥ 3 smoothing + compactness + snap to edges) Randomness Injection Randomized SEEDS labels multiple SEEDS samples objectness score Figure 6. Different samples of randomized SEEDS segmentations of the same frame and with the same accuracy are combined. In the randomized SEEDS, we show the average of the different samples. The objectness score is computed as the sum of the distances to the common superpixel boundaries. noise to the evaluation of the exchanges of pixels in the hillclimbing, i.e. in Eq. (2). This is, Objectness Measure for Still Images. We use O to represent the intersection of several superpixel samples of ran- Randomized SEEDS Randomized SEEDS labels multiple SEEDS samples objectness score Figure 6. Different samples of randomized SEEDS segmentations of the same frame and with the same accuracy are combined. In the randomized SEEDS, we show the average of the different samples. The objectness score is computed as the sum of the distances to the common superpixel boundaries. noise to the evaluation of the exchanges of pixels in the hillclimbing, i.e. in Eq. (2). This is, Objectness Measure for Still Images. We use O to represent the intersection of several superpixel samples of ran- Temporal Video Objectness Randomized SEEDS labels multiple SEEDS samples objectness score Figure 6. Different samples of randomized SEEDS segmentations of the same frame and with the same accuracy are combined. In the randomized SEEDS, we show the average of the different samples. The objectness score is computed as the sum of the distances to the common superpixel boundaries. noise to the evaluation of the exchanges of pixels in the hillclimbing, i.e. in Eq. (2). This is, Objectness Measure for Still Images. We use O to represent the intersection of several superpixel samples of ran- Temporal Video Objectness emma Roig1 Xavier Boix1 Santiago Manen1 2 u¨ rich, Switzerland KU Leuven, Belgium Luc Van Gool1,2 Tubes of Bounding Boxes e,boxavier,gemmar,vangool}@vision.ee.ethz.ch ⇤ hms are broadly used support regions and Recently, many algon order to exploit the However, most methe for real-time applitime video superpixel sed SEEDS superpixwhich delivers multiuperpixels in the same le samples are shown measure the objectuce the novel concept xperiments show that rable performance to running at 30 fps on e-art performance on orders of magnitude ows in video. Figure 1. Top: Video SEEDS provide temporal superpixel tubes. Bottom: Randomized SEEDS efficiently produce multiple label hypotheses per frame. Based on these, a Video Objectness mea- (a) Tubes of Bounding Boxes (b) Figure 9. Comparison of the objectness measure with sampling superpix tion of video objectness on the Chen dataset. Figure 10. Example of the highest ranked temporal window rendered at different frames in the video. video objectness score (3D edge) there is an improvement in accuracy because the score is updated over time. Also, Temporal Video Objectness (SEEDS) Objectness on still images: s-o-a Objectness on still images: baselines 1 gPb (auc: 0.473) Canny (auc: 0.408) Randomized SEEDS - 1 sample (auc: 0.428) Randomized SEEDS - 5 samples (auc: 0.475) 0.9 0.8 0.9 Objectness [1] (auc: 0.490) van de Sande [16] Feng et al. [7] (auc: 0.475) Rahtu et al. [12] (auc: 0.3680) Randomized SEEDS (auc: 0.475) 0.9 0.8 0.7 0.81 0.72 0.7 0.6 0.5 0.4 0.63 Detection Rate Detection Rate Detection Rate Video Objectness: temporal window performance 1 0.6 0.5 0.4 0.54 0.45 0.36 0.3 0.3 0.27 0.2 0.2 0.18 0.1 0.1 0.09 0 10 0 10 1 10 # windows (a) 2 10 3 0 10 0 10 1 10 # windows (b) 2 10 3 0 10 3D edge - 5 samples (auc: 0.652) 3D edge - 1 sample (auc: 0.523) only propagation - 5 samples (auc: 0.628) only propagation - 1 sample (auc: 0.309) 0 10 1 10 2 # temporal windows (tubes) 10 3 10 4 (c) Figure 9. Comparison of the objectness measure with sampling superpixels on PASCAL VOC07 to (a) baselines, (b) s-o-a, and (c) evaluation of video objectness on the Chen dataset. ral windows (tubes of bounding boxes) that contain object candidates. Finally, our experiments have shown that both the video superpixel and objectness algorithms match s-oa offline methods in terms of accuracy, but at much higher speeds. Thank You.
© Copyright 2024 ExpyDoc