P-2A-23. - ECCV 2016

Supported by
Deep Learning of Local RGB-D Patches for
3D Object Detection and 6D Pose Estimation
Wadim Kehl1, Fausto Milletari1, Federico Tombari1,2, Slobodan Ilic1,3, Nassir Navab1
1. CAMP Chair, Technical University of Munich, Germany
2. Computer Vision Lab (DISI), University of Bologna, Italy
3. Siemens AG, Research & Technology Center, Munich, Germany
Scan me!
Overview
We present a method that matches local
patches from an RGB-D scene against
synthetic local object patches to allow for multiinstance and multi-object detection. To facilitate
matching, we regress their descriptors using a
CNN. We produce state-of-the-art results while
being scalable and operating at around 2 Hz.
Network architecture & training
Local patches & votes
Here we employ a convolutional
autoencoder (CAE) with which
we minimize a reconstruction
error ||x − y||. It has been trained
on many real scene patches and
allows to properly reconstruct
input from unseen data.
41/ 5;06*'6+% 8+'95 we extract
scale-invariant patches and store
each together with a 6D vote and
its CNN-feature in a codebook.
patchsize
m
·f
=
z
Constrained voting for different 15, 7, 5
Training set examples
Vote filtering
Reconstructions for different F
Results
+/+.#4 61 !" 9' %10&7%6 # 6*4''56#)' (+.6'4+0)
9*'4' 9' 470 /'#05*+(6 (+456 10 6*' +/#)' 2.#0'
6*'0 +0 64#05.#6+10#. 52#%' #0& (+0#..; +0
37#6'40+10 52#%' 61
4'/18' /156 5274+175 816'5
Sequence
LineMOD LC-HF Our approach
Stage
Runtime (ms)
Camera (377)
0.589
0.394
0.383
Scene sampling
0.03
0.942
0.891
0.972
Coffee (501)
CNN regression
477.3
Joystick (838)
0.846
0.549
0.892
k-NN & voting
61.4
0.595
0.883
0.866
Juice (556)
Vote filtering
1.6
0.558
0.397
0.463
Milk (288)
Verification
130.5
0.792
0.910
Shampoo (604) 0.922
Total
670.8
Total (3164)
0.740
0.651
0.747
ape
Us 98.1
[15] 53.3
[30] 85.5
bvise bowl cam can cat cup driller duck eggb
94.8 100 93.4 82.6 98.1 99.9 96.5 97.9 100
84.6 - 64.0 51.2 65.6 - 69.1 58.0 86.0
96.1 - 71.8 70.9 88.8 - 90.5 90.7 74.0
glue
74.1
43.8
67.8
holep
97.9
51.6
87.5
iron
91.0
68.3
73.5
lamp
98.2
67.5
92.1
phone
84.9
56.3
72.8
!
References
[1] ',#0+#0)175-174+&#5+/-#6'06%.#55 *17)* (14'565 (14 1$,'%6 &'6'%6+10 #0& 215' '56+/#6+100 !"+06'4561+55'4'2'6+6 .+%1.<'44#&5-;101.+)'#8#$1&'.#5'& 4#+0+0)'6'%6+10 #0& 15'56+/#6+10 1(
':674'.'55 $,'%65+0'#8+.; .766'4'& %'0'50 !
".&1/#1/$#4++6'(#01 +0%<'.1$#.;216*'5+5 '4+(+%#6+10 4#/'914-(14 $,'%6 '%1)0+6+10+0.766'4