Supported by Deep Learning of Local RGB-D Patches for 3D Object Detection and 6D Pose Estimation Wadim Kehl1, Fausto Milletari1, Federico Tombari1,2, Slobodan Ilic1,3, Nassir Navab1 1. CAMP Chair, Technical University of Munich, Germany 2. Computer Vision Lab (DISI), University of Bologna, Italy 3. Siemens AG, Research & Technology Center, Munich, Germany Scan me! Overview We present a method that matches local patches from an RGB-D scene against synthetic local object patches to allow for multiinstance and multi-object detection. To facilitate matching, we regress their descriptors using a CNN. We produce state-of-the-art results while being scalable and operating at around 2 Hz. Network architecture & training Local patches & votes Here we employ a convolutional autoencoder (CAE) with which we minimize a reconstruction error ||x − y||. It has been trained on many real scene patches and allows to properly reconstruct input from unseen data. 41/ 5;06*'6+% 8+'95 we extract scale-invariant patches and store each together with a 6D vote and its CNN-feature in a codebook. patchsize m ·f = z Constrained voting for different 15, 7, 5 Training set examples Vote filtering Reconstructions for different F Results +/+.#4 61 !" 9' %10&7%6 # 6*4''56#)' (+.6'4+0) 9*'4' 9' 470 /'#05*+(6 (+456 10 6*' +/#)' 2.#0' 6*'0 +0 64#05.#6+10#. 52#%' #0& (+0#..; +0 37#6'40+10 52#%' 61 4'/18' /156 5274+175 816'5 Sequence LineMOD LC-HF Our approach Stage Runtime (ms) Camera (377) 0.589 0.394 0.383 Scene sampling 0.03 0.942 0.891 0.972 Coffee (501) CNN regression 477.3 Joystick (838) 0.846 0.549 0.892 k-NN & voting 61.4 0.595 0.883 0.866 Juice (556) Vote filtering 1.6 0.558 0.397 0.463 Milk (288) Verification 130.5 0.792 0.910 Shampoo (604) 0.922 Total 670.8 Total (3164) 0.740 0.651 0.747 ape Us 98.1 [15] 53.3 [30] 85.5 bvise bowl cam can cat cup driller duck eggb 94.8 100 93.4 82.6 98.1 99.9 96.5 97.9 100 84.6 - 64.0 51.2 65.6 - 69.1 58.0 86.0 96.1 - 71.8 70.9 88.8 - 90.5 90.7 74.0 glue 74.1 43.8 67.8 holep 97.9 51.6 87.5 iron 91.0 68.3 73.5 lamp 98.2 67.5 92.1 phone 84.9 56.3 72.8 ! References [1] ',#0+#0)175-174++/-#6'06%.#55 *17)* (14'565 (14 1$,'%6 &'6'%6+10 #0& 215' '56+/#6+100 !"+06'4561+55'4'2'6+6 .+%1.<'44#&5-;101.+)'#8#$1&'.#5'& 4#+0+0)'6'%6+10 #0& 15'56+/#6+10 1( ':674'.'55 $,'%65+0'#8+.; .766'4'& %'0'50 ! ".&1/#1/$#4++6'(#01 +0%<'.1$#.;216*'5+5 '4+(+%#6+10 4#/'914-(14 $,'%6 '%1)0+6+10+0.766'4
© Copyright 2025 ExpyDoc