Understanding Structure-from-Motion (SfM) Dr. Neil Smith, VCC 2014 1. Feature Detection 2. Feature Matching 3. Bundle Adjustment (Camera Pose and Intrinsics) 4. Multi-View Stereo Dense Reconstruction (Common Final Output) Following slides adapted from Computer Vision courses taught at Cornell (Snavely) and University of Washington (Szelinski). Richard Szeliski, Computer Vision: Algorithms and Applications, 2010 Feature Detection What Features might we choose to find correspondences? From Szleski 2010: fig. 4.2 Feature Detection Which of the three sample windows would have the most distinguishable features for matching? Which of the three sample windows would have the most distinguishable features? From Szleski 2010: fig. 4.2 Feature Detection Patch A: Texture-less, nothing to match Patch B: Gradient Change (Border between snow and sky) -Can only match the curves, ambiguity over which feature is which. Called the Aperture Problem Patch C: Multiple Gradients (“corner-like”). From Szleski 2010: fig. 4.3 Feature detection Local measure of feature uniqueness • How does the window change when you shift it? • Shifting the window in any direction causes a big change “flat” region: no change in all directions “edge”: no change along the edge direction “corner”: significant change in all directions Slide adapted from Darya Frolova, Denis Simakov, Weizmann Institute. Feature detection: the math Consider shifting the window W by (u,v) • how do the pixels in W change? • compare each pixel before and after by summing up the squared differences (SSD) • Compute an Auto-correlation surface • this defines an SSD “error” of E(u,v): I is Image W is Window x,y Window position (u; v) Displacement Vector W Feature detection: the math Consider shifting the window W by (u,v) • how do the pixels in W change? • compare each pixel before and after by summing up the squared differences (SSD) • Compute an Auto-correlation surface • this defines an SSD “error” of E(u,v): W The second moment matrix The surface E(u,v) is locally approximated by a quadratic form. Corner detection: the math xmin xmax Eigenvalues and eigenvectors of H • Define shift directions with the smallest and largest change in error • xmax = direction of largest increase in E • λmax = amount of increase in direction xmax • xmin = direction of smallest increase in E • λmin = amount of increase in direction xmin Feature detection summary Here’s what you do • • • • • Compute the gradient at each point in the image Create the H matrix from the entries in the gradient Compute the eigenvalues. Find points with large response (λ- > threshold) Choose those points where λ- is a local maximum as features The Harris operator λ- is a variant of the “Harris operator” for feature detection • • • • • • • The trace is the sum of the diagonals, i.e., trace(H) = h11 + h22 Very similar to λ- but less expensive (no square root) Called the “Harris Corner Detector” or “Harris Operator” Emphasizes the large gradients in all directions Remains orientation invariant Lots of other detectors, this is one of the most popular Major Problem: Very Sensitive to Changes in Image Scale Difference of Gaussian (DoG) • An approximation of the Laplacian of Gaussian which has a strong positive responses for dark blobs (extrema/features) • LOG/DOG Blob Detector: Finds points regarded as a bright (dark) blob if the value at this point is greater (smaller) than the value in all its 26 neighbours. Identifies Extrema. • The subtraction of one blurred version of an original image from another, less blurred version of the original • convolves the original grayscale images with Gaussian kernels having differing standard deviations Invariance Suppose we are comparing two images I1 and I2 • I2 may be a transformed version of I1 • What kinds of transformations are we likely to encounter in practice? We’d like to find the same features regardless of the transformation • This is called transformational invariance • Most feature methods are designed to be invariant to – Translation, 2D rotation, scale • They can usually also handle – Limited 3D rotations (SIFT works up to about 60 degrees) – Limited affine transformations (some are fully affine invariant) – Limited illumination/contrast changes How to achieve invariance Need both of the following: 1. Make sure your detector is invariant • Harris is invariant to translation and rotation • Scale is trickier – common approach is to detect features at many scales using a Gaussian pyramid (e.g., MOPS) – More sophisticated methods find “the best scale” to represent each feature (e.g., SIFT) 2. Design an invariant feature descriptor • A descriptor captures the information in a region around the detected feature point • The simplest descriptor: a square window of pixels – What’s this invariant to? • Let’s look at some better approaches… Scale Invariant Feature Transform Search for DoG features (Extrema) that remain stable across scales Downsample and compare From Lowe 2004 Scale Invariant Feature Transform Orientation Assignment: • • • • Take 16x16 square window around detected feature Compute edge orientation (angle of the gradient - 90°) for each pixel Throw out weak edges (threshold gradient magnitude) Create histogram of surviving edge orientations 2π 0 angle histogram Adapted from slide by David Lowe SIFT descriptor Full version • Divide the 16x16 window into a 4x4 grid of cells (2x2 case shown below) • Compute an orientation histogram for each cell • 16 cells * 8 orientations = 128 dimensional descriptor Adapted from slide by David Lowe Properties of SIFT Extraordinarily robust matching technique • Can handle changes in viewpoint – Up to about 60 degree out of plane rotation • Can handle significant changes in illumination – Sometimes even day vs. night (below) • Fast and efficient—can run in real time • Lots of code available – http://people.csail.mit.edu/albert/ladypack/wiki/index.php/Known_implementations_of_SIFT Feature descriptors We know how to detect good points Next question: How to match them? ? Feature descriptors We know how to detect good points Next question: How to match them? ? Lots of possibilities (this is a popular research area) • Simple option: match square windows around the point • State of the art approach: SIFT – David Lowe, UBC http://www.cs.ubc.ca/~lowe/keypoints/ Image Matching Feature matching Given a feature in I1, how to find the best match in I2? Need a Matching Strategy 1. Use Feature Description Vectors to measure distance between Descriptors 2. Set a threshold (maximum distance) to eliminate bad matches but not so low that you miss matches. Feature distance How to define the difference between two features f1, f2? • • Simple approach: L2 distance, ||f1 - f2 || can give good scores to ambiguous (incorrect) matches f1 f2 I1 I2 Feature distance How to define the difference between two features f1, f2? • Better approach: ratio distance = ||f1 - f2 || / || f1 - f2’ || • f2 is best SSD match to f1 in I2 • f2’ is 2nd best SSD match to f1 in I2 • gives large values for ambiguous matches f2' f1 I1 I2 f2 Evaluating the results How can we measure the performance of a feature matcher? 50 75 200 feature distance True/false positives 50 75 true match 200 false match feature distance Problem: High Threshold = False Positives (Incorect matches) Low Threshold = False Negatives (Miss matches) Solution: Model Performance of Matching at varying Thresholds using a Receiving operating characteristic curve (ROC) Evaluating the results How can we measure the performance of a feature matcher? ROC 1 curve (“Receiver Operator Characteristic”) 0.7 # true positives # correctly matched features (positives) “recall” true positive rate 0 0.1 false positive rate 1 # false positives # incorrectly matched features (negatives) 1 - “precision” Feature distance State-of-the-Art Techniques: 1. Matching Nearest Neighbor Features with Adaptable Threshold 2. Nearest Neighbor Distance Ratio -Compare NN distance to second NN 3. Verify Matches using Random Sampling i.e. RANSAC Feature distance From Szleski 2010: fig. 4.25 Bundle Adjustment X4 X1 X3 X2 X5 minimize g(R, T, X) X7 non-linear least squar X6 p1,1 p1,2 Camera 1 R1,t1 p1,3 Camera 3 Camera 2 R2,t2 R3,t3 Bundle Adjustment • What makes this non-linear minimization hard? • • • • many more parameters: potentially slow poorer conditioning (high correlation) potentially lots of outliers gauge (coordinate) freedom CSE 576, Spring 2008 Structure from Motion 31 Bundle Adjustment • Minimize sum of squared reprojection errors: predicted observed image locationimage location indicator variable: is point i visible in image j ? • Minimizing this function is called bundle adjustment – Optimized using non-linear least squares, Levenberg-Marquardt e.g. Multi-view Stereo Input: calibrated images and sparse points from SfM Output: Dense Point Cloud or Mesh Figures by Carlos Hernandez Stereo: basic idea error depth Multi-view Stereo Different Approaches: Figures from Furukawa and Ponce 2007 A. Patch Based: PMVS B. Merge Stereo Depth Maps (SURE) -Advantages – Can work with full resolution image. Not dependent on original sparse point cloud. -Disadvantage –When can’t achieve good alignment Merging Depth Maps: Temple Model input image 16 (ring) 317 images 47 images (hemisphere) ground truth model Goesele, Curless, Seitz, 2006 Michael Goesele Multi-view Stereo Figures from Furukawa and Ponce 2007 Patch Based Method: Expand from sparse points taking a patch of pixels (3D rectangle, with normal). Then Filter From Reference Image R, compute score of photometric discrepancy…want only small g(p) Multi-view Stereo The SfM Project 1. Capture a part of KAUST campus using provided cameras 2. Reconstruct Sparse and Dense Point Cloud using VisualSfM and PMVS 3. Report on: -Timings -Images Used in Reconstruction of Model/Models -Things you could do to improve results or timings -What was reconstructed well, what was not reconstructed 4. Extra Credit: Compare results with Kinect Scan Calibrate your Camera using Matlab Toolbox and Calibration Chart then rerun SfM and compare results Try a wide angle 8mm lens instead of standard lens/GoPro/Video Try a different MVS method (SURE or CMPMVS) Try Creating a 360 Panorama SfM in Practice Proper acquisition plays a major role in the success of the Bundle Adjustment SfM in Practice 1. Sharp in focus images 2. Not too dark or shadowed images 3. Need to have features in image (White walls-bad) 4. Fixed Focal Length use motion rather than zoom 5. 60-80% Overlap between images 6. Wide Field of View helps 7. Good Initial Pair 8. No sharp changes in Angle 9. Increase capture in difficult areas 10. Close the loop VisualSfM What is VisualSfM - Open Source SfM GUI developed by Changchang Wu to run: 1. SiftGPU – A gpu accelerated SIFT detector and matcher. - >10X -Runs inside OpenGL/CUDA shaders (GPU memory) -Processes Gaussian Pyramids and DOG in parallel 2. Multicore Bundler – A GPU accelerated Bundler -30X Faster, One thread per camera 3. Many advanced and experimental features Installing/Using VisualSfM 1. Transfer Package to computer 2. Run from Command Line 3. Use sample pictures for running reconstruction
© Copyright 2025 ExpyDoc