Interactive Feature Tracking using K

Interactive Feature Tracking using K-D Trees and Dynamic Programming
Aeron M. Buchanan, [email protected] (Engineering, University of Oxford, UK)
Andrew W. Fitzgibbon, [email protected] (Microsoft Research, Cambridge, UK)
• with a human in the loop, which is modelled by the system,
• a comprehensive track definition allowing optimal interactive tracking with massive automatic preprocessing.
Interactivity
Preprocessing
Detection
Filter jet creation
K-D tree searches
Extremely simple and intuitive user-guided process:
1. Single user click generates a track through the entire sequence at over 100fps.
2. Further clicks lock down keyframe features and improve the track.
Optimization
Refinement
Dynamic programming
Sub-pixel precision
>100fps
Interactive
Overview
A faster than real-time tracker:
500
Prior State of the Art:
• including Mean-shift, KLT, WSL, etc,
• do not model the possibility of failure,
• fail even on ‘easy’ sequences, due to
I “Track-to-first”—do not model appearance change,
I “Track-to-previous”—track is lost through drift,
• require user to continually monitor and restart,
• time taken to track sequence: >100% of sequence length.
Instead, User-in-Loop Batch Tracking:
Our approach overcomes traditional conflict between:
1. Small search region—tracking is fast per frame; must use
every frame; unable to recover after occlusions.
2. Large search region—tracking is slow per frame, but
subsampling allows speed-ups; recovery possible after
occlusions, but bad tracks due to mismatches.
950
2000
3500
5000
6500
8420
9500
11750
5
50
500
725
950
2000
3500
5000
6500
8420
9500
11750
5
50
500
725
950
2000
4550
4910
6515
7940
9140
10040
Input: Image stream, I(x, y, t); keyframe features, Q = {(xi, yi, ti, qi)}; tuning parameters, µ1,2.
Output: Optimal track, T = {(xt, yt, pt)}, minimizing:
E(T ) =
X
t
?
µ1kxt − xt−1k
|
{z
}
trajectory smoothness
?
?
?
+ µ2kpt − pt−1k + min(kqi − ptk )
|
{z
}
appearance smoothness
| i
{z
}
keyframe similarity
note: norms are robust for
occlusions and multichannel
th
with pt being the vectorized appearance of the feature at xt in frame t and qi the appearance of the i keyframe feature.
Immense speed-ups, taking this optimization to interactive rates, are achieved by separating detection from tracking:
• Fast detection is facilitated by converting each image to a 16D point cloud,
using convolution with a filter bank (obtained via PCA of natural image
patches), and storing it in a K-D tree.
• The optimal track is efficiently extracted using dynamic programming (Viterbi algorithm) from the small set of
candidate matches for each frame selected by the detection stage.
Occlusions are incorporated by treating them as a special type of feature.
Refinement automatically updates the track trajectory to sub-pixel precision. This is required because the image locations of
the data points in the K-D trees are necessarily quantized.
20
60
100
140
180
220
260
300
340
380
420
460
8
100
200
300
400
500
600
700
800
900
1000
1100
320×240, 200fps
Results
360×270, 135fps
Comparison
• user specifies keyframe features, one at a time,
• the optimal track over the entire sequence is generated,
• optimizing over
I smoothness of trajectory,
I similarity to keyframe features,
I smoothness of appearance,
I summed occlusion penalties,
• tuning parameters can be updated interactively,
• time taken to track sequence: 40% of sequence length.
725
Details
50
Mean-shift Colour KLT
Frame 5