home.hiroshima

Toru Tamaki, Miho Abe, Bisser Raytchev, Kazufumi Kaneda
19th Nov. 2010
Contribution of this talk
 Fast GPU implementations of registration
algorithms for 3D point sets.
 Softassign [Gold et al., 1998]
 EM-ICP [Granger et al., 2002]
 (Weighted) Horn’s method [Horn, 1987]
 So, what is “registartion” ?
What is “Registration” or “Alignment” ?
Image registration
A set of images
3D registration algorithm
 Input
 Two point sets: 𝑋 and 𝑌
 Output
 Rotation matrix 𝑅
 Translation vector 𝒕
𝑅 and 𝒕
X
Y
Algorithms for registration
Horn’s method
ICP (Iterative closest point)
• Corresponding point sets are given.
• Estimate R and t.
•
•
•
•
Unknown correspondence.
Fast, standard.
Easily fail due to local minimum.
A lot of variants follow.
Registration
algorithm
Softassign
EM-ICP
• Unknown correspondence.
• Robust.
• Very slow because of iterations.
• Unknown correspondence.
• Robust.
• Very slow because of iterations.
Algorithms for registration
Horn’s method
ICP (Iterative closest point)
• Corresponding point sets are given.
• Estimate R and t.
•
•
•
•
Unknown correspondence.
Fast, standard.
Easily fail due to local minimum.
A lot of variants follow.
Registration
algorithm
Softassign
EM-ICP
• Unknown correspondence.
• Robust.
• Very slow because of iterations.
• Unknown correspondence.
• Robust.
• Very slow because of iterations.
Horn’s method: correspondence is known.
𝒙1𝑇
𝑇
𝒙2
⋮
𝒚1𝑇
𝑇
𝒚2
⋮
𝑋
𝑌
Known correspondence
X
Y
𝒙1 = (𝑥1𝑥 , 𝑥1𝑦 , 𝑥1𝑧 )𝑇
Unknown correspondence
?
X
Y
Horn’s method: correspondence is known.
𝒙1
𝑇
𝒙2
⋮
𝑇
3
𝒚1
𝑇
𝒚2
⋮
𝑇
𝑆
𝑋
𝑌
𝑋
=
𝑋
𝑌
𝑌
4
𝐾 =
𝑋−𝒙
𝒙
𝑌−𝒚
𝒚
Compute centers
1
Centering
2
5
Computer 1st Eigenvector 𝒒
: quaternion 𝑞
Convert 𝑞 to 𝑅
𝒕 = 𝒙 − 𝑅𝒚
Algorithms for registration
Horn’s method
ICP (Iterative closest point)
• Corresponding point sets are given.
• Estimate R and t.
•
•
•
•
Unknown correspondence.
Fast, standard.
Easily fail due to local minimum.
A lot of variants follow.
Registration
algorithm
Softassign
EM-ICP
• Unknown correspondence.
• Robust.
• Very slow because of iterations.
• Unknown correspondence.
• Robust.
• Very slow because of iterations.
ICP: correspondence is unknown.
𝒙1𝑇
𝑇
𝒙2
⋮
𝒚1𝑇
𝑇
𝒚2
⋮
𝒚𝑖
𝑋
𝑌
𝑌∗
𝒚𝑖
Find closest
(nearest) point
to 𝒙1 in 𝑌
Put the point
to 𝑌 ∗
ICP: correspondence is unknown.
𝒙1𝑇
𝑇
𝒙2
⋮
𝒚1𝑇
𝑇
𝒚2
⋮
𝒚𝑖
𝒚𝑗
⋮
Horn’s method
with 𝑋 and 𝑌 ∗
𝒚𝑗
𝑋
𝑌
𝑌∗
Estimate 𝑅 and 𝒕
Find closest
(nearest) point
to 𝒙1 in 𝑌
Put the point
to 𝑌 ∗
ICP: correspondence is unknown.
𝒙1𝑇
𝑇
𝒙2
⋮
𝒚1𝑇
𝑇
𝒚2
⋮
𝒚𝑖
𝒚𝑗
Horn’s method
with 𝑋 and 𝑌 ∗
⋮
𝒚𝑗
𝑋
𝑅𝑌 + 𝒕
𝑌∗
Estimate 𝑅 and 𝒕
Repeat
Find closest
(nearest) point
to 𝒙1 in 𝑌
Put the point
to 𝑌 ∗
Fast, but easy to fail
due to hard correspondence.
Algorithms for registration
Horn’s method
ICP (Iterative closest point)
• Corresponding point sets are given.
• Estimate R and t.
•
•
•
•
Unknown correspondence.
Fast, standard.
Easily fail due to local minimum.
A lot of variants follow.
Registration
algorithm
Softassign
EM-ICP
• Unknown correspondence.
• Robust.
• Very slow because of iterations.
• Unknown correspondence.
• Robust.
• Very slow because of iterations.
Softassign: soft correspondence.
𝒚𝑗
𝑌
Weighted
Horn’s method
with 𝑋 and 𝑌
𝑀
𝒙𝑖
𝑋
𝑚𝑖𝑗
Each row and column
should be normalized to 1
by Shinkhorn iterations
𝑚𝑖𝑗 = ||𝒙𝑖 − 𝑅𝒚𝑗 + 𝒕 ||
Estimate 𝑅 and 𝒕
Repeat
Shinkhorn iterations
𝑀
𝑚𝑖𝑗
sum up to 1
sum up to 1
sum up to 1
⋮
Each row and column
should be normalized to 1
by Shinkhorn iterations
sum up to 1
Repeat row and column
normalization until converge.
Shinkhorn iterations
𝑀
𝑚𝑖𝑗
Each row and column
should be normalized to 1
by Shinkhorn iterations
sum up to 1
⋮
sum up to 1
sum up to 1
sum up to 1
Repeat row and column
normalization until converge.
Shinkhorn.GPU (row normalization)
Using sgemv of CUBLAS
𝑀
1
1
1
⋮
Each row and column
should be normalized to 1
by Shinkhorn iterations
𝑹𝑀
𝟏
Shinkhorn.GPU (row normalization)
Using CUDA kernel
𝑀
Row-wise
division
Each row and column
should be normalized to 1
by Shinkhorn iterations
𝑹𝑀
Column normalization is done
by the same way.
Weighted Horn’s method
Normal version
Weighted version
3
3
𝑆
=
𝑋
𝑌
𝑆
=
𝑋
𝑀
Using CUBLAS sgemv twice.
𝑌
Centering.GPU (weighted version)
CUDA
kernel
∗
∗
𝑋
1
1
1
⋮
CUBLAS
sasum
Weighted
sum
𝑋
𝟏
𝑹𝑀
CUBLAS
sasum
∗
𝑹𝑀
Weighted
center
𝒙
𝟏
Same as for 𝒚
Pipeline of Softassing.GPU
𝑌
𝑌
𝑋
𝑋
𝑀
Compute 𝑀 with CUDA kernel
Shinkhorn.GPU
𝒙 ,𝒚
𝑅 and 𝒕
Solve
Eigenvalue
problem
𝐾
𝑆
𝑆=
Centering.GPU
𝑋
𝑀
𝑌
Weighted Horn’s method
Algorithms for registration
Horn’s method
ICP (Iterative closest point)
• Corresponding point sets are given.
• Estimate R and t.
•
•
•
•
Unknown correspondence.
Fast, standard.
Easily fail due to local minimum.
A lot of variants follow.
Registration
algorithm
Softassign
EM-ICP
• Unknown correspondence.
• Robust.
• Very slow because of iterations.
• Unknown correspondence.
• Robust.
• Very slow because of iterations.
EM-ICP: soft correspondence.
Pseudo correspondence 𝑋′
𝒙𝑗
𝑋
Weighted
Horn’s method
with 𝑋′ and 𝑌
𝐴
𝒚𝑖
𝑑𝑖𝑗
𝒙′𝑖
𝑋′
𝑌
Estimate 𝑅 and 𝒕
Each row is normalized once.
𝑑𝑖𝑗 = ||𝒙𝑗 − 𝑅𝒚𝑖 + 𝒕 ||
Repeat
Row normalization on GPU
Using sgemv of CUBLAS
𝐴
1
1
1
⋮
Not normalized yet.
𝑪
𝟏
Row normalization on GPU
Using CUDA kernel
𝐴
Row-wise
division
+
sqrt
Now normalized.
𝑪
Computing weights
Using sgemv of CUBLAS
𝐴
1
1
1
⋮
Now normalized.
𝝀
𝟏
Pseudo correspondence
CUBLAS
sgemv
𝐴
𝑋
Now normalized.
Centering: same with Softassing.GPU
𝑋′
Weighted Horn’s method
Weighted version (not efficient)
3
𝜆1
𝑆
=
0
𝜆2
𝑋′
0
⋱
𝑌
Weighted version (2 steps)
3
CUDA
kernel
CUBLAS
sgemm
∗
𝑋′
𝑆
𝝀
𝑋
=
𝑋’
𝑌
Pipeline of EM-ICP.GPU
𝑌
𝑌
𝑋
𝑋
𝐴
Compute 𝐴with CUDA kernel
Row normalization on GPU
𝒙 ,𝒚
𝑅 and 𝒕
Solve
Eigenvalue
problem
𝐾
𝑆
∗
𝑋′
Centering.GPU
𝑆
=
𝑋′
𝑌
𝑋
𝝀
2 step weighted Horn’s method
Computing time over different number of points
GPU: GeForce8800GT
CPU: Intel Core2 Quad + OpenMP (4 cores)
Successfully aligned
5000 points less than
7 seconds.
Slightly fast,
but failed.
Summary
 Implemented 3D registration algorithms on a
GPU are:
 Softassign,
 EM-ICP,
 Weighted Horn’s method.
 EM-ICP.GPU is
 able to align 5000 points within 7 seconds,
 60 times faster than EM-ICP.CPU,
 more robust than ICP.CPU.
 Code, binary, and movies are available at:

http://home.hiroshima-u.ac.jp/tamaki/study/cuda_softassign_emicp/
Limitations
 Number of points
 Should be less than 8000 for GeForce8800GT with
512MB memory.
 More memory, more points.
 Stopping condition
 requires to store whole matrix 𝑀 or 𝐴, and
compare with previous ones: inefficient.
 Hence, currently, number of iterations is fixed.
Algorithms for registration
Horn’s method
ICP (Iterative closest point)
• Corresponding point sets are given.
• Estimate R and t.
•
•
•
•
Unknown correspondence.
Fast, standard.
Easily fail due to local minimum.
A lot of variants follow.
Registration
algorithm
Softassign
EM-ICP
• Unknown correspondence.
• Robust.
• Very slow because of iterations.
• Unknown correspondence.
• Robust.
• Very slow because of iterations.