Systematic Evaluation Methodology for Fingerprint

Systematic Evaluation Methodology for
Fingerprint-Image Quality Assessment Techniques
¨
J. Hammerle-Uhl,
M. Pober, A. Uhl
Presenter: Christof Kauba
Department of Computer Sciences
University of Salzburg
May 30th, 2014
Andreas Uhl: Systematic Evaluation Methodology for Fingerprint-Image Quality Assessment Techniques
1/22
Overview
1
Introduction & Motivation
2
The Stirmark Toolkit
3
Experiments
4
Conclusion
Andreas Uhl: Systematic Evaluation Methodology for Fingerprint-Image Quality Assessment Techniques
2/22
Outline
1
Introduction & Motivation
2
The Stirmark Toolkit
3
Experiments
4
Conclusion
Andreas Uhl: Systematic Evaluation Methodology for Fingerprint-Image Quality Assessment Techniques
3/22
Introduction & Motivation
Fingerprint recognition robustness and sample quality
Sample image quality impacts on recognition accuracy
Skin conditions (e.g., dryness, moisture, dirt, cuts and bruises,
ageing), sensor type and conditions (e.g., dirt, noise, size), user
cooperation, crime scene preservation
Benchmarking frameworks: FVC, BioSecure, SFinGe, StirMark [1]
Essential: Indices to reliably determine fingerprint image quality
in various circumstances, but how to assess such proposed
indices ?
Here: Propose standardised tool to assess correlation between
image quality indices and recognition accuracy on fingerprint
image data representing a wide range of real-world acquisition
conditions and quality levels (which are simulated by StirMark [1]).
¨
[1] J. Hammerle-Uhl,
M. Pober, A. Uhl, ”Towards Standardised Fingerprint Matching Robustness Assessment: The StirMark
Toolkit – Cross-Database Comparisons with Minutiae-based Matching”, In Proceedings of the 1st ACM Workshop on Information
Hiding and Multimedia Security (IH&MMSec’13), pp. 111-116, Montpellier, France, June 17 - June 19, 2013.
Andreas Uhl: Systematic Evaluation Methodology for Fingerprint-Image Quality Assessment Techniques
4/22
Outline
1
Introduction & Motivation
2
The Stirmark Toolkit
3
Experiments
4
Conclusion
Andreas Uhl: Systematic Evaluation Methodology for Fingerprint-Image Quality Assessment Techniques
5/22
The Stirmark Toolkit
Basic idea: The StirMark Benchmark is a generic benchmark test for
evaluating the robustness of digital image watermarking methods,
many of the systematic errors introduced into data can be interpreted
as specific fingerprint acquisition conditions.
Additive noise: Actual dust on the fingerprint contact area, sensor
noise, grainy surface a latent fingerprint has been taken off
Median Cut filtering: Simulates blur in fingerprint images, e.g.
smudgy fingerprints (too much moist) etc.
Remove Lines and Columns: Sensor errors, esp. sweep sensors
can be affected by line removal (examples are shown)
Rotation: Omnipresent challenge in fingerprint recognition
Stretching: A higher force applied when pressing the finger onto
the contact area, in forensics a soft or flexible surface
Shearing: Simulates a setting where the applied pressing force is
not perpendicular to the contact area
Random distortions: Modelling e.g. unevenly distributed pressure
or a latent fingerprint scanned from an uneven surface.
Andreas Uhl: Systematic Evaluation Methodology for Fingerprint-Image Quality Assessment Techniques
6/22
StirMark Examples
(a) Noise (level 15)
(b) Median Cut Filter
(size 9)
(c) Rotation of −15◦
(d) Stretching (d =
1.350)
(e) Shearing (b =
c = 0.20)
(f) Random
(lrnddist 4.2)
Andreas Uhl: Systematic Evaluation Methodology for Fingerprint-Image Quality Assessment Techniques
Dist.
7/22
Real Examples
(a) Missing lines
(b) Warping effects
Figure: Examples for distortions from actual acquisition problems.
Andreas Uhl: Systematic Evaluation Methodology for Fingerprint-Image Quality Assessment Techniques
8/22
Fingerprint quality indices assessment strategy
Generate a large corpus of test data exhibiting various quality
levels
Start with an available dataset of the target sensor (or multiple
sensors of interest)
Apply StirMark image manipulations of different types in various
intensities
Conduct fingerprint recognition experiments on these data with
different types of feature extraction / matching algorithms
Correlate recognition result parameters (e.g. EER) to the quality
index values of different strength intensities
In this manner, specific strengths and weaknesses of candidate
quality indices can be identified
Andreas Uhl: Systematic Evaluation Methodology for Fingerprint-Image Quality Assessment Techniques
9/22
Types of Fingerprint Matchers
Correlation-Based Matcher Use the fingerprint images in their
entirety, the global ridge and furrow structure of a
fingerprint is decisive. Images are correlated at
different rotational and translational alignments.
Ridge Feature-Based Matcher Deal with the overall ridge and furrow
structure in the fingerprint, yet in a localised manner.
Characteristics like local ridge orientation or local ridge
frequency are used.
Minutiae-Based Matcher The set of minutiae within each fingerprint is
determined and stored as list, each minutia being
represented (at least) by its location and direction. The
matching process then basically tries to establish an
optimal alignment between the minutiae sets.
Andreas Uhl: Systematic Evaluation Methodology for Fingerprint-Image Quality Assessment Techniques
10/22
Outline
1
Introduction & Motivation
2
The Stirmark Toolkit
3
Experiments
4
Conclusion
Andreas Uhl: Systematic Evaluation Methodology for Fingerprint-Image Quality Assessment Techniques
11/22
Experimental Settings: Data & Recognition Software
DB1 - DB3 from the FVC2004 data are used in a verification
setting employing the evaluation protocol as specified by FVC
Fingerprint matching software
3 minutiae-based schemes [2], including“NIST Biometric Image
Software” (NBIS) package (mindtct and bozorth3). Due to the
similarity of the results, average EER is given only.
Phase only correlation (POC), custom implementation. First, the
normalised cross spectrum (or cross-phase spectrum) of the DFT
of the two images is computed. The POC is then obtained by taking
the inverse DFT of the normalised cross spectrum.
Fingercode (FC), custom implementation. A Gabor filter bank is
applied to the orientation image resulting in a “Ridge Feature Map”
which is translationally and rotationally aligned for matching.
EER is used to assess recognition accuracy
¨
[2] J. Hammerle-Uhl,
M. Pober, A. Uhl, ”Towards Standardised Fingerprint Matching Robustness Assessment: The StirMark
Toolkit – Cross-Feature Type Cmparisons”, In Proceedings of the 14th IFIP International Conference on Communications and
Multimedia Security (CMS’13), pp. 3-17, Magdeburg, Germany, Springer Lecture Notes on Computer Science, 8099, Sept 25 Sept 26, 2013.
Andreas Uhl: Systematic Evaluation Methodology for Fingerprint-Image Quality Assessment Techniques
12/22
Experimental Settings: StirMark Settings & Quality
Indices
StirMark Settings: We use 12 types of StirMark manipulations,
each of which with 3 to 10 intensity levels, overall 91 different
manipulations per image.
Quality Indices
nfiq: Part of the NIST Biometric Image Software (NBIS) package. It
relies on information produced by the minutiae detector mindtct and
basically conducts a neural net-based classification of the
minutiae-vector into one of five overall fingerprint quality classes.
SpatDom: Based on determining the block-wise clarity of the ridges
and furrow orientation in the spatial domain. Per foreground-block,
the gradient vectors of the gray level intensities are used to build a
covariance matrix, based on which the normalized coherence
measure is computed, which is then combined for all
foreground-blocks in a weighed sum.
FreqDom: Image quality is defined in terms of energy concentration
within a specific frequency band containing ridge frequency, which
is measured in terms of entropy.
Andreas Uhl: Systematic Evaluation Methodology for Fingerprint-Image Quality Assessment Techniques
13/22
Experimental Settings: Determining Quality Indices’
Reliability
With respect to matching StirMark manipulated fingerprint images,
the enrolled gallery image is taken in its original
(non-manipulated) version, while the probe image involved in
matching is the manipulated one.
Thus, for each matching scheme, for each manipulation type, and
for each manipulation strength we are able to compute the
corresponding EER, overall 91 EER values per database per
matching scheme, arranged in lists of increasing manipulation
strength.
We generate quality indices for all involved images and generate
mean values per manipulation type and intensity level. From these
data, we generate lists of quality mean values, ordered by
manipulation intensity as well.
Finally, we compute Spearman’s rank order correlation per
manipulation type, per quality measure, per database, and per
fingerprint matcher.
Andreas Uhl: Systematic Evaluation Methodology for Fingerprint-Image Quality Assessment Techniques
14/22
Result Selection Strategy
We screen the results for quality indices where correlation is close to
zero (low) for some data sets or matcher types, while it is clearly
positive or negative (high) for other conditions.
Existence of such results does indicate that quality indices need to be
assessed for each data set and matching scheme separately, while the
absence of such results allows to draw data and matcher-independent
conclusions.
Andreas Uhl: Systematic Evaluation Methodology for Fingerprint-Image Quality Assessment Techniques
15/22
1
1
0.5
0.5
Correlation Coefficient
Correlation Coefficient
Results: Minutiae-based schemes vs. FC on DB1
0
-0.5
-1
0
-0.5
-1
t
t
is
dd
lrn
t
is
dd ear
rn
h
Ys
f.X ar
af
e
sh
h
f.Y
af retc
st
h
f.Y
af retc
st
f.X
af
nfiq
ro
(a) Minutiae-based matchers
s
l
rm aus
G
nv n
co ea
M
t
nv
co nCu
ia
ed
m
e
is
FreqDom
no
t
SpatDom
t
is
dd
lrn t
is
ar
dd
rn she
Y
f.X ar
af
e
sh
h
f.Y
af retc
st
h
f.Y
af retc
st
f.X
af
ro
s
l
rm aus
G
nv n
co ea
M
t
nv
co nCu
ia
ed
m
e
is
no
nfiq
SpatDom
FreqDom
(b) Fingercode (FC)
Figure: Mean correlation for DB1.
−→ significant inter-matcher variability is observed for nfiq (for rml)
Andreas Uhl: Systematic Evaluation Methodology for Fingerprint-Image Quality Assessment Techniques
16/22
1
1
0.5
0.5
Correlation Coefficient
Correlation Coefficient
Results: FC vs. POC on DB1
0
-0.5
-1
0
-0.5
-1
t
t
is
dd
lrn
t
is
dd ear
rn
h
Ys
f.X ar
af
e
sh
h
f.Y
af retc
st
h
f.Y
af retc
st
f.X
af
nfiq
ro
(a) Fingercode (FC)
s
l
rm aus
G
nv n
co ea
M
t
nv
co nCu
ia
ed
m
e
is
FreqDom
no
t
SpatDom
t
is
dd
lrn t
is
ar
dd
rn she
Y
f.X ar
af
e
sh
h
f.Y
af retc
st
h
f.Y
af retc
st
f.X
af
ro
s
l
rm aus
G
nv n
co ea
M
t
nv
co nCu
ia
ed
m
e
is
no
nfiq
SpatDom
FreqDom
(b) Phase-only Correlation (POC)
Figure: Mean correlation for DB1.
−→ FreqDom exhibits inter-matcher variability (for noise)
Andreas Uhl: Systematic Evaluation Methodology for Fingerprint-Image Quality Assessment Techniques
17/22
1
1
0.5
0.5
Correlation Coefficient
Correlation Coefficient
Results: Minutiae-based vs. FC on DB2
0
-0.5
-1
0
-0.5
-1
h
t
is
dd
lrn t
is
ar
dd
rn she
XY r
a
he
h
tc
f.
af
tc
tre
Ys
tre
Ys
Xs
SpatDom
f.
af
f.
af
t
f.
af
ro
s
l
rm aus
G
nv n
ea
M
nv ut
C
an
nfiq
co
se
i
ed
co
(a) Minutiae-based
m
FreqDom
i
no
h
t
is
dd
lrn t
is
ar
dd
rn she
XY r
a
he
h
tc
f.
af
tc
tre
Ys
tre
Ys
Xs
f.
af
f.
af
t
SpatDom
f.
af
ro
s
l
rm aus
G
nv n
ea
M
nv ut
C
an
co
se
i
ed
co
m
i
no
nfiq
FreqDom
(b) FC
Figure: Mean correlation for DB2.
−→ obvious inter-matcher variability for nfiq
−→ comparing minutiae-based and FC schemes on DB1 & DB2,
significant inter-data variability is found for all three matching types
Andreas Uhl: Systematic Evaluation Methodology for Fingerprint-Image Quality Assessment Techniques
18/22
1
1
0.5
0.5
Correlation Coefficient
Correlation Coefficient
Results: Minutiae-based vs. FC on DB3
0
-0.5
-1
0
-0.5
-1
t
is
dd
t
is
dd ear
sh
lrn
rn
h
tc
ar
he
XY
f.
af
h
tc
tre
Ys
tre
Ys
Xs
SpatDom
f.
af
f.
af
t
f.
af
ro
s
l
rm aus
G
nv n
ea
M
ut
C
an
nv
nfiq
co
se
i
ed
co
(a) minutiae-based
m
FreqDom
i
no
t
is
dd
t
is
dd ear
sh
lrn
rn
h
tc
ar
he
XY
f.
af
h
tc
tre
Ys
tre
Ys
Xs
f.
af
f.
af
t
SpatDom
f.
af
ro
s
l
rm aus
G
nv n
ea
M
ut
C
an
nv
co
se
i
ed
co
m
i
no
nfiq
FreqDom
(b) FC
Figure: Mean correlation for DB3.
−→ inter-data set variability: nfiq results correspond better to DB2, the
behaviour on DB1 is much different.
−→ inter-matcher variability: For SpatDom, low correlation values are
seen with FC matching for medianCut,convMean,convGauss,
while high values are obtained for the minutiae-based matchers.
Andreas Uhl: Systematic Evaluation Methodology for Fingerprint-Image Quality Assessment Techniques
19/22
Outline
1
Introduction & Motivation
2
The Stirmark Toolkit
3
Experiments
4
Conclusion
Andreas Uhl: Systematic Evaluation Methodology for Fingerprint-Image Quality Assessment Techniques
20/22
Conclusion
Lessons learnt:
We observe significant inter-data set variability as well as cases of
significant inter-matcher variability with respect to rank order
correlation between recognition accuracy and fingerprint image
quality.
Such effects are found for all three example fingerprint quality
indices.
−→ we reveal that fingerprint image quality indices need to be related
to different data sets (or sensors) and different fingerprint matching
schemes separately, which can be done efficiently with the proposed
methodology
−→ a general purpose fingerprint quality index reliably applicable to
any sensor / matching scheme combination does not seem to have
come into existence so far
Andreas Uhl: Systematic Evaluation Methodology for Fingerprint-Image Quality Assessment Techniques
21/22
Thank you for your attention!
Questions?
Andreas Uhl: Systematic Evaluation Methodology for Fingerprint-Image Quality Assessment Techniques
22/22