Unlocking Smart Phone through Handwaving Biometrics

JOURNAL OF LATEX CLASS FILES, VOL. 6, NO. 1, JULY 2014
1
Unlocking Smart Phone through Handwaving
Biometrics
Lei Yang, Member, IEEE, Yi Guo, Student Member, IEEE, Xuan Ding, Member, IEEE,
Jinsong Han, Member, IEEE, Yunhao Liu, Senior Member, IEEE,
Cheng Wang, Member, IEEE, Changwei Hu
Abstract—Screen locking/unlocking is important for modern smart phones to avoid the unintentional operations and secure the
personal stuff. Once the phone is locked, the user should take a specific action or provide some secret information to unlock the
phone. The existing unlocking approaches can be categorized into four groups: motion, password, pattern, and fingerprint. Existing
approaches do not support smart phones well due to the deficiency of security, high cost, and poor usability. We collect 200 users’
handwaving actions with their smart phones and discover an appealing observation: the waving pattern of a person is kind of unique,
stable and distinguishable. In this paper, we propose OpenSesame, which employs the users’ waving patterns for locking/unlocking.
The key feature of our system lies in using four fine-grained and statistic features of handwaving to verify users. Moreover, we utilize
support vector machine (SVM) for accurate and fast classification. Our technique is robust compatible across different brands of smart
phones, without the need of any specialized hardware. Results from comprehensive experiments show that the mean false positive
rate of OpenSesame is around 15%, while the false negative rate is lower than 8%.
Index Terms—Smart Phone, Security, Privacy, Authentication, Accelerometer
F
1
I NTRODUCTION
Nowadays, smart phones are no longer the devices that are
only used to call or text others. They become prevalent
with much more powerful functions. Acting as pocket PCs,
smart phones can be used to deal with complicated tasks
such as sending/receiving e-mails, shopping, mobile payment,
etc.. Screen locker is a fundamental utility for smart phones
to prevent the device from unauthorized use. For example,
the Apple iPhones and Android phones can lock themselves
automatically after being idle for a short time. It can protect
the privacy of users as well as prevent unintentional operations.
Classical screen lockers have been proposed long time back.
(1) The most widely used one is Slide-to-Unlock. The user
can unlock his/her phone through sliding his finger across
a defined trajectory. This method is too simple to protect
user’s privacy. (2) PIN, the most common method used by
traditional digital device, is always adopted on smart phones
for unlocking smart phones. However, due to the relatively
small screen and frequent unlocking request, it is inconvenient
to set long and complex PIN on phones. For example, there
• Lei Yang, Xuan Ding, Yunhao Liu are with the School of Software,
Tsinghua University, Beijing, China. E-mail: {young, xuan}@tagsys.org,
[email protected].
• Yi Guo is with Department of Computer Science and Engineering, Hong
Kong University of Science and Technology, Hong Kong. Email: yi@tagsys.
• Jinsong Han is with the Department of Computer Science and Technology,
Xi’an Jiaotong University, Shaanxi, China. Email: [email protected].
• Cheng Wang is with the Department of Computer Science and Technology,
Tongji University, and with the Key Laboratory of Embedded System
and Service Computing, Ministry of Education, Shanghai, China. E-mail:
[email protected].
• Changwei Hu is with Shaanxi Broadcast& TV Network Intermediary
(Group) Co., LTD.
are only four numbers allowed to be set as unlocking PIN
in iPhone’s default setting. Such a short and simple PIN can
often be easily guessed [1], [2]. (3) The user can pre-define a
graphical password, like connecting at least 4 circles shown in
the screen. Being similar to the PIN, simple graphic passwords
are easy to be peeked and guessed, while the complex pattern
may confuse the user and make inconvenience.
To enhance the security as well as the flexibility, many
biometric authentication methods [3], [4] are introduced for
screen lockers. The secrets of these methods cannot be easily
spied and reproduced since they identify the user based on
her natural features. The biometric measures are grouped into
two main categories [5]: physiological biometrics and behavior
biometrics.
Physiological biometrics leverage the physiological features
of human beings to identify the user, including recognitions
of face [6], voice [7], fingerprint [8], ear [6], and so on.
However, we find that (i) performances of these solutions
are heavily influenced by external factors. For example, the
face acquirement by the camera is severely affected by the
illumination, resulting in the failure to identify user at night.
Similarly, it is hard to distinguish the the voice from the
ambient interference in an extremely noisy environments, like
subway or restaurant. Any authentication method must be
adapted to all kinds of conditions. (ii) Unlocking operation is a
very frequent operation, of which energy consumption should
be carefully considered. It is well known that the camera is
one of notorious energy killers [9] in smart phones. (iii) lack
of required hardware on current mainstream smartphones, like
fingerprint scanner.
The behavior biometrics is the other classification of biometric measure, which identify the user based on their behavior features, such as gesture [10], [11], typing behavior
JOURNAL OF LATEX CLASS FILES, VOL. 6, NO. 1, JULY 2014
[6], [12], mouse movement [13], tapping behavior [14], or
gait [15], However, these methods cannot either be adopted in
smart phones or be suitable for unlocking smartphones. For
example, in order to recognize the gait pattern, the user has
to walk first or the smart phone to figure out whether he/she
is valid [15]. It appears odd and inconvenient for users to
perform the behavior as answering a phone call for the purpose
of checking his/her emails [11]. (More discussions compared
with these works are presented in Section 5).
In this paper, we observe that different users wave their
smart phones produce distinct features. For example, some
persons used to wave their smart phones drastically while some
others like to wave in a gentle way. This makes the waving
speed and frequency totally different among users. Also, the
waving range and the way of wrist twisting are also different
from user to user. These patterns derive from user’s physical
features and habits. For example, the users with longer arms
wave faster and wider than those with shorter arms. Some
persons are accustomed to end their waving action with a wrist
twisting while some others like to begin with a wrist twisting.
Moreover, the gender, age, and occupation also greatly affect
the feature of waving actions. On the other hand, we also
observe that when a user waves his smart phone, he always
shakes in a similar way. This is because, without intentional
changes, a specific person tends to follow his habits once the
habits are developed.
Based on above observations, we propose a handwaving
biometric-based approach, called OpenSesame, to unlock the
smart phone. Comparing with the existing methods, there are
two major advantages of our approach. (i) It is difficult to
forge. Using our approach, the authentication process is based
on the features of the user’s habits and motions, which is
much harder for unauthorized users to obtain. Even if the
unauthorized user occasionally peeks at the user’s waving
action, it is still difficult to simulate since there are still
many distinct but invisible differences of waving actions.
For example, users have different strength when waving or
twisting. (ii) It is the simple and convenient. Our approach can
free users from remembering a large number of passwords or
complicated patterns for unlocking their phones, preserved the
security and All the user needs to do is just naturally wave
the phone for 1 or 2 seconds.
However, it is challenging to mine the unique patterns
from the user’s handwaving action. First, we should choose
appropriate sensors to monitor the user’s waving action. The
sensor should be in low cost, easy for wide deployment,
and energy-efficient. After careful comparison, we use the 3axis accelerometer. Second, the main difficulty is to extract
stable but unique features from the user’s waving action. We
project the collected waving data into A-Space and then utilize
four waving functions for feature extraction. Furthermore, we
employ the support vector machine (SVM) for accurate and
fast classification. We develop a prototype of handwaving
unlocking system, termed as OpenSesame, and implement into
three mainstreaming smart phones. We collect the handwaving
traces from 200 volunteers using our app. After comprehensive
experiments and tests, the result demonstrates that OpenSesame can accurately verify users via their handwaving with
2
low latency.
The remainder of the paper is structured as follows. We
characterize the handwaving with a large number of real users’
trace in Section 2. The system design is presented in Section
3 and the experiment results are evaluated in Section 4. We
introduce the related work in Section 5. Finally, Section 6
concludes the paper.
2
WAVING C HARACTERIZATION
In this section, we introduce the sensor used for waving
sensing, real trace collection, and analysis on the data.
2.1
Waving Sensing
For precisely characterizing user’s waving actions, selecting
appropriate sensors is necessary. As the tremendous growth of
MEMS technology, there are many powerful sensors equipped
in our smart phone today, such as camera, microphone, proximity sensor, accelerometer, gyroscope, and magnetic sensor
etc. In our system, the selected sensor should be able to depict
the handwaving. In addition, it should be energy-efficient,
stable, cheap, and compatible for wide deployment in most
kinds of smart phones. Obviously, the first three sensors cannot
capture the phone’s motion. The gyroscope sensor is attractive
because it is designed for measuring or maintaining purpose,
based on the principles of angular momentum. Unfortunately,
this kind of sensor is not a standard equipment in most
smart phones due to its high price. The magnetic sensor is
usually used for compass, but it tends to be interfered by the
mental objects under special environment, like inside the car
or subway.
In our approach, we finally select the 3-axis accelerometer
as our feature detecting sensor. The accelerometer allows
smart phones to detect the motion performed on them. The
accelerometer in smart phones measures the acceleration of
the phone relative to freefall. A value of 1 indicates that the
phone is experiencing 1 g of acceleration exerting on it. 1 g of
acceleration is the gravity, which the phone experiences when
it is stationary. The accelerometer measures the acceleration
of the phone in three different axes: X, Y, and Z. Examples
of the collected data are shown in Figure 1.
2.2
Data Collection
For investigating the uniqueness of handwaving, we collect the
waving action data from 200 distinct smart phone users. For
each specific user, he is asked to shake the smart phone for
more than 10 seconds and repeat for three times. Note that
there is no special restriction on user’s waving actions. He
can shake the smart phone arbitrarily in each trail. Indeed, we
aim at taking insight into the handwaving action but not the
motion pattern.
The data is collected in two sampling modes: fast and
normal modes. In the fast mode, the accelerometer samples
every 10 to 20 milliseconds, corresponding to the acceleration
value change rate. There are 100 users’ traces collected using
this mode. In the normal mode, the sampling interval is 200
milliseconds and 100 users’ traces are sampled. Clearly, using
JOURNAL OF LATEX CLASS FILES, VOL. 6, NO. 1, JULY 2014
3
normal sampling mode of accelerometer loses some data,
but saves energy. We will compare these two modes in the
evaluation section.
All the raw waving action are recorded as a sequence of
tuples represented as (xt , yt , zt ), where x, y, z donate the
acceleration along the x-axis, y-axis and z-axis respectively,
and t donates the time. As a result, we totally collect 600 files
containing 389, 373 raw tuples.
2.3
Waving Measurement
To show the uniqueness of handwaving in intuition, we display
four users’ traces in Figure 1. The traces are illustrated in a
3-D acceleration space, short for A-Space, where the raw tuple
(xt , yt , zt ) are connected in time order. Both the Figure 1(a)
and Figure 1(b) are generated from two trails of a volunteer.
We can see that the two shapes are very similar. The last three
figures come from three distinct persons. Figure 1(c) is plot
as a circle, Figure 1(d) resembles a river, while the shape in
Figure 1(e) is in the shape of crescent. From these figures, we
can observe that the handwaving biometrics are unique for a
certain user. A given user presents very simple shape results on
different trails. Moreover, different users have clearly different
results.
The challenge here is how to measure the handwaving
represented in A-Space. We should transform the A-Space
representation into a parameterized and comparable feature
vector. For this purpose, we define the waving function to
measure the global geometric properties of the waving shapes,
which is formally given by:
f = S(A)
(1)
where A = {(xt0 , yt0 , zt0 ), (xt1 , yt1 , zt1 ) · · · , (xtn , ytn , ztn )}.
A is a set of raw waving tuples collected during t0 and tn .
The waving function considers A as input and outputs a feature
vector f . A good waving function should have the following
properties:
• Efficiency: Since shape function will be performed in the
smart phone, it should be simple enough to be fast and
efficiently function.
• Invariance: In most time, the smart phone is working
in mobile environments. The waving function should be
insensitive to the position or direction change of smart
phones.
• Robustness: Although the waving data generated by one
person is similar, there always exist many noises and the
sampling time is variable. Hence, the waving function
should be robust to noise, blur, cracks, and dust in the
waving.
For meeting above four requirements, we propose four
waving functions, S1 , S2 , S3 , S4 , as follows:
• S1 : The centroid C is computed first and then two
random points A and B in the A-Space are chosen. The
angle ∠ACB among these three points are measured.
The selection of random points is repeated for N times.
At a result, N angles output and the corresponding PDF
of these angles is reported as the feature vector.
S2 : This waving function is similar to the S1 . The difference is that all of these three points are randomly selected.
One angle among the three angles formed by these three
points is recorded. As the result, the corresponding PDF
of these angles is given for the feature vector.
• S3 : While both S1 and S2 concentrate on the angle
parameter, the other two waving functions, S3 and S4 ,
focus on the distances among the points. S3 randomly
selects N points and calculates the Euclidean distance
between the centroid and these N selected points. Finally,
the corresponding PDF of distances is calculated as the
feature vector.
• S4 : Randomly selects N pair of points and calculates
their Euclidean distance. The PDF of these distances is
the feature vector.
The results of above four waving functions are demonstrated
in Figure 2, with the input of four users’ waving data shown
in Figure 1. From the figures, we can see that all four waving
functions behave well. These four waving functions are chosen
mostly for their simplicity and invariance. In particular, they
are fast to compute, easy to understand, and simple to produce
distributions. Despite their simplicity, we find these general
purpose waving functions tare fairly distinguishable. They
are robust because the probability that noises are selected is
very low and hence their performance will not be affected.
Third, these four functions are invariant to rotation and scaling
because both the angle and distance is irrelevant to directions
and positions of waving.
•
2.4
Waving Matching
Keeping in mind that our goal is to determine whether the
screen should be unlocked according to a given waving action
and the pre-defined one. We formalize the similarity of two
waving actions by means of the distance between their feature
vectors. Since the feature vectors are PDF of distributions,
we divide the whole range of PDF into discrete bins and the
average value is calculated regarding to each bin. As a result,
the discretized PDF, f = [p1 , p2 , · · · , pn ], is considered the
feature vector where pi denotes the probability of falling into
the ith bin.
Definition 1 (Similarity): Given two arbitrary feature vectors, f1 = [p1 , p2 , · · · , pn ], and f2 = [q1 , q2 , · · · , qn ], their
similarity is defined as
D(f1 , f2 ) =
n
X
|pi − qi |
i=1
where D(f1 , f2 ) ∈ [0, 2]. The smaller similarity means two
features are very close and vice versa.
We select 6 users randomly and each user conducts 3 trails.
The waving function S4 is employed here to measure the handwaving. As a result, there are 3×6 = 18 features after using by
S4 . Their similarity are plotted as a visualized similarity matrix
in Figure 3. In the matrix, the darkness of each elements (i, j)
is proportional to the magnitude of the computed similarity
between the ith and j th features. Darker elements represent
better matches, while lighter elements indicate worse matches.
The matrix is symmetric.
JOURNAL OF LATEX CLASS FILES, VOL. 6, NO. 1, JULY 2014
4
2 0
8
4
2
6
0
0
2
25
-4
15
- 5
-1 0
20
10
15
20
5
1
25
20
-2
)
2
5
Y(
m
(c) User 2
/s
-5
2
)
/s
15
20
)
-2 0
5
0
2
25
1 0
-5
/s
Y(
m
0
Y(
m
5
0
2 0
1 5
-1 0
(m
1
- 1 5
)
)
(b) User 1 (Test 2)
0
2
2
-1 5
/s
/s
)
)
(a) User 1 (Test 1)
4
2
(m
(m
2
2
/s
/s
-1 0
-1 5
1 2
1 0
6
X
X
(m
(m
-1 0
8
-1 0
X
- 1 0
X
X
- 5
-1 5
-1 5
Y(
m
0
2
0
)
/s
0
0
- 5
0
-2 0
0
- 5
0
-1 0
5
5
1 5
0
1 0
5
1 6
1 4
1
0-
1 5
-2 0
5
)
-2
1 5
1 0
-1 0
-4
/s
-1 5
1 0
-5
0
-2
2
2 0
-5
- 2- 2
2 0
5
0
4
0
0
1 0
15
2
2
1 0
6
2
Z (m /s )
4
1 2
2
6
)
4
2
Z (m /s )
2
Z (m /s )
2
Z (m /s )
6
1 5
8
1 4
8
8
1 0
1 6
2
Z (m /s )
1 0
2 0
1 2
1 8
1 0
/s
1 2
1 2
Y(
m
1 4
(d) User 3
(e) User 4
Fig. 1: 3-D Acceleration Space
0 .0 6
U s e
U s e
U s e
U s e
U s e
0 .0 3 5
0 .0 3 0
0 .0 2 5
r 1 T e s t 1
r 1 T e s t 2
r 2
r 3
r 4
0 .1 6
U s e
U s e
U s e
U s e
U s e
0 .0 5
0 .0 4
r 1 T e s t 1
r 1 T e s t 2
r 2
r 3
r 4
0 .0 8
U s e
U s e
U s e
U s e
U s e
0 .1 4
0 .1 2
r 1 T e s t 1
r 1 T e s t 2
r 2
r 3
r 4
0 .0 6
0 .0 3
r 1 T e s t 1
r 1 T e s t 2
r 2
r 3
r 4
0 .0 5
P D F (% )
0 .0 1 5
P D F (% )
P D F (% )
P D F (% )
0 .1 0
0 .0 2 0
U s e
U s e
U s e
U s e
U s e
0 .0 7
0 .0 8
0 .0 6
0 .0 4
0 .0 3
0 .0 2
0 .0 1 0
0 .0 4
0 .0 2
0 .0 1
0 .0 0 5
0 .0 2
0 .0 0 0
0 .0 0
0
2 0
4 0
6 0
8 0
1 0 0
1 2 0
In te rv a l
(a) S1
1 4 0
1 6 0
1 8 0
2 0 0
0 .0 1
0 .0 0
0
2 0
4 0
6 0
8 0
1 0 0
1 2 0
1 4 0
1 6 0
1 8 0
2 0 0
0 .0 0
0
5
1 0
1 5
2 0
2 5
3 0
0
1 0
2 0
3 0
In te rv a l
In te rv a l
In te rv a l
(b) S2
(c) S3
(d) S4
4 0
5 0
6 0
Fig. 2: Probability Density Functions with Variant Waving Functions
•
•
•
•
•
Fig. 3: Similarity Matrix for 6 distinct users with 3 trails
Definition 2 (Self-similarity): The self-similarity is the distance of two feature vectors extracted from two hand waving
generated by a same user. Especially, if the two features
come from a same waving instance, they are equal and their
similarity equal zero.
Obviously, the elements lying in the diagonal line are the
darkest because their distances equal 0. For each user, there
are 3 × 3 = 9 elements for self-similarity measurement. From
the figure, we can see that the self-similarity always maintains
an acceptable darkness and is fully distinguishable from other
users’ features.
3
O PEN S ESAME
In this section, we present our unlocking method for smart
phone called OpenSesame.
3.1 Overview
OpenSesame consists of four components: sensing, filter,
fetcher, classifier, and matcher.
Sensing: This component is straightforward used to
record the user’s handwaving action data.
Filter: In practice, we find that there always exist some
silent periods when no waving or very low level sensing
data is detected. For better feature extraction, we use filter
component to wipe out the silent periods.
Fetcher: The filtered raw tuples is feeded into fetcher
component in which four waving functions are applied
to fetch the waving features.
Classifier: To discriminate the authorized users and unauthorized users, the Support Vector Machine (SVM) is
employed in our system for classification.
Matcher: In the last component, the extracted feature is
used to determine whether it matches the pre-defined one.
3.2 Filter
Figure 4(a) shows 12 seconds of data acquisition. We find three
special periods in which the waving values are too low to be
detected. We can regard such periods as the silent periods. The
silent periods may exist at the initial stage before the user
shakes his smart phone, or in the final stage after the user
stops his waving. The period may also be observed in the
intermediate stage when an unexpected user’s pause occurs.
Since the silent periods will seriously affect the accuracy of
OpenSesame, we must filter those data captured during this
periods. The ith raw tuple with composed acceleration value
Ai is wiped out if it satisfies the equation:
i+b
X
(Ax −
x=i−b
i+b
X
y=i−b
Ay 2
) < α,
2b + 1
(2)
where b is called the tolerant static period, representing the
amount of acceleration points used to determine the stability of
an acceleration point. The α is the threshold to filter the silent
points. Based on our algorithm, the filtered data is illustrated
in Figure 4(b).
JOURNAL OF LATEX CLASS FILES, VOL. 6, NO. 1, JULY 2014
5
data, we select an acceleration point Pk and form the input
with the subsequence of w continuous acceleration points
{Pk , Pk+1 , Pk+2 , ..., Pk+w−1 }. Then we apply the waving
function on this input and deliver the PDF of the feature
vectors to describe the feature of the waving action.
3 0
2 5
A c c e le r a tio n
2 0
1 5
1 0
3.4
5
0
In te r m e d ia te s ta g e
In itia l s ta g e
0
2 0 0 0
4 0 0 0
6 0 0 0
8 0 0 0
F in a l s ta g e
1 0 0 0 0
1 2 0 0 0
T im e ( m s )
(a) Before filtering
3 0
2 5
A c c e le r a tio n
2 0
1 5
1 0
5
0
0
5 0 0 0
1 0 0 0 0
T im e ( m s )
(b) After filtering
Fig. 4: Comparing the features before and after filtering
3.3
Fetcher
After the filter component, we need to generate the feature
vector of the user’s handwaving action. According to Section
2.3, the field set of the acceleration points can be treated as
one single input of waving function, and the waving function
can be applied to this input to generate the feature vector.
However, using the field set as an input has two shortcomings.
First, the amount of acceleration points in a field set is large,
usually more than 1000. In order to generate a representative
feature vector for the waving action data, an extremely large
number of feature vectors are required. In this way, the system
overhead is high and affects the normal operation of the smart
phone. Second, to unlock the smart phone, the user is required
to shake his smart phone for a period to generate same amount
of waving data. However, it is inconvenient to ask the user to
shake the smart phone for such a long time period to generate
more than 1000 acceleration points for each time he wants to
unlock his phone. Therefore, the amount of acceleration points
selected as an input needs to be reduced.
According to our observation, the waving action of user
always shows the property of repeating. In fact, the input
waving action can be regarded as a series of small repeating
waving actions which are very similar. Therefore, we can
select a continuous sequence of acceleration points with a
reasonable amount as an input to the waving function. Feature
vectors can be generated from these small inputs with low data
loss.
We generate the feature vectors as follows: we first select a window with size w, where w is much smaller than
the size of the field set of data. From the field set of
Classifier
The feature classifier is designed to generate a standard to
discriminate authorizeds user and unauthorized users with the
feature vectors of the input waving action data. In OpenSesame, the support vector machine, SVM for short, is selected
as the classifier. The SVM classifier is used to classify a group
of linear-inseparable training tuples into two classes. Training
tuples for SVM input is donated as {v, y}, where v is the
attribute vector used to describe the attributes of the training
tuple, and y is the label of the training tuple, which represents
the actual class it belongs to. The basic idea of SVM is to
transform these attribute vectors of training tuples into a higher
dimensional space to make the training tuples linear-separable.
Then the training tuples can be separated into two classes
by a hyperplane. The SVM classifier classifies the training
tuples based on this hyperplane, attempting to classify training
tuples with same label into same class. Then a classification
model is generated to describe the classification standard of
a given tuple. Inputting an unclassified tuple into the SVM
classifier using the generated classification model, the tuple
can be predicted which class it most probably belongs to.
In OpenSesame, the label of the training tuple is either +1
or −1. When y = +1, the tuple is generated from the class of
unauthorized users. On the contrary, y = −1 means the tuple
belongs to the authorized user’s class. The attribute vector v is
generated from the feature vector we gain from Section 3.3.
The attribute vector can be represented as [a1 , a2 , ..., an ]T .
Here, ai is ith property of the training tuple, which represents
the ith value in the feature vector. By injecting enough amount
of training tuples into the SVM classifier, a classification
model can be achieve to verify the authentication data of user.
3.5
Matcher
The matcher component is performed when the user activates
the authentication interface of OpenSesame and wants to
unlock the smart phone. The user shakes the smart phone to
input his waving action as the authentication data. Feature
vectors of the input waving action is generated and used to
verify whether the user is the authorized user. If so, the access
query is accepted and the smart phone is unlocked. If not, the
access query is denied and the smart phone keeps locked.
The most important requirement is that the feature matching
phase has to be processed within a short time period, say 1 or 2
seconds. The reason is that users always expect the unlocking
process to be fast and convenient. If the feature matching
time is long, the inconvenience overweighs the security of our
approach and the users may decide to give up our system. To
reduce the response time, two aspects need to be considered.
The first issue is to reduce the amount of repetition when
doing authentication. This can be achieved by reducing the
false negative rate of authentication, which is going to be
JOURNAL OF LATEX CLASS FILES, VOL. 6, NO. 1, JULY 2014
OpenSes
ame
OpenSes
ame
Unl
ocki
ng
6
OpenSes
ame
Accessdeni
ed!
Accessaccept
!
Cost and γ for the kernel function via the cross-validation
when generating the training model.
4.2
x:0.
25878862
y:2.
901
134
z:8.
989429
(a) Screen locking
(b) Access denied
(c) Access accepted
Fig. 5: The UI of OpenSesame
discussed in the experiment section. The second issue is to
reduce the waving time in the matcher component. As we
designed in the fetcher component, by using a small waving
function input with window size w, the waving time can be
reduced to the time period for collecting w acceleration points.
Since w is much smaller than the size of the field set of
acceleration points. Therefore, the waving time can be reduced
to a tolerant range.
Normally, since the waving time is short, we assume that
there is no pause in the middle of the waving action to reduce
the complexity of filter. To detect the initial record feature
point,
stability of the ith point
Pi we calculate
Pithe real-time
Ay 2
as x=i−b (Ax − y=i−b b+1 ) . Once the real-time stability
value is greater than the threshold, acceleration point Pi is set
to be the initial point, and the waving action detection terminates when acceleration point sequence {Pi , Pi+1 , ..., Pi+w }
is recorded. Applying the same waving function to this sequence, we can generating the predict tuple with attribute
vector [a1 , a2 , ..., an ]T . By inputting this predict tuple into the
SVM classifier with the classification model we delivered in
classifier component, the SVM classifier decides which class
the input tuple most likely belongs to. When the input tuple
is classified into the authorized user set, the authentication is
successfully done and the smart phone is unlocked. Otherwise,
the smart phone requires another authentication try.
4
I MPLEMENTATION
AND
E VALUATION
In this section, we present the implementation of OpenSesame
and evaluate its performance.
4.1
Implementation App
We implement OpenSesame in Android-based smart phones.
The version of Android system is 2.3.3. the app is developed
with Android-SDK using Java SE. Figure 5 shows the GUI of
our app. With this app, the user’s handwaving data is collected
and analyzed by the smart phone. Specifically, the interfaces
shown in Figure 5(b) and Figure5(c) are used to notice whether
the unlocking access is success or not. We use the open source
library tool, LIBSVM [16], to perform the classification of
SVM. LIBSVM is an integrated software for support vector
classification. The version we used is LIBSVM-3.12. During
our experiments, we use the default kernel function (Gaussian
Radial Basis Function) and find the best setting of parameters
Metrics
We evaluate OpenSesame in terms of the authentication
accuracy. The authentication accuracy is measured via the
following metrics:
• False Negative Rate (FNR): The probability that an
authorized user is treated as an unauthorized user. This
rate is indeed the ratio of the number of incorrect authentications conducted by an authorized user to the number
of his authentication attempts.
• True Positive Rate (TPR): The probability that an authorized user is successfully verified. This rate derived from
the ratio of correct authentication times of an authorized
user to the number of his authentication attempts.
• False Positive Rate (FPR): The probability that an unauthorized user is treated as an authorized user. This rate
is obtained from the ratio of the incorrect authentication
times of an unauthorized user to the number of his
authentication attempts.
Note that FNR and TPR are related to the convenience
of users when they use our system, where the authorized
user can successfully unlock the smart phone by a single try.
The FPR reflects the security of the OpenSesame, where the
unauthorized user should be denied to unlock the smart phone.
4.3
Experiment Setup
For investigating the uniqueness of handwaving, we collect
the waving action data from 200 distinct smart phone users.
The subjects producing these datasets are randomly selected
in different public places, including railway station, university
library, and stadtpark. When collecting the waving action
data, three smart phones from different brands are used. For
collecting each specific users handwaving data, he is asked
to act with the following instruction: The user first randomly
selects one of the three smart phones we provided, and holds
this smart phone, which is running our data collection app, in
his accustomed way. Then he pushes the button of ‘start’ on
the screen and begins to wave the smart phone until the hint
sound is played by the smart phone. This waving process lasts
for more than 10 seconds. The user repeats the above action
for three times to terminate the data collection. Note that there
is no special restriction on users waving actions. He can wave
the smart phone arbitrarily in each trail. Indeed, we aim at
taking insight into the handshaking action but not the motion
pattern.
Overall, 389,373 raw tuples are captured from 200 distinct
users, with an average 1,947 raw tuples per user. Each user
performs the handwaving for three trails while each trail
persists 10 ∼ 20 seconds. For each user, the training data will
be extracted from the first two trails, while the testing data will
be retrieved from the last one. Therefore, there is no overlap
between the training data and testing data. The classification
is based on self and non-self discrimination. For a given user,
JOURNAL OF LATEX CLASS FILES, VOL. 6, NO. 1, JULY 2014
7
1 0 0
0 .9
9 0
0 .8
8 0
F a ls e N e g a tiv e R a te ( % )
1 .0
F a ls e N e g a tiv e R a te
0 .7
0 .6
0 .5
0 .4
0 .3
0 .2
0 .1
1 0 0 tr a in in g tu p le s
2 0 0 tr a in in g tu p le s
4 0 0 tr a in in g tu p le s
7 0
6 0
5 0
4 0
3 0
2 0
1 0
0 .0
0
S 1
S 2
S 3
S 4
0
1 0
2 0
S h a k in g F u n c tio n
4 0
5 0
(a) False Negative Rate
(a) False Negative Rate
1 0 0
0 .9
9 0
0 .8
8 0
F a ls e P o s itiv e R a te ( % )
1 .0
0 .7
F a ls e P o s itiv e R a te
3 0
W in d o w S iz e
0 .6
0 .5
0 .4
0 .3
0 .2
0 .1
1 0 0 tr a in in g tu p le s
2 0 0 tr a in in g tu p le s
4 0 0 tr a in in g tu p le s
7 0
6 0
5 0
4 0
3 0
2 0
1 0
0 .0
0
S 1
S 2
S 3
S 4
0
S h a k in g F u n c tio n
1 0
2 0
3 0
4 0
5 0
W in d o w S iz e
(b) False Positive Rate
(b) False Positive Rate
Fig. 6: Impact of waving functions
Fig. 7: Impact of windows size
1 .0
the training data is composed of negative samples belonging
to this user, and an equal number of positive ones from others.
Impact of Waving Functions
There are four waving functions to parameterize the A-Space
representation of handwaving. In this experiment, we select 30
users’ handwaving and maintains the window size as 50 tuples.
Figure 6 plots the FNR and FPR for the four waving functions.
From the Figure 6(a), we observe that the average FNR using
S1 and S2 are around 20% while the values are below 10%
using S3 and S4 . The similar observation is obtained on FPR,
as shown in Figure 6(b). This shows that the distance-based
waving functions perform better than the angle-based ones.
We further focus on the distance-based waving functions. S3
and S4 have close FNRs and FPRs. However, the variance of
S4 is smaller than that of S3 , which means S4 is more stable
than S3 .
4.5
0 .8
T r u e P o s itiv e R a te
4.4
0 .9
Impact of SVM
Window size is an important factor. For capturing enough
windows, we require the users to shake their phones in a
acceptable time period. A large window size will prolong
the waving time period for unlocking and seriously affect
user experiences. But a small window size will influence the
identification accuracy. We change the windows size from 5
to 50 with the increment of 5 and employ S4 for testing.
The result is shown in Figure 7. The average FNR decreases
from 20% to 8% and the average FPR reduces from 42% to
18% as the window size increases. This shows that the larger
0 .7
0 .6
0 .5
0 .4
0 .3
0 .2
F a s t S a m p lin g M o d e
N o r m a l S a m p lin g M o d e
A c c e le r o m e te r S a m p lin g M o d e
Fig. 8: Sampling mode
window helps improve the accuracy. This is because that more
raw tuples are extracted in a larger window and the user’s
handwaving is better characterized.
The number of training tuples also affect the accuracy. As
illustrated in Figure 7, FNR is approximately reduced by 50%,
i.e. from 15% to 8%, when window size is 50. This reduction
is even obvious with small window size. On the other hand,
the average FPR only reduces from 20% to 15% taking 5% off
when window size is 50. This shows that FPR is less sensitive
to the number of training tuples.
4.6
Impact of Sampling Rate
Accelerometer in smart phones has variant modes of sampling.
With different sampling modes, the collection of data can be
much different. In this experiment, we test the OpenSesame
both in fast sampling mode and normal sampling mode. It can
JOURNAL OF LATEX CLASS FILES, VOL. 6, NO. 1, JULY 2014
8
1 0 0
2
Z (m /s )
4
-2 0
0
Y(
m/
s
20
20
1
25
10
2
)
Y (
m/
s
- 1 4
1 4
)
25
2
)
-2 1
2
5
0
7
0
- 7
/s
)
20
15
-5
0
5
-8
(m
2
/s
15
1 2
X
(m
2
/s
4 0
- 4
)
8
4
0
-8
/s
1 0
0
- 4-
-1
1 8
X
(m
10
X
- 6
2
1 2
0
- 1 2
1 6
)
6
0
-1
-1 5
0
- 2 0
-5
-5
6 0
0
-4
0
2 4
Y(
m
5
0
P e rc e n ta g e (% )
4
8
2
Z (m /s )
2
Z (m /s )
1 0
8 0
8
1 2
1 5
F a ls e N e g a tiv e R a te
F a ls e P o s itiv e R a te
(a) A-Space representation
1 0 0
2 0
S ta n d in g
L y in g
O n th e S id e
0 .0 5
S ta n d in g
L y in g
O n th e S id e
8 0
0 .0 4
1
2
3
4
5
S p e e d (m /s )
0 .0 3
P D F
0
C u m u la tiv e R a te
0
0 .0 2
Fig. 9: User’s motion
6 0
4 0
2 0
0 .0 1
0 .0 0
0
0
1 0
2 0
3 0
4 0
5 0
In te rv a l
6 0
0 .7 5
0 .8 0
0 .8 5
0 .9 0
0 .9 5
1 .0 0
T r u e P o s itiv e R a te
0 .3 0
(b) Feature PDF via S4
(c) CDFs of TPR
F a ls e N e g a tiv e R a te
0 .2 5
Fig. 11: Impact of phone’s orientation
0 .2 0
0 .1 5
0 .1 0
0 .0 5
0 .0 0
S m a rt P h o n e A
S m a rt P h o n e B
S m a rt P h o n e C
S m a rt P h o n e
Fig. 10: Phone Brand
reach a very high 90% average accuracy in the fast sampling
mode while the number is 55% with normal sampling mode.
Losing part of waving data with low sampling rate is the major
reason for the poor performance.
4.7
Impact of User Motion
As mentioned before, our approach should be insensitive to the
user’s motions because the smart phone is mainly used in mobile environment. Clearly, the user motion will introduce many
noises. In this experiment, we test the relationship between the
speed of user’s motions and the accuracy. Five user’s motions
are considered: stationary, walking slow, walking fast, running,
and taking a vehicle. The result is shown in Figure 9. From the
figure, we can see that as the speed growing from 0 m/s to 5
m/s, the FNR is steady around 11%, with a standard deviation
of 2.0%. This indicates that the motion of users makes a
very limited effect on our approach. Besides, the FPR is also
invariant when the speed of user’s motion increases. The false
positive rate is around 15% with a standard deviation of 2.5%.
It can be further obtained from Figure 9 that, the FNR has an
slightly increase, about 7%, when the speed of user increases
from 0 m/s to 5 m/s. This can be understood because the faster
motion will increase vibration in his smart phone leading to
more noisy. However, these motion has very limited effect on
the accuracy.
4.8
Impact of Phone Diversity
Nowadays, there are plenty of smart phone brands, such
as iPhone, MOTO, SAMSUNG, HTC, etc. To promote the
OpenSesame to smart phone users, one crucial issue is whether
the OpenSesame can be well adapted to different brands of
phones. The most effective factor on different smart phones
is the type of accelerometer equipped. For different types of
accelerometers, the level of sensitivity is different. Hence, the
waving data collected is inequivalent.
In this experiment, three different brands of phones are
tested. For these three brands, the order from low to high
based on the accelerometer is Phone A, Phone B, and Phone
C. 40 sets of trials are tested on each smart phone and the FNR
is reported in Figure 10. From the figure, we can see that the
Phone C achieves the lowest FNR and Phone A has the worst
value. That is because more sensitive accelerometer can collect
more fine-grained data, which reflects more complete feature
from waving actions. The average FNRs of three smart phone
are below 10%, which is all acceptable in practice. Therefore,
The OpenSesame can be well adapted to different brands of
smart phones.
4.9
Impact of Smart Phone’s Orientation
Although the waving habit may be similar for an identical
user, the postures of users when waving the smart phone
can change the orientation of the phone. In this section, we
evaluate OpenSesame with variant phone’s postures. In this
experiment, three user’s postures are tested:
• Standing: waving phone when standing on the ground.
We consider the standing as a normal posture.
• Lying: waving phone when lying on the bed. The waving
orientation is rotated 90 degrees upward.
• On-the-side: waving phone when sleeping on the user’s
left side. The waving orientation is rotated 90 degrees to
the left.
The results are shown in Figure 11. In the figures, we
illustrate the A-Space representations of waving data by the
three postures in Figure 11(a). Intuitively, these tree trails
are similar, all like a shape of crescent, but having different
orientation. Our approach should be insensitive to the rotation.
We transform the waving from A-Space to feature PDF, shown
in Figure 11(b), by means of waving function S4 . As we
JOURNAL OF LATEX CLASS FILES, VOL. 6, NO. 1, JULY 2014
9
1 0 0
2 .0
1 .5
6 0
N o r m a liz e d S im ila r ity
C D F (% )
8 0
4 0
2 0
D is ta n c e to id e n tic a l u s e r
D is ta n c e to d is tin c t u s e r s
1 .0
0 .5
0
0 .0
0 .2
0 .4
0 .6
0 .8
1 .0
1 .2
1 .4
1 .6
1 .8
0 .0
D is ta n c e
C 1
C 2
C 3
C 4
C a s e s
Fig. 12: Distances
(a) OpenSesame
4.10
Discrimination
We consider the OpenSesame’s capability of discrimination
among different users. One user’s trace is selected and his
similarity compared to other uses is calculated. The result is
shown in Figure 12. From the figure, we can see that selfsimilarity is approximately bounded under 0.3, and 90% of
the distances are lower than 0.25. Being different with the
self-similarity, the distance between the given user and others
is obvious. Only 8% of the distances are lower than 0.2, and
about 20% of the distances are larger than 1.0. Hence, the
discrimination of distinct users and recognition of identical
users can be achieved.
Although the percentage of small distances between distinct
users’ features is low, it may still affect the accuracy of
the OpenSesame. It is necessary to find out the reason of
the failure in discriminating the distinct users’ features. In
Figure 13, three users’ A-Space representation are randomly
selected. For each row, the top four similar users’ A-Space
representations are listed. From these figures, we can find that
the similar A-Space representations cause small distance of
users’ features. For extremely close A-Space representations,
such as the second figure in the first row, the distance is
very small, e.g. 0.074. With higher dissimilarity of A-Space
representations, for instance the last figure, the distance is
larger, e.g. 0.147. Since the A-Space representation can reflect
the waving action on the smart phone, we can draw the
1 .0
0 .8
N o r m a liz e d S im ila r ity
expected, the difference of these three PDFs is very slight.
In details, the distance between standing posture (the normal
posture) and the lying posture/on-the-side posture are 0.172
and 0.173, respectively. We believe these distances are small
enough for the trails to be treated as coming from an identical
user.
Furthermore, We conduct one trail in standing posture and
store the corresponding result feature vector in our smart
phone. Then we attempt to unlock the smart phone in the three
postures. Each posture is repeated 30 trails. Finally, the CDF of
accuracy is displayed in Figure 11(c). For the standing posture,
20% trails have an accurate rate lower than 90%, while 20%
of lying posture and on-the-side posture have accurate rate
lower than 86% and 76%, respectively. Meanwhile, 20% lying
postures and 45% on-the-side postures have their accurate rates
higher than 90%. This experiment fully demonstrates that our
approach is phone-orientation-insensitive.
0 .6
0 .4
0 .2
C 1
C 2
C 3
C 4
C a s e s
(b) DTW
Fig. 14: The normalized similarities under four cases using
DTW and OpenSesame.
conclusion that for the users with similar habit of waving
action, the probability of failure for the OpenSesame increases.
Fortunately, referring to Figure 12, such kind of probability is
low and OpenSesame therefore performs well as expected.
4.11
Comparison with DTW
The Dynamic Time Wrapping (DTW) is a well-established
technique from speech processing, which is used to measure
the similarity between two temporal sequences which may
vary in time or speed. The advantage of DTW is that it can well
deal with the misalign of points in the temporal sequences.
DTW is only suitable for the case in which the user must
wave his/her smart phone along a fixed, secrete and pre-defined
movement. However, we pursue that the users are able to shake
their phones in wider free movements in terms of their daily
habits. In this situation, the DTW has following two major
technical limitations compared with our shaking functions.
First, the data acquired from the accelerator highly depends on
the smart phone’s orientation. To maintain the similar shaking
sequence for DTW identification, the users have to keep the
same orientation as trained. Second, DTW cannot deal with
the existence of noise, blur, cracks, and dust in the shaking
data. Four kinds of waving functions we proposed are based
on the statistics, being able to well address above issues.
To further compare the performance of DTW and OpenSesame, we let the user perform the following four trails:
JOURNAL OF LATEX CLASS FILES, VOL. 6, NO. 1, JULY 2014
10
1 8
1 6
1 6
1 6
1 4
1 2
8
8
8
0
0
0
-8
-1 6
-6
1 6
8
0
0
0
0
1 6
-2 0
8
0
2
Y(
m
0
10
10
/s
)
2
Y(
m
0
-2 4
20
-2 4
20
20
/s
)
2
10
10
/s
)
0
20
Y(
m
0
Y(
m
/s
Y(
m
0
10
)
20
2
)
- 1 6
/s
2
(b) 0.114
(m
- 1 6
/s
)
-2 4
- 8
X
(m
2
)
)
(a) 0.074
2 0
- 1 6
/s
2
/s
2
/s
- 1 6
-2 4
- 8
X
(m
(m
(m
- 1 6
-2 4
- 8
X
- 8
X
X
- 8
/s
)
2
)
2
1 6
-2 0
8
0
-1
0
-1
0
-1
-1 6
1 6
-2 0
8
0
-1
-2 0
-1
1 6
8
-8
-1 6
-1 4
0
0
-8
-7
-2 0
2
Z (m /s )
6
2
Z (m /s )
0
2
Z (m /s )
2
Z (m /s )
2
Z (m /s )
7
(c) 0.161
(d) 0.162
2 0
1 4
1 6
1 5
1 5
1 6
7
8
-8
- 2 0
0
1 2
6
0
-1
-8
2
Y(
m
20
1 6
1 5
1 2
0
1 0
2
Z (m /s )
2
Z (m /s )
6
10
(d) 0.120
2 0
7
2
Z (m /s )
2
Z (m /s )
2
Z (m /s )
0
-1 5
20
(c) 0.117
1 4
1 2
)
2
)
0
15
10
-1 5
25
20
Y(
m
0
1
(b) 0.115
1 8
/s
-1
0
0
1
15
/s
5
5
0
/s
Y(
m
/s
10
20
)
20
2
)
-3
/s
2
/s
-1 2
- 1 0
(m
)
- 1 0
(m
Y(
m
0
)
-5
2
0
2 0
8
0
- 5
X
/s
)
)
- 9
(a) 0.103
1 6
5
-2 0
0
- 5
X
2
/s
2
/s
15
(m
0
3
X
(m
(m
- 6
-1 2
1 0
5
-2 0
/s
0
- 3
X
X
5
-5
)
0
- 4
- 8
1 0
0
9
-1 0
2
-1 5
2
-1 0
-5
-1 0
0
-5
1 5
6
3
)
4
-1 5
5
4
9
-1 5
8
2
- 2 0
8
-1 6
1 2
-5
2
Z (m /s )
-7
0
-1 4
0
2
Z (m /s )
0
1 2
Y(
m
5
2
Z (m /s )
2
Z (m /s )
2
Z (m /s )
1 0
1 0
5
0
8
4
-7
0
0
-1 4
-5
2
0
2
)
2
5
5
1
0
1
15
-1 8
25
20
25
(b) 0.128
(c) 0.139
-2 4
20
15
15
20
25
0
0
1
Y(
m
Y(
m
5
-2 4
Y(
m
0
2
)
10
20
)
Y(
m/
s
0
Y(
m
- 1 6
)
-2 1
/s
)
- 8
0
0
-1
/s
8
-1 0
2
2
15
1 6
-1 5
/s
- 1 2
/s
20
- 2 0
(m
(m
)
(a) 0.118
- 1 8
)
2
/s
25
- 6
-5
)
-5
0
-1 0
X
0
6
-1 5
X
5
2 0
2
(m
)
- 1 4
- 1 2
/s
- 7
X
2
/s
10
0
- 6
(m
(m
-1 2
-56
-1 0
X
X
- 6
-1 5
1 2
)
7
0
2
-1
1 4
-2 0
1 2
-5
6
0
- 2 0
0
-1 2
1 2
-1 5
0
-2 0
2 1
/s
2 4
1 8
/s
-6
-1 6
(d) 0.147
Fig. 13: Top 4 Similar A-Space Points and the Distances to the Reference A-Space Points
Case 1 (C1): The user waves his/her smart phones as
trained.
• Case 2 (C2): The user waves the smart phone as the mode
he/she gets used to but not required as same as trained.
• Case 3 (C3): The user waves his/her smart phone as
trained but the orientation of smart phone is reversed.
• Case 4 (C4): A second user attempts to wave the same
smart phones as his habit.
The normalized similarities using DTW and waving functions
are shown in Figure 14. We observe that (1) The normalized
similarities from Case 1 to Case 3 are almost below 0.5,
showing that whatever the user how to wave his/her phone,
the self-similarities always maintain under an acceptable level.
When changing user in Case 4, the similarity exceeds the
threshold of 0.5, resulting a unlocking rejects. (2) When the
user waves his/her smart phone not as trained, even just
reversing the orientation, the normalized similarities are much
higher than that of Case 1. In summary, the system can
well distinguish different users whatever using OpenSesame or
DTW. However, the DTW requires the user must wave his/her
smart phones as trained.
6
•
4.12 Usability
We also conducted some filed trails using our prototype to
evaluate the usability of our system. We invited about 10
college students who install our system and unlock their
phones through OpenSesame. We measure the overall time
they take to unlock the screen and ask for their feedback on
our prototype. Different phone models are used in experiments,
including HTC One, Xiaomi 2, Nexus 5, Huawei C8815 and
Sony Xperia.
5
Time (seconds)
4
3
2
1
0
User1 User2 User3 User4 User5 User6 User7 User8 User9 User110
fi
Fig. 15: Time consumption
First, we collect the average and standard deviation of the
time consumption for unlocking their smartphones. We see
that it takes lower than 3 seconds by 6/10 users to unlock their
smart phones. Compared with the slide-to-lock or PIN (taking
about one second), the OpenSesame does not improve the
unlocking. However, the savings come from (1) the simplified
user interface as users do not need to take off the gloves for
touch screen, or remember some complex passwords. (2) the
security is also promoted in some extent.
Second, we ask the volunteers to fill the questionnaires in
terms of learning curve, user-friendly, security and accessibility. The volunteer gives a score ranging from 1 to 5 for
each item. The results are shown in Table 1. We can see
that all the users indicate that our solution is very easy to
use and intuitive, with almost no learning curve. This is the
key value of OpenSesame, which we think is even more
important than speed improvement. However, the user have a
little concerns about the security. It is reasonable because each
JOURNAL OF LATEX CLASS FILES, VOL. 6, NO. 1, JULY 2014
TABLE 1: Trial Experience
mean
stdev
Learning
curve
1.2
0.5
Userfriendly
4.8
1.5
Security
Accessibility
3.6
1.1
4.8
1.3
new technology has a process to be accepted. We believe that
these concerns will gradually disappear as the OpenSesame is
more widely accepted.
5
R ELATED W ORK
This section reviews the related work.
Accelerator based Authentication: A work parallel to
ours is that Conti et al. propose to adopt the movement
the user performs when answering a phone call to authenticate the user of a smartphone, which utilizes two kinds of
components, accelerometer and orientation sensors, in smart
phones [11]. Their work has following three major technical
limitations compared to our work. First, their method in fact
highly depends the phone’s trajectory, related to the phone’s
movement parameters, such as the start position, end position,
orientation and velocity. As long as the system learns the
trajectory when the user picks up the phone from the pocket
and moves to his/her ear, other trajectories, like the movement
from the desktop to ear, will be rejected. On the contrary, our
method concentrates on the human’s inherent characteristics,
like the length arm and wrist size, among different users
that leads to the waving differences. Thus, we don’t need
the user to perform specific movements. Second, our four
kinds of waving functions are designed to be invariant to
the position or direction changes of smart phone. The user
can wave the phones starting or ending at arbitrate positions.
Importantly, our approach allows the existence of noise, blur,
cracks, and dust in the shaking. Therefore, our approach
provides much more freedom to user compared with theirs.
Third, their method needs the orientation sensors, which are
not fully supported by all smartphones, especially among lowgrade mobile phones.
The second work parallel to ours is to identify users based
on a secrete movement pattern measured by the accelerator
sensor [17] and [18]. Liu et al. aims at identifying users based
on a secrete movement pattern [17]. e.g. moving the phones
as if to draw an ‘8’ in the air where ‘8’ is a secrete. Similarly,
Okumura et al. asks the tester to grasp the device int the same
way and shake it simple up and down in direction of y-axis
5 times continuously [18]. Being similar to the tranditional
methods like PIN or password, an adversary might spy the
movement, replay it, and get access to the phone and its data.
Importantly, above methods have not been evaluated their
scheme in real word scenarios while ours are verified among
200 distinct users.
Touch based Authentication: These work [10], [19], [20]
utilizes the unique interaction between user and the touch
screen to identify the users. Sae-Bae et al. propose to use
the timing of performing five-finger gestures on multi-touch
capable of devices for authentication [10]. Luca et al. propose
the timing of drawing the password on Antriod based touch
11
screen phones for authentication [20]. Shahzed et al. propose
to utilize the correlations among predefined ‘gestures’ i.e.
touch trajectories for authentication. Their work requires users
to use fingers to perform the gestures with the following
two major limitations compared to our work [19]. First, their
methods require users to use more than two fingers of a hand to
perform the predefined gestures, which is very inconvenient on
small touch screens of smart phones. Second, most of smart
phones employ capacitive touch screens that only recognize
the human’s finger without gloves. It is a wore experience to
take off the gloves for answering a phone outside in winter.
Our method, shaking the smart phone, behaves much more
user friendly
Keystrokes based Authentication: These work proposes to
identify users based on their typing behavior [12], [14], [21].
These methods mainly proposed for devices with physical
keyboards and are inapplicable for smart phones. In addition,
they have low accuracy because it is difficult to model typing
behavior on touch screens because most people use the same
finger for typing all keys on the keyboard displayed on the
screen of smart phone.
Gait based Authentication: There are several methods
[15], [22], [23] proposed to utilize the accelerator in smart
phones to authenticate users based upon their gaits. Their
accuracies are vulnerable to the types of surfaces such as grass,
road, snow, wet surface, and slippery surface. They are also
inapplicable for unlocking smart phone, in that it is infeasible
to let user walk first to figure out whether she/he is the correct
user or not, in order to recognize the user from his/her walking
pattern.
6
C ONCLUSION
In this paper, we propose a novel behavioral biometric-based
authentication approach called OpenSesame for smart phone.
We design four waving functions to fetch the unique pattern of
user’s handwaving actions. By applying the SVM classifier, the
smart phone can accurately verify the authorized user with the
pattern of handwaving action. Experiment results based on 200
distinct users’ handwaving actions show that the OpenSesame
reaches high level of security and robustness, and achieves
good user’s experience.
ACKNOWLEDGEMENT
This work is supported in part by the NSFC program under
Grant No. 61190110 and No. 61125202. The research of
Jinsong Han is supported from the NSFC program under Grant
No.61373175, specialized Research Fund for the Doctoral Program of Higher Education under Grant No. 20130201120016,
and the Fundamental Research Funds for the Central Universities of China under Project No. 2012jdgz02 (Xian Jiaotong
University).
R EFERENCES
[1]
[2]
D. Florencio and C. Herley, “A large-scale study of web password
habits,” in Proc. of ACM WWW, 2007.
J. Bonneau, “The science of guessing: analyzing an anonymized corpus
of 70 million passwords,” in Proc. of IEEE Security and Privacy (SP),
2012.
JOURNAL OF LATEX CLASS FILES, VOL. 6, NO. 1, JULY 2014
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
H.-A. Park, J. W. Hong, J. H. Park, J. Zhan, and D. H. Lee, “Combined
authentication-based multilevel access control in mobile application for
dailylifeservice,” IEEE Transactions on Mobile Computing, 2010.
N. Ben-Asher, N. Kirschnick, H. Sieger, J. Meyer, A. Ben-Oved, and
S. M¨oller, “On the need for different security methods on mobile
phones,” in Proc. of ACM HCI, 2011.
R. V. Yampolskiy and V. Govindaraju, “Behavioural biometrics: a survey
and classification,” International Journal of Biometrics, 2008.
A. H. Akkermans, T. A. Kevenaar, and D. W. Schobben, “Acoustic ear
recognition for person identification,” in IEEE Workshop on Automatic
Identification Advanced Technologies, 2005.
A. Jain, L. Hong, and Y. Kulkarni, “A multimodal biometric system
using fingerprint, face and speech,” in Proc. of Audio-and Video-based
Biometric Person Authentication, 1999.
P. J. Phillips, A. Martin, C. L. Wilson, and M. Przybocki, “An introduction evaluating biometric systems,” Computer, vol. 33, no. 2, pp. 56–63,
2000.
R. LiKamWa, B. Priyantha, M. Philipose, L. Zhong, and P. Bahl,
“Energy characterization and optimization of image sensing toward
continuous mobile vision,” in Proc. of ACM MobiSys, 2013.
N. Sae-Bae, K. Ahmed, K. Isbister, and N. Memon, “Biometric-rich
gestures: a novel approach to authentication on multi-touch devices,” in
Proc. of ACM CHI, 2012.
M. Conti, I. Zachia-Zlatea, and B. Crispo, “Mind how you answer me!:
transparently authenticating the user of a smartphone when answering
or placing a call,” in Proc. of ACM ASIACCS, 2011.
F. Monrose, M. K. Reiter, and S. Wetzel, “Password hardening based
on keystroke dynamics,” International Journal of Information Security,
2002.
N. Zheng, A. Paloski, and H. Wang, “An efficient user verification
system via mouse movements,” in Proc. of ACM CCS, 2011.
E. Miluzzo, A. Varshavsky, S. Balakrishnan, and R. R. Choudhury,
“Tapprints: your finger taps have fingerprints,” in Proc. of ACM MobiSys,
2012.
D. Gafurov, K. Helkala, and T. Søndrol, “Biometric gait authentication
using accelerometer sensor,” Journal of computers, vol. 1, no. 7, pp.
51–59, 2006.
C.-C. Chang and C.-J. Lin, “Libsvm: a library for support vector
machines,” ACM Transactions on Intelligent Systems and Technology
(TIST), vol. 2, no. 3, p. 27, 2011.
J. Liu, L. Zhong, J. Wickramasuriya, and V. Vasudevan, “User evaluation
of lightweight user authentication with a single tri-axis accelerometer,”
in Proc. of ACM MobiHCI, 2009.
F. Okumura, A. Kubota, Y. Hatori, K. Matsuo, M. Hashimoto, and
A. Koike, “A study on biometric authentication based on arm sweep
action with acceleration sensor,” in Proc. of IEEE ISPACS, 2006.
M. Shahzad, A. X. Liu, and A. Samuel, “Secure unlocking of mobile
touch screen devices by simple gestures: You can see it but you can not
do it,” in Proc. of ACM MobiCom, 2013.
A. De Luca, A. Hang, F. Brudy, C. Lindner, and H. Hussmann, “Touch
me once and i know it’s you!: implicit authentication based on touch
screen patterns,” in Proc. of ACM CHI, 2012.
S. Zahid, M. Shahzad, S. A. Khayam, and M. Farooq, “Keystroke-based
user identification on smart phones,” in Recent Advances in Intrusion
Detection. Springer, 2009, pp. 224–243.
J. R. Kwapisz, G. M. Weiss, and S. A. Moore, “Cell phone-based
biometric identification,” in Proc. of IEEE BTAS. IEEE, 2010, pp.
1–7.
J. Mantyjarvi, M. Lindholm, E. Vildjiounaite, S.-M. Makela, and
H. Ailisto, “Identifying users of portable devices from gait pattern with
accelerometers,” in Proc. of IEEE ICASSP, 2005.
Lei Yang received the B.S. degree in the School
of Software and Ph.D. degree in Department of
Computer Science and Engineering from Xi’an
Jiaotong, Shaanxi, China. He is currently a postdoc fellow in the School of Software at Tsinghua
University, Beijing, China. His research interests
include RFID, pervasive computing, network security, and smart home. He is a member of the
IEEE Computer Society, and the ACM.
12
Yi Guo received his B.S. degree of Electrical
and Computer Engineering from Shanghai Jiao
Tong University, Shanghai, China, in 2011. He
is currently a Ph.D. student in Department of
Computer Science and Engineering, Hong Kong
University of Science and Technology. His research interests include radio frequency identification (RFID) and pervasive computing. He is
a student member of the IEEE and the ACM.
Xuan Ding received his B.S. degree in the
School of Software and Ph.D. degree in the
Department of Computer Science and Technology from Tsinghua University, Beijing, China. He
is currently a postdoc fellow in the School of
Software at Tsinghua University. His research
interests include RFID, Social Network, and Security & Privacy. He is a member of the IEEE and
the ACM.
Jinsong Han is currently an associate professor at Xi’an Jiaotong University. He received
his Ph.D. degree from Hong Kong University of
Science and Technology. He has published a
number of research papers in highly recognized
journals and conference, including IEEE TPDS,
IEEE TKDE, IEEE INFOCOM, IEEE ICNP, etc.
His research interests include pervasive computing, distributed system, and wireless network.
He is a member of IEEE and ACM.
Yunhao Liu received the B.S. degree in automation from Tsinghua University, Beijing, China, in
1995, and the M.S. and Ph.D. degrees in computer science and engineering from Michigan
State University, in 2003 and 2004, respectively.
Yunhao is now Changjiang Professor at School
of Software and Tsinghua National Lab for Information Science and Technology, Tsinghua University, China.
Cheng Wang received his Ph.D. degree in Department of Computer Science at Tongji University in 2011. Currently, he is a research professor
of Computer Science at Tongji University. His
research interests include wireless networking,
mobile social networks, and cloud computing.
Changwei Hu received his B.S. degree in
the Department of Electronic Engineering from
Xi’an University of Posts & Telecommunications,
Shaanxi, China. He is currently an engineer in
the Shaanxi Broadcast& TV Network Intermediary (Group) Co., LTD. His research interests
include smart home and end-use products.

Download Report