Extracting Mobility Statistics from Indexed

Extracting Mobility Statistics from
Indexed Spatio-Temporal Datasets
Yoshiharu Ishikawa
Yuichi Tsukamoto
Hiroyuki Kitagawa
University of Tsukuba
August 30, 2004
STDBM 2004 at Toronto
Outline




Background and objectives
Markov transition probability
Indexing method for moving trajectories
Proposed methods




naïve algorithm
CSP-based algorithm
Experimental results
Conclusions
Background

Moving object databases



Research issues



stores and manages information on a huge number of
moving objects
supports queries on moving trajectories and/or
moving status
spatio-temporal indexes
extraction of statistics (e.g., selectivities)
Statics in spatio-temporal databases


used for query optimization
also useful in mobility analysis
Our Approach



Objective: extracting mobility statistics from spatiotemporal databases
Target: trajectory data indexed using R-trees
Statistics to be extracted:Markov transition probability



target space is decomposed in cells
estimating transition probabilities between cells using the
indexed trajectory data
Features


search problem is formalized as constraint satisfaction problem
(CSP)
efficient processing using R-trees
Outline




Background and objectives
Markov transition probability
Indexing method for moving trajectories
Proposed methods




naïve algorithm
CSP-based algorithm
Experimental results
Conclusions
Markov Transition Probability (1)


Assumption: target space is decomposed in cells
Example 1: What is the estimated probability that an
object currently in cell c0 moves in cell c1 in a unit time
later?
c0
c1
A
t =τ

A
t =τ+1
First-order Markov transition probability Pr(c1|c0)
Markov Transition Probability (2)

Example 2: What is the probability that an object
which moves from c0 to cell c1 in a unit time
moves to cell c2 in the next unit time?
c0
c1
A
A
c2
A
t =τ


t =τ+1
t =τ+2
Second-order transition probability Pr(c2|c0, c1)
Extension to order-n Markov transition
probability Pr(cn|c0, …, cn-1) is easy
Markov Transition Probability

Conventional technique in traffic data analysis


Special kind of association rules



Upton & Fingleton, 1989 [13]
probability corresponds to the confidence factor
difference: existence of order
Usage

trajectory estimation


estimates where a moving object moves to in the next
period
simulation of movement status

given status of moving objects at t = , we can estimate the
change of the status at t =  + 1,  + 2, …
Assumptions

Movement patterns obeys stationary process


Cell decomposition




movement tendency does not change as time passes
each cell is a rectangle
cell size is arbitrary: non-uniform decomposition is
allowed
cell decomposition can be specified dynamically
Unit time length


unit time can be specified as arbitrary length (e.g.,
one minuite, 10 minuites, …)
but a unit time length should be a multiple of
sampling time length
Formalization of Probability (1)


Target data: trajectory data from t = 0 to t = T
Definition of first-order Markov transition probability
T 1
P r(c1 | c0 ) 
 | objs(c , t )  objs(c , t  1) |
t 0
0
T 1
 | objs(c , t ) |
t 0



1
0
objs(ci, t): set of objects which were in cell ci at t
denominator: no. of objects which were in cell c0 at arbitrary t (0
≤ t ≤ T  1)
numerator: no. of objects each of which contained in
denominator and moved cell c1 at t + 1
Formalization of Probability (2)

Definition of order-n Markov Transition
Probability
T 1
P r(cn | c0 ,  , cn 1 ) 
n
|

 i 0 objs(ci , t  i) |
t 0
T 1
n 1
|

 i 0 objs(ci , t  i) |
t 0


denominator: no. of objects each of which was in cell
c0 at t (0 ≤ t ≤ T  1), in cell c1 at t + 1, …, and in cell cn
 1 at t + n  1
numerator: no. of objects each of which is contained
in Dominator and moved cell cn at t + n
Generalized Transition Probability
Estimation Problem (1)
Given n + 1 cell sets
C0  {c0,1,, c0, |C0 |},, Cn  {cn,1,, cn, |Cn|},
for each of arbitrary cell combinations
(c0 ,, cn )  C0  Cn ,
output Pr(cn|c0,…,cn-1)

Derives transition probability according to the
specified cell sets at once
Generalized Transition Probability
Estimation Problem (2)

Example: Given C0 = {c0, c1}, C1 = {c1, c2}, C2 =
{c1, c2, c3}, estimate second-order probabilities
c0
c1
c2
c3

Algorithm outputs 12 probabilities Pr(c1|c0, c1), Pr(c2|c0,
c1), …, Pr(c3|c1, c2)
Outline




Background and objectives
Markov transition probability
Indexing method for moving trajectories
Proposed methods




naïve algorithm
CSP-based algorithm
Experimental results
Conclusions
Indexing Methods for Trajectories


R-tree-based approach is assumed
Point-based representation: trajectories is
represented as a set of points


(d+1)-dimension R-tree is used (e.g., 3D R-tree)
incorporating temporal dimension
(d +1)-D R-tree-based Representation
x
x
root
b
1
5
6
3
B
a
4
c
2
0 1
A
0 1
2
3
4
5
6
2
3
4
5
6
7
root
7 8
(=T)
a
Sampling-based representation
1
2
b
3
c
4
5
6
8
(=T)
Outline




Background and objectives
Markov transition probability
Indexing method for moving trajectory data
Proposed methods




naïve algorithm
CSP-based algorithm
Experimental results
Conclusions
Naïve Algorithm (1)


Based on the definition of the Markov transition
probability
Example: Estimating Pr(c2|c0, c1)







Determine objs(c0, ) and objs(c1,  + 1) using the R-tree
 objs(ci, t): the set of objects which were in cell ci at time t
Take intersection of two sets; the cardinality of the intersection
is added to Scount
If the intersection is not empty objs(c2,  + 2) is determined using
the R-tree
Take intersection of objs(c0, ), objs(c1,  + 1) , objs(c2,  + 2); the
cardinality of the result is added to Qcount
This process is repeated for each  (0 ≤  ≤ T – n)
Calculate Pr(c2|c0, c1) based on Scount, Qcount
No. of search on R-tree is proportional to T
Naïve Algorithm (2)
Example: estimation of Prc2 | c0 , c1 
x
Qcount += 1
Output =
Qcount
Scount
cell c2
cell c1
cell c0
0
1
2
3
4
Scount += 1
5
6
7
Scount += 1
8
(=T)
No. of search
on R-tree
is proportional
to T
Outline




Background and objectives
Markov transition probability
Indexing method for moving trajectories
Proposed methods




naïve algorithm
CSP-based algorithm
Experimental results
Conclusions
Basic Idea (1)

Estimation of Pr(cn|c0, …, cn-1) based on three steps:
1.
2.
3.

Count the no. of objects which were in c0, …, cn-1 at each
unit time using an R-tree
Count the no. of objects which were in c0, …, cn at each
unit time using an R-tree
Compute Pr(cn|c0, …, cn-1) by [result of step 2] / [result of
step 1]
Benefits

step 1 & 2 can be processed using the same algorithm


algorithm for step 1 is given by setting n → n – 1
requires only two searches on R-tree
Basic Idea (2)
Example: estimation of Pr(c2|c0, c1)
x
Step 1: count
objects
which moved from
c0 to c1 within a
cell unit time
c2
Step 2: count objects
that moved as
cell c , c , c at each
c1 0 1 2
unit time
cell Step 3: compute
c0 probability
Qcount = 1
Pr(c2|c0, c1) = ―――――
Scount = 2
0
1
2
3
4
5
6
7
8 (= T )
Counting Using R-tree (1)



How can we compute no. of objects which were
in c0, …, cn at each unit time?
Idea: the problem is formalized as a constraint
satisfaction problem (CSP)
An object satisfying the constraint fulfills the
following constraints for some 





it was in cell c0 at t = 
it was in cell c1 at t =  + 1
…
it was in cell cn at t =  + n
Search objects that satisfy all n + 1 constraints
Counting Using R-tree (2)

Effective use of R-tree is necessary

We extend the CSP solution search method
using R-trees (Papadias et al, VLDB’98) [7]

considers spatial constraints


search CSP solutions from the root to leaves



Example: find all spatial objects x, y, z that satisfy
overlap(x, y) and north(y, z)
Use of pruning and backtracks
Reduce search space using constraints
enumerates all solutions with one R-tree access
Example of Counting (1)
x
root
For C0 = {c1}, C1 = {c1, c2},
C2={c2}, derive
probabilities for (C0, C1, C2)
b
1
5
6
3
c2 Derive two probabilities at
a
once
4
c
2
0
1
c1
2
3
4
5
6
7
(=T)
Pr(c2|c1, c1): the probability that
an object which have moved
as c1c1 next moves to c2
 Pr(c2|c1, c2)

8
Example of Counting (2)
x
root
R-tree
b
1
root
5
6
3
c2
a
4
1
c1
2
3
4
b
c
c
2
0
a
5
6
7
(=T)
8
1
2
3
4
5
6
Pruning Method (1)
Pruning condition 1:
Movement between two R-tree
nodes which do not temporary
consecutive is impossible
x
b
c
a
Candidates can be deleted
0 1
2
3
4
5
6
7
8
(=T)
Example:
- movement such as a b and
b c are allowed
- movement a c is impossible
Pruning Method (2)
x
Pruning condition 2:
Trajectory is not contained
in the target cell
cell c1
0 1
2
3
4
5
6
7
8
(=T)
Example: When we are
counting for c1  c1, we
should consider only nodes
that overlaps with c1
Pruning Method (3)
x
Pruning condition 3:
If [max distance an object
can move] < [distance between
MBRs] then an object cannot
move from a node to next node
1
distance
between
MBRs
2
0 1
2
3
4
5
6
7
8
(=T)
Query Processing Example
x
tree
level
=2
root
a
cell c2
b
root
Targets:
c1  c1  c2
c1  c2  c2
root
c
cell c1
pruning
pruning
t
tree
level
=1
1
2
cell c2
cell c1
pruning
tree
level
=0
cell c2
cell c1
backtrack
AnThere
objectisthat
no
moved
as
objects that
c1 
c1 as
c2
moved
is cfound
and
1  c1  c2
counted
c1  c2  c2
Outline




Background and objectives
Markov transition probability
Indexing method for moving trajectory data
Proposed methods




Naïve algorithm
CSP-based algorithm
Experimental results
Conclusions
Dataset (1)


Generated using the moving object simulator
made by Brinkoff [1]
Simulates car movement situation on actual city
road network






Oldenburg city, Germany (about 2.5km x 2.8km)
no. of initial moving objects: 5
5 objects are created in a minute
on average 100 objects are moving in the map at a
time
data is generated for T = 1000 minutes
120K points are stored in 3-D R-tree
Dataset (2)
Example for
estimating using 3 x 3 cells
c0
0
c3
0.183
c6
0.04
c1
0.081 c4
0.348 c7
0.10
c2
0.08
c5
0.01
c8
0.02
Experimental Result (1)
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
T (minute)
00
0
T=
1
00
T=
9
00
T=
8
00
T=
7
00
T=
6
00
T=
5
00
T=
4
00
T=
3
T=
2
00
Naïve
CSP
00

T=
1

Map is decomposed into 30 x 30 cells
First-order Markov transition probabilities
Randomly 3 x 3 cells are selected
Ellapsed Time (second)

Experimental Result (2)
8
7 Naïve
6 CSP
5
4
3
2
1
T (minute)
00
0
T=
1
00
T=
9
00
T=
8
00
T=
7
00
T=
6
00
T=
5
00
T=
4
00
T=
3
00
T=
2
00
0
T=
1

Estimation of second-order transition probabilities
Other parameters are same to the former case
Ellapsed Time (second)

Experimental Result (3)
120
100
Naïve
CSP
80
60
40
20
T (minute)
T=
90
0
T=
10
00
T=
80
0
T=
70
0
T=
60
0
T=
50
0
T=
40
0
T=
30
0
T=
20
0
0
T=
10
0

Estimation of third-order transition probabilities
Other parameters are similar to the former case
Ellapsed Time (second)

Experimental Result (4)
The case when CSP-based approach is not effective
 Target space is decomposed into 20 x 20 cells
 Estimation of second-order transition probabilities
25
20
Since cell
decomposition
is coarse, the
pruning cannot
reduce
candidates
Naïve
CSP
15
10
5
T (minute)
T=
90
0
T=
10
00
T=
80
0
T=
70
0
T=
60
0
T=
50
0
T=
40
0
T=
30
0
T=
20
0
0
T=
10
0
Ellapsed Time (second)

Conclusions and Future Work

Conclusions


mobility statistics based on Markov transition
probability
proposals of two algorithms




naïve approach
CSP-based approach
CSP-based approach effectively utilizes R-tree
structure
Future Work


adaptive cell decompositions
extension to non-stationary Markov transitions