A Rule-oriented Architecture to Incorporate

Integration of Spatial
Information Sources Based on
Source Description Framework
Yoshiharu Ishikawa, Gihyong Ryu, and
Hiroyuki Kitagawa
University of Tsukuba
Background

Spatial information sources:
emerging new information
sources on the Internet



information sources that
provide region- or locationoriented information
they support mobile users with
GPSs and hand-held devices
Need for the technology to
integrate spatial information
sources


description of spatial
information sources by taking
their contents into
consideration
efficient and effective query
planning and processing

Popular approach for
information integration


Wrapper



well-known wrapper-mediator
approach
encapsulates the detail of
each information source
provides abstract uniform
view of the source
Mediator


selects appropriate
information sources for a
given query
query planning and
processing
Our Objective and Approaches

Objective: development of a
spatial information
integration framework for
location-aware information
services


to provide useful locationoriented information service to
mobile users
Our approach (1):
development of a description
method to represent spatial
information sources


based on the source description
framework
describes the contents and the
service of the source

Our approach (2):
development of query
planning and processing
methods that effectively
utilize source descriptions


considers the heterogeneity
of the underlying information
sources
effective use of the query
processing power of each
information source
Motivating Example (1)

Query received by the
mediator: show top-20
nearest restaurants such that


within 1000 meters from the
current position
the evaluation score is more
than or equal to 2.5 stars
7
5
6
2 1000m
1
4
3

Information Source A:
provides restaurant info for
a specific area


contains information of
restaurants within the
rectangle area rectA
given name or address, it
returns the matched
restaurants
rectA
Motivating Example (2)

Information Source B:
supports spatial conditions
to query restaurant info


returns restaurants within the
specified circle area (results
are ordered by their
distances)
receives additional condition
on restaurant category
category = “Chinese”
23
1
4
2
1
53

Information Source C:
provides restaurant
evaluation scores

given restaurant name, it
returns the evaluation score
select *
from Source-C
where name = “Manchu”
name
Manchu
score
3.0
Source Description Framework (1)

Source Description
Framework: a formal
framework to specify meta
information for an
information source


proposed in Information
Manifold [Levy et al 96]
A source description
consists of:



Contents Description:
describes the contents of the
source in terms of the global
schema
Capability Description:
describes the types of queries
which the source can support
Our approach (1):
incorporates the notion of
spatial data types into the
source description
framework then represents



spatial information
spatial queries
Our approach (2):

allows the specification of
top-N query capability in
capability descriptions
Source Description Framework (2)

Data model


a global schema is written in
the relational data model
enhanced with spatial data
types
a global schema specifies a
virtual database: each
information source is (partially)
mapped into the schema
relation Restaurant { relation Evaluation {
name string;
name string;
category string;
score real;
address string;
}
location point;
}

Query language: monoid
comprehension
[Fegaras&Maier95]



a declarative query language
an extension of list
comprehension (used in
functional programming) to
multiple collection types (e.g.,
bag, set)
basic form: M{E | Q1, Q2, ...,
Qn}
 M: the collection type of
the evaluation result of
the form
 E: allowable expression
 Qi: generator (with the
form v  V ) or filter
Examples of Source Description

Source description for
information source A:
it provides information of
restaurants within the region
rectA
 it can receive name or
address as query conditions
contents
capability
description
description

Source A
contents: SA  set{r | r  Restaurant,
in(r.location, rectA)}
input/output: < >  SA
filters: <n: string>  name = n,
<a: string>  address = a

Source description for
information source B:



it receives the query point
and the allowable maximal
distance
it returns ordered results
it can receive category as an
additional filtering condition
Source B
contents: SB  set{r | r  Restaurant}
input/output: <q: point, m: real> 
sorted[d]{x | x  SB, d  dist(x.location, q), d 
m}
filters: <c: string>  category = c
sorted result based on distance values
Query Processing (1)

Example query: retrieve the
name and address of
restaurants such that



within 1000 meters from
mypos (the current position
of the user)
their evaluation scores are
larger than or equal to 2.5
stars
within the nearest top-20

Step 1: an access target
description query is
generated:

AQ specifies the required
information to process query Q
AQ = set{r#s | r  Restaurant,
s  Scoring, r.name = s.name, s.score  2.5,
dist(r.location, mypos)  1000}

Q = head[20](
sorted[d]{<name: r.name, address: r.address> |
r  Restaurant, s  Evaluation,
r.name = s.name, s.score  2.5,
d  dist(r.location, mypos), d  1000})
Step 2: subqueries are
extracted: QR and QS
QR = set{r | r  Restaurant,
dist(r.location, mypos)  1000}
QS = set{s | s Evaluation, s.score  2.5}
AQ = set{r#s | r  QR , s  QS ,
r.name = s.name}
Query Processing (2)

Step 3: target information
sources are determined for
each subquery

Step 4: a query plan is
generated for each
combination of information
sources: for example,
For example, source A may
contain required information
PA, C = set{x#y | x  IterC(score  2.5),
for QR and becomes the target
y  IterA(name = x.name),
information source if SA  QR 
dist(y.location, mypos)  1000}
 (SA has appeared in the
source description for source
 Step 5: the final integration
A).
plan is generated based on
 This condition is equivalent to:
the condition
the subqueries over the
in(r.location, rectA) 
information sources
(dist(r.location, mypos)  1000)
is satisfiable
