Ch t 17 O tli Chapter 17 Outline

Chapter 17
Database File
Indexing
Techniques, BT
Trees,
and
d B+Trees
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley
Ch t 17 Outline
Chapter
O tli
„
„
„
„
„
„
Type of Single-Level Ordered Indexes
Multilevel Indexes
Dynamic Multilevel Indexes Using B-Trees
and
d B+-Trees
T
Indexes
de es on
o Multiple
u t p e Keys
eys
Other Types of Indexes
Some general Issues Concerning Indexing
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
I d
Indexes
as Access
A
P
Paths
th
„
„
The idea behind an ordered index access structure is
similar to that behind the index used in a textbook.
A single-level
i l l
l iindex
d is
i an auxiliary
ili
fil
file that
th t makes
k it more
efficient to search for a record in the data file
„
The index
inde is usually
s all specified on one field of the file
file,
called an indexing field (or indexing attribute),
(although it could be specified on several fields)
„
One form of an index is a file of entries <field value,
pointer to record
record>,, which is ordered by field value
„
The index is called an access path on the field
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
I d
Indexes
as Access
A
Paths
P th (cont’d.)
(
t’d )
„
„
„
The index file usually occupies considerably less disk
blocks than the data file because its entries are much
smaller
A binary search on the index yields a pointer to the file
record
Indexes can also be characterized as dense or
sparse
„
„
A dense index has an index entry for every search key
value (and hence every record) in the data file
file.
A sparse (or nondense) index, on the other hand,
has index entries for only some of the search values
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Indexes as Access Paths (cont’d.)
(
)
„
„
Example:
p Given the following
g data file
EMPLOYEE(NAME, SSN, ADDRESS, JOB,
SAL, ... )
Suppose that:
„
„
record size R=150 bytes
y
r=30000 records
block size B=512 bytes
y
Then, we get:
g
„
„
blocking factor Bfr= B div R= 512 div 150= 3
records/block
number of file blocks b= (r/Bfr)= (30000/3)= 10000
blocks
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Indexes as Access Paths (cont’d.)
(
)
„
For an index on the SSN field
field, assume the field size
VSSN=9 bytes, assume the record pointer size PR=7
bytes Then:
bytes.
„
„
„
„
index entry size RI=(VSSN+ PR)=(9+7)=16 bytes
index blocking factor BfrI= B div RI= 512 div 16= 32
entries/block
number of index blocks bI = (r/ BfrI) = (30000/32)
(30000/32)=938
938
blocks
binaryy search needs log
g2bI = log
g2938 = 10 block
accesses
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Indexes as Access Paths (cont’d.)
(cont d.)
„
For an index on the SSN field
field, assume the field size
VSSN=9 bytes, assume the record pointer size PR=7
bytes Then (cont’d
bytes.
(cont d.):
):
„
This is compared to an average linear search cost of:
•
„
(b/2)= 10000/2
(b/2)
10000/2= 5000 block accesses
If the file records are ordered, the binary search cost
would be:
•
log2b= log210000= 14 block accesses
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
T
Types
off Single-Level
Si l L
l Indexes
I d
„
Primary Index
„
„
Clustering Indexes
„
„
on the ordering nonkey field of a file
Secondary Indexes
„
„
on the ordering key field of an ordered file
on any
y nonordering
g field of a file
A file can have at most one physical ordering field,
so it can have at most one primary index or one
clustering index, but not both
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Pi
Primary
Index
I d
„
„
„
„
Primary Index
Defined on an ordered data file
The data file is ordered on a key field
Includes one index entryy for each block in the data
file; the index entry has the key field value for the first
record in the block, which is called the block anchor
„
„
A similar scheme can use the last record in a block.
Ap
primary
y index is a nondense ((sparse)
p
) index,, since
it includes an entry for each disk block of the data file
and the keys of its anchor record rather than for
every search value.
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Pi
Primary
Index
I d (cont’d)
(
t’d)
„
The index file for a primary index occupies a much
smaller space than does the data file for two reasons:
„
„
There are fewer index entries than there are records in
the data file
E h iindex
Each
d entry
t iis ttypically
i ll smaller
ll iin size
i th
than a d
data
t
record because it has only two fields
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Primary
Index on the
Ordering
Key Field
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Pi
Primary
Index
I d Example
E
l
„
Example 1: suppose that we have an ordered file with r=30
r=30,000
000
records stored on a disk block size B=1,024 bytes. File records
are fixed size and are unspanned, with record length R=100
bytes.
„
„
„
„
„
What is the blocking factor for the file? bfr= ⎣B/R⎦ =
⎣1024/100⎦=10, b= ⎡r/bfr ⎤ = ⎡30000/10⎤ =3000
How manyy block accesses are needed to retrieve a record using
ga
binary search based on the ordered key field? ⎡log
⎡ 23000⎤=12
⎤
Suppose that the ordering key field of the file is V=9 bytes long, a
block pointer is P=6 bytes long, and a primary index is constructed
for the file
file. What is the blocking factor for the index? bfri =⎣B/Ri⎦
=⎣1024/15⎦ =68
What is the total number of blocks needed to store the index?
bi=⎡r
⎡ i/bfri⎤⎤=⎡3000/68⎤=45
⎡
⎤
How many block accesses are needed to retrieve a record using
the index if a binary search is applied to the index file?
⎡log245⎤+1=7
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
P bl
Problems
off Primary
Pi
Indexes
I d
„
The major problem with a primary index – as with any
ordered file – is insertion and deletion of records
„
Inserting a record into its correction requires not only to
move records to make space for the new record, but also
change
g some index entries
„
„
„
Using an ordered overflow file can reduce the problem
Use a linked list of overflow records for each block in the data
fil records
file;
d within
ithi each
h bl
block
k and
d itits overflow
fl
lilinked
k d lilistt can
be sorted to improve retrieval time
Deleting
g a record can be handled using
g deletion makers
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Cl t i Index
Clustering
I d
„
„
„
„
Defined on an ordered data file
The data file is ordered on a non-key field unlike primary
i d
index,
which
hi h requires
i
th
thatt th
the ordering
d i fifield
ld off th
the d
data
t fil
file
have a distinct value for each record
Includes one index entry for each distinct value of the field;
the index entry points to the first data block that contains
eco ds with that
a field
e d value
a ue
records
It (Figure 17.3) is another example of nondense index
g
where Insertion and Deletion is relativelyy straightforward
with a clustering index
„
By reserving a whole block (or a cluster of contiguous blocks) for
each
h value
l off the
th cluster
l t field
fi ld
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
A
Clustering
Index
Example
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Cl t i Index
Clustering
I d (cont’d.)
(
t’d )
„
To alleviate the record insertion and deletion problems
„
„
Reserve a whole block (or a cluster of contiguous blocks)
for each value of the clustering field; all records with that
value are placed in the block (or block cluster) (see Figure
17.3)
An index is somewhat similar to the directory structures
used for extensible hashing.
„
„
Both are searched to find a pointer to data block containing
the desired record
A index
An
i d search
h uses th
the values
l
off th
the search
h fifield
ld it
itself,
lf
whereas a hash directory search uses the hash value
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Another
Clustering
Index
Example
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Secondary Index
„
„
„
A secondary index provides a secondary means of
accessing a file for which some primary access
already
l d exists
i t
The data file can be ordered, unordered, or hashed.
The secondary index may be on a field which is a
candidate key and has a unique value in every record,
or a non-key
k with
ith d
duplicate
li t values
l
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Secondary Index (cont’d
(cont d.))
„
The index is an ordered file with two fields
„
„
„
„
The first field is of the same data type as some nonordering
d i field
fi ld off the
th data
d t file
fil that
th t is
i an indexing
i d i fifield
ld
The second field is either a block pointer or a record
pointer
There can be many secondary indexes (and hence,
indexing fields) for the same file
Includes one entry for each record in the data file;
hence it is a dense index
hence,
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Example
off a Dense
D
Secondary
Index
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
S
Secondary
d
Index
I d (cont’d.)
(
t’d )
„
Example 2: consider the file in Example 1 with rr=30,000
30 000
fixed length records of size R=100 bytes stored o a disk
block size B=1024 bytes. The file has b=3000 blocks as
calculated
l l t d iin E
Example
l 1
1.
„
„
„
„
How many number of block accesses required for doing a
linear search on the file? b/2=3000/2=1500
Suppose that a dense secondary index is constructed on a
norordering key field of the file that is V=9 bytes long and a
block pointer is P=6 bytes
y
long.
g What is the blocking
g factor
for the index? bfri = ⎣1024/15⎦=
⎣
⎦ 68
What is the total number of blocks needed to store the index?
bi=⎡ri/bfri⎤=⎡30000/68⎤=442
How many block accesses are needed to retrieve a record
using the index if a binary search is applied to the index file?
⎡log
g2442⎤+1=10
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
S
Secondary
d
Index
I d (cont’d.)
(
t’d )
„
Options for implementing a secondary index on a nonkey,
nonordering field of a file
„
Option 1:
„
„
„
Include several index entries (duplicate index entries) with the
( ) value – one for each record
same K(i)
This would be a dense index
Option 2: (nondense)
„
„
Have variable-length records for the index entries, with a
repeating field for the pointer
Keep a list of pointers <P(i
<P(i,1),
1) …,P(i,k)>
P(i k)> in the index entry for
K(i), where one pointer to each block that contains a record
whose index field value is K(i)
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
S
Secondary
d
Index
I d (cont’d.)
(
t’d )
„
O ti 3:
Option
3 more commonly
l used
d
„
„
„
„
„
„
„
Keep the index entries themselves at fixed length and have a single
entry for each index field value
Create an extra level of indirection to handle the multiple pointers
This would be a nondense index
The block p
pointer P(i)
( ) in each index entry
y <K(i),
( ), P(i)>
() p
points to a
block of record pointers that lead to one of the records with value
K(i) (see Figure 17.5)
If the K(i)
( ) occurs in manyy records that cannot fit in a single
g disk block,
a cluster or linked list of blocks is used
Retrieval via the index requires one or more additional block
accesses
A secondary index provides a logical ordering on the records by the
index field.
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Example
off
Secondary
Index
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Properties of Index Types
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Multi-Level Indexes
„
Because a single-level index is an ordered file, we
can create a primary index to the index itself;
„
„
„
IIn this
thi case, the
th original
i i l iindex
d fil
file iis called
ll d the
th firstfi t
level index and the index to the index is called the
second-level
second
level index
We can repeat the process, creating a third, fourth, ...,
top level until all entries of the top level fit in one disk
block
A multi-level index can be created for anyy type
yp of
first-level index (primary, secondary, clustering) as
long as the first-level index consists of more than one
disk block
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Multi-Level Indexes (cont
(cont’d
d.))
„
„
„
„
The binary search requires approximately (log2bi) block
accesses for an index bi block because it reduces the part
of index file that we continue to search by a factor of 2
The idea behind a multilevel index is to reduce the part
of index file that we continue to search by bfri, the
blocking factor for index, which is larger than 2
The
e value
a ue o
of b
bfri iss called
ca ed the
e fan-out
a out o
of the
e multilevel
u e e
index and is denoted as fo (bfri=fo)
Searching
g a multilevel index requires
q
approximately
pp
y
(logfobi) block accesses for an index bi block
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
M lti L
Multi-Level
l Indexes
I d
(
(cont’d.)
t’d )
„
„
„
A multilevel index considers the index file, now refer to the
first (or base) level of the a multilevel index, as an
ordered file with a distinct value for each K(i)
Hence, we can create a primary index for the first level;
this index to the first level is called the second level of
the multilevel index
The
e seco
second
d level
e e has
as o
one
ee
entry
y for
o eac
each b
block
oc o
of the
e first
s
level. If the first level has r1 entries and the blocking factor
for the index is bfri=fo, then the first level needs ⎡(r1/fo)⎤
blocks, which is the number of entries r2 needed at the
second level of the index
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
M lti L
Multi-Level
l Indexes
I d
(
(cont’d.)
t’d )
„
„
„
We can repeatt the
W
th process for
f the
th second
d level.
l
l The
Th third
thi d
level, which the primary index of the second level, has an
entry
y for each second-level block,, so the number of thirdlevel entries is r3= ⎡(r2/fo)⎤
A second level is required only if the first level needs
more than
th one bl
block
k off di
diskk storage,
t
and,
d similarly,
i il l a thi
third
d
level is required only if the second level needs more than
one
o
eb
block
oc
We can repeat the process until all the entries of some
index level t fit in a single block. This block at the tth level
i called
is
ll d th
the top
t index
i d llevel.
l Th
Thus, 1
1≤(r
( 1/fo
/f t) and,
d h
hence, t
= ⎡logfor1⎤
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
M lti L
Multi-Level
l Indexes
I d
(
(cont’d.)
t’d )
„
„
The multilevel
Th
l il
l scheme
h
can be
b used
d on any type off iindex,
d
whether
h h iit iis
primary, clustering, or secondary – as long as the first-level index has
distinct values for K(i) and fixed-length entries.
Example 3: Suppose that the dense secondary index of Example 2 is
converted into a multilevel index.
„ What is the fan
fan-out
out fo for the multilevel index? fo=⎣1024/15⎦ =68
„ What are the number of first-level block b1, the number of secondlevel b2, and the number of third-level block b3? b1=
⎡ 1/fo⎤=⎡30000/68⎤=442,
⎡r
⎤ ⎡
⎤
b2=⎡b
⎡ 1/fo⎤=⎡442/68⎤=7,
⎤ ⎡
⎤
b3= ⎡7/68⎤=1
⎡
⎤
„ What is the number of the top level of the index t? t=3
„ How many block accesses are needed to retrieve a record using
the multilevel index? t+1=3+1=4
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
A Two-Level
Two Level
Primary
I d
Index
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Multi-Level Indexes (cont
(cont’d
d.))
„
„
We can have
W
h
a multilevel
l il
l primary
i
iindex,
d
which
hi h would
ld b
be nondense.
d
Such a multi-level index is a form of search tree
„
„
„
However, insertion and deletion of new index entries is a severe problem
because every level of the index is an ordered file
A common file organization is an ordered file with a multilevel primary
index on its ordering key field
field. Such an organization is called indexed
sequential file.
„ The insertion can be handled by some form of overflow file and
recreate the index during the file reorganization.
reorganization
„ IBM ISAM (indexed Sequential Access Method): two-level index
(cylinder, track)
To retain the benefits of using multilevel indexing while reducing
index insertion and deletion problem, a multilevel index that leaves
some space in each of its blocks for inserting entries can be adopted.
„ This is called a dynamic multilevel index and is often implemented
by using B-tree and B+-tree.
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Dynamic Multilevel Indexes Using
BT
B-Trees
and
d B+-Trees
B T
Most multi-level
M
lil
l iindexes
d
use B
B-tree or B+-tree
B
d
data
structures
because of the insertion and deletion problem
„ This leaves space
p
in each tree node ((disk block)) to allow for new
index entries
These data structures are variations of search trees that allow
efficient insertion and deletion of new search values
values.
„ A search tree is a special type of tree that is used to guide the search
for a record, given the value of record field.
„ The multilevel index can be though of as a variation of search tree
(slightly different); each node in the multilevel index can have as
many as fo pointers and fo key values, where fo is the index fan-out
„ The index field (or search field) in each node guide us to the next
node, until we reach the data file block that contains the required
record.
eco d
„
„
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Dynamic Multilevel Indexes Using
BT
B-Trees
and
d B+-Trees
B T
(contd.)
(
td )
„
„
„
In B-Tree and B+-Tree data structures, each node
corresponds to a disk block
E h node
Each
d iis kkeptt b
between
t
h
half-full
lf f ll and
d completely
l t l ffullll
An insertion into a node that is not full is quite efficient
„
„
„
„
If a node
d is
i ffullll th
the iinsertion
ti causes a split
lit iinto
t ttwo nodes
d
Splitting may propagate to other tree levels
A deletion
d l ti iis quite
it efficient
ffi i t if a node
d d
does nott b
become lless
than half full
If a deletion causes a node to become less than half full
full, it
must be merged with neighboring nodes
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
A Tree Data Structure that Shows
an Unbalanced
U b l
d Tree
T
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Dynamic Multilevel Indexes Using
BT
B-Trees
and
d B+-Trees
B T
(contd.)
(
td )
„
Search Trees
„
„
A search tree of order p is a tree such that each node
contains at most p
p−1
1 search values and p pointers in the
order <P1, K1, P2, K2,…,Pq-1,Kq-1,Pq>, where q≤p; each Pi is
a pointer to a child node (or a null pointer); and Ki is a
search value from some ordered set of values
Two constraints must hold
„
„
Within each
Withi
h node,
d K1 < K2 < … < Kq-1
For all values X in the subtree pointed at by Pi, we have Ki-1< X
< Ki for 1< i < q; X < Ki for i=1;; and Ki-1
q
i 1< X for i=q
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
A Node in a Search Tree with
Pointers to Subtrees Below It
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Bt
B-tree
Structures
St
t
„
AB
B-tree
tree of order p can be defined as follo
follows:
s
„ Each internal node in the B-tree (Figure 14.10a) is of the form
<P1, <K1, Pr1>, P2, <K2, Pr2>, …, <Kq-1, Prq-1>, Pq>
where q ≤ p. Each Pi is a tree pointer to another node in the B-tree.
Each Pri is a data pointer to the record whose search key field value
is equal to Ki
„ Within each node
node, K1 < K2 < … < Kq-1
„ For all search key field values X in the subtree pointed by Pi, we have
Ki-1< X < Ki for 1 < i < q; X < Ki for i=1; and Ki-1 < X for i=q
„ Each node has at most p tree pointers
„ Each node, except the root and leaf nodes, has at least ⎡(p/2)⎤ tree
pointers. The root node has at least two tree pointers unless it is the
only node in the tree
„ A node with q tree pointers, q ≤ p, has q-1 search field values
„ All leaf nodes are at the same level. Leaf nodes have the same
structure as internal nodes
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
B-tree
B
tree Structures
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Dynamic Multilevel Indexes Using
BT
B-Trees
and
d B+-Trees
B T
(contd.)
(
td )
„
„
„
„
Mostt implementations
M
i l
t ti
off a dynamic
d
i multilevel
ltil
l iindex
d use a
variation of B-tree data structure called B+-tree.
In B+-tree,
B+-tree data pointers are stored only at the leaf nodes
of the tree; hence the structure of leaf nodes differs from
the structure of internal node
The leaf nodes have an entry for every value of the
search field, among with a data pointer to the record (or to
the block that contains this record) if the search field is a
key field.
For a nonkeyy search field, the pointer
p
p
points to a block
containing pointers to the data file records, creating an
extra level of indirection
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
B t
B+-tree
Structures
St
t
(cont’d.)
(
t’d )
„
The structure
Th
t t
off the
th internal
i t
l nodes
d off a B+-tree
B t
off order
d P
is
„
„
„
„
„
„
Each internal node is of the form
<P1, K1, P2, K2, …, Pq-1, Kq-1, Pq> where q ≤ p. Each Pi is a tree
pointer
Within each internal node, K1 < K2 < … < Kq-1
For all search key field values X in the subtree pointed by Pi, we
have
Ki-1< X ≤ Ki for 1 < i < q; X ≤ Ki for i=1; and Ki-1 < X for i=q
Each internal node has at most p tree pointers
E h iinternal
Each
t
l node,
d exceptt th
the roott and
d lleaff nodes,
d
h
has att lleastt
⎡(p/2)⎤ tree pointers. The root node has at least two tree pointers if
it is an internal node
An internal node with q pointers, q ≤ p, has q-1 search field values
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
B t
B+-tree
Structures
St
t
(cont’d.)
(
t’d )
„
The structure of the leaf nodes of a B+-tree of order p is
„
„
„
„
„
Each leaf node is of the form
<<K1, Pr1>,
> <K2, Pr2>,
> …, <Kq-1, Prq-1>,
> Pnext>
where q ≤ p. Each Pri is a data pointer, and Pnex points to the next
leaf node of the B+-tree
Within each leaf node, K1 < K2 < … < Kq-1 , q ≤ p
Each Pri is a data pointer that points to the record whose search
field value is Ki or to a file block containing the record (or to a
block of record pointers that point to records whose search field
value is Ki if the search field is not a key)
Each leaf node has at least ⎡(p/2)⎤
⎡
⎤ values
All leaf nodes are at the same level
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Difference between B-tree
B tree and B+tree
„
„
„
In a B-tree, pointers to data records exist at all levels
of the tree
In a B+-tree, all pointers to data records exists at the
leaf level nodes
leaf-level
A B+-tree can have less levels (or higher capacity of
search values) than the corresponding B
B-tree
tree
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
The Nodes of a B+-tree
tree
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
I d
Indexes
on Multiple
M lti l Keys
K
„
„
If a certain combination of attributes is used very
frequently, it is advantageous to set up an access
structure to provide efficient access by a key value that is
a combination of those attributes
Example: an EMPLOYEE file
„ Consider a query “List the employees in department
g is 59”,, where both DNO and
number 4 whose age
AGE are nonkey attributes
„ The search strategies
g
can be
„
Assume DNO has an index, but AGE does not, access the
records having DNO=4 using the index then select from
among them those records that satisfy AGE=59
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
I d
Indexes
on Multiple
M lti l Keys
K
(cont’d.)
(
t’d )
„
„
„
„
„
Alt
Alternately,
t l AGE has
h an index,
i d
but
b t DNO d
does not,
t access th
the records
d
having AGE=59 using the index then select from among them those
records that satisfy DNO=4
If both DNO and AGE have indexes; use both index to retrieve a set
of records (or a set of pointers) and an intersection of these set of
records (or pointers) yields those records that satisfy both condition
However, if the set of records that meet each condition
However
(DNO=4 or AGE=59) individually are large, yet only a few
records satisfy the combined condition, then none of above
is a very efficient technique for the given search request
We can consider the combination <DNO, AGE> or <AGE,
DNO> as a search key made up of multiple attributes
We refer the keys containing multiple attributes as
composite keys
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Ordered Index on Multiple
Att ib t
Attributes
„
„
„
In general, if an index is created on attributes <A1, A2, …,
An>, the search key values are tuples with n values: <v1,
v2, …, vn>
„ E.g., <DNO, AGE> = <4, 59>
A lexicographic ordering of these tuples values
establishes an order on this composite search key
„ E.g.,
E g <3
<3, n> proceeds <4
<4, m> for any values of m and n
An index on a composite key of n attributes work similarly
to any single attribute index
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
P titi
Partitioned
d Hashing
H hi
„
„
„
Partitioned hashing is an extension of static external
hashing that allows access on multiple keys. It is suitable
only for equality comparison; rang queries are not
supported.
t d
In partitioned hashing, for a key consisting of n
components,
p
, the hash function is designed
g
to p
produce a
result with n separate hash addresses. The bucket
address is a concatenation of these n addresses
„ Example: DNO=4 has a hash value “100”
100 and
AGE=59 has a hash value “10101”, then the bucket
address will be 10010101
Th partitioned
The
titi
dh
hashing
hi can b
be easily
il extended
t d d tto any
number of attributes, but cannot handle range queries on
any of the component attributes
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
G id Files
Grid
Fil
„
„
„
„
„
If we wantt to
t access a file
fil on two
t
keys,
k
we can construct
t t
a grid array with one linear scale (or dimension) for each
g
14.14))
of the search attributes ((see Figure
The scales are made in a way as to achieve a uniform
distribution of that attribute
Each cell of the grid array points to some bucket address
where the records corresponding to the cell are stored
The method is particularly useful for range queries that
would map into a set of cells corresponding to a group of
g the linear scales.
values along
The grid file concept may be applied to any number of
search keys
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Example of a Grid Array on DNO
and
d AGE Attributes
Att ib t
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Using Hashing and Other Data
St
Structures
t
as Indexes
I d
„
„
„
It is also possible to create access structure similar to
indexes that are based on hashing
Th index
The
i d entries
t i <K,
<K P> can b
be organized
i d as a
dynamically expandable hash file; searching for an entry
uses the hash search algorithm on K
„ E.g., extensible hashing (directory) and linear hashing
Once an entry is found
found, the pointer P is used to locate the
corresponding record in the data file
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
L i l versus Physical
Logical
Ph i l Indexes
I d
„
Physical
Ph
i l iindex
d h
has th
the di
disadvantage
d
t
th
thatt th
the pointer
i t mustt b
be
changed if the record is moved to another disk location
„
„
To remedy this situation, we can use a structure called a logical
index,, whose index entries are of the form <K,, Kp>
„
„
„
e.g., a primary file organization based on linear hashing or
extensible hashing
Each entry has one value K for the secondary indexing field
matched with the value Kp of the field used for the primary
file organization
By searching the secondary index K, we first locate Kp and
use Kp to access the record through the primary file
organization
Logical indexes introduce an additional level of indirection between
the access structure and the data
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Di
Discussion
i
„
„
„
„
„
An index can be created and discarded dynamically
A secondary index is usually created to avoid physical
ordering
g of the records in the data file
It is more expensive and much harder to create primary
indexes and clustering indexes dynamically because of
physical ordering on the indexing field
It is common to use an index to enforce a key constraint
on an attribute
A file
fil that
h h
has a secondary
d
iindex
d on every one off iits fifields
ld
is often called a fully inverted file.
„ Because all indexes are secondary, new records are
inserted at the end of the file; therefore the data file
itself is an unordered (heap) file
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Summary
„
Types of Single-level Ordered Indexes
„
„
„
„
„
„
Primary Indexes
Clustering Indexes
Secondary
y Indexes
Multilevel Indexes
Dynamic Multilevel Indexes Using B
B-Trees
Trees
and B+-Trees
I d
Indexes
on Multiple
M lti l K
Keys
Copyright © 2011 Ramez Elmasri and Shamkant Navathe