Data Modeling and Database Design

Data Modeling and Database Design
Minder Chen, Ph.D.
[email protected]
Team
Team number
Specialty
Customer
member
is a member of
Employee
Division
Division number
Division name
Division address
belongs to
Customer
Customer
Customer
Customer
Customer
Customer
Employee number
First name
Last name
Employee function
Employee salary
subcontract
staffed by
is assigned to
Project
Task
Task name
Task cost
number
name
address
activity
telephone
fax
contains
Project number
Project name
Project label
Start date
End date
Data Modeling and Database Design Course Outline
• INTRODUCTION
– Introduction to Data Modeling
– Database Development Life Cycle Overview
• ENTITY AND RELATIONSHIP
– Develop the Subject Area Diagram
– Develop Preliminary Data Model: Entity & Relationship
Identification
• ATTRIBUTES AND SUBTYPES
– Attributes Identification and Definition
– Develop Fully Attributed Data Model
– Identifiers
– Data Modeling Exercise
– Partitioning and Entity Subtypes
• NORMALIZATION
– Normalization
– Normalization Exercise
– De-normalization
• DATA MODEL EVALUATION AND MAPPING TO RELATIONAL DBMS
– Refine a Data Model: Analysis and Simplification
– Transform to Physical Data Base Design
• PowerDesigner: Data Architect
• Pysical DB Design and Data Warehouse DB Design
© Minder Chen, 1993~2002
Data Modeling - 2 -
References
Data Modeling and Database Design
1. Batini, Ceri, Navathe, Conceptual Database Design, Redwood City, CA: The
Benjamin/Cummings Publishing Company, Inc., 1992.
2. Teorey, T. J., Database Modeling and Design: The Entity-Relationship Approach,
Morgan Kaufmann Publishers, Inc., 1990.
3. Thomas A. Bruce, Designing Quality Databases with IDEF1X Information
Models, Dorset House Publishing, NY: New York, 1991.
4. Texas Instruments, A Guide to IE Using IEF, 2nd edition, Part No. 2739756-0001,
1990.
5. Martin, James, Information Engineering Book II: Planning and Analysis,
Prentice-Hall Inc., 1989.
6. Dave Ensor, Ian Stevenson, Oracle Design, O'Reilly & Associates, 1997
7. Rob Gillette, etc., Physical Database Design for Sybase SQL Server, Prentice
Hall, 1995.
8. Ralph Kimball, The Data Warehouse Toolkit, Wiley, 1996.
JAD References
1. August, J. H.. Joint Application Design: The Group Session Approach to
System Design. Englewood Cliffs, NY, Prentice Hall, Inc., 1991.
2. Wood, J. and Silver, D. Joint Application Design: How to Design Quality
Systems in 40% Less Time. New York, NY, John Wiley & Sons, 1989.
3. Andrews, D. C. and Leventhal, N. S., Fusion: Integrating IE, CASE, and JAD: A
Handbook for Reengineering the Systems Organization, Englewood Cliffs, NJ:
Yourdon Press, 1993.
© Minder Chen, 1993~2002
Data Modeling - 3 -
Data Modeling and Database Design: INTRODUCTION
• Systems Development Life Cycle (SDLC)
in a Client/Server Environment
• Introduction to Data Modeling
• Database Development Life Cycle
Overview
© Minder Chen, 1993~2002
Data Modeling - 4 -
Rationales for Data Modeling
• Data is the foundation of modern information
systems enabled by data base technologies.
• Data in an organization exist and can be described
independently of how these data are used.
• Data should be managed as a corporate-wide
resource.
• The types of data used in an organization do not
change very much.
• Data have certain inherent properties which lead to
correct structuring.
• If we structure data according to their inherent
properties, the structure (i.e., data models) will be
stable.
© Minder Chen, 1993~2002
Data Modeling - 5 -
History of Data Modeling
• Importance of Entity-Relationship Modeling Technique
–
–
–
–
–
Database
Data modeling and enterprise-wide data
Data quality
Data updating and accessing tools and procedure
Data sharing culture
• ER modeling technique was first developed by Peter Chen
in 1976
–
–
–
A conceptual/logical data modeling tool
A user-oriented approach
A graphic-based method
• ER modeling technique is the major data modeling method
in Information Engineering and is widely supported by
most of CASE tools.
• Data modeling is the foundation of most database-centered
transaction processing systems and data warehouse
systems
© Minder Chen, 1993~2002
Data Modeling - 6 -
CSC Development Strategies
HIGH
• RE-CREATE new business process &
systems from scratch
Risk
Long Term Reward
Short Term Costs
Degree of Change
• RE-ENGINEER business process &
systems
• RE-DESIGN current systems
• RE-HOST current systems
LOW
© Minder Chen, 1993~2002
• RE-IMAGE current systems
Data Modeling - 7 -
Distribution of Business Function (Logic)
Presentation
Space
Presentation Presentation
Service
Logic
Function
Logic
Data
Service
Data
Space
Server
Client
•
•
•
•
•
Data
Logic
Presentation logic
Local input validation
Output production logic
Local peripheral drivers
Performance critical processing
•
•
•
Functions that access data
on the server
Functions that need input
from multiple users
Functions that coordinate
the work of several user
Issues:
•
•
•
•
•
© Minder Chen, 1993~2002
Distribution of data
Platform-specific capabilities and interoperability
Connectivity capabilities/platform
Frequency of change to codes
Configuration management
Data Modeling - 8 -
C/S Development Methodology
SDLC
C/S
Architecture
performance =>
rules=>
Conceptual
Analysis
Logical
Design
Physical
Design
Work
Flow
Form
Sequences
Forms,
Screens
Application
Logic
Process
Flow
Object
Interaction
Model
Programs,
Procedures
Information
& Data Base
Data
Model
Database
Schema
Tables,
Indexes
User
Interface
Source: David Vaskevitch, Client/Server Strategies, IDG Books, 1993.
Data Modeling - 9 © Minder Chen, 1993~2002
Client/Server Application Development Methodology
Requirements
Information
& Data Base
Processes
Behavior
Workflow
User Interface
Architecture
Application
Design and
Development
Source: David Vaskevitch, Client/Server Strategies, IDG Books, 1993.
© Minder Chen, 1993~2002
Data Modeling - 10 -
Data Modeling (Data Base Design) Process
Information Requirements
Conceptual
DB Design
A conceptual DB schema is a highlevel description of the database,
independent of the particular DBMS.
Conceptual (Enterprise) DB Schema
Logical
DB Design
A logical DB schema is a description of
the structure of the database that can be
processed by a DBMS: relational, network,
or hierarchical.
Logical DB Schema
Physical
DB Design
Physical DB Schema
A physical DB schema is a description of
the implementation of the database in
external memory; it describes the storage
structures and access methods used in
order to effectively access and maintain
data.
Source: Batini, C., Ceri, S., and Navathe, S. B., Conceptual Database Design: An EntityRelationship Approach, The Benjamin/Cummings Publishing Company, Inc., 1992.
Data Modeling - 11 © Minder Chen, 1993~2002
Multiple Perspectives
We use
this data
DATA
ONE
BUSINESS
ACTIVITY
HIRE
EMPLOYEE
PAY
EMPLOYEE
......
......
....
....
EMPLOYEE
© Minder Chen, 1993~2002
We do
these things
PROMOTE
EMPLOYEE
FIRE
EMPLOYEE
Data Modeling - 12 -
Data Model (Entity Relationship Diagram)
Member
Order
sells;
is sold on
Product
placed by;
places
Member
is enrolled under;
applies to
established by;
established
generates;
generated by
is featured in;
features
© Minder Chen, 1993~2002
Agreement
Promotion
sponsors;
is sponsored by
Club
Data Modeling - 13 -
Entity Relationship Diagram: Subject Area and Entity Type
• Subject Area and Subject Area
Diagram
• Entity Types
• Entity Instances
• Finding Entity Types
• Evaluating Entity Types
© Minder Chen, 1993~2002
Data Modeling - 14 -
Subject Area (Submodel)
• A natural area of interest to the business that is centered
on a major resource, inputs, outputs, or activity of the
business.
• It contains a set of entity types.
• We start the data modeling in the ISP stage by identifying
subject areas with names and descriptions.
• In BAA stage, subject areas are used to as high level
grouping of entity types.
• Naming: a subject area is a noun in plural form and often
has the name as the central entity type in the subject area.
• Examples:
Projects
Project Member
Task
Project
© Minder Chen, 1993~2002
Data Modeling - 15 -
Subject Area Diagram
Customers
Products
Raw-materials
Orders
Suppliers
Buyers
Purchase
Orders
Sales-persons
Legends
: Subject Area
: Association
© Minder Chen, 1993~2002
Data Modeling - 16 -
Entity Types
• Definition:
– An entity is an object or event, real or abstract, about
which we would like to store data. Entity is the
abbreviation of entity type. It represent a set of
entity instances which can be described by the
same set of attribute types. The value of the same
attribute for each entity instance may be different.
• Identifying Entity Types
– What information is required by the business?
– Things that are of interest to the business that need
to be remembered in order to manage and track
them.
– Things belong to the same entity type have common
characteristics.
© Minder Chen, 1993~2002
Data Modeling - 17 -
Naming Entity Types
• The name of each entity is in singular form
–
–
–
–
a noun
an adjective + a noun
a noun + a noun => (noun string)
an adjective + a noun + a noun
• Examples
– Customer, Customer Order, Product, Hourly Employee, Project,
Department, Unfilled Customer Order
•
•
•
•
Be clear and concise
Avoid abbreviation
Be consist with user’s terminology
Identify synonyms
–
–
–
–
Customer
Product
Supplier
Teacher
Client
Merchandise
Vendor
Faculty
• Use one name as the official name and document others
as aliases
© Minder Chen, 1993~2002
Data Modeling - 18 -
Exercise: Entity Type Naming
• Courses
• Department
• Customer Order
• PO
© Minder Chen, 1993~2002
Data Modeling - 19 -
Properties of Entity Types
•
•
•
•
•
•
•
•
Name
Description
Identifier
Properties: Estimated number (Max., Min.,
Average) of entity instances
Expected growth rate of entity instances
Subject Area in which the Entity Type
resides
Attributes that describe the Entity Types
Examples of entity type instances
© Minder Chen, 1993~2002
Data Modeling - 20 -
Definition of an Entity Type
• A poor definition of Customer: Anyone
that buys something from the company.
– Can employees be a customer?
– Can a leasor be a customer?
– If the company sold a subsidiary to another
company, does the new owner consider a
customer?
• Good definition should be:
–
–
–
–
–
Compatible
Precise
Concise
Clear
Complete
© Minder Chen, 1993~2002
Data Modeling - 21 -
Good Definition
• Compatible
– Customer: An ORGANIZATION that purchase
PRODUCTs for personal use.
– Distributor: An ORGANIZATION that purchase
PRODUCTs for resale.
• Precision:
– With appropriate qualifiers
– Example: An ORGANIZATION is considered to have
purchase a PRODUCT when we receive a valid
PURCHASE ORDER from it.
• Complete
– ORGANIZATION, PRODUCT, PURCHASE ORDER
need to be defined.
• Concise and Clear
– Use modular definition
© Minder Chen, 1993~2002
Data Modeling - 22 -
Example of Entity Type Descriptions
Entity Type
Description
Customer
Information about all persons or organizations who
purchases
Product
All goods manufactured and sold
Raw-material
Components used to manufacture Products.
Supplier
Vendors of Raw Materials.
Buyer
Company personnel responsible for purchasing
Raw-Materials from Suppliers
© Minder Chen, 1993~2002
Data Modeling - 23 -
Entity Type and Entity Instance (Occurrence)
Entity Types
Vendor
Employee
Course
Department
© Minder Chen, 1993~2002
Entity Instance
ABC Co.
John Smith
Intro. to IE
Marketing Department
Data Modeling - 24 -
Exercise: Entity Types or Entity Instances?
• Maryland
• Organization Unit
• Customer
• President
• Bill Clinton
• Department of Commerce
• Address
© Minder Chen, 1993~2002
Data Modeling - 25 -
Finding Entity Types
•
•
•
•
•
•
Interviews with users
JAD workshops
Business forms
Reports
Computer files using reverse engineering
Operation manuals
© Minder Chen, 1993~2002
Data Modeling - 26 -
Where to Look for an Entity Type?
• Tangible or Intangible Things
– The nouns that are used to describe the problem domain will often
correspond to the major Entity Types of the system, at least at a
high level.
– Examples: Product, Sensor, and Employee, Department, and Sale
Office.
• Resources
– Any resources that an organization needs to manage should be
represented as an Entity Type. Information assists the efficient
and effective use of other resources through improved decision.
– Examples: Inventory, Machine, Bank Account, and Customer.
• Roles Played
– Roles can be played by persons or organizational units.
– Examples: Customers, Managers, and Account representatives.
• Events
– Events are incidents that occur at points in time. An event often
involved an interaction between two Entity Types or an action that
changes the status of an Entity Type.
– Examples: Sale, Delivery, and Registration of a motor vehicle.
© Minder Chen, 1993~2002
Data Modeling - 27 -
BIAIT:
Business Information Analysis and Integration Technique
•
•
•
•
Analysis of Orders
Ordered entities can be a thing, a space, or a skill.
View the order from supplier side.
If an organization receives no orders, it has no reason
for existing.
• An organization unit can receive multiple types of
orders.
• 4 questions about the Supplier:
–
–
–
–
Billing (Cash)?
Deliver Late (Immediate)?
Profile customer?
Negotiate price (Fixed)?
• 3 questions about the Ordered Entity:
– Rented (Sold)?
– Tracked?
– Made to order (Stock)?
Source: Carlson, W. M., "BIAIT: Business Information Analysis and Integration Technique The New Horizon," Data Base, Vol. 10, No. 4, 1979, pp. 3-9.
© Minder Chen, 1993~2002
Data Modeling - 28 -
Criteria for Evaluating an Entity Type
• Need to be remembered by the information system in order
to be functional.
• Can be operated on: CREATE, READ, UPDATE, DELETE.
• Has a set of operations/services that always apply to
change the status of each occurrence of an Entity Type.
• Carry a set of attributes that always apply to describe each
occurrence of an Entity Type.
• Have at least one relationship with other entity type.
• Exist more than one entity occurrence (instance) in an
Entity Type.
• Have at least a unique identifier.
• Domain-based requirements: Something that the system
must have in order to operate. These may be clearly
specified in the problem description or known from subject
matter experts.
© Minder Chen, 1993~2002
Data Modeling - 29 -
Entity Relationship Modeling and Diagramming
• Relationships
• Entity Relationship Diagramming
Notation
• Attributes
• Identifiers
• Partitioning and Entity Subtypes
© Minder Chen, 1993~2002
Data Modeling - 30 -
Relationship (Type)
• Definition
– A Relationship Type is an association among Entity
Types. It indicates that there is a business
relationship between these Entity Types.
– Relationship Membership is the participation of an
Entity Type in a Relationship.
– In IE, a Relationship Type can involve only two Entity
Types (binary relationship). Some other modeling
techniques allow n-ary relationships.
• Examples
–
–
–
–
CUSTOMER places ORDER
ORDER is placed by CUSTOMER
EMPLOYEE works on PROJECT
PROJECT has project member EMPLOYEE
© Minder Chen, 1993~2002
Data Modeling - 31 -
Paring (Relationship Instance)
• Relationship paring is a pair of Entity Instances of two
Entity Types associated by a Relationship Type between
these two Entity Types.
Entity Types
Entity Instance
Student
Student#1
Student#2
Course
Course#A
Course#B
Course#C
Course#D
Relationship
Student
takes
Course
© Minder Chen, 1993~2002
Relationship Paring
Student#1 takes Course#A
Student#1 takes Course#B
Student#1 takes Course#D
Student#2 takes Course#A
Student#2 takes Course#C
Student#2 takes Course#D
Data Modeling - 32 -
Relationship Instances Grouping
• Definition: A collection of pairings of
a Relationship Membership in which
an Entity Instance is involved.
• Examples:
– Student#1 takes Course#A, #B, and #D
– Student#2 takes Course#A, #C, and #D
– Course#A is taken by Student#1 and
Student#2
© Minder Chen, 1993~2002
Data Modeling - 33 -
Relationship Cardinality
One-to-One
E1
E2
E1
E2
1:1
One-to-Many
1:M
Many-to-Many
E1
E2
M:N
© Minder Chen, 1993~2002
Data Modeling - 34 -
Relationship Cardinality
• The number of Entity Instances involved in the Relationship
Instances Grouping in a Relationship Type.
• Three Forms of Cardinality
1. One-to-one (1:1)
DEPARTMENT has MANAGER
Each DEPARTMENT has one and only one MANAGER
Each MANAGER manages one and only one DEPARTMENT
2. One-to-many (1:m)
CUSTOMER places ORDER
Each CUSTOMER sometimes (95%) place one or more ORDERs
Each ORDER always is placed by exactly one CUSTOMER
3. Many-to-many (m:n)
INSTRUCTOR teaches COURSE
Each INSTRUCTION teaches zero, one, or more COURSEs
Each COURSE is taught by one or more INSTRUCTORs
© Minder Chen, 1993~2002
Data Modeling - 35 -
Entity Relationship Diagram (ERD): Notations
Graphical Notations
Cardinality
indicator
zero
one
many
relationship-description
Entity-X
reversed-relation-description
Entity-Y
min max
Translate into two structured statements
Each Entity-X relationship-description cardinality-indicator (one-or-many) Entity-Y
Each Entity-Y reversed-relationship-description (zero-or-one) Entity-Y
Example
is-managed-by
Department
© Minder Chen, 1993~2002
manages
Manager
Data Modeling - 36 -
Optionality of Relationship Memberships
• Whether all entity instances of both entity
types need to participate in relationship
pairing.
• Optionality:
– Mandatory
– Optional
• Example:
– CUSTOMER membership is optional
– ORDER membership is mandatory
places
CUSTOMER
© Minder Chen, 1993~2002
is placed by
ORDER
Data Modeling - 37 -
Relationship Statements
Cardinality
indicator
one
one or more
Graphical Notations
places
CUSTOMER
is placed by
Optionality
indicator
ORDER
zero (sometimes)
one (always)
Each Entity X optionality relationship cardinality Entity Y
Each CUSTOMER sometimes places one or more ORDER.
Each ORDER always is placed by one CUSTOMER.
© Minder Chen, 1993~2002
Data Modeling - 38 -
Defining Relationships
• Name
• Description
• Property
– Cardinality volumes
– Optionality percentage: % of Entity Type X's
instances pairing with Entity Type's Y's
instances
– Transferability: A relationship is transferable if
an entity instance can change its pairing within
the same relationship.
» TRANSFERABLE: An EMPLOYEE can change to a
different DEPARTMENT.
» NON-TRANSFERABLE: An ORDER cannot be
transferred to another CUSTOMER.
© Minder Chen, 1993~2002
Data Modeling - 39 -
ERD: More Examples
(a)
Customer
Product
(b)
places
Order
belongs-to
is-contained-in
contains
manages
Employee
is-managed-by
works-for
Parallel
Relationship
Project
has-project-members
is-consists-of
(c)
Part
contained-in
© Minder Chen, 1993~2002
Involuted or Looped
Relationship
Data Modeling - 40 -
ERD: Alternative Notations
places
Customer
Order
belongs-to
Alternative Notations:
places
Customer
Customer
Customer
© Minder Chen, 1993~2002
belongs-to Order
places
1
belongs-to Order
places
M
Order
Data Modeling - 41 -
Identifying Relationships
• Association between entity types
• Entity types that are used on the
same forms or documents.
• A description in a business document
that has a verb that relates two entity
types
– has
– consists of
– uses
© Minder Chen, 1993~2002
Data Modeling - 42 -
Attributes
• Definition
– Characteristics that could be used to describe Entity Types and
Relationship Types. However, in IE, relationship types are not
allowed to have attributes.
• Naming Conventions:
– Names that have business meaning
– Don't use abbreviation or possessive case, e.g., PN and
Customer's name
– Don't include entity type name because IEF will prefix the attribute
name with entity type name automatically
– Use standard format:
Entity Type Name
(Qualifiers)
Domain Name
Customer
Name
Employee
Starting
Date
• Examples
–
–
–
–
Customer has customer name, address, and telephone number
Product has quantity-on-hand, weight, volume, color, and name.
Employee has SSN, salary, and birthday.
Employee-works-for-project has percentage-of-time, starting-date.
© Minder Chen, 1993~2002
Data Modeling - 43 -
Attributes: Notations
Student
Student ID
Student Name
Birth date
Student ID
Course no.
enrollment
Employee
Employee number
First name
Last name
Employee function
Employee salary
Student
studentID
name
phone
Birth date
Student(Student ID, Student Name, Birth Date)
Finding Attributes:
Attributes are identified progressively during BAA phase.
• Data Analysis
• Activity Analysis
• Interaction Analysis
• Current Systems Analysis
© Minder Chen, 1993~2002
Data Modeling - 44 -
Attribute Value
• Definition
– Attribute Values are instances of Attributes used to describe
specific Entity Instances
• Examples
–
–
–
–
–
Customer Number: 011334
Customer Name: Minder Chen
State: VA
Order Total: $23,000
Sale tax: $250
• An attribute of an entity type should have only one value
at any given time. (No repeating group)
• Avoid using complex coding scheme for an attribute.
For example: PART Number: X-XXX-XXX
Part Type
© Minder Chen, 1993~2002
Material
Sequence Number
Data Modeling - 45 -
Type & Instance
OBJECT TYPE
Entity Type
Entity
Entity Type
OCCURRENCE
Entity Instance
Entity Instance
Entity
Relationship (Type)
Pairing (Relationship Instance)
Attribute (Type)
(Attribute) Value
© Minder Chen, 1993~2002
Data Modeling - 46 -
Attribute Source Categories
• Basic
– Definition: An Attribute Value that cannot be deduced
or calculated.
– Examples: Student name and Birthday
• Derived
– Definition: The Attribute Value can be calculated or
deduced from relationship Groupings or from the
values of other Attributes. The value of a Derived
Attribute changes constantly.
– Examples: Student Age, Account Balance, Number of
courses taken.
• Designed
– Definition: The Attribute is created to overcome the
system constraints. The value of a Designed
Attribute does not change.
– Examples: Student ID, Course number.
© Minder Chen, 1993~2002
Data Modeling - 47 -
Data Types
© Minder Chen, 1993~2002
Data Modeling - 48 -
Properties of Attributes
•
•
•
•
•
•
•
Name
Description
Attribute Source Category: Basic, Derived, Designed
Domain or data type: Text, Number, Date, Time, Timestamp
Optionality: Mandatory or optional
Length and/or precision
Permitted Values (Legal Values)
–
–
Ranges
A set of values (Code Table)
• Default value or algorithm
Tools such as PowerBuilder has additional properties for
table’s columns called extended attributes
– Validation Rule
– Editing Format
– Reporting Format
© Minder Chen, 1993~2002
– Column Heading
– Form Label
– Code Table
Data Modeling - 49 -
Composite Attribute
• Definition:
• Example:
– Telephone Number =
Area code + Exchange + Extension
• There is no support of composite attribute
type most of CASE tools. In such case,
an composite attribute must be stored as
an entity type.
© Minder Chen, 1993~2002
Data Modeling - 50 -
Domain
• A collection of values which can be taken by one
or more attributes.
• Date is the domain for Ordered Date, Student's
Birthday, Employee Starting Date.
• A used defined domain can have customized
validation rules and formats.
• CASE tools such as IEF only supports the
following basic domains:
–
–
–
–
–
Text
Number
Date
Time
Timestamp
© Minder Chen, 1993~2002
Data Modeling - 51 -
Identifiers
• The identifier of an entity type is a set of
attributes and/or relationships whose
values can uniquely identify an entity.
• Entity types should have one identifier.
• Identifiers may consist of
– A single attribute: Student ID
– A set of attributes: Students ID + Course ID
– An attribute and a relationship membership
(implemented as a foreign Key): Order Item No +
Order Has Order Item
© Minder Chen, 1993~2002
Data Modeling - 52 -
Identifying Relationship
product
customer
places
Symbol for
Identifying Relationship
is ordered by
ORDERS
is placed by
contains
has
order
is part of
© Minder Chen, 1993~2002
order
item
Data Modeling - 53 -
Data Modeling Case Study
The following is description by a pharmacy owner:
"Jack Smith catches a cold and what he suspects is a
flu virus. He makes an appointment with his family
doctor who confirm his diagnosis. The doctor
prescribes an antibiotic and nasal decongestant
tablets. Jack leaves the doctor's office and drives to
his local drug store. The pharmacist packages the
medication and types the labels for pill bottles. The
label includes information about customer, the doctor
who prescribe the drug, the drug (e.g., Penicillin),
when to take it, and how often, the content of the pill
(250 mg), the number of refills, expiration date, and the
date of purchase."
Please develop a data model for the entities and relationships
within the context of pharmacy. Also develop a definition
for "prescription". List all your underlying assumptions
used in your data models.
© Minder Chen, 1993~2002
Data Modeling - 54 -
Data Modeling Case Study
Given the following narrative description of entities and
their relationships, prepare a draft entity relationship
diagram (ERD). Be sure any reasonable assumptions that
you are making.
Burger World Distribution Center serves as a supplier
to 45 Burger World franchises. You are involved with
a project to build a database system for distribution.
Each franchise submits a day-by-day projection of
sales for each of Burger World's menu products - the
products listed on the menu at each restaurant - for
the coming month. All menu product require
ingredients and/or packaging items. Based on
projected sales for the store, the system must
generate a day-by-day and ingredients need and then
collapse those needs into one-per-week purchase
requisitions and shipments.
© Minder Chen, 1993~2002
Data Modeling - 55 -
Data Modeling Process
• List entity types
• Create relationships
– Pick a central entity type
– Work around the neighborhood
» Add entity types to the diagram
» Build relationships among them
– Determine cardinalities of relationships
• Find/Create identifiers for each entity type
• Add attributes to the entity type in the data
model
• Analyze and revise the data model
© Minder Chen, 1993~2002
Data Modeling - 56 -
Classifying Attribute and Partitioning
• An Entity Subtype A collection of Entities of the same
type to which a narrower definition and additional
Attributes and Relationships apply. An Entity Subtype
inherits (retains) all the Attributes and Relationships of its
parent Entity Type.
• Classifying Attribute: An attribute of the Base Entity Type
whose values partition the Entity Instances into
Subtypes.
• Partitioning: A basis for subdividing one entity type into
subtypes. The process of dividing an Entity Type into
several Subtypes based on a Classifying Attribute is
called Partitioning.
• The Classifying Attribute is recorded as a property of the
Partitioning and it appears on the diagram.
© Minder Chen, 1993~2002
Data Modeling - 57 -
Characteristics of Partitioning
• Optionality:
– Mandatory: Every Entity instances of the Entity Type
must fall into one of the Subtype categories.
– Optional: Not every Entity instances of the Entity
Type must fall into one of the Subtype categories.
• Entity Life Cycle: The states through which an
Entity Type can pass are used for Partitioning.
• Enumeration:
– Fully enumerated
– Not fully enumerated
• Classifying Attributes and Values
– Classifying Attribute: Type
– D: Domestic Subtype
– F: Foreign Subtype
© Minder Chen, 1993~2002
Data Modeling - 58 -
Partitioning and Entity Subtype: Notation
ATTRIBUTE:
Employee ID
Name
Birthday
ATTRIBUTE:
Teaching Quality Indicator
Employee
Type
Lecturer
Staff
Teaches
Seminar
Status
Wage
Hourly
© Minder Chen, 1993~2002
Data Modeling - 59 -
Alternative Notations for Subtypes
IDEF1X
PowerDesigner
Complete Category
All categories shown
employeeID
name
phone
Account
Account Number
Name
employee type
full-time-emp
employeeID (FK)
salary
© Minder Chen, 1993~2002
part-time-emp
employeeID (FK)
hourly-rate
Savings
Rate
Checking
Fees
Data Modeling - 60 -
Entity Subtype Partitioning
Life Cycle Partitioning
Order
Order Status
Taken
Scheduled
Shipped
Billed
Paid
© Minder Chen, 1993~2002
Data Modeling - 61 -
Normalization
• A data base is a model or an image of the
reality.
• Logical Data Base Design is a process of
modeling and capturing the end-user
views of an application domain and
synthesis them into a data base structure.
• Normalization is a logical data base design
method.
• The basis for normalization is the
functional dependencies among attributes
in a table.
© Minder Chen, 1993~2002
Data Modeling - 62 -
SQL Terminology
Column
Product Table
Row
p_no
product_name
quantity
price
101
Color TV
24
500
201
B&W TV
10
250
202
PC
5
2000
Create a table in SQL
CREATE TABLES
(p_no
CHAR(5) NOT NULL,
product_name
CHAR(20),
quantity
SMALLINT,
price
DECIMAL(10, 2));
© Minder Chen, 1993~2002
Data Modeling - 63 -
SQL Terminology
Set Theory
Relational DB
File
Example
Relation
Table
File
Product_table
Attribute
Column
Data item
Product_name
Tuple
Row
Record
Product_101's info.
Domain
Pool of legal values
Data type
DATE
© Minder Chen, 1993~2002
Data Modeling - 64 -
SQL Principles
• The result of a SQL query is always a table (View
or Dynamic Table)
• Rows in a table are considered to be unordered
• Dominate the markets since late 1980s
• Can be used in interactive programming
environments
• Provide both data definition language (DDL) and
data manipulation language (DML)
• A non-procedural language
• Can be embedded in 3GL:
– Embedded SQL
– Dynamic SQL
© Minder Chen, 1993~2002
Data Modeling - 65 -
SQL: Data Definition Language (DDL)
CREATE
DROP
ALTER
© Minder Chen, 1993~2002
TABLE
VIEW
INDEX
DATABASE
TABLE
Data Modeling - 66 -
SQL: Introduction
• A relational data base is perceived by its users
as a collection of tables
• E. F. Codd 1969
• Dominate the markets since late 1980s
• Strengths:
– Simplicity
– End-user orientation
– Standardization
– Value-based instead of pointer-based
– Endorsed by major computer companies
• Most CASE products support the development
of relational data base centered applications
© Minder Chen, 1993~2002
Data Modeling - 67 -
SQL: Data Manipulation Language (DML)
SELECT
UPDATE
INSERT
DELETE
p_no
101
201
202
product_name
Color TV
B&W TV
PC
quantity
24
10
5
price
500
250
2000
The Generic Form of the SELECT Statement
SELECT [DISTINCT] column(s)
FROM table(s)
[WHERE conditions]
[GROUP BY column(s) [HAVING condition]]
[ORDER BY column(s)]
© Minder Chen, 1993~2002
Data Modeling - 68 -
Database Table
• The following code retrieves only the Last Name and the
Employee ID where the Employee ID is greater than 5. The
records are retrieved in descending order.
SELECT LastName, EmployeeID
FROM Employees
WHERE EmployeeID > 5
ORDER BY EmployeeID DESC
© Minder Chen, 1993~2002
Data Modeling - 69 -
WHERE Clause
• WHERE: Use the Where clause to limit the
selection. The # symbol indicates literal date
values.
SELECT * FROM Employees
WHERE LastName = "Smith"
SELECT Employees.LastName FROM Employees
WHERE Employees.State in ('NY','WA')
SELECT OrderID FROM Orders
WHERE OrderDate BETWEEN #01/01/93# AND
#01/31/93#
© Minder Chen, 1993~2002
Data Modeling - 70 -
Keys
• A key, also called identifier, is an Attribute or a
Composite Attribute that can be used to
uniquely identify an instance of an entity type.
• Examples:
Entity Type
Key
Warehouse
Product
Student
Ship
Warehouse Number
Product Number
Student ID or SSN
Name and Port of Registration
Stock of Product
Product Number and Warehouse No.
© Minder Chen, 1993~2002
Data Modeling - 71 -
Types of Key
• Primary Key: A unique key is an attribute or a set
of attributes that has been used by the DBMS as
the identifier of a table.
• Candidate (Alternative) Key: An attribute or a set
of attributes that could have been used as the
primary key of a table.
• Secondary (Index) Key: An attribute or a set of
attributes that has been used to construct the
data retrieval index.
• Concatenated (Combined or Composite) Key: A
set of attributes that has been used as the key.
• Foreign Key: An attribute or a set of attributes
that is used as the primary key in another table.
© Minder Chen, 1993~2002
Data Modeling - 72 -
Purposes of Normalization
• Avoid maintenance problems such as
Update .
• Insert: There may be no place to insert new
information.
• Delete: Some important information will be
lost by deletion.
• Update: Inconsistency may occur because
of the existence of data redundancy.
• Provide maximum flexibility to meet future
information needs by keeping tables
corresponding to object types in their
simplified forms.
© Minder Chen, 1993~2002
Data Modeling - 73 -
A Common Sense Approach to Normalization
• Don't rush to put all the information in one
table.
• Create a table to correspond to a class of
a simple object type that should exist by
itself, i.e., "one fact in one place."
• Include common fields (links) as ways of
joining information from several related
tables.
• Avoid redundancy by using links to
retrieve data from related tables.
© Minder Chen, 1993~2002
Data Modeling - 74 -
Normalization Theory
• Normalization is a process of systematically
breaking a complex table into simpler ones.
• It is built around the concept of normal forms.
• A relation is in a particular normal form if it
satisfies a specific set of constraints such as
dependencies among attributes in the relation.
• For x is an integer and x > 1,
if a relation is in x-NF than it is in (x-1)-NF.
• Higher order normal forms are usually more
desirable than lower order normal forms.
• Normalization process usually starts from
complex relations which are usually drawn
from some existing documents such as
business forms.
© Minder Chen, 1993~2002
Data Modeling - 75 -
A Business Form
© Minder Chen, 1993~2002
Data Modeling - 76 -
An Informal Example of Normalization
• A CUSTOMER ORDER contains the following
information:
–
–
–
–
–
–
–
–
OrderNo
OrderDate
CustNo
CustAddress
CustType
Tax
Total
one or more than one Order-Item which has
»
»
»
»
»
ProductNo
Description
Quantity
UnitPrice
Subtotal.
© Minder Chen, 1993~2002
Data Modeling - 77 -
Solution
Unnormalized table
(OrderNo, OrderDate, CustNo, CustAddress, CustType, Tax, Total,
1{ProductNo, Description, Quantity, UnitPrice,Subtotal}n)
Remove repeating group
1st NF
(OrderNo, ProductNo, Description, Quantity, UnitPrice, Subtotal)
Remove partial FD
2nd NF
(OrderNo, OrderDate, CustNo, CustAddress, CustType, Tax, Total)
Remove transitive FD
(OrderNo, ProductNo, Quantity, UnitPrice, Subtotal)
(ProductNo, Description, UnitPrice)
3rd NF
© Minder Chen, 1993~2002
(OrderNo, OrderDate, CustNo, Tax, Total)
(CustNo, CustAddress, CustType)
Data Modeling - 78 -
Unnormalized Form
• A relation that has multi-valued attributes (repeating
groups).
• Normalization Process: Remove Multi-value Attributes
• If an unnormalized relation R has a primary key K and a
multi-value attribute M, the normalization process is:
– The multi-value attribute M should be removed from R.
– A new relation will be created with (K,M) as the primary key of
the relation.
– There may be some other attributes associated with this new
relation.
– R will then be at least in 1NF.
• Example: An Employee relation has an attribute
language-spoken. For some employees there may be
more than one language that they can speak.
EMP (employeeID, empName, empAddress, (language1, language2, ...))

EMP (employeeID, empName, empAddress)
EMP-LANGUAGE (employeeID, language, skillLevel)
© Minder Chen, 1993~2002
Data Modeling - 79 -
How Do You Remove the Repeating Groups?
CREATE TABLE MEM_CONDITION (
MEMBER#
VARCHAR2(12)
CASE#
VARCHAR2(16)
DIAG_ARRAY_1
VARCHAR2(6)
DIAG_ARRAY_2
VARCHAR2(6)
DIAG_ARRAY_3
VARCHAR2(6)
DIAG_ARRAY_4
VARCHAR2(6)
DIAG_ARRAY_5
VARCHAR2(6)
DIAG_EX_ARRAY_1
VARCHAR2(2)
DIAG_EX_ARRAY_2
VARCHAR2(2)
DIAG_EX_ARRAY_3
VARCHAR2(2)
DIAG_EX_ARRAY_4
VARCHAR2(2)
DIAG_EX_ARRAY_5
VARCHAR2(2)
DRUG_ARRAY_1
VARCHAR2(12)
DRUG_ARRAY_2
VARCHAR2(12)
DRUG_ARRAY_3
VARCHAR2(12)
DRUG_ARRAY_4
VARCHAR2(12)
DRUG_ARRAY_5
VARCHAR2(12)
LC_ARRAY_1
VARCHAR2(4)
LC_ARRAY_2
VARCHAR2(4)
LC_ARRAY_3
VARCHAR2(4)
LC_ARRAY_4
VARCHAR2(4)
LC_ARRAY_5
VARCHAR2(4)
MEM_REVIEW
VARCHAR2(4)
OP#
VARCHAR2(4)
© Minder
Chen, 1993~2002
NOT
NOT
NOT
NOT
NOT
NOT
NOT
NOT
NOT
NOT
NOT
NOT
NOT
NOT
NOT
NOT
NOT
NOT
NOT
NOT
NOT
NOT
NOT
NOT
NULL,
NULL,
NULL,
NULL,
NULL,
NULL,
NULL,
NULL,
NULL,
NULL,
NULL,
NULL,
NULL,
NULL,
NULL,
NULL,
NULL,
NULL,
NULL,
NULL,
NULL,
NULL,
NULL,
NULL,
Data Modeling - 80 -
Functional Dependency
• Notation: R.X => R.Y
• Definition: Attribute Y of Relation R is
functionally dependent on the
Attribute X of Relation R when there
is each value of R.Y associated with
no more than one value of R.X. R.X
and R.Y may be composite attributes.
• Description:
– R .Y is functionally dependent on R.X
– R.X functionally determines R.Y
© Minder Chen, 1993~2002
Data Modeling - 81 -
Full & Partial Dependency
• R.A => R.B
• If B is not functionally dependent on
any subset of A (other than A itself), B
is fully dependent on A in R.
• If B is functionally dependent on a
subset of A (other than A itself), B is
partially dependent on A in R.
© Minder Chen, 1993~2002
Data Modeling - 82 -
First Normal Form (1NF)
• A relation R is in the first normal form (1NF) if and only if all
attributes of any tuple in R contain only atomic values.
• Normalization Process:
– Remove Partial Functional Dependencies
– If R is in 1NF and has a composite primary key (K1,K2), an attribute
P is functionally dependent on K1 (K1 => P) (i.e., P is partially
dependent on (K1, K2)), the normalization process is:
– The attribute P should be removed from R and a new relation will
be created with K1 as the primary key and P as a non-key attribute.
– A relation that is in 1NF and not in 2NF must have a composite
primary key.
• Example
– Supplier-Part relation has attributes supplier#, part#, qty, city,
distance, where (supplier#, part#) is the key.
– City is partially dependent on supplier#.
SUPPLIER-PART (supplier#, part#, qty, city, distance)

SUPPLIER-PART (supplier#, Part#, qty)
SUPPLIER (supplier#, city, distance)
© Minder Chen, 1993~2002
Data Modeling - 83 -
Non-loss Decomposition
• Normalization is a reduction (decomposition)
process that replaces a relation by suitable
projections. Each of the projection is a new
relation that is in a further normalized form than
the original relation. The collection of
projections is equivalent to the original relation.
• The original relation can always be recovered by
taking the natural join of these projections.
• Any information that can be derived from the
original relation can also be derived from the
further normalized relations. The converse is not
true.
• The process is reversible because no
information is loss in the reduction process.
© Minder Chen, 1993~2002
Data Modeling - 84 -
Transitive Dependency
In a relation R,
if R.A =>R.B and R.B => R.C
then attribute C is said to be transitively
dependent on attribute A.
© Minder Chen, 1993~2002
Data Modeling - 85 -
Second Normal Form (2NF)
• A relation R is in the second normal form (2NF) if and
only if it is in 1NF and every non-key attribute is fully
dependent on the primary key.
• Normalization Process: Remove Transitive
Dependencies
• If R is in 2NF and has two non-key attributes A1 and A2
where A2 is functionally dependent on A1 (A1 => A2).
The A2 should be removed from R and a new relation
will be created with A1 as the primary key and A2 as a
non-key attribute.
• Example
– Supplier relation has attributes supplier#, city, distance, where
supplier# is the key and distance to a supplier can be
determined by the city of the supplier.
SUPPLIER (supplier#, city, distance, quality_level)

SUPPLIER (Supplier#, city, quality_level)
CITY-DISTANCE (city, distance)
© Minder Chen, 1993~2002
Data Modeling - 86 -
Third Normal Form (3NF)
• A relation R is in the third normal form (3NF) if
and only if the non-key attributes (if there is any)
are fully dependent on the primary key of R (i.e.,
R is in its 2NF) and are mutually independent.
• Heuristic to Check Whether a Relation Is in 3NF
– All the non-key attributes (which are not multi-value
attributes) are dependent on the (primary) key, the
whole key, and nothing but the key.
Explanation
• All the non-key attributes have atomic value and dependent on the key
(1NF - No multi-value attribute),
• the whole key, (2NF - No Partially Functional Dependency)
• and nothing but the key (3NF - No Transitive Functional Dependency)
© Minder Chen, 1993~2002
Data Modeling - 87 -
Normalization Process
Unnormalized Form
B
A
C
D
E
F
G
H
remove repeating groups
1NF
F
A
G
B
A
H
remove partial dependencies
2NF
C
D
E
remove transitive dependencies
3NF
3NF
3NF
A
F
G
F
© Minder Chen, 1993~2002
H
D
3NF
A
B
C
E
D
Data Modeling - 88 -
Normalization: Pros and Cons
• Pros
– Reduce data redundancy & space required
– Enhance data consistency
– Enforce data integrity
– Reduce update cost
– Provide maximum flexibility in responding ad hoc queries
• Cons
– Many complex queries will be slower because joins have to be
performed to retrieve relevant data from several normalized
tables
– Programmers/users have to understand the underlying data
model of an database application in order to perform proper
joins among several tables
– The formulation of multiple-level queries is a nontrivial task.
© Minder Chen, 1993~2002
Data Modeling - 89 -
Join Two Tables
SELECT Categories.CategoryName, Products.ProductName
FROM Categories, Products
WHERE Products.CategoryID = Categories.Category ID
© Minder Chen, 1993~2002
Data Modeling - 90 -
Tables in Relational DB
• Identify Primary Keys and Foreign Keys in the
following Tables!!!
ID
ID
ID
© Minder Chen, 1993~2002
Data Modeling - 91 -
Join Tables
SELECT Orders.OrderID, Orders.CustID,
LastName, Firstname, Orders.ItemID, Description
FROM
Customer, Orders, Inventory
WHERE Customer.CustID = Orders.CustID AND
Orders.ItemID = Inventory.ItemID
ORDER BY CustID, Orders.ItemID
ID
© Minder Chen, 1993~2002
ID
Data Modeling - 92 -
Foreign Keys & Primary Keys in a Sample Access Database
© Minder Chen, 1993~2002
Data Modeling - 93 -
An Example of a Complex Query
Please list name and phone number of customers
who have ordered product number 007.
SELECT customer_name, customer_phone
FROM customer
WHERE customer_number IN
SELECT customer_number
FROM order
WHERE order_no IN
SELECT order_no
FROM orderItem
WHERE product_number = 007
© Minder Chen, 1993~2002
Data Modeling - 94 -
Denormalization
• The process of intentionally backing away from
normalization to improve performance. Denormalization
should not be the first choice for improving performance
and should only be used for fine tuning a database for a
particular application.
• Requirements
– Prior normalization
– Knowledge of data usage
• Benefits
–
–
–
–
Minimize the need for joins
Reduce number of tables
Reduce number of foreign keys
Reduce number of indices
• Knowledge of Data Usage
–
–
–
–
–
How often are two data items needed together
How many rows are involved
How volatile is denormalized data
How important is visibility of data to users
What is the minimum response time and frequency of an query
© Minder Chen, 1993~2002
Data Modeling - 95 -
De-normalization: An Example
JOIN
R1
R2
Denormalization
R1 * R 2
R2
• Where:
– R1 (ProductNo, SupplierNo, Price)
– R2 (SupplierNo, Name, Address, Phone)
– R1*R2 (ProductNo, SupplierNo, Name, Address, Phone, Price)
• R2 should be kept to prevent data loss.
• Data redundancy in R1*R2 and R2 could cause potential
data inconsistency problems if the redundant data in
these two tables are not maintained properly.
© Minder Chen, 1993~2002
Data Modeling - 96 -
Data Model Refinement and Transformation
•
•
•
•
•
Data Model Refinement
Associative Entity Type
Removing Many-to-Many Relationships
Keys
Transformation to Relational Databases
© Minder Chen, 1993~2002
Data Modeling - 97 -
Refinement of a Data Model: Analysis and Simplification
•
•
•
•
•
•
•
Isolated Entity Type
Solitary Entity Type
One-to-One Relationship
Redundant Relationship
Multi-Valued Attributes
Attribute with Attributes
Many-to-Many Relationship
© Minder Chen, 1993~2002
Data Modeling - 98 -
Isolated Entity Type
• An Entity Type that does not participate in a
Relationship.
• Since every Entity Type should participate in at
least one Relationship, there exist two
alternatives:
– Identify a relevant Relationship
– Remove the Entity Type from the model
© Minder Chen, 1993~2002
Data Modeling - 99 -
Solitary Entity Type
• An Entity Type that has only one Entity Instance.
Examples: Computer Center, Sales Tax, and Current
Order Number. Solitary Entity Types may be too
restrictive.
• Alternatives:
– Introduce another Entity Type with a wider scope.
– Computer Center ==> Organization Unit
– Define it as an Attribute of an Entity Type.
– Sales Tax ==> Sales Tax of Order
– Define it as a data element in an parameter table. A parameter
table has only one row.
– Current Order Number ==> Current Order Number of Parameter
Table
© Minder Chen, 1993~2002
Data Modeling - 100 -
Evaluate One-to-One Relationship
• It may be an unnecessary relationship between
two Entity Types if they have the same attribute
and relationships (i.e., they are identical).
• It should be then combined into one Entity Type.
Maybe Incorrect
Purchase
Request
becomes
has request
Purchase
Order
Correct
Purchase
Order
© Minder Chen, 1993~2002
Data Modeling - 101 -
Redundant Relationship
Is this relationship redundant?
has ordered
product
customer
is ordered by
places
ORDERS
is placed by
contains
has
order
is part of
order
item
Differences in timing of an entity type in its life cycle:
• Implemented as separate entity types or use subtypes
• Use value of attributes or additional attributes to differentiate them
© Minder Chen, 1993~2002
Data Modeling - 102 -
Redundant Relationship
Redundant
Product
stocks
Warehouse
is held as
holds
Stock
contains
is held in
Non-redundant
Product
is contained in
contains
Order Line
is contained in
contains
Order
is placed by
is contained in
contains
© Minder Chen, 1993~2002
Order History
places
is contained in
contains
Customer
Data Modeling - 103 -
Multi-Valued Attribute
• Definition
–
An Attribute that may have more than one value at a time is called a
multi-valued attribute.
• Solution:
–
Create an Entity Type for the multi-valued attribute
• Example:
–
Languages spoken by an Employee
–
–
Employee(ID, Name, Phone, Languages)
Employee(111, “John Smith”, 201-999-8888, (English, Chinese))
–
–
Employee(ID, Name, Phone)
Employee(111, “John Smith”, 210-999-8888)
–
–
–
Employee_language(ID, Language)
Employee_language(111, English)
Employee_language(111, Chinese)
© Minder Chen, 1993~2002
Data Modeling - 104 -
Attribute with Attributes
• An Attribute that can be described by other
Attributes is called an attribute with
attributes.
• Example:
– College Degree by an Employee
– (John Smith has a College Degree in Computer
Sciences from George Mason University)
• Solution:
– Create an Entity Type to avoid an Attribute with
Attributes.
– Add new attributes to the existing Entity Type.
© Minder Chen, 1993~2002
Data Modeling - 105 -
Associative Entity Type
• An Associative Entity Type is an Entity Type
whose existence is meaningful only if it
participates in several (>=2) Relationship Types
at the same time.
• Associative Entity Types are often introduced to
represent additional information in many-tomany Relationships or to decompose a many-tomany Relationship into two one-to-many
Relationships.
• Associative Entity Types are also used to
represent n-ary Relationships in a binary data
model.
© Minder Chen, 1993~2002
Data Modeling - 106 -
Remove Many-to-Many Relationship
Given
Order
contains
belongs-to
Product
Why?
• There is no place to attach Attributes that are required to describe a
many-to-many Relationship.
• It is difficult to translate many-to-many Relationships into relational
tables automatically.
How?
A many-to-many relationship can be decomposed into two
one-to-many Relationships by creating an Associative Entity
Type between the existing two Entity Types.
has
contains
Order
belongs to
© Minder Chen, 1993~2002
Order Line
Product
is contained in
Data Modeling - 107 -
Remove Many-to-Many Relationships: Exercises
Remove the many-to-many relationship from the
following ER diagrams
(a)
Product
(b)
Student
(c)
has-sources
offers
takes
is-taken-by
Supplier
Course
consists-of
Part
is-contained-in
© Minder Chen, 1993~2002
Data Modeling - 108 -
Bills of Material
A
Part
consists-of
is-a-component-in
C
B
D
E
D
3
1
Product Structure
1
2
F
2
2
Product-Structure(Parent Part No, Child Part No, Quantity)
A
A
B
B
C
C
© Minder Chen, 1993~2002
B
C
D
E
D
F
2
1
1
3
2
2
Data Modeling - 109 -
Using an Associative Entity Type to Represent an N-ary Relationship
involved in
product usage
involved in
product usage
Product
Project
involved in
product usage
Supplier
Product Usage is an Associative Entity Type for a 3-ary Relationship.
is used in
Product
uses
Product Usage
Project
supplies
Supplier
© Minder Chen, 1993~2002
Data Modeling - 110 -
Translate Data Models to Relational Tables
Given
has
contains
Order
Order Line
belongs to
Key: Order#
Attribute:
Order date
Customer ID
Sale Person ID
is contained in
Key: Order#+Product#
Attribute:
Quantity
Unit Price
Product
Key: Product#
Attribute:
Description
Qty-on-hand
Unit Price
Relational Tables Created
CREATE TABLE ORDER
(OrderNo
CHAR(10)
OrderDate
DATE,
CustomerID
CHAR(10),
SalePersonID CHAR(10));
© Minder Chen, 1993~2002
NOT NULL,
Data Modeling - 111 -
Transformation of Data Models to Relational Database Tables
• The entire, or part of, a data (entity-relationship)
model can be translated into a normalized
database design.
• Objects Created
– At most one relational database
– One or more relations (tables)
– Data structures (DDL) representing the elements
(attributes) and the primary key of each relation
– Data type of each data elements
© Minder Chen, 1993~2002
Data Modeling - 112 -
Heuristics of Transformation
• A table is created for each Entity Type in the ER diagram.
• A table is created for each multi-valued attribute.
• Relationship Types are implemented as tables or as foreign
keys in other tables.
• Many-to-many relationship types are translated into tables.
• Foreign keys are used for implementing one-to-one and
one-to-many Relationship Types.
• For one-to-many Relationship Types, the foreign key is
placed in the table that represents the Entity Type on the
"many" end of the Relationship Type.
• For identifying one-to-many Relationship Types, the PK of
the "one" table migrate to the "many" table as a FK and the
FK is also part of the PK of the "many" table.
• For non-identifying one-to-many Relationship Types, the PK
of the "one" table migrate to the "many" table as a FK and
the FK is a non-key attribute of the "many" table.
© Minder Chen, 1993~2002
Data Modeling - 113 -
PowerDesign: Data Architect
Generation/Reverse Engineering:
CDM, PDM
Generation & Reverse Engineering:
Database Structure
Triggers & Stored Procedures
Generation & Reverse Engineering:
Extended Attributes
Database Structure
Target
4GL Tool
http://www.powersoft.com/
Target DBMS
© Minder Chen, 1993~2002
Data Modeling - 114 -
PowerDesigner
© Minder Chen, 1993~2002
Data Modeling - 115 -
A Sample Conceptual Data Model
Team
Team number
Speciality
Division
Division number
Division name
Division address
Is membersupervises
of
Uses
Employee
Employee number
First name
Last name
Employee function
Employee salary
Conceptual Data Model
Project : Management
Model : Project Management
Author : User Version 6.x 7/21/98
Customer
Customer number
Customer name
Customer address
Customer activity
Customer telephone
Customer fax
Subcontract
Activity
Start date
End date
Is manager of
Project
Project number
Project name
Project label
Material
Material number
Material name
Material type
composes
composed of
© Minder Chen, 1993~2002
Task
Task name
Task cost
Participate
Start date
End date
Data Modeling - 116 -
Notations
Entity
Employee
Employee number
First name
Last name
Employee function
Employee salary
Relationship
Employee
Employee number
First name
Last name
Employee function
Employee salary
Div ision
Div ision number
Div ision name
Div ision address
One-to-many
© Minder Chen, 1993~2002
Data Modeling - 117 -
More on Relationships
Employee
Employee number
First name
Last name
Employee function
Employee salary
is a member of
Team
member
Team number
Specialty
Many-to-many cardinality
Project
Project number
Project name
Project label
Task
Task name
Task cost
A project 'contains’ one or more tasks, and a task's
existence is dependent on the project.
© Minder Chen, 1993~2002
Data Modeling - 118 -
Advanced Concepts
Account
Account Number
Name
Material
Material number
Material name
Material type
composes
Savings
Rate
composed of
Checking
Fees
Employee
Subtype
Employee number
First name
Last name
Employee function
Employee salary
Reflexive relationship
© Minder Chen, 1993~2002
Data Modeling - 119 -
Define Entities
© Minder Chen, 1993~2002
Data Modeling - 120 -
Define Attributes
© Minder Chen, 1993~2002
Data Modeling - 121 -
Check Parameters
© Minder Chen, 1993~2002
Data Modeling - 122 -
Relationship Definition
© Minder Chen, 1993~2002
Data Modeling - 123 -
Dependent (Identifying Relationship)
• Check the box to
indicate a
dependent
relationship. "One
to many" and
"mandatory" are
automatically
chosen as the
cardinality and
optionality.
• At the physical data
model level, the
parent entity type's
primary key (PK) will
become part of the
dependent child
entity type's PK. It
is also a foreign key.
© Minder Chen, 1993~2002
Data Modeling - 124 -
Inheritance (Super-Type and Sub-Type)
© Minder Chen, 1993~2002
Data Modeling - 125 -
Generate Physical Data Model
© Minder Chen, 1993~2002
Data Modeling - 126 -
Physical Data Model
Conceptual
Data Model
Transformation
Physical
Data Model
Division
Division number
belongs to
Division name
Division address
Employee
Employee number
First name
Last name
Employee function
Employee salary
Do not define FK
as an attribute.
DIVISION
DIVNUM <pk>
DIVNAME
DIVADDR
DIVNUM = DIVNUM
EMPLOYEE
EMPNUM
<pk>
DIVNUM
<fk>
EMPFNAM
EMPLNAM
EMPFUNC
EMPSAL
DIVNUM automatically migrates as a foreign key.
© Minder Chen, 1993~2002
Data Modeling - 127 -
Dependent Relationship
Project
Conceptual
Data Model
Transformation
Project number
Project name
Project label
Physical
Data Model
PROJECT
PRONUM
<pk>
CUSNUM
<fk>
EMPNUM
<fk>
ACTBEG
ACTEND
PRONAME
PROLABL
© Minder Chen, 1993~2002
PRONUM = PRONUM
Task
Task name
Task cost
TASK
PRONUM
<pk,fk>
TSKNAME <pk>
ACTBEG
ACTEND
TSKCOST
Data Modeling - 128 -
Physical Data Model
Physical Data Model
Project:Management
Model :Project Management
Author :User Version6.x 7/21/98
DIVISION
DIVNUM <pk>
DIVNAME
DIVADDR
TEAM
TEANUM <pk>
TEASPE
TEANUM = TEANUM
MEMBER
TEANUM <pk,fk>
EMPNUM <pk,fk>
CUSTOMER
CUSNUM <pk>
CUSNAME
CUSADDR
CUSACT
CUSTEL
CUSFAX
EMPLOYE_MATERIAL
MATERIAL.MATNAME
char(30)
PROJ.EMPLOYEE.EMPNUM
numeric(5)
PROJ.EMPLOYEE.EMPFNAM char(30)
PROJ.EMPLOYEE.EMPLNAM char(30)
PROJ.EMPLOYEE.EMPFUNC char(30)
MATERIAL
PROJ.EMPLOYEE
USED
DIVNUM = DIVNUM
CUSNUM = CUSNUM
EMPNUM = EMPNUM
USED
MATNUM <pk,fk>
EMPNUM <pk,fk>
EMPNUM = EMP_EMPNUM
EMPNUM = EMPNUM
MATNUM = MATNUM
MATERIAL
MATNUM <pk>
MATNAME
MATTYPE
MATNUM = CPD_MATNUM
TNUM = CPN_MATNUM
COMPOSE
CPD_MATNUM <pk,fk>
CPN_MATNUM <pk,fk>
© Minder Chen, 1993~2002
EMPLOYEE
EMPNUM
<pk>
EMP_EMPNUM <fk>
DIVNUM
<fk>
EMPFNAM
<ak>
EMPLNAM
<ak>
EMPFUNC
<ak>
EMPSAL
PROJECT
PRONUM <pk>
CUSNUM <fk>
EMPNUM = EMPNUM EMPNUM
<fk>
ACTBEG
ACTEND
PRONAME
PROLABL
EMPNUM = EMPNUM
PARTICIPATE
PRONUM <pk,fk> PRONUM = PRONUM
TSKNAME <pk,fk> TSKNAME = TSKNAME
EMPNUM <pk,fk>
PARBEG
PAREND
PRONUM = PRONUM
TASK
PRONUM <pk,fk>
TSKNAME <pk>
ACTBEG
ACTEND
TSKCOST
Data Modeling - 129 -
References (Relationships at the Physical Data Model)
© Minder Chen, 1993~2002
Data Modeling - 130 -
Referential Integrity
• The arrow is
pointing from
the table
containing
the foreign
key to the
table where
the foreign
key is used
as a primary
key.
© Minder Chen, 1993~2002
Data Modeling - 131 -
Deletion Rules
• Update Constraints
• Delete Constraints
–None
–Restrict
–Cascade
–Set null
–Set Default
© Minder Chen, 1993~2002
Data Modeling - 132 -
Generation of Oracle SQL DLL
-- ============================================================
-- Database name: PROJECT
-- DBMS name:
ORACLE Version 8
-- Created on: 7/21/98 8:59 PM
-- ============================================================
-- ============================================================
-- Table: DIVISION
-- ============================================================
create table ADMIN.DIVISION
(
DIVNUM
numeric(5)
not null
constraint CKC_DIVNUM_DIVISION check (DIVNUM >= '1'),
DIVNAME char(30)
not null,
DIVADDR char(80)
null ,
constraint PK_DIVISION primary key (DIVNUM)
)
/
© Minder Chen, 1993~2002
Data Modeling - 133 -
Referential Integrity
alter table PROJ.EMPLOYEE
add constraint FK_EMPLOYEE_CHIEF_EMPLOYEE foreign key (EMP_EMPNUM)
references PROJ.EMPLOYEE (EMPNUM)
/
alter table PROJ.EMPLOYEE
add constraint FK_EMPLOYEE_BELONGS_T_DIVISION foreign key (DIVNUM)
references ADMIN.DIVISION (DIVNUM)
/
alter table PROJ.PROJECT
add constraint FK_PROJECT_SUBCONTRA_CUSTOMER foreign key (CUSNUM)
references PROJ.CUSTOMER (CUSNUM)
/
alter table PROJ.PROJECT
add constraint FK_PROJECT_IS_RESPON_EMPLOYEE foreign key (EMPNUM)
references PROJ.EMPLOYEE (EMPNUM)
/
alter table PROJ.TASK
add constraint FK_TASK_BELONGS_T_PROJECT foreign key (PRONUM)
references PROJ.PROJECT (PRONUM)
/
© Minder Chen, 1993~2002
Data Modeling - 134 -
Physical Database Design Activities
Define Tables & Columns
Define Keys
Identify Critical Transactions
Add Columns:
Manipulate Tables:
Add Tables:
• Redundant columns
• Derived data columns
• Collapse tables
• Supertypes & subtypes
• Derived data
tables
Handle Integrity Issues:
• Row uniqueness & Domain restrictions
• Referential integrity & Generate sequence numbers
• Derived and redundant data
Controlling Access
Source: Gillete, Rob, etc., Physical
Database Design for Sybase SQL
Server, Prentice Hall, 1995.
© Minder Chen, 1993~2002
Manage Objects:
• Sizes
• Placement
Data Modeling - 135 -
Architecture of Data Warehouse
Data Warehouse
Metadata
Info. Directory
Corporate
Operational
Database
Data
Replication
& Cleansing
Summarized
Derived
Informational
Database
Detailed
Past
Data Bridging/
Transformation
© Minder Chen, 1993~2002
•
•
•
•
•
End User
Access and
OLAP frontend Tools
Projecte
Current d
Data extraction
Data filtering
Table joining
Translation
Re-Formatting
•
•
•
•
EIS
DSS
Report Writers
Spreadsheets
Data Modeling - 136 -
Operational vs. Informational Databases
Characteristics
Operational Database
Informational Database
Data
Content
Current value
Archival data, summarized
data, calculated data
Data
organizations
Application by application
Subject areas across
enterprise
Data
Volatility
Dynamic
Static until refreshed
Data
normalization
Fully normalized for
transaction processing
Joined views suitable for
business analysis
Access
frequency
High
Low - Medium
Data
Update
Updated on a record and field
basis
Access only;
no direct update
Usage
Highly structured
transaction processing
Highly unstructured, heuristic
or analytical processing
Response
Time
Sub-second to 2-3 seconds
Several seconds to minutes
© Minder Chen, 1993~2002
Data Modeling - 137 -
Excel Pivot
Table Wizard
Relational View
Multidimensional View
© Minder Chen, 1993~2002
Data Modeling - 138 -
Dimensional Model
Product
•
•
•
•
•
Key
Name
Description
Size
Price
Promotion
•
•
•
•
Key
Description
Discount
Media
Market Region
Sale
Product Key
Market Key
Promotion Key
Time Key
•
•
•
•
Dollars
Units
Price
Cost
•
•
•
•
•
Key
Description
District
Region
Demographics
Time
•
•
•
•
Key
Weekday
Holiday
Fiscal
Region
Product
© Minder Chen, 1993~2002
Time
Data Modeling - 139 -
Modeling a Data Warehouse
• MDM: Multidimensional Modeling
– A logical model of business information
– Easy to understand
– Applicable to relational and multidimensional
databases
– Extremely useful for analysis
– A tried-and-tested techniques
• Why?
– An OLTP (On-Line Transaction Process) design of an
order processing system may have dozens or
hundreds of tables. It becomes difficult for
business managers to understand the design in
order to analyze the data.
© Minder Chen, 1993~2002
Data Modeling - 140 -
Approach
• Designed around numeric data:
–
–
–
–
values
counts
weights
occurrence
• An example of a MDM problem statement:
– "What is my profitability by customer over
time, by organization?"
© Minder Chen, 1993~2002
Data Modeling - 141 -
The Classic Star Schema
Market Dimension
Market ID
description
region
state
district
city
Product Dimension
Product ID
description
supplier ID
brand
color
size
© Minder Chen, 1993~2002
Each dimension is described by its own table
and the facts are arranged in a single large
table with a concatenated primary key
comprises the individual keys of each
dimension.
Fact Table
Market ID
Product ID
Period ID
dollars
units
price
Period Dimension
Period ID
description
year
quarter
month
current flag
resolution
sequence
Data Modeling - 142 -
Snow Flake Structure
Brand
Brand identifier <pk> int
Brand name
char(30)
Customer
Customer identifier
<pk>
Customer name
Customer address
Customer activity
Customer phone number
Customer fax number
Brand identifier = Brand identifier
Product
Product identifier
<pk>
Brand identifier
<fk>
Product description
Product category
Product unit price
int
int
char(80)
char(30)
int
int
char(30)
char(80)
char(80)
char(12)
char(12)
Year identifier
Year name
Year
<pk> int
char(30)
Customer identifier = Customer identifier
Year identifier = Year identifier
Product identifier = Product identifier
Quarter
Quarter identifier
<pk> int
Year identifier
<fk> int
Quarter name
char(10)
Sale
Time identifier
Customer identifier
Store identifier
Product identifier
Sale total
Sale revenu
Country
Country identifier <pk> int
Country name
char(80)
<fk>
<fk>
<fk>
<fk>
int
int
int
int
real
real
Quarter identifier = Quarter identifier
Month
Month identifier
<pk> int
Quarter identifier
<fk> int
Month name
char(10)
Country identifier = Country identifier
Region
Region identifier
<pk> int
Country identifier <fk> int
Region name
char(30)
Time identifier = Time identifier
Region identifier = Region identifier
Store
Store identifier
Region identifier
Store name
Store address
Store manager
Store phone number
Store FAX number
Store financial services type
Store photo services type
© Minder Chen, 1993~2002
Month identifier = Month identifier
Store identifier = Store identifier
<pk> int
<fk> int
char(50)
char(80)
char(30)
char(20)
char(20)
char(10)
char(10)
Week
Week identifier
<pk> int
Month identifier
<fk> int
Week name
char(30)
Week number in year
int
Week identifier = Week identifier
Day
Week identifier
<fk>
Time identifier
<pk>
Date
Day of week
Day number in month
int
int
datetime
char(30)
int
Data Modeling - 143 -
Steps to Build MDM
• Pick a business subject area
– Weekly sales reports, monthly financial
statements, insurance claim costs.
• Asking six fundamental questions:
– What business process is being modeled?
– At what level of detail (granularity) is "active"
analysis conducted?
– What do the measures have in common (the
"dimensions")?
– What are the dimensions' attributes?
– Are the attributes stable or variable over time
and is their "cardinality" bounded or
unbounded?
© Minder Chen, 1993~2002
Data Modeling - 144 -
Issues
• Active analysis
– Mechanical manipulation: Pivoting, Drilling
down, Graphing
– Agent-based manipulation: Alert reporting,
exception reporting
– Workflow manipulation: Publishing,
distributing documents.
• Cardinality means "how many"
– A relational database usually has "unbounded"
cardinality
– A multidimensional database usually has
"bounded" cardinality. Complete
reorganization is needed to change cardinality.
© Minder Chen, 1993~2002
Data Modeling - 145 -
dept_id = parent_id
dept
dept_id
int
parent_id
int
name
varchar(255)
description
text
date_changed datetime
pfid = pfid
product_variant
sku
int
pfid
varchar(30)
attribute0 tinyint
attribute1 tinyint
attribute2 tinyint
attribute3 tinyint
attribute4 tinyint
A Data Model for an Electronic Commerce Application
dept_id = dept_id
pfid = pfid
product_family
pfid
varchar(30)
dept_id
int
manufacturer_id int
name
varchar(255)
short_description varchar(255)
long_description text
image_filename
varchar(255)
intro_date
datetime
date_changed
datetime
list_price
int
monogramable
tinyint
shopper
sku = sku
shopper_id char(32)
order_id = order_id
created
datetime
name
varchar(235)
password
varchar(20)
street
varchar(50)
city
varchar(50)shopper_id = shopper_id
state
varchar(30)
receipt
zip
varchar(15)
order_id
char(26)
country
varchar(20)
shopper_id
char(32)
phone
varchar(16)
total
int
email
varchar(50)
status
tinyint
date_entered
datetime
date_changed
datetime
marshalled_receipt image
shopper_id = shopper_id
basket
shopper_id
char(32)
date_changed
datetime
marshalled_order image
© Minder Chen, 1993~2002
product_attribute
pfid
varchar(30)
attribute_id
tinyint
attribute_index tinyint
attribute_value varchar(20)
pfid = pfid
pfid = pfid
promo_cross
pfid
varchar(30)
related_pfid varchar(30)
description
varchar(255)
pfid = pfid
promo_upsell
pfid
varchar(30)
related_pfid varchar(30)
description
varchar(255)
receipt_item
pfid
varchar(30)
sku
int
order_id
char(26)
row_id
int
quantity
int
adjusted_price int
promo_price
promo_name
varchar(255)
promo_type
int
promo_description
text
promo_rank
int
active
int
date_start
datetime
date_end
datetime
shopper_all
int
shopper_column
varchar(64)
shopper_op
varchar(2)
shopper_value
varchar(64)
cond_all
int
cond_column
varchar(64)
cond_op
varchar(2)
cond_value
varchar(64)
cond_basis
char(1)
cond_min
int
award_all
int
award_column
varchar(64)
award_op
varchar(2)
award_value
varchar(64)
award_max
int
disjoint_cond_award int
disc_type
char(1)
disc_value
realData Modeling
- 146 -
Attribute 0 of pfid 14 is size and
the attribute value 1 is Grande
and 2 is Tall and 3 is Short
© Minder Chen, 1993~2002
Data Modeling - 147 -
Web-based Build-To-Order Application
© Minder Chen, 1993~2002
Data Modeling - 148 -
Data Model for Build-To-Order Application
© Minder Chen, 1993~2002
Data Modeling - 149 -
http://www.oracle.com/tools/jdeveloper/documents/jsptwp/index.html?content.html
Auction Web
Site's Data Model
© Minder Chen, 1993~2002
Data Modeling - 150 -