Introduction to Database Principles

2012-2013 (2)
BI203: Introduction to Database Principles
http://cbb.sjtu.edu.cn/course/database
MaoyingWu (吴茂英)
[email protected]
October 1, 2015
Course Syllabus
Terminologies and Concepts
 Data (数据)
 Database (数据库)
 Database Management System (DBMS,数据库管理系统)
 Database Administrator (DBA,数据库管理员)
 Database System (DBS,数据库系统)
DBMS: 数据库管理系统
hardware
Operating
system
DBMS/compi
ler
Application
developer
Application
DBS: 数据库系统
user
user
Applications
Applications
developer
Database
Administrator
DBMS
Operating
system
Database
user
Textbook and Reference
① Abraham Silberschatz, Henry F. Korth, S. Sudarshan.
Database System Concepts. McGraw-Hill Companies.
② 杨冬青,马秀莉,唐世渭。数据库系统概念。机
械工业出版社
③ Jeffrey D. Ullman, Jennifer Wildom. A First Course in
Database Systems. Dept. of CS, Stanford University
④ 王珊,萨师煊。数据库系统概论。高等教育出版
社
Frequently-used DBMS
 Oracle
 MySQL
 PostgreSQL
 Microsoft SQL Server
 IBM DB2
Course Outline
 Ch1. Introduction
Section 1: Relational database (关系型数据库)
 Ch2. Relational model (关系模型)
 Ch3. SQL (结构化查询语言)
 Ch4. Advanced SQL
 Ch5. Other Relational Query Languages
Section 2: Database design (数据库设计)
 Ch6. Database Design: Entity-Relationship Model (实体-关系模型)
 Ch7. Relational Database Design (关系数据库设计)
 Ch8. Application Design (应用程序设计)
Course Outline (cont.)
Section 3: Data storage、query (数据存储与查询)
 Ch9. Storage and File Structure (存储与文件结构)
 Ch10. Indexing and Hashing (索引与哈希)
 Ch11. Query Processing (查询处理)
 Ch12. Query Optimization (查询优化)
Section 4: Transaction Management (事务管理)
 Ch13. Transaction (事务概念)
 Ch14. Concurrency Control (并发控制)
 Ch15. Recovery system (系统恢复)
Section 5: Object-Based Databases and XML (面向对象数据库)
 Ch16. Object-Based Databases
 Ch17. XML
 Ch18. Data mining (数据挖掘)
Schedule
 Lecture + Discussion
32 hours
 Computer Labs
16 hours
 Group-based Project
> 10 hours
 Final Exam (written)
2 hours
Scoring Policy
 Assignments
10%
 Computer Lab Reports
20%
 Group projects
 Final
10%
60%
Evaluation form for reports
 0: Not Done
 1: Incomplete
 2: Late Complete
 3: Needs Improvement
 4: Complete
 5: Well Done
Chapter 1: Introduction
-- Introduction to database principles
MaoyingWu ([email protected])
23-Feb-13
Outlines
 Database History
 Why database
 Database Architecture
 Data Models
 Conceptual Model
 Logic Model
 Physical Model
 Data Storage and Queries
 Transaction Management
 Database Users and Administrators
History of Database Systems
 1950s and early 1960s
 Magnetic tapes for storage
 provide only sequential access
 Punched cards for input
 Late 1960s and 1970s:
 Hard disks allow direct access to data
 Network and hierarchical data models
 Ted Codd: the relational data model
 ACM Turing Award for this work
 IBM Research begins System R prototype
 UC Berkeley begins Ingres prototype
 High-performance transaction processing
History (cont.)
 1980s:
 Relational prototypes evolve into commercial systems
 SQL becomes industrial standard
 Parallel and distributed database systems
 Object-oriented database systems
 1990s:
 Large decision support and data-mining applications
 Large multi-terabyte data warehouses
 Emergence of Web commerce
 2000s:
 XML and Xquery standards
 Automated database administration
Why database systems?
 Drawbacks of early-days file-based data storing
 Data redundancy and inconsistency
 Difficulties in accessing the data
 Data isolation – multiple files and formats
 Integrity problems
 integrity constraints become “buried” in program codes rather than being
stated explicitly
 hard to add new constraints or change existing ones
 Atomicity of updates
 Concurrent access by multiple users
 Security problems
The solution
 The Database system (DBS)
Database Architecture
Architecture levels
 Physical level (物理层)
 describes how a record (e.g., customer) is stored.
 Logical level (逻辑层)
 describes data stored in database, and the relationships among the data.
type customer = record
customer_id: string;
customer_name: string;
customer_street: string;
customer_city: integer;
end;
 View level (可视层)
 application hide details of data types. Views can also hide information (such as an
employee’s salary) for security purposes.
DBMS Components
Data Models (数据模型)
 Data model is the abstract of real-world data characteristics,
which include
 data definition (数据描述、定义)
 data organization (数据组织)
 data manipulation (数据操作)
real
world
Information World
Conceptual Model
Computer World
DBMS data model
Data Models: Components
 Data structure (数据结构)
 the static characteristic (静态特征) of the data
 set of object type (描述对象类型的集合)
 Data manipulation (数据操作)
 dynamic characteristics (动态特征)
 Query/modify (insert/delete/update)
 Integrity constraints (完整性约束)
 entity integrity (实体完整性)
 referential integrity (参照完整性)
Data Models (数据模型)
 Concept:A collection of tools for describing




Data (数据)
Data relationships (关系)
Data semantics (语义)
Data constraints (约束)
 Known Models:
 Relational model (关系模型)
 Entity-Relationship (E-R) model (数据库设计)
 Object-based models (Object-oriented and Object-relational, 面向
对象、对象关系模型)
 Semi-structured model (XML)
 Older: Network model, hierarchical model
Relational Model (关系模型)
 Relation (关系): a 2-dimensional table to represent data (关系)
 Attributes (属性): names for the columns of a relation
 (title, year, length, genre)
 Schema (关系模式): name of a relation + set of attributes
 Movies(title, year, length, genre)
 Tuple (元组): a row of a relation
 (StarWars, 1977, 124, sciFi)
 Domain (域): the elementary type associated with each attribute of a relation
 Movies(title:string, year:integer, length:integer, genre:string)
Keys of relations (键/码)
 Many constraints for relational models
 key constraint is one kind of constraint
 a set of attributes forms a key for a relation if we do not
allow two tuples in a relation instances to have the same
values in all the attributes of the key
 Examples:
 Movies(title, year, length, genre)
Database Schema: Movies
Relation terminology
Jargon (专业术语)
Vulgo (俗称)
Relation name (关系名)
Table name (表名)
Schema (关系模式)
Table header (表头/表描述)
Relation (关系)
2-d table (二维表)
Tuple (元组)
Row (行)
Attribute (属性)
Column (列)
Attribute name (属性名)
Column name (列名)
Attribute value (属性值)
Column value (列值)
Domain (域)
Column type (列类型)
In-class Exercise 1
Two relations in a banking database: accounts and
customers
Indicate the following:
acctNo
type
Balance
 attributes of each relation
12345
savings
12000
23456
checking
1000
 tuples of each relation
34567
savings
25
 components of one tuple from each

firstName
lastName
id
account

Robbie
Banks
901-222
12345

Lena
Hand
805-333
23456

Lena
Hand
805-333
34567
relation
relation schema for each relation
database relation schema
a suitable domain for each relation
considering the order, how many ways
to present each of the two relation
instances?
DML (数据操作语言)
Data Manipulation Language
 Language for accessing and manipulating the data organized
by appropriate data model
 aka, query language (查询语言)
 Two classes of languages
 procedural (过程性): specifies what data is required and how to get
these data
 declarative (声明性): specifies what data is required without
knowing how to get these data
 SQL: most widely used query language
DDL (数据定义语言)
Data-defintion Language
 Specified notation for defining the database schema
 Example:
create table account(
account_no
char(10),
balance integer)
 DDL compiler generates a set of tables stored in a data dictionary (数据字典)
 Data dictionary contains metadata (元数据)
 Database schema
 Data storage and definition language
 specifies the storage structure and access methods (存储结构和访问方法)
 Integrity constraints
 domain constraints (类型约束)
 referential integrity (参考约束)
 assertions (要求约束)
 Authorization (权限)
SQL (结构化查询语言)
Structural Query Language
 Principal language to describe and manipulate relational
databases
 SQL-99 standard
 Two aspects:
 Data-Definition language (DDL) for declaring database schemas
 Data-Manipulation language (DML) for querying and modifying
SQL: 3 kinds of relations
 tables (表):relations stored in the database allowing
modification as well queries
 views (视图): relations defined by a computation,
constructed when needed
 temporary tables (临时表): constructed by SQL
processor when performing queries.
Database Design (数据库设计)
The process of designing the general structure of the database
 Conceptual design (概念设计):real world =>conceptual
model
 business decision: what attributes is required?
 computational decision: what relation schemas? how to
distribute the attributes among various relation schemas?
 Logical design (逻辑设计):decision on the database
schema.
 Physical design (物理设计): Decision on the physical layout
of the database
E-R Model (实体-关系模型)
The Entity-Relation Model
 Models the enterprise as a collection of entities (实体) and
relationships (关系)
 Entity: a “thing” or “object” in the enterprise that is
distinguishable from other objects
 Relationship: an association among several entities
Object-relational model
对象关系模型
 Extend the relational data model by including object orientation
and structs to deal with added data types.
 Allow attributes of tuples to have complex types, including nonatomic values such as nested relations.
 Preserve relational foundations, in particular the declarative access
to data, while extending modeling power.
 Provide upward compatibility with existing relational languages.
XML (可扩展标记语言)
Extensible markup language
 WWW Consortium (W3C)
 Originally as a document markup language
 The ability to specify new tags, and to create nested tag
structures
 great way to exchange data
 Basis for all new-generation data interchange formats.
A wide variety of tools is available for parsing, browsing and
querying XML documents/data
XML
Extensional Markup Language
Storage management (存储管理)
 Storage manager
A module that provides the interface between the low-level data
stored in the database and the application and queries submitted to
the system.
 Responsibilities:
 Interaction with the file manager
 Efficient storing, retrieving and updating of data
 Issues:
 Storage access
 File organization
 Indexing and hashing
Query processing: 3 steps
 Parsing and translation (解读与翻译)
 Optimization (优化)
 Evaluation (评价)
Query processing (cont.)
 Alternative ways of evaluating a given query
 Equivalent expressions
 Different algorithms for each operation
 Cost difference between a good and a bad way of evaluating a
query can be enormous
 Need to estimate the cost of operations
 Depends critically on statistical information about relations
which the database must maintain
 Need to estimate statistics for intermediate results to compute
cost of complex expressions
Transaction Management
事务管理
 Transaction (事务): A collection of operations that
performs a single logical function in a database application
 Transaction-management component (事务管理组
件) ensures consistent (correct) state of database in case of
system failures (e.g., power failures and operating system
crashes) and transaction failures.
 Concurrency-control manager (并发控制管理器)
controls the interaction among the concurrent transactions,
to ensure the consistency of the database.
Database Architecture
数据库架构
The architecture is greatly influenced by the underlying
computer system
 Centralized (集中式)
 Client/Server (C/S, 客户端/服务器)
 Parallel (multi-processor, 并行)
 Distributed (分布式)
Database users (数据库用户)
Users are distinguished by their way to interact with the database
system
 Application Programmers (应用程序员) – interact with
system through DML calls
 Specialized users (专业用户) - write specialized applications
that do not fit into the traditional data processing framework
 Naïve users (普通用户) – invoke one of the permanent
applications that have been written previously
 Persons accessing the database over the web, bank tellers, clerical staff
Database Administrator
数据库管理员
 Responsible for coordinating all the activities of the database
system
 good understanding of the enterprises’ information resources and
needs
 Tasks include:
 schema definition (模式描述)
 storage structure (存储结构) and access method definition (访问方





法描述)
schema and physical organization and modification (物理层面)
authorities grants (权限分配)
specifying integrity constraints (完整性约束)
acting as liaison with users (与其他用户的联系纽带)
monitoring performance and responding to changes in need (性能检
测、应对需求和危机)
Learning strategies
 Know how => know why
 适当的不求甚解
After-class assignment 1
 Install MySQL on your own computer
 Linux: from source package or LAMP
 Windows: from rebuilt package or XAMP
 Learn some basic knowledge on HTML/PHP and B/S
development
 Review of Perl/Python/C programming skills