2012-2013 (2) BI203: Introduction to Database Principles http://cbb.sjtu.edu.cn/course/database MaoyingWu (吴茂英) [email protected] October 1, 2015 Course Syllabus Terminologies and Concepts Data (数据) Database (数据库) Database Management System (DBMS,数据库管理系统) Database Administrator (DBA,数据库管理员) Database System (DBS,数据库系统) DBMS: 数据库管理系统 hardware Operating system DBMS/compi ler Application developer Application DBS: 数据库系统 user user Applications Applications developer Database Administrator DBMS Operating system Database user Textbook and Reference ① Abraham Silberschatz, Henry F. Korth, S. Sudarshan. Database System Concepts. McGraw-Hill Companies. ② 杨冬青,马秀莉,唐世渭。数据库系统概念。机 械工业出版社 ③ Jeffrey D. Ullman, Jennifer Wildom. A First Course in Database Systems. Dept. of CS, Stanford University ④ 王珊,萨师煊。数据库系统概论。高等教育出版 社 Frequently-used DBMS Oracle MySQL PostgreSQL Microsoft SQL Server IBM DB2 Course Outline Ch1. Introduction Section 1: Relational database (关系型数据库) Ch2. Relational model (关系模型) Ch3. SQL (结构化查询语言) Ch4. Advanced SQL Ch5. Other Relational Query Languages Section 2: Database design (数据库设计) Ch6. Database Design: Entity-Relationship Model (实体-关系模型) Ch7. Relational Database Design (关系数据库设计) Ch8. Application Design (应用程序设计) Course Outline (cont.) Section 3: Data storage、query (数据存储与查询) Ch9. Storage and File Structure (存储与文件结构) Ch10. Indexing and Hashing (索引与哈希) Ch11. Query Processing (查询处理) Ch12. Query Optimization (查询优化) Section 4: Transaction Management (事务管理) Ch13. Transaction (事务概念) Ch14. Concurrency Control (并发控制) Ch15. Recovery system (系统恢复) Section 5: Object-Based Databases and XML (面向对象数据库) Ch16. Object-Based Databases Ch17. XML Ch18. Data mining (数据挖掘) Schedule Lecture + Discussion 32 hours Computer Labs 16 hours Group-based Project > 10 hours Final Exam (written) 2 hours Scoring Policy Assignments 10% Computer Lab Reports 20% Group projects Final 10% 60% Evaluation form for reports 0: Not Done 1: Incomplete 2: Late Complete 3: Needs Improvement 4: Complete 5: Well Done Chapter 1: Introduction -- Introduction to database principles MaoyingWu ([email protected]) 23-Feb-13 Outlines Database History Why database Database Architecture Data Models Conceptual Model Logic Model Physical Model Data Storage and Queries Transaction Management Database Users and Administrators History of Database Systems 1950s and early 1960s Magnetic tapes for storage provide only sequential access Punched cards for input Late 1960s and 1970s: Hard disks allow direct access to data Network and hierarchical data models Ted Codd: the relational data model ACM Turing Award for this work IBM Research begins System R prototype UC Berkeley begins Ingres prototype High-performance transaction processing History (cont.) 1980s: Relational prototypes evolve into commercial systems SQL becomes industrial standard Parallel and distributed database systems Object-oriented database systems 1990s: Large decision support and data-mining applications Large multi-terabyte data warehouses Emergence of Web commerce 2000s: XML and Xquery standards Automated database administration Why database systems? Drawbacks of early-days file-based data storing Data redundancy and inconsistency Difficulties in accessing the data Data isolation – multiple files and formats Integrity problems integrity constraints become “buried” in program codes rather than being stated explicitly hard to add new constraints or change existing ones Atomicity of updates Concurrent access by multiple users Security problems The solution The Database system (DBS) Database Architecture Architecture levels Physical level (物理层) describes how a record (e.g., customer) is stored. Logical level (逻辑层) describes data stored in database, and the relationships among the data. type customer = record customer_id: string; customer_name: string; customer_street: string; customer_city: integer; end; View level (可视层) application hide details of data types. Views can also hide information (such as an employee’s salary) for security purposes. DBMS Components Data Models (数据模型) Data model is the abstract of real-world data characteristics, which include data definition (数据描述、定义) data organization (数据组织) data manipulation (数据操作) real world Information World Conceptual Model Computer World DBMS data model Data Models: Components Data structure (数据结构) the static characteristic (静态特征) of the data set of object type (描述对象类型的集合) Data manipulation (数据操作) dynamic characteristics (动态特征) Query/modify (insert/delete/update) Integrity constraints (完整性约束) entity integrity (实体完整性) referential integrity (参照完整性) Data Models (数据模型) Concept:A collection of tools for describing Data (数据) Data relationships (关系) Data semantics (语义) Data constraints (约束) Known Models: Relational model (关系模型) Entity-Relationship (E-R) model (数据库设计) Object-based models (Object-oriented and Object-relational, 面向 对象、对象关系模型) Semi-structured model (XML) Older: Network model, hierarchical model Relational Model (关系模型) Relation (关系): a 2-dimensional table to represent data (关系) Attributes (属性): names for the columns of a relation (title, year, length, genre) Schema (关系模式): name of a relation + set of attributes Movies(title, year, length, genre) Tuple (元组): a row of a relation (StarWars, 1977, 124, sciFi) Domain (域): the elementary type associated with each attribute of a relation Movies(title:string, year:integer, length:integer, genre:string) Keys of relations (键/码) Many constraints for relational models key constraint is one kind of constraint a set of attributes forms a key for a relation if we do not allow two tuples in a relation instances to have the same values in all the attributes of the key Examples: Movies(title, year, length, genre) Database Schema: Movies Relation terminology Jargon (专业术语) Vulgo (俗称) Relation name (关系名) Table name (表名) Schema (关系模式) Table header (表头/表描述) Relation (关系) 2-d table (二维表) Tuple (元组) Row (行) Attribute (属性) Column (列) Attribute name (属性名) Column name (列名) Attribute value (属性值) Column value (列值) Domain (域) Column type (列类型) In-class Exercise 1 Two relations in a banking database: accounts and customers Indicate the following: acctNo type Balance attributes of each relation 12345 savings 12000 23456 checking 1000 tuples of each relation 34567 savings 25 components of one tuple from each firstName lastName id account Robbie Banks 901-222 12345 Lena Hand 805-333 23456 Lena Hand 805-333 34567 relation relation schema for each relation database relation schema a suitable domain for each relation considering the order, how many ways to present each of the two relation instances? DML (数据操作语言) Data Manipulation Language Language for accessing and manipulating the data organized by appropriate data model aka, query language (查询语言) Two classes of languages procedural (过程性): specifies what data is required and how to get these data declarative (声明性): specifies what data is required without knowing how to get these data SQL: most widely used query language DDL (数据定义语言) Data-defintion Language Specified notation for defining the database schema Example: create table account( account_no char(10), balance integer) DDL compiler generates a set of tables stored in a data dictionary (数据字典) Data dictionary contains metadata (元数据) Database schema Data storage and definition language specifies the storage structure and access methods (存储结构和访问方法) Integrity constraints domain constraints (类型约束) referential integrity (参考约束) assertions (要求约束) Authorization (权限) SQL (结构化查询语言) Structural Query Language Principal language to describe and manipulate relational databases SQL-99 standard Two aspects: Data-Definition language (DDL) for declaring database schemas Data-Manipulation language (DML) for querying and modifying SQL: 3 kinds of relations tables (表):relations stored in the database allowing modification as well queries views (视图): relations defined by a computation, constructed when needed temporary tables (临时表): constructed by SQL processor when performing queries. Database Design (数据库设计) The process of designing the general structure of the database Conceptual design (概念设计):real world =>conceptual model business decision: what attributes is required? computational decision: what relation schemas? how to distribute the attributes among various relation schemas? Logical design (逻辑设计):decision on the database schema. Physical design (物理设计): Decision on the physical layout of the database E-R Model (实体-关系模型) The Entity-Relation Model Models the enterprise as a collection of entities (实体) and relationships (关系) Entity: a “thing” or “object” in the enterprise that is distinguishable from other objects Relationship: an association among several entities Object-relational model 对象关系模型 Extend the relational data model by including object orientation and structs to deal with added data types. Allow attributes of tuples to have complex types, including nonatomic values such as nested relations. Preserve relational foundations, in particular the declarative access to data, while extending modeling power. Provide upward compatibility with existing relational languages. XML (可扩展标记语言) Extensible markup language WWW Consortium (W3C) Originally as a document markup language The ability to specify new tags, and to create nested tag structures great way to exchange data Basis for all new-generation data interchange formats. A wide variety of tools is available for parsing, browsing and querying XML documents/data XML Extensional Markup Language Storage management (存储管理) Storage manager A module that provides the interface between the low-level data stored in the database and the application and queries submitted to the system. Responsibilities: Interaction with the file manager Efficient storing, retrieving and updating of data Issues: Storage access File organization Indexing and hashing Query processing: 3 steps Parsing and translation (解读与翻译) Optimization (优化) Evaluation (评价) Query processing (cont.) Alternative ways of evaluating a given query Equivalent expressions Different algorithms for each operation Cost difference between a good and a bad way of evaluating a query can be enormous Need to estimate the cost of operations Depends critically on statistical information about relations which the database must maintain Need to estimate statistics for intermediate results to compute cost of complex expressions Transaction Management 事务管理 Transaction (事务): A collection of operations that performs a single logical function in a database application Transaction-management component (事务管理组 件) ensures consistent (correct) state of database in case of system failures (e.g., power failures and operating system crashes) and transaction failures. Concurrency-control manager (并发控制管理器) controls the interaction among the concurrent transactions, to ensure the consistency of the database. Database Architecture 数据库架构 The architecture is greatly influenced by the underlying computer system Centralized (集中式) Client/Server (C/S, 客户端/服务器) Parallel (multi-processor, 并行) Distributed (分布式) Database users (数据库用户) Users are distinguished by their way to interact with the database system Application Programmers (应用程序员) – interact with system through DML calls Specialized users (专业用户) - write specialized applications that do not fit into the traditional data processing framework Naïve users (普通用户) – invoke one of the permanent applications that have been written previously Persons accessing the database over the web, bank tellers, clerical staff Database Administrator 数据库管理员 Responsible for coordinating all the activities of the database system good understanding of the enterprises’ information resources and needs Tasks include: schema definition (模式描述) storage structure (存储结构) and access method definition (访问方 法描述) schema and physical organization and modification (物理层面) authorities grants (权限分配) specifying integrity constraints (完整性约束) acting as liaison with users (与其他用户的联系纽带) monitoring performance and responding to changes in need (性能检 测、应对需求和危机) Learning strategies Know how => know why 适当的不求甚解 After-class assignment 1 Install MySQL on your own computer Linux: from source package or LAMP Windows: from rebuilt package or XAMP Learn some basic knowledge on HTML/PHP and B/S development Review of Perl/Python/C programming skills
© Copyright 2024 ExpyDoc