Dimensional Fact Model Stuttgart, 26/11/2014 Stefano Cazzella @StefanoCazzella http://caccio.blogdns.net http://bimodeler.com stefano.cazzella{at}gmail.com BI ACADEMY Launch@Germany - Stuttgart, 26/11/2014 - Stefano Cazzella 1 Complexity in SE and IS development The art of programming is the art of organizing complexity, of mastering multitude and avoiding its bastard chaos as effectively as possible. – Edsger Dijkstra, “Notes on Structured Programming” BI ACADEMY Launch@Germany - Stuttgart, 26/11/2014 - Stefano Cazzella 2 Project Layers Business • User requirements • Conceptual model Design • Technical choices • Logical model Build • Tecnology • Physical model BI ACADEMY Launch@Germany - Stuttgart, 26/11/2014 - Stefano Cazzella 3 Civil Engineering Example Business Design Build What the client wants The technical blueprint The desired building BI ACADEMY Launch@Germany - Stuttgart, 26/11/2014 - Stefano Cazzella 4 Model-driven engineering Model transformation • Business centric • No tecnical details PIM PSM • Tecnical design • System architecture • Tecnical deliverables • System realization Build Model transformation BI ACADEMY Launch@Germany - Stuttgart, 26/11/2014 - Stefano Cazzella 5 Project Layers for Data Mart Business • DFM Design Build Dimensional Fact Model • Relational model • DBMS specific DDL BI ACADEMY Launch@Germany - Stuttgart, 26/11/2014 - Stefano Cazzella 6 Why Dimensional Fact Model ? 1 4 Formal language à well-specified syntax and an unequivocally interpretation (semantic) based on a sound algebraic definition 2 Simple and effective graphical notation (representation) 3 Specifically defined to represent multi-dimensional models Does not imply any technical/implementation choice BI ACADEMY Launch@Germany - Stuttgart, 26/11/2014 - Stefano Cazzella 7 DFM Notation Compendium BI ACADEMY Launch@Germany - Stuttgart, 26/11/2014 - Stefano Cazzella 8 Data Mart building process Business user’s needs Technical specifications + = Requirements definition Deployment Implementation strategy Model transformation Multidimensional data model (Dimensional Fact Model) Data Mart Model transformation Logical data model (Relational model: tables, columns, etc.) BI ACADEMY Launch@Germany - Stuttgart, 26/11/2014 - Stefano Cazzella Phisical data model (DDL with indexes, partions, etc.) 9 Data Mart building process Business user’s needs Technical specifications + = Requirements user’s needs definition Formalize in aImplementation conceptual (business-centric) model, thenstrategy … Model transformtransformation it in a logical … model technical specification … Data Mart Deployment Model transformation integrating Multidimensional data in model Phisical data model … and transformLogical it again a physical model data model that realizes the (Relational businessmodel: requirements (DDL with indexes, (Dimensional Fact Model) tables, columns, etc.) partions, etc.) BI ACADEMY Launch@Germany - Stuttgart, 26/11/2014 - Stefano Cazzella 10 Business - From requisite to DFM • Context: weblog analytics - the analysis of the visits of several web sites belonging to different domains (eg. Google Analytics) • Requisite: monitoring and analyzing the number of visits and their monthly and daily average duration for each page of the websites, or each domain, + distributed by the geographic þ Domain definition þ Aggregation rules region of the IP of the visitors. þ Optional dependencies BI ACADEMY Launch@Germany - Stuttgart, 26/11/2014 - Stefano Cazzella 11 Design choice Reference ROLAP model: • Star-schema (denormalized dimension table) • Snow-flake (hierarchies implemented by tables in 3NF) Hierarchy implementation strategy (for every dimension) • Use natural key (the dimension attribute à PK column) • Use surrogate key (add a new column with no business meaning) • Use slow-changing dimension (SCD) of type 2 • Use implicit dimension (no dimension table, only a column in the fact table) Domain ßà Data type association • Text à VARCHAR(250) ; Currency à NUMBER(9,2) ; etc. Standard naming conventions and abbreviations • Table name prefix (D for Dimensions, F for Facts) ; Number à NBR ; etc. BI ACADEMY Launch@Germany - Stuttgart, 26/11/2014 - Stefano Cazzella 12 Transform DFM in a Relational Model Surrogate key SCD-2 Start date End date Model transformation Technical design choices: Fact grain • Reference ROLAP model à star-schema • Hierarchy Viewerà use surrogate key • Hierarchy Page à SCD – Type 2 BI ACADEMY Launch@Germany - Stuttgart, 26/11/2014 - Stefano Cazzella 13 Build choice Choice the DBMS • SqlServer – Oracle – Hive / Hadoop Generate constraints? • Generate unique keys / primary keys / integrity constraints (foreign keys) Add specific indexes • Add clustered indexes / column-store indexes / bitmap indexes / etc. Define table partitions • Organize fact tables in partitions (by hash, value, range, etc.) Distribute data over multiple volumes • Define file groups / tablespaces for tables, partitions, indexes BI ACADEMY Launch@Germany - Stuttgart, 26/11/2014 - Stefano Cazzella 14 Phisical model and DDL (1) Implementation choices & best practice: • • • • DBMS à SQL Server Fact F_VISITS partitioned by year Column-store index on day and duration 2 distinct file groups for tables and indexes Partition scheme and functions File groups Columnstore index BI ACADEMY Launch@Germany - Stuttgart, 26/11/2014 - Stefano Cazzella 15 Phisical model and DDL (2) Implementation choices & best practice: • • • • DBMS à Oracle Fact F_VISITS partitioned by year Bitmap index on viewer dimension 2 distinct table spaces for tables and indexes Table spaces Table partitions Bitmap index BI ACADEMY Launch@Germany - Stuttgart, 26/11/2014 - Stefano Cazzella 16 BI Modeler • In order to apply a model-driven approach, BI Project teams need a software tool to: þ Manage (draw) all the models - DFM, relational, etc. þ Support (and drive) the model transformation process • There was (are) no many tools able to do that so, in 2006 I started working on the development of … http://bimodeler.com BI ACADEMY Launch@Germany - Stuttgart, 26/11/2014 - Stefano Cazzella 17 DEMO Create a DFM about SALES from scratch Define the fact schema and its measures Transform a DFM in a relational data model Define an implementation strategy for Hierarchies Add physical properties to the relational model Choose a DBMS Add some dimensions / hierarchies Define and associate domains to attributes and measures Associate Data type to domains Apply a naming convention Create partitions Create indexes Generate DDL BI ACADEMY Launch@Germany - Stuttgart, 26/11/2014 - Stefano Cazzella 18
© Copyright 2024 ExpyDoc