Dimensional Fact Model

Dimensional Fact Model
Stuttgart, 26/11/2014
Stefano Cazzella @StefanoCazzella
http://caccio.blogdns.net
http://bimodeler.com
stefano.cazzella{at}gmail.com
BI ACADEMY Launch@Germany - Stuttgart, 26/11/2014 - Stefano Cazzella
1
Complexity in SE and IS development
The art of programming is the art of organizing
complexity, of mastering multitude and avoiding
its bastard chaos as effectively as possible.
– Edsger Dijkstra, “Notes on Structured Programming”
BI ACADEMY Launch@Germany - Stuttgart, 26/11/2014 - Stefano Cazzella
2
Project Layers
Business
• User requirements
• Conceptual model
Design
• Technical choices
• Logical model
Build
• Tecnology
• Physical model
BI ACADEMY Launch@Germany - Stuttgart, 26/11/2014 - Stefano Cazzella
3
Civil Engineering Example
Business
Design
Build
What the
client wants
The
technical
blueprint
The desired
building
BI ACADEMY Launch@Germany - Stuttgart, 26/11/2014 - Stefano Cazzella
4
Model-driven engineering
Model transformation
• Business
centric
• No tecnical
details
PIM
PSM
• Tecnical
design
• System
architecture
• Tecnical
deliverables
• System
realization
Build
Model transformation
BI ACADEMY Launch@Germany - Stuttgart, 26/11/2014 - Stefano Cazzella
5
Project Layers for Data Mart
Business • DFM
Design
Build
Dimensional
Fact
Model
• Relational model
• DBMS specific DDL
BI ACADEMY Launch@Germany - Stuttgart, 26/11/2014 - Stefano Cazzella
6
Why Dimensional Fact Model ?
1
4
Formal language à well-specified syntax and an unequivocally
interpretation (semantic) based on a sound algebraic definition
2
Simple and effective graphical notation (representation)
3
Specifically defined to represent multi-dimensional models
Does not imply any technical/implementation choice
BI ACADEMY Launch@Germany - Stuttgart, 26/11/2014 - Stefano Cazzella
7
DFM Notation Compendium
BI ACADEMY Launch@Germany - Stuttgart, 26/11/2014 - Stefano Cazzella
8
Data Mart building process
Business user’s needs
Technical specifications
+
=
Requirements
definition
Deployment
Implementation
strategy
Model
transformation
Multidimensional
data model
(Dimensional Fact Model)
Data Mart
Model
transformation
Logical data model
(Relational model:
tables, columns, etc.)
BI ACADEMY Launch@Germany - Stuttgart, 26/11/2014 - Stefano Cazzella
Phisical data model
(DDL with indexes,
partions, etc.)
9
Data Mart building process
Business user’s needs
Technical specifications
+
=
Requirements
user’s
needs
definition
Formalize
in aImplementation
conceptual
(business-centric) model, thenstrategy
…
Model
transformtransformation
it in a logical
…
model
technical specification …
Data Mart
Deployment
Model
transformation
integrating
Multidimensional
data in
model
Phisical data model
… and transformLogical
it again
a physical model
data model
that realizes the (Relational
businessmodel:
requirements (DDL with indexes,
(Dimensional Fact Model)
tables, columns, etc.)
partions, etc.)
BI ACADEMY Launch@Germany - Stuttgart, 26/11/2014 - Stefano Cazzella
10
Business - From requisite to DFM
•  Context: weblog analytics - the
analysis of the visits of several
web sites belonging to different
domains (eg. Google Analytics)
•  Requisite: monitoring and
analyzing the number of visits
and their monthly and daily
average duration for each page
of the websites, or each domain,
+
distributed by the geographic
þ Domain definition
þ Aggregation rules
region of the IP of the visitors.
þ Optional dependencies
BI ACADEMY Launch@Germany - Stuttgart, 26/11/2014 - Stefano Cazzella
11
Design choice
Reference ROLAP model:
• Star-schema (denormalized dimension table)
• Snow-flake (hierarchies implemented by tables in 3NF)
Hierarchy implementation strategy (for every dimension)
• Use natural key (the dimension attribute à PK column)
• Use surrogate key (add a new column with no business meaning)
• Use slow-changing dimension (SCD) of type 2
• Use implicit dimension (no dimension table, only a column in the fact table)
Domain ßà Data type association
• Text à VARCHAR(250) ; Currency à NUMBER(9,2) ; etc.
Standard naming conventions and abbreviations
• Table name prefix (D for Dimensions, F for Facts) ; Number à NBR ; etc.
BI ACADEMY Launch@Germany - Stuttgart, 26/11/2014 - Stefano Cazzella
12
Transform DFM in a Relational Model
Surrogate key
SCD-2
Start date
End date
Model
transformation
Technical design choices:
Fact grain
•  Reference ROLAP model à star-schema
•  Hierarchy Viewerà use surrogate key
•  Hierarchy Page à SCD – Type 2
BI ACADEMY Launch@Germany - Stuttgart, 26/11/2014 - Stefano Cazzella
13
Build choice
Choice the DBMS
• SqlServer – Oracle – Hive / Hadoop
Generate constraints?
• Generate unique keys / primary keys / integrity constraints (foreign keys)
Add specific indexes
• Add clustered indexes / column-store indexes / bitmap indexes / etc.
Define table partitions
• Organize fact tables in partitions (by hash, value, range, etc.)
Distribute data over multiple volumes
• Define file groups / tablespaces for tables, partitions, indexes
BI ACADEMY Launch@Germany - Stuttgart, 26/11/2014 - Stefano Cazzella
14
Phisical model and DDL (1)
Implementation choices & best practice:
• 
• 
• 
• 
DBMS à SQL Server
Fact F_VISITS partitioned by year
Column-store index on day and duration
2 distinct file groups for tables and indexes
Partition scheme and functions
File groups
Columnstore index
BI ACADEMY Launch@Germany - Stuttgart, 26/11/2014 - Stefano Cazzella
15
Phisical model and DDL (2)
Implementation choices & best practice:
• 
• 
• 
• 
DBMS à Oracle
Fact F_VISITS partitioned by year
Bitmap index on viewer dimension
2 distinct table spaces for tables and
indexes
Table spaces
Table partitions
Bitmap index
BI ACADEMY Launch@Germany - Stuttgart, 26/11/2014 - Stefano Cazzella
16
BI Modeler
•  In order to apply a model-driven approach, BI Project teams
need a software tool to:
þ Manage (draw) all the models - DFM, relational, etc.
þ Support (and drive) the model transformation process
•  There was (are) no many tools able to do that so, in 2006 I
started working on the development of …
http://bimodeler.com
BI ACADEMY Launch@Germany - Stuttgart, 26/11/2014 - Stefano Cazzella
17
DEMO
Create a
DFM about
SALES from
scratch
Define the fact schema and its measures
Transform a
DFM in a
relational
data model
Define an implementation strategy for Hierarchies
Add physical
properties
to the
relational
model
Choose a DBMS
Add some dimensions / hierarchies
Define and associate domains to attributes and measures
Associate Data type to domains
Apply a naming convention
Create partitions
Create indexes
Generate DDL
BI ACADEMY Launch@Germany - Stuttgart, 26/11/2014 - Stefano Cazzella
18