Data Vault

Business Analytics For All
Unlocking the Value within
the Data Vault
BA4All – Insight Session April 29th 2014
Guy Van der Sande – Vincent Greslebin
Fifthplay : Architecture
Data Warehouse
Smart Homes Platform
Gebruikers
ETL
Data
Vault
Data
mart
Marketing &
SSC
Dag - 1
-
Controle data kwaliteit
Toepassing business
rules
Aggregatie
Filtering
Utility Portal
Facility Portal
Fifthplay : Why Data Vault ?
• Pattern based design which allows agility to take place
• Easy to add new data sources making it future proof. This allows
Fifthplay to stay innovative
• Large volume of data
• Build up history that is not available in the operational system
• Possibility of performing analysis on raw data (cfr quality checks)
• Development speed (Pilot : 37 working days)
Data Vault ?
Data Vault ?
Data Vault ?
The Data Vault is a detail oriented, historical tracking and
uniquely linked set of normalized tables that support one or
more functional areas of business. It is a hybrid approach
encompassing the best of breed between 3rd normal form
(3NF) and star schema.
The design is flexible, scalable, consistent and adaptable to
the needs of the enterprise.
Standard architecture
 The centerpiece of the Enterprise Data Warehouse
 History is build-up
 Granularity as ‘detailed’ as possible
 No use of business rules
 Use of business keys that are horizontal in nature and
provide visibility across lines of business
 A new layer which has the benefits of the RAW Data
Vault, but with the business data embedded
 In the Business Data Vault the data has been altered,
cleansed and changed to meet the business rules
 Downstream of the raw data vault
 Starting point for Master Data Management
 Metadata is absolutely vital
Component parts of the Data Vault model
The Data Vault Model exists of 3 basic entity types
• Hubs : contains a unique list of business keys
• Links : associations across or between business keys
• Satellites : holds descriptive data (about the business key) over time
Component parts - Hub
• Represents a Core Business Concept
• Is formed around the Business Key of this concept
• Is established the first time a new instance of that
business key is introduced
• Must be 1:1 with a single instance
• Consists of the business key, a sequence id, a load
date/time stamp and a record source.
Component parts - Link
• Represents a natural business relationship between business keys
• Is established the first time this new unique association is presented
• Can represent an association between several Hubs and sometimes other
Links.
• maintains a 1:1 relationship with the unique and specific business defined
association between that set of keys.
• Consists of the sequence ids from the Hubs and Links
• Contains sequence id, a load date/time stamp and a
• record source.
Component parts - Satellite
• The Satellite contains the descriptive information
(context) for a business key.
• A Satellite can only describe one key (Hub or a Link).
• The Satellite is the only construct that manages time
slice data (data warehouse historical tracking of
values over time).
Data Vault – Why ?
Dimension 1
Dimension 2
Fact
Dimension 3
Dimension 4
Data Vault – Why ?
Dimension 1
Dimension 2
Fact
Dimension 3
Dimension 4
Data Vault – Why ?
Fact
Dimension 1
Dimension 2
Fact
Dimension 5
Dimension 3
Dimension 4
Data Vault – Why ?
DV
DM
Data Vault – Why ?
S
S
S
S
H
L
S
H
DV
H
H
DM
Data Vault – Why ?
S
S
S
S
H
L
S
H
DV
H
H
DM
Dimension
Fact
Data Vault – How did we do
it with Fifthplay ?
Fifthplay Raw Data Vault Architecture
Legend
Hub
HubSatHomeAreaManagerAddress
PK
SeqSatHomeAreaManagerA
ddress
Link
SeqHomeAreaManager
Satellite
LoadDateTime
LoadEndDateTime
HubSatCustomer
HubSatServicePartner
LinkSatHomeAreaManagerCity
RecordSource
PK
PK
SeqSatServicePartner
SeqSatCustomer
SeqCustomer
SeqServicePartner
PK
HomeAreaManagerAddress
Line1
LoadDateTime
LoadDateTime
LoadEndDateTime
LoadEndDateTime
HomeAreaManagerAddress
Line2
PK
SeqSatHomeAreaManagerCi
ty
LinkSatCountryCity
PK
SeqSatCountryCity
SeqSatHomeAreaManager
SeqHomeAreaManagerCity
SeqHomeAreaManager
LoadDateTime
LoadDateTime
LoadEndDateTime
LoadDateTime
LoadEndDateTime
RecordSource
LoadEndDateTime
RecordSource
HAMCityAddressLine1
RecordSource
HomeAreaManagerMode
HAMCityAddressLine2
CountryCityRegion
HomeAreaManagerArchitec
ture
HAMCityPhoneNumber
CountryCityState
HubSatCountry
SeqCountryCity
PK
RecordSource
RecordSource
RecordSource
HomeAreaManagerCountry
CustomerLanguage
ServicePartnerCustomerCon
tact
LoadEndDateTime
HomeAreaManagerProvince
CustomerLastName
ServiucePartnerEmail
LoadDateTime
HomeAreaManagerState
CustomerFirstName
ServicePartnerWebPage
LinkHomeAreaManagerCity
HubCustomer
PK
SeqServicePartnerCustomer
SeqServicePartner
CountryName
LinkCustomerHomeAreaManager
LinkServicePartnerCustomer
HubServicePartner
SeqCustomerHomeAreaMan
ager
SeqCustomer
PK
SeqCity
PK
SeqCountryCity
PK
SeqHomeAreaManager
CityPostalCode
SeqCountry
SeqCity
CityName
SeqCity
LoadDateTime
LoadDateTime
LoadDateTime
RecordSource
RecordSource
RecordSource
HomeAreaManagerNumber
LoadDateTime
LoadDateTime
LoadDateTime
LoadDateTime
RecordSource
RecordSource
RecordSource
SeqCountry
CountryIsoCode
SeqHomeAreaManager
LoadDateTime
RecordSource
SeqHomeAreaManagerCity
SeqHomeAreaManager
SeqCustomer
CustomerID
SeqCustomer
LoadDateTime
LinkCountryCity
HubCountry
PK
PK
SeqServicePartner
ServicePartnerID
HubCity
HubHomeAreaManager
PK
PK
SeqSatCountry
SeqCountry
HomeAreaManagerCityNam
e
CustomerEmail
ServicePartnerCode
PK
HubSatHomeAreaManager
HomeAreaManagerPostalCo
de
RecordSource
RecordSource
HubSatSmartPlug
LinkSatSmartPlugApplianceEnergyL
ogType
LinkCustomerDeviceGroup
PK
PK
SeqSatSmartPlug
PK
SeqCustomerDeviceGroup
SeqSatSmartPlugApplianceE
nergyLogType
SeqSmartPlug
SeqSmartPlugApplianceEner
gyLogType
SeqCustomer
LoadDateTime
LinkHomeAreaManagerSmartPlug
SeqDeviceGroup
LoadDateTime
LoadEndDateTime
LoadDateTime
RecordSource
LinkDeviceGroupSmartPlug
PK
HubSatDeviceGroup
PK
HubDeviceGroup
SeqSatDeviceGroup
SeqDeviceGroup
LoadDateTime
PK
PK
SeqDeviceGroupSmartPlug
SeqHomeAreaManagerSmar
tPlug
SeqHomeAreaManager
SmartPlugDisplayName
SeqDeviceGroup
SeqSmartPlug
SmartPlugManufacturer
SeqSmartPlug
LoadDateTime
SmartPlugModel
LoadDateTime
RecordSource
SmartPlugIsGenerator
RecordSource
SeqDeviceGroup
SmartPlugHasSchedule
LinkDeviceSubGroupSmartPlug
LoadDateTime
RecordSource
RecordSource
DeviceGroupName
LoadDateTime
HubSmartPlug
PK
HubSatAppliance
RecordSource
HubAppliance
EnergyLogDateTime
EnergyLogValue
PK
EnergyLogValueUnit
SmartPlugHasChildren
DeviceGroupID
LoadEndDateTime
LoadEndDateTime
RecordSource
SeqDeviceSubGroupSmartPl
ug
PK
LinkSmartPlugApplianceEnergyLogT
ype
PK
SeqAppliance
ApplianceID
LoadDateTime
LoadDateTime
LoadEndDateTime
RecordSource
RecordSource
ApplianceCategory
SeqSmartPlug
HubEnergyLogType
SeqDeviceGroup
SmartPlugID
SeqAppliance
PK
SeqEnergyLogType
SeqSmartPlug
LoadDateTime
SeqEnergyLogType
EnergyLogName
LoadDateTime
LoadDateTime
RecordSource
RecordSource
LoadDateTime
RecordSource
SeqSatAppliance
SeqAppliance
SeqSmartPlugApplianceEner
gyLogType
SeqSmartPlug
DeviceGroupDescription
PK
RecordSource
Fifthplay Raw Data Vault Architecture
Legend
Hub
LinkSatSmartPlugApplianceEnergyL
ogType
PK
Link
SeqSatSmartPlugApplianceE
nergyLogType
Satellite
SeqSmartPlugApplianceEner
gyLogType
LoadDateTime
LoadEndDateTime
HubSatAppliance
RecordSource
HubAppliance
EnergyLogDateTime
EnergyLogValue
PK
EnergyLogValueUnit
LinkSmartPlugApplianceEnergyLogT
ype
HubSmartPlug
PK
PK
LoadDateTime
RecordSource
SeqAppliance
ApplianceID
LoadDateTime
LoadDateTime
LoadEndDateTime
RecordSource
RecordSource
ApplianceCategory
HubEnergyLogType
SeqSmartPlug
SeqAppliance
SeqSatAppliance
SeqAppliance
SeqSmartPlugApplianceEner
gyLogType
SeqSmartPlug
SmartPlugID
PK
PK
SeqEnergyLogType
SeqEnergyLogType
EnergyLogName
LoadDateTime
LoadDateTime
RecordSource
RecordSource
Fifthplay : Data Vault – lessons learned
• Don’t stop with data vault; A combination with classic
dimensional Kimball-methodology is advised
• Be creative; get out of your comfort zone, dare to walk
the thine line
• While setting up the data vault, operational issues
where discovered early in the process
• ETL-development goes very quickly because of the
typical pattern design of the data vault;
Data Vault – What’s next ?
History and what’s next ?
Relational modeling
(E.F.Codd)
1960
1970
• Barry Devlin and
Dr Kimball
release
“Business Data
Warehouse”
• Bill Inmon
popularizes Data
Warehousing
• Dr Kimball
popularizes Star
Schema
Bill Inmon began
discussing Data
Warehousing
1980
1990
Dan Linstedt begins
R&D on Data Vault
Modeling
Dan Linstedt
releases first 5
articles on Data
Vault Modeling
2000
2010
2012 : Dan Linstedt
announces Data
Vault 2.0
2013 : Dan Linstedt
releases Data Vault
2.0 specs
Thank You
“In the Data Warehousing/BI world, we
should store the data as it stands on the
source system and interpret it on the
way out to the data marts. This is
absolutely critical to remember.”
Dan Linstedt
[email protected]
@BICC_at_USG
http://www.linkedin.com/company/usgprofessionalsbe
+32 3 231 94 84
www.usgict.be
https://www.facebook.com/usgictbe