Business Analytics For All Unlocking the Value within the Data Vault BA4All – Insight Session April 29th 2014 Guy Van der Sande – Vincent Greslebin Fifthplay : Architecture Data Warehouse Smart Homes Platform Gebruikers ETL Data Vault Data mart Marketing & SSC Dag - 1 - Controle data kwaliteit Toepassing business rules Aggregatie Filtering Utility Portal Facility Portal Fifthplay : Why Data Vault ? • Pattern based design which allows agility to take place • Easy to add new data sources making it future proof. This allows Fifthplay to stay innovative • Large volume of data • Build up history that is not available in the operational system • Possibility of performing analysis on raw data (cfr quality checks) • Development speed (Pilot : 37 working days) Data Vault ? Data Vault ? Data Vault ? The Data Vault is a detail oriented, historical tracking and uniquely linked set of normalized tables that support one or more functional areas of business. It is a hybrid approach encompassing the best of breed between 3rd normal form (3NF) and star schema. The design is flexible, scalable, consistent and adaptable to the needs of the enterprise. Standard architecture The centerpiece of the Enterprise Data Warehouse History is build-up Granularity as ‘detailed’ as possible No use of business rules Use of business keys that are horizontal in nature and provide visibility across lines of business A new layer which has the benefits of the RAW Data Vault, but with the business data embedded In the Business Data Vault the data has been altered, cleansed and changed to meet the business rules Downstream of the raw data vault Starting point for Master Data Management Metadata is absolutely vital Component parts of the Data Vault model The Data Vault Model exists of 3 basic entity types • Hubs : contains a unique list of business keys • Links : associations across or between business keys • Satellites : holds descriptive data (about the business key) over time Component parts - Hub • Represents a Core Business Concept • Is formed around the Business Key of this concept • Is established the first time a new instance of that business key is introduced • Must be 1:1 with a single instance • Consists of the business key, a sequence id, a load date/time stamp and a record source. Component parts - Link • Represents a natural business relationship between business keys • Is established the first time this new unique association is presented • Can represent an association between several Hubs and sometimes other Links. • maintains a 1:1 relationship with the unique and specific business defined association between that set of keys. • Consists of the sequence ids from the Hubs and Links • Contains sequence id, a load date/time stamp and a • record source. Component parts - Satellite • The Satellite contains the descriptive information (context) for a business key. • A Satellite can only describe one key (Hub or a Link). • The Satellite is the only construct that manages time slice data (data warehouse historical tracking of values over time). Data Vault – Why ? Dimension 1 Dimension 2 Fact Dimension 3 Dimension 4 Data Vault – Why ? Dimension 1 Dimension 2 Fact Dimension 3 Dimension 4 Data Vault – Why ? Fact Dimension 1 Dimension 2 Fact Dimension 5 Dimension 3 Dimension 4 Data Vault – Why ? DV DM Data Vault – Why ? S S S S H L S H DV H H DM Data Vault – Why ? S S S S H L S H DV H H DM Dimension Fact Data Vault – How did we do it with Fifthplay ? Fifthplay Raw Data Vault Architecture Legend Hub HubSatHomeAreaManagerAddress PK SeqSatHomeAreaManagerA ddress Link SeqHomeAreaManager Satellite LoadDateTime LoadEndDateTime HubSatCustomer HubSatServicePartner LinkSatHomeAreaManagerCity RecordSource PK PK SeqSatServicePartner SeqSatCustomer SeqCustomer SeqServicePartner PK HomeAreaManagerAddress Line1 LoadDateTime LoadDateTime LoadEndDateTime LoadEndDateTime HomeAreaManagerAddress Line2 PK SeqSatHomeAreaManagerCi ty LinkSatCountryCity PK SeqSatCountryCity SeqSatHomeAreaManager SeqHomeAreaManagerCity SeqHomeAreaManager LoadDateTime LoadDateTime LoadEndDateTime LoadDateTime LoadEndDateTime RecordSource LoadEndDateTime RecordSource HAMCityAddressLine1 RecordSource HomeAreaManagerMode HAMCityAddressLine2 CountryCityRegion HomeAreaManagerArchitec ture HAMCityPhoneNumber CountryCityState HubSatCountry SeqCountryCity PK RecordSource RecordSource RecordSource HomeAreaManagerCountry CustomerLanguage ServicePartnerCustomerCon tact LoadEndDateTime HomeAreaManagerProvince CustomerLastName ServiucePartnerEmail LoadDateTime HomeAreaManagerState CustomerFirstName ServicePartnerWebPage LinkHomeAreaManagerCity HubCustomer PK SeqServicePartnerCustomer SeqServicePartner CountryName LinkCustomerHomeAreaManager LinkServicePartnerCustomer HubServicePartner SeqCustomerHomeAreaMan ager SeqCustomer PK SeqCity PK SeqCountryCity PK SeqHomeAreaManager CityPostalCode SeqCountry SeqCity CityName SeqCity LoadDateTime LoadDateTime LoadDateTime RecordSource RecordSource RecordSource HomeAreaManagerNumber LoadDateTime LoadDateTime LoadDateTime LoadDateTime RecordSource RecordSource RecordSource SeqCountry CountryIsoCode SeqHomeAreaManager LoadDateTime RecordSource SeqHomeAreaManagerCity SeqHomeAreaManager SeqCustomer CustomerID SeqCustomer LoadDateTime LinkCountryCity HubCountry PK PK SeqServicePartner ServicePartnerID HubCity HubHomeAreaManager PK PK SeqSatCountry SeqCountry HomeAreaManagerCityNam e CustomerEmail ServicePartnerCode PK HubSatHomeAreaManager HomeAreaManagerPostalCo de RecordSource RecordSource HubSatSmartPlug LinkSatSmartPlugApplianceEnergyL ogType LinkCustomerDeviceGroup PK PK SeqSatSmartPlug PK SeqCustomerDeviceGroup SeqSatSmartPlugApplianceE nergyLogType SeqSmartPlug SeqSmartPlugApplianceEner gyLogType SeqCustomer LoadDateTime LinkHomeAreaManagerSmartPlug SeqDeviceGroup LoadDateTime LoadEndDateTime LoadDateTime RecordSource LinkDeviceGroupSmartPlug PK HubSatDeviceGroup PK HubDeviceGroup SeqSatDeviceGroup SeqDeviceGroup LoadDateTime PK PK SeqDeviceGroupSmartPlug SeqHomeAreaManagerSmar tPlug SeqHomeAreaManager SmartPlugDisplayName SeqDeviceGroup SeqSmartPlug SmartPlugManufacturer SeqSmartPlug LoadDateTime SmartPlugModel LoadDateTime RecordSource SmartPlugIsGenerator RecordSource SeqDeviceGroup SmartPlugHasSchedule LinkDeviceSubGroupSmartPlug LoadDateTime RecordSource RecordSource DeviceGroupName LoadDateTime HubSmartPlug PK HubSatAppliance RecordSource HubAppliance EnergyLogDateTime EnergyLogValue PK EnergyLogValueUnit SmartPlugHasChildren DeviceGroupID LoadEndDateTime LoadEndDateTime RecordSource SeqDeviceSubGroupSmartPl ug PK LinkSmartPlugApplianceEnergyLogT ype PK SeqAppliance ApplianceID LoadDateTime LoadDateTime LoadEndDateTime RecordSource RecordSource ApplianceCategory SeqSmartPlug HubEnergyLogType SeqDeviceGroup SmartPlugID SeqAppliance PK SeqEnergyLogType SeqSmartPlug LoadDateTime SeqEnergyLogType EnergyLogName LoadDateTime LoadDateTime RecordSource RecordSource LoadDateTime RecordSource SeqSatAppliance SeqAppliance SeqSmartPlugApplianceEner gyLogType SeqSmartPlug DeviceGroupDescription PK RecordSource Fifthplay Raw Data Vault Architecture Legend Hub LinkSatSmartPlugApplianceEnergyL ogType PK Link SeqSatSmartPlugApplianceE nergyLogType Satellite SeqSmartPlugApplianceEner gyLogType LoadDateTime LoadEndDateTime HubSatAppliance RecordSource HubAppliance EnergyLogDateTime EnergyLogValue PK EnergyLogValueUnit LinkSmartPlugApplianceEnergyLogT ype HubSmartPlug PK PK LoadDateTime RecordSource SeqAppliance ApplianceID LoadDateTime LoadDateTime LoadEndDateTime RecordSource RecordSource ApplianceCategory HubEnergyLogType SeqSmartPlug SeqAppliance SeqSatAppliance SeqAppliance SeqSmartPlugApplianceEner gyLogType SeqSmartPlug SmartPlugID PK PK SeqEnergyLogType SeqEnergyLogType EnergyLogName LoadDateTime LoadDateTime RecordSource RecordSource Fifthplay : Data Vault – lessons learned • Don’t stop with data vault; A combination with classic dimensional Kimball-methodology is advised • Be creative; get out of your comfort zone, dare to walk the thine line • While setting up the data vault, operational issues where discovered early in the process • ETL-development goes very quickly because of the typical pattern design of the data vault; Data Vault – What’s next ? History and what’s next ? Relational modeling (E.F.Codd) 1960 1970 • Barry Devlin and Dr Kimball release “Business Data Warehouse” • Bill Inmon popularizes Data Warehousing • Dr Kimball popularizes Star Schema Bill Inmon began discussing Data Warehousing 1980 1990 Dan Linstedt begins R&D on Data Vault Modeling Dan Linstedt releases first 5 articles on Data Vault Modeling 2000 2010 2012 : Dan Linstedt announces Data Vault 2.0 2013 : Dan Linstedt releases Data Vault 2.0 specs Thank You “In the Data Warehousing/BI world, we should store the data as it stands on the source system and interpret it on the way out to the data marts. This is absolutely critical to remember.” Dan Linstedt [email protected] @BICC_at_USG http://www.linkedin.com/company/usgprofessionalsbe +32 3 231 94 84 www.usgict.be https://www.facebook.com/usgictbe
© Copyright 2024 ExpyDoc