Bryn Davies

Data Quality, Master Data
Management & Data Governance
Addressing the challenges using Oracle
Enterprise Data Quality (EDQ)
Presented by:
Bryn Davies
Managing Director
InfoBluePrint (Pty) Ltd
Introducing InfoBluePrint
InfoBluePrint is a company dedicated
to helping businesses to optimally manage
their critical business information
as a valuable corporate asset.
We are uniquely and exclusively focused
on DATA and we supply specialized
Data Quality and Information Management Services.
www.infoblueprint.co.za
Our Focus Areas
Data Migrations - ECTL
© 2012 InfoBluePrint
Example Business Areas Requiring a Data Quality Focus
•
•
•
•
•
•
•
•
•
•
•
•
•
Customer Relationship Management
Customer Centricity & Single View
Contactability
Online/Digital Strategy
Governance, Risk and Compliance
POPI
Watchlist Screening
Supplier & Procurement Management
Marketing & Sales
Human Resources
Business Intelligence
EPM
System Consolidations, New Systems, Data
Migrations
Data Categorised
DATA
Transactional
Data
Master Data
Provides context to
transactional data
Characteristic
Data
Reference
Data
Defines fixed value
domain data for
classification
Defines individual
characteristics of
the master entity
being defined
Example:
Name
Address
Town
Region
ID
Status
Level
Amount
B.T. Davies
25 Short St
CT
West
6012315053081
Active
Gold
R251.87
© 2012 InfoBluePrint
Common Data Problems – Customer Information
•
Typically key data for people and businesses such as:
–
–
–
–
–
–
–
•
•
•
•
•
Name, title
ID related information (ID no, company registration no, VAT no)
Address/location information (physical, postal, delivery, billing)
Contact details/types (address, phone, cell, email, website)
Status (eg. active vs inactive)
Product/Service type/class
Marketing attributes eg LSM, demographics, psychographics
Inconsistent spelling, formatting and structure
Incorrect
Out of date
Missing
Duplication of customers within and across different silos
– Account Centric rather than Customer Centric
– Lack cross-correlation and hierarchies: inability to achieve single view, householding, reliable
segmentation
© 2012 InfoBluePrint
Consequences of Unreliable Customer Information
Impossible to create a Single View of Customer
Unable to communicate
Wrong/inappropriate communication
Ineffective marketing
Unreliable reports
Poor decisions
Wrong decisions!
Inefficiency – wasted time and effort doing “scrap and rework”
Frustrated employees
Issue resolution takes longer – costs more
Frustrated customers
Complex, difficult new system roll-outs (over time/budget due to data issues)
Never ending “data clean-ups”
Legal and compliance problems (eg. PoPI, Sanctions/PEP, FATCA…)
© 2012 InfoBluePrint
InfoBluePrint’s 10 Key Points for Data Quality
1. (Quality) Data is an Asset
2. Data is a Product
3. Quality is Defined by the Data Customer
4. Data Quality is not just about Dirty Data
5. Data Quality Measurement is Mandatory
6. Don’t do DQ without DQ Software
7. Data Quality is the Ultimately the Responsibility of Business
8. Get the First DQ Project Right First Time
9. Understand the Human Element
10. Doing Nothing will Cost you the Most
© 2012 InfoBluePrint
A Database is like a Lake*
• The water is the data
• The streams represent
business processes
feeding the database
• Factories upstream are
sources of pollutant
• Information users
drink the water
*Analogy courtesy Tom Redman
A Data Quality Management Framework for Success
© 2012 InfoBluePrint
Introducing Oracle Enterprise Data Quality (EDQ)
DQ-Based Solutions
Business Solutions
Domain Knowledge
••
••
••
•
Pre-Built Solutions
• Any scope – components to end-to-end
solutions
• Any pre-built/reusable item
• Processes, methods
• Knowledge, reference data
• Application integration
•
Data Quality Platform
• Complete range of DQ capabilities
• Best-of-breed capabilities for party and product
data
• Easy to use, intuitive
• Open, tunable, flexible
Customer-delivered
Customer-delivered
Partner-delivered
Partner-delivered
Oracle-delivered
Oracle-delivered
Enterprise Data Quality
Governance
Dashboards
Match/Merge
Product Data
Extensions
Party Data
Extensions
Standardization
Profile and Audit
Introducing Oracle Enterprise Data Quality (EDQ)
Govern
Common Access/UI
Monitor effectiveness & resolve problems
Match
Identify & merge duplicates
Standardize
Drive conformance to standards
Profile
Quickly understand data content
Enterprise DQ Platform
• Process metrics
• Quality metrics
• Case Management
• Remediation
• Party (individuals, • Semantic
households) match
(category) match
• Entity match
• Statistical match
• Match review
• Merge/survivorship
• Global parse
• Category parse
• Extract
• Transform
• Substitute
• Address verification • Enrich
& geocoding
• Classify
• Statistics
• Patterns
• Phrases
• Duplicates
• Completeness
• Max/min values
Oracle EDQ – Denoise, Parse, Standardise
Name: Dr Ellen Van Der Heijde
Title: Dr
First: Ellen
Last: Van Der Heijde
Gender: Female
Name: Mr RJ & Mrs FB MacDonald
Title: Mr
First: R
Middle: J
Last: MacDonald
Gender: Male
Title: Mrs
First: F
Middle: B
Last: MacDonald
Gender: Female
• Standardize, Transform and Parse
• Split names and name elements
• Identify individuals and
businesses
• Derive additional attributes
Name: Jalila Abdul-Alim (Do Not Call)
First: Jalila
Last: Abdul-Alim
Gender: Female
Note: Do Not Call
Oracle EDQ – Match, Merge, Enrich
Title: Mr
First: Robert
Last: Fulmar
Gender: Male
DoB: 12/05/1978
Phone: 555-120-1329
Address:
9405 Main St
Fairfax
Virginia
22030
First: Bob
Last: Fulmar
Gender: Male
Email: [email protected]
Title: Dr
First: R
Last: Fulmer
DoB: 01/01/1978
Email: [email protected]
Address:
9407 Main Street
Fairfax
VA
22031-4001
Title: Dr
First: Robert
Last: Fulmar
Gender: Male
DoB: 12/05/1978
Email: [email protected]
Phone: 555-120-1329
Address:
9407 Main St
Fairfax
VA
22031-4001
• Match & Merge data from disparate sources
• Create ‘best’ record based on survivorship rules
EDQ Roles
Executives & Stakeholders
Business Analysts
Data Analysts
Developers
Data Stewards
Match
Reviewers
Managers &
Executives
Main EDQ Console, Focused on the User
Key Feature: Pre-built Processors
• Comprehensive DQ Functionality with a Single User Interface and Repository
Immediate drill-down to examine real data
Drill-down to see actual data
values and determine required
rules, standards etc.
EDQ Inbuilt Case Management for Governance
Review and resolve exceptions from the DQ process
Usage
• Cases/alerts are assigned a work queues
and a priority
• Data specialists sign in and review/resolve
issues
• Management reports allow monitoring of
work queues and productivity
• Helpful for
o One-time cleanse/migration
o Ongoing governance program
Features
• Hierarchical Case/alert functionality
• Configurable Workflows
• Automatic prioritization of cases/alerts
• Timers
• Email Notification Support
• Comprehensive audit trail
• Immediate ad-hoc reporting
Example Dashboard
Batch and Online Data Quality Deployment
CRM
Asset Management
Planned Maintenance
Billing
Clean your data –
then keep it clean
in real-time
Service
Finance
Realtime DQ with EDQ Web Services
Enforce common DQ standards across the enterprise
Applications
App 1
App 2
App 3
Any EDQ process may be
called as a real-time web
service
Call any process from any
application to
1.
Common
Services
Library of
enterprise
standard DQ
services
2.
Enforce common
standards
Minimize
architectural
changes
Introducing Oracle Enterprise Data Quality (EDQ)
DQ-Based Solutions
Business Solutions
Domain Knowledge
••
••
••
•
Pre-Built Solutions
• Any scope – components to end-to-end
solutions
• Any pre-built/reusable item
• Processes, methods
• Knowledge, reference data
• Application integration
•
Data Quality Platform
• Complete range of DQ capabilities
• Best-of-breed capabilities for party and product
data
• Easy to use, intuitive
• Open, tunable, flexible
Customer-delivered
Customer-delivered
Partner-delivered
Partner-delivered
Oracle-delivered
Oracle-delivered
Enterprise Data Quality
Governance
Dashboards
Match/Merge
Product Data
Extensions
Party Data
Extensions
Standardization
Profile and Audit
InfoBluePrint’s Generic EDQ Processes for SA Party Data
Party Type
Classification:
Natural Person
vs Juristic
Entity
(Individual vs
Organisation
•“Consumer” vs
“Business”
override rules can
be incorporated
Sub-Type
Classification:
SA ID, Temp
Visa, Private
Co., Trust,
NGO, Medical,
School etc etc
“Secondary
Location”
Parsing
and Derivation
of Missing
Data
Parsing,
Cleansing,
Standardising:
•Name, Legal,
Trading, Maiden
•ID, Co Reg
•Addresses – all
classes
•Telephone – all
classes
•Email
•Banking details
•And more
Householding
– various
categorisations
eg. name,
address, email,
banking etc)
PAMSS Data
Preparation &
Processing
© 2012 InfoBluePrint
Challenges with SA Address Data
•
•
•
•
•
•
•
•
•
No Address Standards in SA (SANS1883 pending)
SAPO data is generally unreliable
PAMSS is very basic and used only for bulk mailing discounts
Postcode system has very low granularity
Postcode system highly ambiguous – no distinctions in
hierarchies of city/town/suburb
Informal addresses are plentiful
Multiple languages used
No National Address Database (NAD) – several commercial
versions available at a cost – varying degrees of reliability
Several PAMSS and “Address Cleansing” vendors – varying
degrees of reliability, mostly offsite and not integrated into
your environment
© 2012 InfoBluePrint
InfoBluePrint EDQ Address Classification
Invalid & Intl
BOX
BUILDING
• Strip for ID’s, Names, Tel Nos.
• Invalid
• International (classify and
parse)
• Classify Primary Indicator
• Classify Secondary Indicator
• Classify Language
• Classify Primary Indicator (if
Null)
• Classify Language
FARM
STREET
CORNER OF
• Classify Primary Indicator (if
Null)
• Classify Language
• Classify Primary Indicator (if
Null)
• Classify Language
• Classify Primary Indicator (if
Null)
• Classify Language
PLOT/ERF/SITE
OTHERS
• Classify Primary Indicator (if
Null)
• Classify Secondary Indicator
• Classify Language
• Add Primary Indicator (if Null)
• Default language to English
© 2012 InfoBluePrint
Data Enrichment in SA
• Many data suppliers – generally:
– Marketing (list brokers)
– Credit Bureaus (sell data that they collect for various purposes)
• Some data suppliers do not have legal sources
• Many claims of attributes available prove to be false (low
population)
• Varying degrees of reliability, especially wrt currency of data
• Careful consideration is required as most supply on a subscription
basis
• Some bureaus offer a service to manually collect and/or validate
missing data
• Due diligence on SA data suppliers available as part of our service
© 2012 InfoBluePrint
Our Focus Areas
Data Migrations - ECTL
© 2012 InfoBluePrint
Data Governance
•
•
Data Governance is not about governing data – it is about
governing the people and processes that touch the data
Data Governance is not a product, a service or a project –
it is a formal organisational programme
Management is the decisions you make
Governance is the structure for making them
CIO Magazine
Law & Order
Data Governance
LAW
ORDER
The system of rules and
procedures for governing
data.
Automation for monitoring &
enforcing the rules and
procedures to use and protect
the data.
PEOPLE
PROCESS
TECHNOLOGY
Data Governance: Organisational Model
“Data Governance is the exercise of authority and control (planning, monitoring
and enforcement) over the management of data assets” (DAMA DMBoK)
Data Governance Steering Committee
• Approves strategy and direction
• Resolves escalated issues
• Co-exists with other strategic Steerco’s
DG Steerco
Data Governance Council
DG Council
•Approve enterprise data definitions
•Formulate data governance program decisions
•Ratify principles, standards, policies & processes
•Strategic issue resolution
•Encourage and facilitate change
Data Governance Office
Data Steward Teams
• The face of data governance across the enterprise
• Implements strategic data governance
transformation
• Incorporated within the Data Governance Council
Data Steward Teams
• Point of contact for daily data issues
• Subject matter experts
• Supplies data stewards
• Day to day consumers of data
EDQ and Data Governance
Data Governance Capabilities for Data Stewards
& Stakeholders
Data Flow
Explorer
Sources
Quality KPIs
Case & Issue
Management
Exception
Review
Data Flows
Oracle OpenWorld 2014
Metadata
Management
Business
Glossary
Targets
Master Data Management
Master Data Management
DQ Spans MDM and Data Integration
Oracle
BI/EPM
Data Services
Information
Management
Watchlist
Screening
Transaction Processing
Services
Customer Hub
Oracle
Data
Integration
ETL/E-LT
Storage
3rd Party
Applications
Oracle
Applications
Business Intelligence
Services
Product Hub
Content Management
Services
Supplier Hub
Financial Hub
Profiling
Standardization
Match/Merge
Data Federation
Replication
Transformation
OLTP
System
Collaboration
Services
Site Hub
Oracle
MDM
Enterprise Data Quality
Data Warehouse/
Data Mart
Custom
Applications
OLAP Cube
Synchronization
Web and Event
Services, SOA
MDM Strategy Development
MDM Scope
• Business Goals
• Data Types
• Processing Requirements
MDM Business Solution
• Solution Functional Components
• Solution Patterns
• Integration Requirements
MDM Roadmap
• Functional Component Dependencies
• Business Benefit Realisation
MDM Technical Solution
• Technology
• Implemented Solution
© 2012 InfoBluePrint
MDM Scope
© 2012 InfoBluePrint
MDM Solution
Customers
Inter Office Email
Business
WEB
Admin
Front
Ends
Master Data Direct Update
and Enquiry – CRM Front End
Admin
DATA MIGRATION (Initial)
NEW(Operational)
LEGACY (Operational)
Extraction
Cleansing
Augmentation
Load
eRA
Master Data
Applications
Application
NEW (Analytical)
LEGACY (Analytical)
Application
Integration Adapters
Data
Integration Adapters
ENABLERS (Ongoing)
Data Mapping
Matching
Merging
Data
DATA ALIGNMENT(Ongoing)
eRAData
To Master
eRA Data
From Master
Synchronisation
Broadcast Services
SUPPORT SERVICES
(Ongoing)
Backup & Recovery
Data
© 2012 InfoBluePrint
Business Continuity
Critical Interdependencies
DQ because:
DQ needs:
-
DG needs:
MDM needs:
- DQ provides the framework,
processes and artefacts for
measuring and managing data
improvement
- DQ provides supporting
artefacts and processes
- DQ monitoring is a
dashboard for Data
Governance effectiveness
- Initial migration must take on
quality master data (and
external data)
- Consistency in
format/value/rules is required
- DQ of hub data must be
2010 © InfoBlueprint
controlled and known!
DG because:
MDM because:
-DG provides the structures
for the preventive part of
DQ, eg. people/process
- DG drives out metadata
issues
-DG provides direction and
policies required to manage
the data
- MDM provides the technology
platform for persisted quality
data
- MDM forces enterprise view of
Data Quality
- MDM drives common data
models and hierarchies (eg.
party)
-
- DG resolves people &
process issues for MDM
- DG drives ownership and
stewardship
- DG forces preventive
measures to be in place
-MDM provides a physical data
DMZ
- MDM drives new roles and
responsibilities for data
- MDM provides a technical
platform to support Data
Governance
© 2012 InfoBluePrint
39
Examples of What’s Needed to
Get Data Under Control
Data Governance
People
Process
Practices
Artefacts
Technology
Data Quality
MDM
DG Steerco
Data Council
Data Stewards
DQ Forums
Data Quality
Specialists
Data Analysts
DQ Tools Skills
Data Architect
Data Modellers
MDM Tools Skills
Master Data Inventory
Data Steward Matrix
Data Policies
Data issues: rules for
identification, categorisation,
prioritisation
Issue Resolution Workflows
Business Rules
DQ Assessments
DQ Improvement
Processes &
Systems
DQ Monitoring
MDM Models
MDM Hierarchies
Validations
System of Entry vs
System of Record
Workflows
DG Policy Admin
Metadata Repository
Workflow Tool
Rules Repository
Data Quality Tool –
batch and realtime
MDM Hub(s)
Data Integration
Data Quality
© 2012 InfoBluePrint
www.infoblueprint.co.za
[email protected]
http://www.oracle.com/technetwork/middleware/oedq/overview/index.html
InfoBluePrint Clients
A Data Quality Management Framework for Success
•
To manage Data Quality properly requires both
corrective and preventive actions.
This approach provides us with:
−Top-down prevention focused approach to
define and implement the appropriate
management and practices that will be
required to ensure that we will have
sustainability.
−Bottom –up correction based approach for
situations which need to address the
identified problems.
•
•
Before starting Correction activities it is
important to put in place the supporting
governance and processes that will enable
improvements:
• Effective Prioritisation
• Adequate management
• Appropriate monitoring and reporting
• Consistent corrective action
Ongoing Monitoring is required to highlight:
• Improvements after correction activities
• Incidence of new issues identified for
the first time
© 2012 InfoBluePrint
InfoBluePrint Data Quality Improvement (DQI) - Process
© 2012 InfoBluePrint
InfoBluePrint DQ Assessment – Measure Data Quality
© 2012 InfoBluePrint