1-PDF

THE DATA LAKE DREAM
Edd Dumbill • @edd
[email protected] • svds.com/StrataNY2014
2
© 2014 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED.
WHAT IS A DATA
LAKE?
A scalable, accessible
repository of data
DW
Analytics
Hadoop
(in its natural or processed state)
3
© 2014 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED.
CONVENTIONAL DATA STRATEGY
“WHAT YOU DO TO DATA”
CLEAN
4
VALIDATE
© 2014 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED.
CONTROL
PROTECT
MODERN DATA STRATEGY
“WHAT YOU DO WITH DATA”
ATTRACT NEW CUSTOMERS
TARGET VIP CUSTOMERS
AUTOMATE
5
© 2014 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED.
growth potential
big data applications
well
understood
systems
uncertainty
6
© 2014 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED.
TOWARDS THE “DATA LAKE” — Step 1
DW
7
© 2014 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED.
TOWARDS THE “DATA LAKE” — Step 2
DW
Hadoop
8
© 2014 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED.
Analytics
TOWARDS THE “DATA LAKE” — Step 3
DW
Hadoop
9
© 2014 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED.
Analytics
TOWARDS THE “DATA LAKE” — Step 4
DW
Analytics
Hadoop
10
© 2014 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED.
UP vs. OUT — Enterprise Edition
Scale-up cost
US Dollars
UC1
UC2
Increasing cost per unit
of capability from scaleup architectures causes
rationing of resources.
Only the most valuable
use cases are pursued.
UC4
UC3
UC5
Data Resource Usage
11
© 2014 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED.
Different use cases put
different demands on
the data infrastructure.
Scale-out cost
THE DATA VALUE CHAIN
DRAW VALUE FROM YOUR STRATEGIC DATA ASSETS
Discover
12
Ingest
Process
© 2014 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED.
Persist
Integrate
Analyze
Expose
• 
Make it cheap
•  Failure as a feature
•  Ask good questions
• 
Make it quick
•  Both learning and
adaptation
•  Enable the feedback
loop
• 
Don’t break things
•  Make operations a
platform for innovation
•  APIs, platforms, simulation
13
© 2014 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED.
BUILD FOR
EXPERIMENTS
THE EXPERIMENTAL ENTERPRISE
Data science allows us to observe our
experiments and respond to the
changing environment.
We need to both support investigative
work and build a solid layer for
production.
The foundation of the experimental
enterprise focuses on making
infrastructure readily accessible.
14
© 2014 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED.
Edd Dumbill
[email protected]
@edd
@SVDataScience
Yes, we’re hiring!
[email protected]
Want these slides? Go to:
svds.com/StrataNY2014
15