Meet the Walkers!

Meet the Walkers!
Accelerating Index Traversals for In-Memory Databases"
Onur Kocberber"
Boris Grot, Javier Picorel, Babak Falsafi,"
Kevin Lim, Parthasarathy Ranganathan "
Our World is Data-Driven!"
Data resides in huge databases"
Data!
–  Most frequent task: find data"
"
Indexes used for fast data lookup"
" –  Rely on pointer-intensive data structures"
"
Index!
"
Indexing efficiency is critical "
–  Many requests, abundant parallelism "
–  Power-limited hardware"
"
"
"
Need high-throughput and energy-efficient index lookups"2"
Index Lookups on General-Purpose Cores "
Index Lookups!
•  Data in memory"
•  Inherent parallelism"
OoO Cores!
•  Pointer-chasing ! Low MLP"
•  Limited OoO inst. window"
–  One lookup at a time"
Index Lookups"
"
OoO Core"
Inst. Window"
Memory"
OoO cores are ill-matched to indexing"
3"
Widx: an Indexing Widget"
Specialized: Custom HW for index lookups"
–  Switch fewer transistors per lookup"
"
Parallel: Multiple lookups at once"
–  Extract parallelism beyond the OoO exec. window"
"
Programmable: Simple RISC cores"
–  Target a wide range of DBMSs"
3x higher throughput, 5.5x energy reduction vs. OoO"
4"
Outline"
"
• 
• 
• 
• 
• 
• 
Introduction"
Indexing in database systems"
Indexing inefficiencies in modern processors"
Widx"
Evaluation highlights"
Summary"
"
5"
Modern Databases & Indexing"
Indexes are essential for all database operators"
–  Data structures for fast data lookup"
Hash index: fundamental index structure"
"
"
"
"
"
"
"
Hash Index"
K!
0"
R"
T"
1"
C"
Q"
2"
E"
K"
Walk"
Dominant operation: join via hash index"
"
"
6"
Join via Hash Index"
Join: findon
theindex
matching
values
in Ainand
B"
Lookup
for every
entry
A ""
Hash
B" Index on B"
A"
Join"
Walk"
7"
How Much Time is Spent Indexing?!
Measurement on Xeon 5670 CPU with 100GB Dataset"
% of Execution Time"
Index "
Scan "
Sort & Join"
Other"
100"
75"
50"
25"
0"
2" 3" 5" 7" 8" 9" 11"13"14"15"17"18"19"20"21"22" 5" 37"40"43"46"52"64"81"82"
TPC-H"
TPC-DS"
Indexing is the biggest contributor to execution time"
8"
Dissecting Index Lookups"
Hash: Avg. 30% time of each lookup "
–  Computationally intensive, high cache locality"
Walk: Avg. 70% time of each lookup"
–  Trivial computation, low cache locality"
Next lookup: Inherently parallel"
–  Beyond the inst. window capacity"
OoO Core"
Inst. Window"
9"
Outline"
• 
• 
• 
• 
• 
• 
"
Introduction"
Indexing in Database Systems"
Indexing inefficiencies in modern processors"
Widx"
Evaluation highlights"
Summary"
10"
Roadmap for !
Efficient and High-Throughput Indexing"
"
1.  Specialize"
–  Customize hardware for hashing and walking"
"
2.  Parallelize"
–  Perform multiple index lookups at a time"
3.  Generalize"
–  Use a programmable building block"
"
"
11"
Step 1: Specialize"
Design a dedicated unit for hash and walk"
–  Hash: compute hash values from a key list"
–  Walk: access the hash index and follow pointers"
Hash"
General-purpose"
OoO"
Walk"
Specialized"
hash and walk hardware" 12"
Decoupled"
Decoupled & Parallel"
Time"
Serial"
Step 2: Parallelize"
H"
W
H"
W
H"H"
W
W
Step 3: Generalize"
Widx unit"
Widx unit: common building block for hash and walk"
–  Two-stage RISC core"
–  Custom ISA"
"
Widx units are programmable"
–  Execute functions written in Widx ISA"
–  Support limitless number of data structure layouts "
hash( )"
H"
"
"
"
walk( )"
"
"
"
"
W
14"
Putting it all together: Widx"
When Widx runs, core goes idle"
"
"
Widx"
"
Hash"
Walkers"
W
W
H"
OoO"
Core"
Result "
Producer"
P"
Widx"
W
W
Simple, parallel hardware"
MMU"
L1"
Programming Model"
"
Development"
Write code for each unit "
and compile for Widx ISA"
Execution"
Communicate "
Load the code"
query-specific inputs"
" Hash"
hash (arg1, arg2, …)"
{……}"
"
DBMS"
Index"
Code"
Walk"
"
H"
W
P"
walk (arg1, arg2, …)"
{……}"
"
Res. Produce"
"
emit (arg1, arg2, …)"
{……}"
"
16"
Outline"
• 
• 
• 
• 
• 
• 
"
Introduction"
Indexing in Database Systems"
Indexing inefficiencies in modern processors"
Widx"
Evaluation highlights"
Summary"
17"
Methodology"
Flexus simulation infrastructure [Wenisch '06]"
Benchmarks"
–  TPC-H on MonetDB"
–  TPC-DS on MonetDB"
–  Dataset: 100GB"
Area and Power"
– 
– 
– 
– 
– 
"
uArch Parameters"
–  Core Types"
•  OoO: 4-wide, 128-entry ROB"
•  In-order: 2-wide"
–  Frequency: 2GHz"
–  L1 (I & D): 32KB"
–  LLC: 4MB"
Synopsys Design Compiler"
Technology node: TSMC 40 nm, std. cell"
Frequency: 2GHz"
Widx Area: 0.24mm2"
Widx Power: 0.3W "
18"
0"
TPC-H"
3x higher indexing throughput"
qry82"
qry64"
qry52"
qry40"
OoO"
qry37"
qry5"
qry22"
qry20"
6"
qry19"
qry17"
qry11"
qry2"
Indexing Speedup"
Widx Performance"
Widx"
5"
4"
3"
2"
1"
TPC-DS"
19"
2.5"
2.5"
2"
2"
2"
1.5"
1"
1.5"
1"
0.5"
0.5"
0"
0"
Energy-Delay"
2.5"
Energy"
Runtime"
Widx Efficiency"
1.5"
1"
0.5"
0"
5.5x reduction in indexing energy vs. OoO"
20"
Conclusions"
Indexing is essential in modern DBMSs"
"
Modern CPUs spend significant time in index lookups"
–  Not efficient & fall short of extracting parallelism"
"
Widx: Specialized widget for index lookups"
–  Efficient, parallel & programmable "
3x higher throughput, 5.5x energy reduction vs. OoO"
21"
Thanks!"
22"