Provisioned key-value storage with Libra

PrincetonUniversity
From application requests to Virtual IOPs:
Provisioned key-value storage with Libra
David Shue* and Michael J. Freedman
(*now at Google)
Shared Cloud
Tenant A
Tenant B
Tenant C
VM VM VM
VM VM VM
VM VM VM
Shared Cloud Storage
Tenant A
Tenant B
Tenant C
VM VM VM
VM VM VM
VM VM VM
Block Storage
Key-Value
Storage
SQL Database
Unpredictable Shared Cloud Storage
Tenant A
Tenant B
Tenant C
VM VM VM
VM VM VM
VM VM VM
Key-Value
Key-Value
Block Storage
SQL Database
Storage
Storage
Disk IO-bound Tenants
SSD-backed storage
Provisioned Shared Key-Value Storage
ReservationA ReservationB ReservationC
Tenant A
Tenant B
Tenant C
VM VM VM
VM VM VM
VM VM VM
Application Requests
(1KB normalized)
GET/s
PUT/s
Shared Key-Value Storage
Low-level IO
SSD
SSD
IOPs
BW
SSD
SSD
SSD
Libra Contributions
•Libra IO Scheduler
- Provisions low-level IO allocations for app-request reservations
-
w/ high utilization.
Supports arbitrary object distributions and workloads.
• 2 key mechanisms
- Track per-tenant app-request resource profiles.
- Model IO resources with Virtual IOPs.
6
Related Work
Storage
AppWork
Media
Type
requests Conserving
Maestro
Block
N
N
HDD
mClock
Block
N
Y
HDD
FlashFQ
Block
N
Y
SSD
Y
N
SSD
DynamoDB Key-Value
7
Provisioned Distributed Key-Value Storage
Global Reservation Problem
[Pisces OSD1 ’12]
ReservationA
ReservationB
Tenant A
Tenant B
VM VM VM
VM VM VM
Local Reservation Problem
Storage
Storage
Shared Key-Value
Storage
...
Node 1
Node N
Provisioned Distributed Key-Value Storage
demand
ReservationA
9
ReservationB
Tenant A
Tenant B
VM VM VM
VM VM VM
Storage
Node 1
Storage
Node N
data partitions
...
Provisioned Distributed Key-Value Storage
10
ReservationA
ReservationB
Storage
Node 1
Storage
Node N
...
Provisioned Distributed Key-Value Storage
11
ReservationB
ResA1 ResB1 A
Reservation
ReservationB
ResAn ResBn A
Reservation
Storage
Node 1
Storage
Node N
...
Provisioned Local Key-Value Storage
GET 1001100
GET K
Key-Value Protocol
Retrieve K
Persistence Engine
Read l337
Libra
IO IO
Scheduler
Scheduler
IO operation
Physical Disk
12
Libra Design
How much IO
to consume?
Libra IO
Scheduler
DRR
GET
blah
Libra
Provisioning
Policy
blah
Physical Disk
13
Reservation
Distribution
Policy
PUT
Persistence Engine
How much IO to
provision?
Provisioning App-request Reservations is Hard
IO Amplification
14
Track tenant
app-request
resource profiles
1 KB PUT
≥ 1 KB Write
IO Interference
Underestimate
Variable IOprovisionable
throughput IO
Non-linear IO
Performance
Model
Non-linear
IO with
cost
Virtual
per IOPs
KB
Workload-dependent IO Amplification
PUT K,V3
V2
V3
M
A
index
V3
H
V
index
F
O
V0
index
25
20
15
10
5
0
B
8K
GET/PUT Request Size
12
KB
64
KB
32
KB
16
B
O
8K
A
index
B
V3
4K
B
1K
COMPACT
V1
15
PUT write IO
30
IO Throughput (kop/s)
FLUSH PUT
LevelDB (LSM-Tree)
Workload-dependent IO Amplification
PUT K,V3
V2
V3
M
A
index
V3
H
V
index
F
O
V0
index
PUT write IO
25
20
15
10
5
0
B
8K
GET/PUT Request Size
12
KB
64
KB
32
KB
16
B
O
8K
A
index
B
V3
4K
B
1K
COMPACT
V1
16
FLUSH write IO
30
IO Throughput (kop/s)
FLUSH PUT
LevelDB (LSM-Tree)
Workload-dependent IO Amplification
PUT K,V3
V2
V3
M
A
index
V3
H
V
index
O
V0
index
COMPACT read IO
25
FLUSH write IO
COMPACT write IO
PUT write IO
20
15
10
5
0
B
8K
GET/PUT Request Size
12
KB
64
KB
32
KB
16
B
O
8K
A
index
B
V3
4K
B
17
F
30
1K
COMPACT
V1
IO Throughput (kop/s)
FLUSH PUT
LevelDB (LSM-Tree)
Workload-dependent IO Amplification
M
A
index
H
V
index
F
O
index
FLUSH write IO
20
GET read IO
PUT write IO
15
10
5
0
B
8K
GET/PUT Request Size
12
KB
64
KB
32
KB
16
B
8K
B
18
index
25
COMPACT write IO
4K
K
COMPACT read IO
B
Z
30
1K
G
IO Throughput (kop/s)
GET K
Libra Tracks App-request IO Consumption to
Determine IO Allocations
IO
Tenant A
500 IO/s
lah
Track IO Compute app-request Provision IO
IO profiles
consumption
allocations
FLUSH
19
Tenant A
5 IO units
COMPACT
Per-GET 1
+
Per-PUT 6
x
x
GET
80
PUT
25 PUT
1
0.5 FLUSH
50
lah
PUT
5 GET 5
100
Libra
Provisioning
Policy
70
=
IO
500
Unpredictable IO Interference
Die-level parallelism, low
latency IOPs
Shared-controller and bus
contention
Erase-before-write
overhead
FTL and read-modify-write
garbage colleciton
20
Write IOP Size (KB)
1:1 Pure Read/Pure Write
256
128
64
32
16
8
4
2
1
100
90
80
70
60
50
1
2
4
8
16 32 64 128 256
Read IOP Size (KB)
Pct of Ideal Throughput
4 read/4 write tenants
Unpredictable IO Interference
21
256
128
64
32
16
8
4
2
1
1
2
4
8
256
128
64
32
16
8
4
2
1
16 32 64 128 256
1
50:50 Read/Write Ratio
256
128
64
32
16
8
4
2
1
1
2
4
8
256
128
64
32
16
8
4
2
1
16 32 64 128 256
1
75:25 Read/Write Ratio
100
90
80
70
60
50
2
4
8
16 32 64 128 256
25:75 Read/Write Ratio
100
90
80
70
60
50
2
4
8
Read IOP Size (KB)
16 32 64 128 256
Pct of Ideal Throughput
Write IOP Size (KB)
1:1 Pure Read/Pure Write
Libra Underestimates IO Capacity to Ensure
Provisionable Throughput
1
0.8
0.6
0.4
0.2
Provisionable IO limit
Pct of Read/Write Experiments
Provisionable IO throughput = floor(workloads)
(18 Kop/s)
0
15
22
75:25 Read/Write
50:50 Read/Write
25:75 Read/Write
1:1 Pure Read/Pure Write
20
25
30
35
Normalized IO Throughput
40
45
Libra Underestimates IO Capacity to Ensure
Provisionable Throughput
1
0.8
0.6
0.4
0.2
Provisionable IO limit
Pct of Read/Write Experiments
Provisionable IO throughput = floor(workloads)
(18 Kop/s)
0
15
23
75:25 Read/Write
75:25 = 4K
75:25 = 32K
75:25 = 256K
50:50 Read/Write
25:75 Read/Write
1:1 Pure Read/Pure Write
20
25
30
35
Normalized IO Throughput
40
45
Libra Underestimates IO Capacity to Ensure
Provisionable Throughput
1
0.8
0.6
0.4
0.2
Provisionable IO limit
Pct of Read/Write Experiments
Provisionable IO throughput = floor(workloads)
(18 Kop/s)
0
15
24
75:25 = 256K
50:50 = 256K
25:75 = 256K
1:1 Pure Read/Pure Write
20
25
30
35
Normalized IO Throughput
40
45
Non-linear IO Performance
IO Bandwidth
200
30
150
100
25
20
15
10
50
0
Max IOP/s
35
IOP (kop/s)
Bandwidth (MB/s)
40
Max BW
250
5
1
2
4
8
16
32
IOP Size (KB)
25
IOP Throughput
64 128 256
0
1
2
4
8
16
32
IOP Size (KB)
64 128 256
Non-linear IO Performance
IO Bandwidth
200
30
150
100
Read Rand
Read Seq
Write Rand
Write Seq
50
0
Max IOP/s
35
1
2
4
8
16
32
IOP Size (KB)
64 128 256
IOP (kop/s)
Bandwidth (MB/s)
40
Max BW
250
26
IOP Throughput
25
20
15
10
5
0
1
2
4
8
16
32
IOP Size (KB)
64 128 256
IOP throughput curves, as shown in Figure 6. Libra calculates the cost model in terms of VOPs-per-byte by dividing
the max IOP throughput by the achieved (read or write) IOP
throughput, normalized by IOP size.
5. Im
Libra ex
underlyi
interpose
i.e., pers
tem calls
wrapper
cations t
IO calls
context.
in ⇠2000
Libra
inter-task
tines allo
ping out
compact
256
Libra us
exhausti
operatio
location
resource
Libra Uses Virtual IOPs to Model IO Resources
Max-IOP
VOPCPB (IOP-size) =
Achieved-IOP(IOP-size) ⇥ IOP-size
IO throughput
in VOP/s
conLibra
IO Cost is
Model
40
3.5
stant for the pure
read/write
throughput
curves.
Max Read
Read IO cost
35
3
Max Write
Write IO cost
Internally,
Libra’s
scheduler
threads
perform
distributed
Unifies
IO
cost
into
a
single
Half Read IO
30
2.5
Half
Write
IO (DDRR)
deficit
round
robin
[21]
to efficiently schedule parmetric
25
2 corresponds to the SSD
allel
IO
requests
(up
to
32,
which
20Captures non-linear IO
1.5 the scheduler computes
queue depth). For each IO operation,
15
performance
the number of VOPs consumed 1
10
5Provides
0
27
1
2
IO insulation
Virtual IOP Cost (op/KB)
IOPs (kop/s)
InThroughput
this model,atthe
IOP
1/2 maximum
Max VOPs
0.5
VOPcost (IOP-size) = VOPCPB (IOP-size)
0
4
8 16 32 64 128 256
1
2
4
⇥ IOP-size
8
16
32
64 128
and deducts this amount from the associated tenant’s VOP
IOP Size (KB)
IOP Size (KB)
2 equal-allocation
tenants
allocation to enforce resource limits and track resource conIO Insulation
= 1/2DDRR
Max Read/Write
sumption.
incurs minimal inter-thread synchronization and schedules IO tasks in constant time to efficiently
achieve fair sharing and isolation in a work-conserving fashion. In general, virtual-time [12] and round-robin [29] based
IOP throughput curves, as shown in Figure 6. Libra calculates the cost model in terms of VOPs-per-byte by dividing
the max IOP throughput by the achieved (read or write) IOP
throughput, normalized by IOP size.
5. Im
Libra ex
underlyi
interpose
i.e., pers
tem calls
wrapper
cations t
IO calls
context.
in ⇠2000
Libra
inter-task
tines allo
ping out
compact
256
Libra us
exhausti
operatio
location
resource
Libra Uses Virtual IOPs to Model IO Resources
Max-IOP
VOPCPB (IOP-size) =
Achieved-IOP(IOP-size) ⇥ IOP-size
InThroughput
this model,at the
IOP
1/2 maximum
Max VOPs
IO throughput
in VOP/s
conLibra
IO Cost is
Model
3.5
stant for Libra
the Read
pureIOread/write
throughput
curves. Read IO cost
Model
Libra Write
IO Model
3
Write IO cost
Internally,
Libra’s
scheduler
threads
perform
distributed
Max Read
2.5to efficiently schedule parMax [21]
Write (DDRR)
deficit round robin
2 corresponds to the SSD
allel IO requests (up to 32, which
1.5 the scheduler computes
queue depth). For each IO operation,
the number of VOPs consumed 1
Virtual IOP Cost (op/KB)
40
IOPs (kop/s)
35
30
25
20
15
10
5
0
28
1
2
0.5
VOPcost (IOP-size) = VOPCPB (IOP-size)
0
4
8 16 32 64 128 256
1
2
4
⇥ IOP-size
8
16
32
64 128
and deducts this amount from the associated tenant’s VOP
IOP Size (KB)
IOP Size (KB)
allocation to enforce resource limits and track resource conDDRR incurs
minimal inter-thread synchroniza2sumption.
equal-allocation
tenants
tion and=schedules
tasks in constant time to efficiently
IO Insulation
1/2 Max IO
Read/Write
achieve fair sharing and isolation in a work-conserving fashion. In general, virtual-time [12] and round-robin [29] based
Libra Design
Persistence Engine
Update tenant VOP
allocations
blah
Charge tenant IOPs
based on VOP cost
Libra IO
Scheduler
Libra
Provision VOPs within
Provisioning
provisionable limit
Policy
blah
Physical Disk
29
Track app-request
VOP consumption
Evaluation
• Does Libra's IO resource model achieve accurate
resource allocations?
• Does Libra's IO threshold make an acceptable tradeoff
of performance for predictability in a real storage stack?
• Can Libra ensure per-tenant app-request reservations
while achieving high utilization?
30
Libra Achieves Accurate IO Allocations
Read-Write IOP Throughput Ratio
Throughput Ratio
1.2
1
0.8
Interference-free Ideal
Read Tenants
Write Tenants
even
0.6
0.4
0.2
0
W1 W4 W8 W1 W3 W6 W1 W2
KB KB KB 6KB 2KB 4KB 28K 56K
B
B
Read 1 KB
Throughput Ratio = Actual / Expected (IO Insulation)
31
Libra Achieves Accurate IO Allocations
Read-Write IOP Throughput Ratio
1
Interference-free Ideal
Read Tenants
Write Tenants
0.8
0.6
0.4
Read-Write IOP Throughput Ratio
1.2
0.2
Throughput Ratio
Throughput Ratio
1.2
1
Read Tenants
Write Tenants
0.8
0.6
0.4
0.2
0
0
W1 W4 W8 W1 W3 W6 W1 W2
KB KB KB 6KB 2KB 4KB 28K 56K
B
B
R 1KB R 4KB R 8KB R 16K R 32K R 64K R 128 R 256
KB
KB
B
B
B
Write
1-256
KB
Throughput Ratio = Actual / Expected (IO Insulation)
32
Libra Achieves Accurate IO Allocations
Read IO Cost Models
1
0.8
0.6
0.4
0.2
0
1
2
4
8 16 32 64 128 256
IOP Size (KB)
33
3.5
VOP Cost (op/KB)
VOP Cost (op/KB)
1.2
Write IO Cost Models
3
libra
constant
fixed
2.5
2
1.5
1
0.5
0
1
2
4
8 16 32 64 128 256
IOP Size (KB)
Libra Achieves Accurate IO Allocations
Read IO Cost Models
1
0.8
0.6
0.4
0.2
0
1
2
4
8 16 32 64 128 256
IOP Size (KB)
34
3.5
VOP Cost (op/KB)
VOP Cost (op/KB)
1.2
Write IO Cost Models
3
libra
constant
fixed
linear
2.5
2
1.5
1
0.5
0
1
2
4
8 16 32 64 128 256
IOP Size (KB)
Libra Achieves Accurate IO Allocations
Min-Max Ratio =
Min Throughput Ratio / Max Throughput Ratio
Accuracy (MMR)
IOP Insulation Accuracy
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
w
Accuracy (MMR)
rr
Read-Write
Read-Read
Write-Write
w
rw
Read-Write
Read-Read
Write-Write
Virtual IOP Allocation Accuracy
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
constant
w
linear
w
35
rr
rw
libra
fixed
Libra Trades-off Nominal IO Throughput For
Predictability
25:75 GET-PUT, Variance 4K
256
256
256
30
128
128
128
28
64
64
64
26
32
32
32
16
16
16
8
8
8
4
4
4
20
2
2
2
18
1
1
2
4
8 16 32 64 128 256
GET Request Size (KB)
36
50:50 GET-PUT, Variance 4K
1
1
2
4
8
16 32 64 128
GET Request Size (KB)
1
24
22
1
2
4
8 16 32 64 128 256
GET Request Size (KB)
16
VOP (kop/s)
PUT Request Size (KB)
75:25 GET-PUT, Variance 4K
Libra Trades-off Nominal IO Throughput For
Predictability
Unprovisionable Throughput As a
Percentage of Total Throughput
Percentile
Workload 10th
50th
80th
All
99:1
1.6%
30.5%
40.5%
45.8%
25:75
1.4%
14.9%
25.0%
34.7%
1:99
0.7%
12.2%
19.5%
28.1%
< 10th percentile covered by SLA and higher-level policies
37
Libra Achieves App-request Reservations
0.5x
1.5x
Read Heavy
Libra Normalized GET (1KB)
Throughput (kreq/s)
7
6
5
5
Throughput (kreq/s)
Fully provisioned
allocations
4
3
3
2
1
1
0
0
No Profile Normalized GET (1KB)
5
4
3
2
Work-conserving
consumption of
unprovisioned
resources
1
0
100
Libra Normalized PUT (1KB)
4
2
6
Write Heavy
7
6
7
38
Mixed
7
No Profile Normalized PUT (1KB)
6
5
4
3
2
1
150
200
Time (s)
250
0
300 100
150
200
Time (s)
250
300
Libra Achieves App-request Reservations
Read Heavy
Libra Normalized GET (1KB)
Throughput (kreq/s)
7
6
5
5
4
4
3
3
2
2
1
1
0
0
Throughput (kreq/s)
No Profile Normalized GET (1KB)
7
6
6
5
5
4
4
3
3
2
2
1
1
0
100
150
200
Time (s)
250
Write Heavy
Libra Normalized PUT (1KB)
7
6
7
39
Mixed
No Profile Normalized PUT (1KB)
0
300 100
150
200
Time (s)
250
300
Conclusion
• Libra IO Scheduler
-
Provisions IO allocations for app-request reservations w/ high utilization.
Supports arbitrary object distributions and workloads.
• 2 key mechanisms
-
Track per-tenant app-request resource profiles.
Model IO resources with Virtual IOPs.
• Evaluation
-
40
Achieves accurate low-level IO allocations.
Provisions the majority of IO resources over a wide range of workloads
Satisfies app-request reservations w/ high utilization.