PrincetonUniversity From application requests to Virtual IOPs: Provisioned key-value storage with Libra David Shue* and Michael J. Freedman (*now at Google) Shared Cloud Tenant A Tenant B Tenant C VM VM VM VM VM VM VM VM VM Shared Cloud Storage Tenant A Tenant B Tenant C VM VM VM VM VM VM VM VM VM Block Storage Key-Value Storage SQL Database Unpredictable Shared Cloud Storage Tenant A Tenant B Tenant C VM VM VM VM VM VM VM VM VM Key-Value Key-Value Block Storage SQL Database Storage Storage Disk IO-bound Tenants SSD-backed storage Provisioned Shared Key-Value Storage ReservationA ReservationB ReservationC Tenant A Tenant B Tenant C VM VM VM VM VM VM VM VM VM Application Requests (1KB normalized) GET/s PUT/s Shared Key-Value Storage Low-level IO SSD SSD IOPs BW SSD SSD SSD Libra Contributions •Libra IO Scheduler - Provisions low-level IO allocations for app-request reservations - w/ high utilization. Supports arbitrary object distributions and workloads. • 2 key mechanisms - Track per-tenant app-request resource profiles. - Model IO resources with Virtual IOPs. 6 Related Work Storage AppWork Media Type requests Conserving Maestro Block N N HDD mClock Block N Y HDD FlashFQ Block N Y SSD Y N SSD DynamoDB Key-Value 7 Provisioned Distributed Key-Value Storage Global Reservation Problem [Pisces OSD1 ’12] ReservationA ReservationB Tenant A Tenant B VM VM VM VM VM VM Local Reservation Problem Storage Storage Shared Key-Value Storage ... Node 1 Node N Provisioned Distributed Key-Value Storage demand ReservationA 9 ReservationB Tenant A Tenant B VM VM VM VM VM VM Storage Node 1 Storage Node N data partitions ... Provisioned Distributed Key-Value Storage 10 ReservationA ReservationB Storage Node 1 Storage Node N ... Provisioned Distributed Key-Value Storage 11 ReservationB ResA1 ResB1 A Reservation ReservationB ResAn ResBn A Reservation Storage Node 1 Storage Node N ... Provisioned Local Key-Value Storage GET 1001100 GET K Key-Value Protocol Retrieve K Persistence Engine Read l337 Libra IO IO Scheduler Scheduler IO operation Physical Disk 12 Libra Design How much IO to consume? Libra IO Scheduler DRR GET blah Libra Provisioning Policy blah Physical Disk 13 Reservation Distribution Policy PUT Persistence Engine How much IO to provision? Provisioning App-request Reservations is Hard IO Amplification 14 Track tenant app-request resource profiles 1 KB PUT ≥ 1 KB Write IO Interference Underestimate Variable IOprovisionable throughput IO Non-linear IO Performance Model Non-linear IO with cost Virtual per IOPs KB Workload-dependent IO Amplification PUT K,V3 V2 V3 M A index V3 H V index F O V0 index 25 20 15 10 5 0 B 8K GET/PUT Request Size 12 KB 64 KB 32 KB 16 B O 8K A index B V3 4K B 1K COMPACT V1 15 PUT write IO 30 IO Throughput (kop/s) FLUSH PUT LevelDB (LSM-Tree) Workload-dependent IO Amplification PUT K,V3 V2 V3 M A index V3 H V index F O V0 index PUT write IO 25 20 15 10 5 0 B 8K GET/PUT Request Size 12 KB 64 KB 32 KB 16 B O 8K A index B V3 4K B 1K COMPACT V1 16 FLUSH write IO 30 IO Throughput (kop/s) FLUSH PUT LevelDB (LSM-Tree) Workload-dependent IO Amplification PUT K,V3 V2 V3 M A index V3 H V index O V0 index COMPACT read IO 25 FLUSH write IO COMPACT write IO PUT write IO 20 15 10 5 0 B 8K GET/PUT Request Size 12 KB 64 KB 32 KB 16 B O 8K A index B V3 4K B 17 F 30 1K COMPACT V1 IO Throughput (kop/s) FLUSH PUT LevelDB (LSM-Tree) Workload-dependent IO Amplification M A index H V index F O index FLUSH write IO 20 GET read IO PUT write IO 15 10 5 0 B 8K GET/PUT Request Size 12 KB 64 KB 32 KB 16 B 8K B 18 index 25 COMPACT write IO 4K K COMPACT read IO B Z 30 1K G IO Throughput (kop/s) GET K Libra Tracks App-request IO Consumption to Determine IO Allocations IO Tenant A 500 IO/s lah Track IO Compute app-request Provision IO IO profiles consumption allocations FLUSH 19 Tenant A 5 IO units COMPACT Per-GET 1 + Per-PUT 6 x x GET 80 PUT 25 PUT 1 0.5 FLUSH 50 lah PUT 5 GET 5 100 Libra Provisioning Policy 70 = IO 500 Unpredictable IO Interference Die-level parallelism, low latency IOPs Shared-controller and bus contention Erase-before-write overhead FTL and read-modify-write garbage colleciton 20 Write IOP Size (KB) 1:1 Pure Read/Pure Write 256 128 64 32 16 8 4 2 1 100 90 80 70 60 50 1 2 4 8 16 32 64 128 256 Read IOP Size (KB) Pct of Ideal Throughput 4 read/4 write tenants Unpredictable IO Interference 21 256 128 64 32 16 8 4 2 1 1 2 4 8 256 128 64 32 16 8 4 2 1 16 32 64 128 256 1 50:50 Read/Write Ratio 256 128 64 32 16 8 4 2 1 1 2 4 8 256 128 64 32 16 8 4 2 1 16 32 64 128 256 1 75:25 Read/Write Ratio 100 90 80 70 60 50 2 4 8 16 32 64 128 256 25:75 Read/Write Ratio 100 90 80 70 60 50 2 4 8 Read IOP Size (KB) 16 32 64 128 256 Pct of Ideal Throughput Write IOP Size (KB) 1:1 Pure Read/Pure Write Libra Underestimates IO Capacity to Ensure Provisionable Throughput 1 0.8 0.6 0.4 0.2 Provisionable IO limit Pct of Read/Write Experiments Provisionable IO throughput = floor(workloads) (18 Kop/s) 0 15 22 75:25 Read/Write 50:50 Read/Write 25:75 Read/Write 1:1 Pure Read/Pure Write 20 25 30 35 Normalized IO Throughput 40 45 Libra Underestimates IO Capacity to Ensure Provisionable Throughput 1 0.8 0.6 0.4 0.2 Provisionable IO limit Pct of Read/Write Experiments Provisionable IO throughput = floor(workloads) (18 Kop/s) 0 15 23 75:25 Read/Write 75:25 = 4K 75:25 = 32K 75:25 = 256K 50:50 Read/Write 25:75 Read/Write 1:1 Pure Read/Pure Write 20 25 30 35 Normalized IO Throughput 40 45 Libra Underestimates IO Capacity to Ensure Provisionable Throughput 1 0.8 0.6 0.4 0.2 Provisionable IO limit Pct of Read/Write Experiments Provisionable IO throughput = floor(workloads) (18 Kop/s) 0 15 24 75:25 = 256K 50:50 = 256K 25:75 = 256K 1:1 Pure Read/Pure Write 20 25 30 35 Normalized IO Throughput 40 45 Non-linear IO Performance IO Bandwidth 200 30 150 100 25 20 15 10 50 0 Max IOP/s 35 IOP (kop/s) Bandwidth (MB/s) 40 Max BW 250 5 1 2 4 8 16 32 IOP Size (KB) 25 IOP Throughput 64 128 256 0 1 2 4 8 16 32 IOP Size (KB) 64 128 256 Non-linear IO Performance IO Bandwidth 200 30 150 100 Read Rand Read Seq Write Rand Write Seq 50 0 Max IOP/s 35 1 2 4 8 16 32 IOP Size (KB) 64 128 256 IOP (kop/s) Bandwidth (MB/s) 40 Max BW 250 26 IOP Throughput 25 20 15 10 5 0 1 2 4 8 16 32 IOP Size (KB) 64 128 256 IOP throughput curves, as shown in Figure 6. Libra calculates the cost model in terms of VOPs-per-byte by dividing the max IOP throughput by the achieved (read or write) IOP throughput, normalized by IOP size. 5. Im Libra ex underlyi interpose i.e., pers tem calls wrapper cations t IO calls context. in ⇠2000 Libra inter-task tines allo ping out compact 256 Libra us exhausti operatio location resource Libra Uses Virtual IOPs to Model IO Resources Max-IOP VOPCPB (IOP-size) = Achieved-IOP(IOP-size) ⇥ IOP-size IO throughput in VOP/s conLibra IO Cost is Model 40 3.5 stant for the pure read/write throughput curves. Max Read Read IO cost 35 3 Max Write Write IO cost Internally, Libra’s scheduler threads perform distributed Unifies IO cost into a single Half Read IO 30 2.5 Half Write IO (DDRR) deficit round robin [21] to efficiently schedule parmetric 25 2 corresponds to the SSD allel IO requests (up to 32, which 20Captures non-linear IO 1.5 the scheduler computes queue depth). For each IO operation, 15 performance the number of VOPs consumed 1 10 5Provides 0 27 1 2 IO insulation Virtual IOP Cost (op/KB) IOPs (kop/s) InThroughput this model,atthe IOP 1/2 maximum Max VOPs 0.5 VOPcost (IOP-size) = VOPCPB (IOP-size) 0 4 8 16 32 64 128 256 1 2 4 ⇥ IOP-size 8 16 32 64 128 and deducts this amount from the associated tenant’s VOP IOP Size (KB) IOP Size (KB) 2 equal-allocation tenants allocation to enforce resource limits and track resource conIO Insulation = 1/2DDRR Max Read/Write sumption. incurs minimal inter-thread synchronization and schedules IO tasks in constant time to efficiently achieve fair sharing and isolation in a work-conserving fashion. In general, virtual-time [12] and round-robin [29] based IOP throughput curves, as shown in Figure 6. Libra calculates the cost model in terms of VOPs-per-byte by dividing the max IOP throughput by the achieved (read or write) IOP throughput, normalized by IOP size. 5. Im Libra ex underlyi interpose i.e., pers tem calls wrapper cations t IO calls context. in ⇠2000 Libra inter-task tines allo ping out compact 256 Libra us exhausti operatio location resource Libra Uses Virtual IOPs to Model IO Resources Max-IOP VOPCPB (IOP-size) = Achieved-IOP(IOP-size) ⇥ IOP-size InThroughput this model,at the IOP 1/2 maximum Max VOPs IO throughput in VOP/s conLibra IO Cost is Model 3.5 stant for Libra the Read pureIOread/write throughput curves. Read IO cost Model Libra Write IO Model 3 Write IO cost Internally, Libra’s scheduler threads perform distributed Max Read 2.5to efficiently schedule parMax [21] Write (DDRR) deficit round robin 2 corresponds to the SSD allel IO requests (up to 32, which 1.5 the scheduler computes queue depth). For each IO operation, the number of VOPs consumed 1 Virtual IOP Cost (op/KB) 40 IOPs (kop/s) 35 30 25 20 15 10 5 0 28 1 2 0.5 VOPcost (IOP-size) = VOPCPB (IOP-size) 0 4 8 16 32 64 128 256 1 2 4 ⇥ IOP-size 8 16 32 64 128 and deducts this amount from the associated tenant’s VOP IOP Size (KB) IOP Size (KB) allocation to enforce resource limits and track resource conDDRR incurs minimal inter-thread synchroniza2sumption. equal-allocation tenants tion and=schedules tasks in constant time to efficiently IO Insulation 1/2 Max IO Read/Write achieve fair sharing and isolation in a work-conserving fashion. In general, virtual-time [12] and round-robin [29] based Libra Design Persistence Engine Update tenant VOP allocations blah Charge tenant IOPs based on VOP cost Libra IO Scheduler Libra Provision VOPs within Provisioning provisionable limit Policy blah Physical Disk 29 Track app-request VOP consumption Evaluation • Does Libra's IO resource model achieve accurate resource allocations? • Does Libra's IO threshold make an acceptable tradeoff of performance for predictability in a real storage stack? • Can Libra ensure per-tenant app-request reservations while achieving high utilization? 30 Libra Achieves Accurate IO Allocations Read-Write IOP Throughput Ratio Throughput Ratio 1.2 1 0.8 Interference-free Ideal Read Tenants Write Tenants even 0.6 0.4 0.2 0 W1 W4 W8 W1 W3 W6 W1 W2 KB KB KB 6KB 2KB 4KB 28K 56K B B Read 1 KB Throughput Ratio = Actual / Expected (IO Insulation) 31 Libra Achieves Accurate IO Allocations Read-Write IOP Throughput Ratio 1 Interference-free Ideal Read Tenants Write Tenants 0.8 0.6 0.4 Read-Write IOP Throughput Ratio 1.2 0.2 Throughput Ratio Throughput Ratio 1.2 1 Read Tenants Write Tenants 0.8 0.6 0.4 0.2 0 0 W1 W4 W8 W1 W3 W6 W1 W2 KB KB KB 6KB 2KB 4KB 28K 56K B B R 1KB R 4KB R 8KB R 16K R 32K R 64K R 128 R 256 KB KB B B B Write 1-256 KB Throughput Ratio = Actual / Expected (IO Insulation) 32 Libra Achieves Accurate IO Allocations Read IO Cost Models 1 0.8 0.6 0.4 0.2 0 1 2 4 8 16 32 64 128 256 IOP Size (KB) 33 3.5 VOP Cost (op/KB) VOP Cost (op/KB) 1.2 Write IO Cost Models 3 libra constant fixed 2.5 2 1.5 1 0.5 0 1 2 4 8 16 32 64 128 256 IOP Size (KB) Libra Achieves Accurate IO Allocations Read IO Cost Models 1 0.8 0.6 0.4 0.2 0 1 2 4 8 16 32 64 128 256 IOP Size (KB) 34 3.5 VOP Cost (op/KB) VOP Cost (op/KB) 1.2 Write IO Cost Models 3 libra constant fixed linear 2.5 2 1.5 1 0.5 0 1 2 4 8 16 32 64 128 256 IOP Size (KB) Libra Achieves Accurate IO Allocations Min-Max Ratio = Min Throughput Ratio / Max Throughput Ratio Accuracy (MMR) IOP Insulation Accuracy 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 w Accuracy (MMR) rr Read-Write Read-Read Write-Write w rw Read-Write Read-Read Write-Write Virtual IOP Allocation Accuracy 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 constant w linear w 35 rr rw libra fixed Libra Trades-off Nominal IO Throughput For Predictability 25:75 GET-PUT, Variance 4K 256 256 256 30 128 128 128 28 64 64 64 26 32 32 32 16 16 16 8 8 8 4 4 4 20 2 2 2 18 1 1 2 4 8 16 32 64 128 256 GET Request Size (KB) 36 50:50 GET-PUT, Variance 4K 1 1 2 4 8 16 32 64 128 GET Request Size (KB) 1 24 22 1 2 4 8 16 32 64 128 256 GET Request Size (KB) 16 VOP (kop/s) PUT Request Size (KB) 75:25 GET-PUT, Variance 4K Libra Trades-off Nominal IO Throughput For Predictability Unprovisionable Throughput As a Percentage of Total Throughput Percentile Workload 10th 50th 80th All 99:1 1.6% 30.5% 40.5% 45.8% 25:75 1.4% 14.9% 25.0% 34.7% 1:99 0.7% 12.2% 19.5% 28.1% < 10th percentile covered by SLA and higher-level policies 37 Libra Achieves App-request Reservations 0.5x 1.5x Read Heavy Libra Normalized GET (1KB) Throughput (kreq/s) 7 6 5 5 Throughput (kreq/s) Fully provisioned allocations 4 3 3 2 1 1 0 0 No Profile Normalized GET (1KB) 5 4 3 2 Work-conserving consumption of unprovisioned resources 1 0 100 Libra Normalized PUT (1KB) 4 2 6 Write Heavy 7 6 7 38 Mixed 7 No Profile Normalized PUT (1KB) 6 5 4 3 2 1 150 200 Time (s) 250 0 300 100 150 200 Time (s) 250 300 Libra Achieves App-request Reservations Read Heavy Libra Normalized GET (1KB) Throughput (kreq/s) 7 6 5 5 4 4 3 3 2 2 1 1 0 0 Throughput (kreq/s) No Profile Normalized GET (1KB) 7 6 6 5 5 4 4 3 3 2 2 1 1 0 100 150 200 Time (s) 250 Write Heavy Libra Normalized PUT (1KB) 7 6 7 39 Mixed No Profile Normalized PUT (1KB) 0 300 100 150 200 Time (s) 250 300 Conclusion • Libra IO Scheduler - Provisions IO allocations for app-request reservations w/ high utilization. Supports arbitrary object distributions and workloads. • 2 key mechanisms - Track per-tenant app-request resource profiles. Model IO resources with Virtual IOPs. • Evaluation - 40 Achieves accurate low-level IO allocations. Provisions the majority of IO resources over a wide range of workloads Satisfies app-request reservations w/ high utilization.
© Copyright 2024 ExpyDoc