Dominant Resource Fairness in Cloud Computing Systems with Heterogeneous Servers Wei Wang, Baochun Li, Ben Liang Department of Electrical and Computer Engineering University of Toronto April 30, 2014 Introduction Cloud computing system represents unprecedented heterogeneity Server specification Resource demand profiles of computing tasks Wei Wang, Department of Electrical and Computer Engineering, University of Toronto 2 ectrical and Computer Engineering Heterogenous servers iversity of Toronto Configurations of servers in one of Google’s clusters blem in TABLE I C ONFIGURATIONS OF SERVERS IN ONE OF G OOGLE ’ S CLUSTERS [2], [3]. structed CPU and memory units areARE normalized to TO theTHE maximum server CPU AND MEMORY UNITS NORMALIZED MAXIMUM SERVER esenting ( HIGHLIGHTED BELOW ). such as Number of servers CPUs Memory esource 6732 0.50 0.50 e notion 3863 0.50 0.25 erver to 1001 0.50 0.75 mber of 795 1.00 1.00 fers the 126 0.25 0.25 location 52 0.50 0.12 ntly, no 5 0.50 0.03 a direct 5 0.50 0.97 s DRFH 3 1.00 0.50 Google 1 0.50 0.06 rms the esource es. of different resources. The system then allocates resources to users at the granularity of these slots. Such a single resource abstraction ignores the heterogeneity of both server Wei Wang, Department of Electrical and Computer Engineering, University of Toronto 3 Heterogeneous resource demand ess, Pareto efficiency, incentives for users to at no user is better off statically and equally more, DRF is strategyer allocation by lying F is Pareto-efficient as s subject to satisfying reempting existing alee, as no user prefers er solutions violate at For example, the prechanism in microecobrium from Equal In- evaluated DRF in Figure 1: CPU and memory demands of tasks in a 2000-node over which multiple Hadoop cluster at Facebook over one month (October 2010). Each bubble’s size is logarithmic in the number of tasks in its h as Hadoop and MPI, region. he slot-based fair sharWei Wang, Department Electrical and Computer Engineering, University of Toronto Dryad and show ofthat Ghodsi et al. NSDI11 4 How should resources be allocated fairly and efficiently? Wei Wang, Department of Electrical and Computer Engineering, University of Toronto State-of-the-Art Resource Allocation Mechanisms Wei Wang, Department of Electrical and Computer Engineering, University of Toronto Single-resource abstraction Partition a server’s resources into slots E.g., a slot = (1 CPU core, 2 GB RAM) Allocate resources to users at the granularity of slots Hadoop Fair Scheduler & Capacity Scheduler Dryad Quincy scheduler Ignores the heterogeneity of both server specifications and demand profiles Wei Wang, Department of Electrical and Computer Engineering, University of Toronto 7 Dominant Resource Fairness (DRF) Wei Wang, Department of Electrical and Computer Engineering, University of Toronto 8 Dominant Resource Fairness (DRF) Dominant resource The one that requires the most allocation share Wei Wang, Department of Electrical and Computer Engineering, University of Toronto 8 Dominant Resource Fairness (DRF) Dominant resource The one that requires the most allocation share For example A cluster: (9 CPUs, 18 GB RAM) Job of user 1: (1 CPU, 4 GB RAM) Job of user 2: (3 CPUs, 1 GB RAM) Wei Wang, Department of Electrical and Computer Engineering, University of Toronto 8 Dominant Resource Fairness (DRF) Dominant resource The one that requires the most allocation share For example A cluster: (9 CPUs, 18 GB RAM) Job of user 1: (1 CPU, 4 GB RAM) Job of user 2: (3 CPUs, 1 GB RAM) DRF allocation Equalize the dominant share each user receives 3 jobs for User 1: (3 CPUs, 12 GB) 2 jobs for User 2: (6 CPUs, 2 GB) Equalized dominant share = 2/3 Wei Wang, Department of Electrical and Computer Engineering, University of Toronto 8 Why DRF? Wei Wang, Department of Electrical and Computer Engineering, University of Toronto 9 Why DRF? Addresses the demand heterogeneity Wei Wang, Department of Electrical and Computer Engineering, University of Toronto 9 Why DRF? Addresses the demand heterogeneity Highly attractive allocation properties [Ghodsi11] Pareto optimality Envy freeness Truthfulness Sharing incentive and more… Wei Wang, Department of Electrical and Computer Engineering, University of Toronto 9 However… DRF assumes an all-in-one resource model The entire resource pool is modeled as one super computer Ignores the heterogeneity of servers Allocation depends only on the total amount of resources May lead to an infeasible allocation Wei Wang, Department of Electrical and Computer Engineering, University of Toronto 10 An infeasible DRF allocation The same example A cluster: (9 CPUs, 18 GB) Job of user 1: (1 CPU, 4 GB) Job of user 2: (3 CPUs, 1 GB) DRF allocation 3 jobs for User 1: (3 CPUs, 12 GB) 2 jobs for User 2: (6 CPUs, 2 GB) Wei Wang, Department of Electrical and Computer Engineering, University of Toronto 11 An infeasible DRF allocation The same example A cluster: (9 CPUs, 18 GB) ne resource model Job of user 1: (1not CPU, 4 GB) infrastructure – where Memory Job of user 2: (3 CPUs, 1 GB) umber of servers – but DRF allocation the allocations depend CPUs s pooled in the system, 3 jobs for User 1: (3 CPUs, 12 GB) distribution of servers. Server 1 Server 2 2 jobs for User 2: (6 CPUs, 2 GB) s, even the definition of (1 CPU, 14 GB) (8 CPUs, 4 GB) nding on the underlying ask may bottleneck on Fig. 1. An example of a system consisting of two We shall see that naive geneous servers, in which user 1 can schedule allocation to each server two tasks each demanding 1 CPU and 4 GB m cient allocation (details The resources required to execute the two tasks highlighted in the figure. Wei Wang, of Electricala and Computer tudy toDepartment propose so- Engineering, University of Toronto 11 An infeasible DRF allocation The same example A cluster: (9 CPUs, 18 GB) ne resource model Job of user 1: (1not CPU, 4 GB) infrastructure – where Memory Job of user 2: (3 CPUs, 1 GB) umber of servers – but DRF allocation the allocations depend CPUs s pooled in the system, 3 jobs for User 1: (3 CPUs, 12 GB) distribution of servers. Server 1 Server 2 2 jobs for User 2: (6 CPUs, 2 GB) s, even the definition of (1 CPU, 14 GB) (8 CPUs, 4 GB) nding on the underlying ask may bottleneck on Fig. 1. An example of a system consisting of two We shall see that User naive 1 can schedule at most 2 jobs! geneous servers, in which user 1 can schedule allocation to each server two tasks each demanding 1 CPU and 4 GB m cient allocation (details The resources required to execute the two tasks highlighted in the figure. Wei Wang, of Electricala and Computer tudy toDepartment propose so- Engineering, University of Toronto 11 A quick fix of DRF Per-Server DRF For each server, allocate its resources to all users, using DRF However… Per-server DRF may lead to an arbitrarily inefficient allocation See the paper for details Wei Wang, Department of Electrical and Computer Engineering, University of Toronto 12 Can the attractiveness of DRF extend to a heterogeneous environment? Wei Wang, Department of Electrical and Computer Engineering, University of Toronto The ambiguity of dominant resource The same example A cluster: (9 CPUs, 18 GB) 2 Job of user 1: (1 CPU, 4 GB) ! Memory CPUs Server 1 Server 2 (1 CPU, 14 GB) (8 CPUs, 4 GB) example of a system consisting of two heteroservers, in which user 1 can schedule at most each demanding 1 CPU and 4 GB memory. Wei Wang, Department of Electrical and Computer Engineering, University of Toronto urces required to execute the two tasks are also 14 The ambiguity of dominant resource The same example A cluster: (9 CPUs, 18 GB) Job of user 1: (1 CPU, 4 GB) ! How to define dominant resource? 2 For server 1, the dominant resource is CPU For server 2, the dominant resource is memory Memory For the entire resource pool, the dominant resource is memory CPUs Server 1 Server 2 (1 CPU, 14 GB) (8 CPUs, 4 GB) example of a system consisting of two heteroservers, in which user 1 can schedule at most each demanding 1 CPU and 4 GB memory. Wei Wang, Department of Electrical and Computer Engineering, University of Toronto urces required to execute the two tasks are also 14 Our answer: DRFH A generalization of DRF mechanism in Heterogeneous environments Equalizes every user’s global dominant share Retains almost all the attractive allocation properties of DRF Pareto optimality Envy-freeness Truthfulness Weak sharing incentive and more… Easy to implement Wei Wang, Department of Electrical and Computer Engineering, University of Toronto 15 DRFH Allocation Wei Wang, Department of Electrical and Computer Engineering, University of Toronto A global view of dominant resource Global dominant resource The one that requires the maximum allocation share of the entire resource pool e resource model not The same example infrastructure – where Memory mber Aofcluster: servers(9–CPUs, but 18 GB) the allocations depend CPUs Job of user 1: (1 CPU, 4 GB) pooled in the system, distribution of servers. Server 1 Server 2 , even the definition of (1 CPU, 14 GB) (8 CPUs, 4 GB) nding on the underlying sk may bottleneck on Fig. 1. global An example of a system consisting of two Memory is the dominant resource We shall see that naive geneous servers, in which user 1 can schedule a llocation to each server two tasks each demanding 1 CPU and 4 GB m ient allocation (details The resources required to execute the two tasks a Wei Wang, Department of Electrical and Computer Engineering, University of Toronto highlighted in the figure. 17 users, and allocation A , the global dominant share user i receives is i hmark. We X X Gi (Ai ) = Gil (Ail ) = min{Ailr /dir } . (3) y between r2R l2S l2S o design an fairness on the dominant resources, subject to iesMax-min defined DRFH allocation aimsglobal to maximize the minimum global dom- Key intuition resourceinant constraints per share among allserver users, subject to the resource constraints ! ! satisation per server, i.e., max A Global dominant share min Gi (Ai ) i2U X Ailr clr , 8l 2 S, r 2 R . (4) s.t. en there is i2U to !equalize Recall that without loss of generality, we assume nonuser in the availability wasteful allocation A (see Sec. III-B). Total We have the following of erogeneousAllocation share of resource r on server l y the DRF structural result. resource r user receives Lemma 1: iFor user i and server l, an allocation Ail is ous, a user nt servers. non-wasteful on serverif land only if there exists some gil such that dominant Ail = gil di . In particular, gil is the global dominant share esource in user i receives in server l under allocation Ail , i.e., 1. Because gil = Gil (Ail ) . allocation CPUs, each Proof: (() We start with the necessity proof. Since Ail = Wei Wang, Department of Electrical and Computer Engineering, University of Toronto 18 DRFH Properties Wei Wang, Department of Electrical and Computer Engineering, University of Toronto Fairness property Wei Wang, Department of Electrical and Computer Engineering, University of Toronto 20 Fairness property DRFH is envy-free No user can schedule more computing tasks by taking the other’s resource allocation No one will envy the other’s allocation Wei Wang, Department of Electrical and Computer Engineering, University of Toronto 20 Fairness property DRFH is envy-free No user can schedule more computing tasks by taking the other’s resource allocation No one will envy the other’s allocation DRFH is truthful No user can schedule more computing tasks by misreporting its resource demand Strategic behaviours are commonly seen in real system [Ghodsi11] Wei Wang, Department of Electrical and Computer Engineering, University of Toronto 20 Fairness property DRFH is envy-free No user can schedule more computing tasks by taking the other’s resource allocation No one will envy the other’s allocation DRFH is truthful No user can schedule more computing tasks by misreporting its resource demand Strategic behaviours are commonly seen in real system [Ghodsi11] Wei Wang, Department of Electrical and Computer Engineering, University of Toronto 20 Resource utilization DRFH is Pareto optimal No user can schedule more tasks without decreasing the number of tasks scheduled for the others No resource that could be utilized to serve a user is left idle Wei Wang, Department of Electrical and Computer Engineering, University of Toronto 21 of many different ways to evenly divide the total availability. In general, any equal partition that allocates an equal share of every resource can be used as a benchmark. This allows us to be the tot We are relax the sharing incentive definition. We first define an equal on A. Si Equal partition partition as follows. availabilit Definition 3 (Equal partition): Allocation A is an equal Allocation A is an equal partition if it divides every resource evenly partition if it divides every resource evenly among all users, resource r among all n users i.e., X ! Ailr = 1/n, 8r 2 R, i 2 U . Service isolation ! l2S It is easy It is easy to verify that the aforementioned per-server fraction o Allocation share of resource r user i receives on server l partition is an equal partition. We are now ready to define the weak sharing incentive property as follows. Definition 4 (Weak sharing incentive): Allocation A satisfies the weak sharing incentive property if there exists As a resul an equal partition A0 under which each user schedules fewer resources demands, tasks than those under A, i.e., Ni (Ai ) Ni (A0i ), Wei Wang, Department of Electrical and Computer Engineering, University of Toronto 8i 2 U . A 22 of many different ways to evenly divide the total availability. In general, any equal partition that allocates an equal share of every resource can be used as a benchmark. This allows us to be the tot We are relax the sharing incentive definition. We first define an equal on A. Si Equal partition partition as follows. availabilit Definition 3 (Equal partition): Allocation A is an equal Allocation A is an equal partition if it divides every resource evenly partition if it divides every resource evenly among all users, resource r among all n users i.e., X ! Ailr = 1/n, 8r 2 R, i 2 U . Service isolation ! l2S It is easy It is easy to verify that the aforementioned per-server fraction o Allocation share of resource r user i receives on server l partition is an equal partition. We are now ready to define the weak sharing incentive property as follows. Weak sharing incentive Definition 4 (Weak sharing incentive): Allocation A As a resul satisfies the weak sharing incentive property if there exists There exists an equal allocation A’ under which each user schedules 0 resources an equal partition A under which each user schedules fewer fewer tasks than those under DRFH demands, tasks than those under A, i.e., DRFH is unanimously preferred to an equal allocation by all users 0 Ni (Ai ) Ni (Ai ), Wei Wang, Department of Electrical and Computer Engineering, University of Toronto 8i 2 U . A 22 Comparison DRFH DRF (all-in-one model) Pareto optimality Pareto optimality Envy freeness Envy freeness Truthfulness Truthfulness Weak sharing incentive Strong sharing incentive Wei Wang, Department of Electrical and Computer Engineering, University of Toronto 23 Comparison DRFH DRF (all-in-one model) Pareto optimality Pareto optimality Envy freeness Envy freeness Truthfulness Truthfulness Weak sharing incentive Strong sharing incentive DRFH retains almost all the attractive properties of DRF Wei Wang, Department of Electrical and Computer Engineering, University of Toronto 23 Trace-Driven Simulation Wei Wang, Department of Electrical and Computer Engineering, University of Toronto Task completion ratio w/ DRFH Resource utilization CPU Utilization 1 0.8 0.6 0.4 0.2 Memory Utilization 0 0 Best−Fit DRFH 200 400 First−Fit DRFH 600 800 Time (min) 1000 Slots 1200 1400 1 Fig us sch ble be 0.8 0.6 0.4 0.2 0 0 Best−Fit DRFH 200 400 First−Fit DRFH 600 800 Time (min) 1000 Fig. 5. Time series of CPU and memory utilization. Wei Wang, Department of Electrical and Computer Engineering, University of Toronto Slots 1200 1400 in fo 25 Completion Time Reduction memory utilization. Job completion times FH 5 10 80 62% 60 43% 40 25% 20 0 −1% 2% 0 0 0 0 0 0 5 0 0 0 1− 51−1 01−5 1−10 >10 1 50 Job Size (tasks) Wei Wang, Department of Electrical and Computer Engineering, University of Toronto 26 Conclusions We have studied a multi-resource fair allocation problem in a heterogeneous cloud computing system We have generalized DRF to DRFH and shown that it possesses a set of highly attractive allocation properties We have designed an effective heuristic algorithm that implements DRFH in a real-world system ! ! http://iqua.ece.toronto.edu/~weiwang/ Wei Wang, Department of Electrical and Computer Engineering, University of Toronto 27
© Copyright 2024 ExpyDoc